MCQs on Scenario Based Questions on Cassandra

  • Case Based
  • Query Based

Case Based

These scenario-based MCQs help in testing your practical understanding of Cassandra in real-world environments. By diving into use cases and problem-solving scenarios, you will learn how to apply Cassandra’s features and best practices effectively. The answers are provided at the end for self-assessment.


1. Scenario: High Write Throughput

You are working for an e-commerce company that has millions of users placing orders every second. The order data needs to be stored in a way that can handle high write throughput and allow fast retrieval.

What feature of Cassandra will be most beneficial for your use case?
a) Secondary indexing
b) Horizontal scalability
c) Complex JOIN operations
d) Data consistency across nodes

2. Scenario: Real-Time Analytics

Your company needs to provide real-time analytics of customer interactions on a website. The platform handles millions of events every minute and you need a system that can quickly process and store these events.

Which feature of Cassandra will help in handling real-time analytics?
a) High availability
b) Data compression
c) Low latency write operations
d) Advanced query optimization

3. Scenario: Handling Time-Series Data in IoT

You’re working with a company in the IoT space that collects data from millions of devices every second, including temperature, pressure, and humidity readings. The data needs to be stored in a time-series format.

Which Cassandra feature will help store time-series data efficiently?
a) Composite primary keys
b) Complex joins
c) Partitioning by timestamp
d) Data denormalization

4. Scenario: Bank Transaction Data

You are tasked with implementing a system for banking transaction data, which requires both fast writes and reads to allow real-time access to transaction history.

What Cassandra feature is essential for ensuring fast access to the transaction data?
a) ACID compliance
b) High scalability and write-heavy optimization
c) Complex relational joins
d) Row-level locking

5. Scenario: Social Media Feed

You need to design a system to store user posts, likes, and comments for a social media platform. The system must handle frequent updates and support real-time retrieval of recent posts for a specific user.

Which of the following should be used as the partition key in Cassandra?
a) Post ID
b) User ID
c) Timestamp
d) Post content

6. Scenario: Multi-Data Center Replication

Your company has a global presence and needs to ensure that users in different regions can read and write data with low latency, even if one data center goes down.

Which Cassandra feature allows you to achieve this?
a) Single-node architecture
b) Peer-to-peer distributed architecture
c) Multi-data center replication
d) Master-slave architecture

7. Scenario: E-Commerce Product Catalog

An e-commerce site needs to store a product catalog that should be available globally and should be able to scale as the catalog grows with thousands of products being added frequently.

What is the most suitable way to model this data in Cassandra?
a) Denormalize product data across multiple tables
b) Store each product in a separate database
c) Normalize the product data into multiple tables
d) Store product data in a single table with no partitioning

8. Scenario: Scaling Cassandra Cluster

Your company’s database is growing and you are facing performance degradation. You need to scale your Cassandra cluster to accommodate more data without losing performance.

Which of the following should you do?
a) Add more nodes to the cluster
b) Move to a relational database
c) Use more complex queries
d) Normalize the data further

9. Scenario: Handling Hotspots

Your Cassandra cluster is experiencing hotspots, where some nodes are much busier than others due to uneven data distribution.

What is the most likely cause of this issue?
a) Incorrect partition key selection
b) Too many secondary indexes
c) Use of multiple clustering columns
d) Use of large data types

10. Scenario: Data Backup and Recovery

You need to ensure that your Cassandra cluster data is safe and can be recovered in case of failure. What is the recommended way to back up your Cassandra data?

a) Take snapshots of the data
b) Rely on the operating system’s backup tools
c) Use a manual data dump with SQL commands
d) Back up only the configuration files

11. Scenario: Performance Tuning

Your application is experiencing slower read performance as the data volume grows in Cassandra. You want to optimize the system for faster reads.

What Cassandra feature would help improve the read performance?
a) Use of lightweight transactions
b) Denormalizing data
c) Use of multiple secondary indexes
d) Optimizing queries with partition keys

12. Scenario: Distributed Systems

Your application is deployed in a globally distributed environment. You need to ensure that the data is available and consistent across multiple geographical regions.

Which Cassandra feature helps manage this?
a) Consistency level tuning
b) Partition key design
c) Using a single data center
d) Relational data model

13. Scenario: Data Migration

You have a large dataset stored in another database system and need to migrate it to Cassandra. The data includes millions of records with complex relationships.

What is the best approach for migrating this data to Cassandra?
a) Normalize the data first
b) Use denormalization and design based on query patterns
c) Migrate it as-is without modification
d) Use SQL queries for direct migration

14. Scenario: Large-Scale Online Service

You are managing a service that handles millions of customer records. The data must be available with very low latency for customer support representatives, regardless of the number of active users.

What is the key factor to consider when modeling this data in Cassandra?
a) Optimizing for complex queries
b) Ensuring horizontal scalability
c) Using ACID transactions
d) Reducing data redundancy

15. Scenario: High Availability in Cassandra

Your organization requires high availability for an application that must be operational 24/7 with no downtime.

Which feature of Cassandra ensures high availability?
a) Single-node architecture
b) Multi-data center replication
c) Complex indexing
d) Transactional integrity

16. Scenario: Large Product Catalog Search

You are working for an online marketplace that allows users to search for products from thousands of listings. The product search should be fast, even as the catalog expands rapidly.

What Cassandra feature will ensure efficient searching across a large product catalog?
a) Using a dedicated search engine like Elasticsearch
b) Using secondary indexes on product attributes
c) Normalizing product information
d) Partitioning data based on categories

17. Scenario: Sensor Data for Smart Home

You’re building a system to store data from thousands of smart home devices, including temperature, humidity, and security system statuses. The data needs to be stored efficiently and queried in real-time.

What is the best data model to use for this scenario?
a) Relational model with JOINs
b) Denormalized data model with a focus on time-series
c) Graph database model
d) Complex relational tables

18. Scenario: Data Consistency

Your Cassandra cluster is used to store session data for users across multiple servers. You require eventual consistency for the session data, as long as it’s available for reading and writing during high traffic periods.

Which consistency level should you use for this scenario?
a) QUORUM
b) ONE
c) ALL
d) LOCAL_QUORUM

19. Scenario: Reducing Latency in Writes

Your application needs to ensure that writes are as fast as possible while maintaining good availability. You don’t need to worry about immediate consistency across replicas.

Which consistency level should you choose to minimize write latency?
a) ONE
b) ALL
c) LOCAL_QUORUM
d) QUORUM

20. Scenario: Real-Time User Activity Tracking

Your company needs to track user activity in real-time across multiple devices. The data needs to be available for analytics and reporting within seconds.

How should you design the data model in Cassandra?
a) Use a wide-column design with denormalized data
b) Use complex SQL joins for real-time queries
c) Normalize the data into relational tables
d) Use a single large partition for all activities

21. Scenario: Data Replication Strategy

You’re managing a globally distributed system with Cassandra. The company requires that data be replicated in multiple regions for redundancy, but the replication must be fast.

Which strategy should you use for efficient data replication?
a) Multi-data center replication with minimal consistency
b) One single replication factor across all data centers
c) Synchronous replication in a single data center
d) Manual replication between data centers

22. Scenario: Handling Customer Feedback

You need to store customer feedback and ensure that it can be retrieved quickly by customer service agents. The feedback data includes timestamps and customer IDs.

Which should be used as the partition key in Cassandra?
a) Customer ID
b) Timestamp
c) Feedback content
d) Feedback ID

23. Scenario: Fraud Detection System

You are tasked with implementing a fraud detection system for a financial institution. The system must handle real-time processing of large transaction volumes and detect fraud patterns quickly.

What Cassandra feature will best help with this requirement?
a) Real-time analytics with low-latency writes
b) Advanced relational data modeling
c) Complex SQL queries with JOINs
d) Single-node configuration

24. Scenario: Event Logging

Your company needs to log events from its website in real-time. The log data needs to be stored in a way that allows efficient retrieval and analysis over time.

How should you model the event data in Cassandra?
a) Use a time-series data model with partitioning based on timestamp
b) Use a normalized model for log data
c) Store logs in a single table with no partitioning
d) Use complex JOINs to link event logs

25. Scenario: Dealing with Hot Partitions

Your Cassandra application is facing issues with data being unevenly distributed across the cluster, leading to some nodes being over-utilized.

What should you do to resolve the issue?
a) Change the partition key design
b) Add more nodes to the cluster without addressing the partition key
c) Reduce the number of replication factors
d) Use relational joins to balance the load

26. Scenario: Storing User Preferences

You are tasked with storing user preferences for an online platform. These preferences need to be updated frequently, and users may have different categories of preferences (e.g., content, notifications).

How should you model this data in Cassandra?
a) Use one partition for each user’s preferences
b) Store preferences in multiple tables based on category
c) Use a single wide table with no partitioning
d) Use complex JOINs to link preferences

27. Scenario: Cross-Region Access

Your application needs to serve users from different parts of the world and ensure fast read and write operations, even if one region’s data center is down.

What Cassandra feature allows you to achieve this?
a) Peer-to-peer architecture
b) Multi-data center replication
c) Single-node architecture
d) Centralized data replication

28. Scenario: Handling User Sessions

You need to store session data for millions of users in an application. The session data must be available for quick retrieval and can be updated frequently.

Which Cassandra feature will help ensure fast retrieval and updates?
a) Use of secondary indexes
b) Efficient partition key design based on user ID
c) Normalized data model for session management
d) Use of complex relational queries

29. Scenario: Managing Inventory Data

Your company needs to track product inventory across multiple stores. The inventory levels should be updated in real-time and be accessible for reporting and analytics.

What is the best data modeling approach in Cassandra?
a) Use partitioning by product ID and store location
b) Normalize the inventory data
c) Use complex JOINs to link inventory data
d) Store all data in a single table

30. Scenario: Scaling During Peak Traffic

During holiday seasons, your e-commerce platform experiences a surge in traffic. You need to ensure that the system can handle increased read and write loads without crashing.

What Cassandra feature should be utilized to ensure scalability during peak traffic?
a) Increased hardware resources for individual nodes
b) Adding more nodes to the cluster to scale horizontally
c) Using complex queries for traffic load balancing
d) Reducing data volume by denormalizing all data


Answer Key

QnoAnswer
1b) Horizontal scalability
2c) Low latency write operations
3c) Partitioning by timestamp
4b) High scalability and write-heavy optimization
5b) User ID
6c) Multi-data center replication
7a) Denormalize product data across multiple tables
8a) Add more nodes to the cluster
9a) Incorrect partition key selection
10a) Take snapshots of the data
11d) Optimizing queries with partition keys
12a) Consistency level tuning
13b) Use denormalization and design based on query patterns
14b) Ensuring horizontal scalability
15b) Multi-data center replication
16b) Using secondary indexes on product attributes
17b) Denormalized data model with a focus on time-series
18b) ONE
19a) ONE
20a) Use a wide-column design with denormalized data
21a) Multi-data center replication with minimal consistency
22a) Customer ID
23a) Real-time analytics with low-latency writes
24a) Use a time-series data model with partitioning based on timestamp
25a) Change the partition key design
26a) Use one partition for each user’s preferences
27b) Multi-data center replication
28b) Efficient partition key design based on user ID
29a) Use partitioning by product ID and store location
30b) Adding more nodes to the cluster to scale horizontally

Query Based

Here are 20 query-based scenario questions related to Cassandra. These scenarios will test your understanding of how to interact with Cassandra using CQL (Cassandra Query Language) for practical use cases.


1. Scenario: User Information Retrieval

You have a table users that stores user details, including user_id, first_name, last_name, and email. You want to retrieve the details of a user with a specific user_id.

What CQL query would you use to retrieve the user details for user_id = 12345?
a) SELECT * FROM users WHERE user_id = 12345;
b) SELECT * FROM users WHERE email = 12345;
c) SELECT user_id, first_name, last_name FROM users;
d) SELECT * FROM users WHERE user_id IN (12345);


2. Scenario: Inserting Data into a Table

You need to insert a new record into the orders table, which includes order_id, user_id, product_id, and order_date.

What is the correct CQL query to insert a new order with order_id = 101, user_id = 202, product_id = 303, and order_date = '2024-12-01'?
a) INSERT INTO orders (order_id, user_id, product_id, order_date) VALUES (101, 202, 303, '2024-12-01');
b) INSERT INTO orders (order_id, product_id, order_date) VALUES (101, 303, '2024-12-01');
c) INSERT INTO orders (user_id, product_id, order_date) VALUES (202, 303, '2024-12-01');
d) INSERT INTO orders VALUES (101, 202, 303, '2024-12-01');


3. Scenario: Updating Data

You need to update the email address of the user with user_id = 12345 in the users table.

What CQL query would you use?
a) UPDATE users SET email = 'newemail@example.com' WHERE user_id = 12345;
b) UPDATE users SET first_name = 'John', email = 'newemail@example.com';
c) MODIFY users SET email = 'newemail@example.com' WHERE user_id = 12345;
d) UPDATE users SET email = 'newemail@example.com' WHERE first_name = 'John';


4. Scenario: Deleting Data

You want to delete the user with user_id = 12345 from the users table.

Which CQL query would you use?
a) DELETE FROM users WHERE user_id = 12345;
b) REMOVE FROM users WHERE user_id = 12345;
c) DELETE * FROM users WHERE user_id = 12345;
d) DELETE users WHERE user_id = 12345;


5. Scenario: Counting Rows

You want to count the total number of orders in the orders table.

What CQL query will give you the count?
a) SELECT COUNT(*) FROM orders;
b) SELECT TOTAL COUNT FROM orders;
c) SELECT COUNT(*) FROM orders WHERE order_id;
d) COUNT * FROM orders;


6. Scenario: Querying with WHERE Clause

You need to retrieve all orders where user_id = 202 from the orders table. The table is partitioned by user_id and clustered by order_date.

What CQL query would you use to retrieve all orders for user_id = 202?
a) SELECT * FROM orders WHERE user_id = 202;
b) SELECT * FROM orders WHERE user_id = 202 AND order_date > '2024-01-01';
c) SELECT * FROM orders WHERE order_id = 101 AND user_id = 202;
d) SELECT * FROM orders WHERE order_date > '2024-01-01';


7. Scenario: Querying with LIKE Operator

You need to retrieve all users whose first_name starts with ‘J’. How would you write the query?
a) SELECT * FROM users WHERE first_name LIKE 'J%';
b) SELECT * FROM users WHERE first_name LIKE '%J';
c) SELECT * FROM users WHERE first_name LIKE '%John%';
d) SELECT * FROM users WHERE first_name LIKE 'J_';


8. Scenario: Using ALLOW FILTERING

You want to retrieve all orders for user_id = 202 in the orders table, even though the table is not designed to filter by user_id without an index.

What query will work in this scenario?
a) SELECT * FROM orders WHERE user_id = 202 ALLOW FILTERING;
b) SELECT * FROM orders WHERE user_id = 202;
c) SELECT * FROM orders WHERE user_id = 202 INDEXED;
d) SELECT * FROM orders WHERE user_id = 202 AND order_id > 100;


9. Scenario: Creating a Secondary Index

You need to create a secondary index on the email column in the users table. What CQL query would you use?
a) CREATE INDEX ON users(email);
b) CREATE SECONDARY INDEX users(email);
c) CREATE INDEX email_index ON users(email);
d) CREATE INDEX email ON users;


10. Scenario: Using IN Clause

You need to retrieve details for users with user_id values 101, 102, and 103 from the users table. What query would you use?
a) SELECT * FROM users WHERE user_id IN (101, 102, 103);
b) SELECT * FROM users WHERE user_id = 101 OR user_id = 102 OR user_id = 103;
c) SELECT * FROM users WHERE user_id = 101, 102, 103;
d) SELECT * FROM users WHERE user_id IN (101 AND 102 AND 103);


11. Scenario: Using TTL for Expiring Data

You want to insert a session record that expires in 1 hour (3600 seconds) in the sessions table, which has a session_id, user_id, and session_data. How would you write this query?

a) INSERT INTO sessions (session_id, user_id, session_data) VALUES ('session1', 123, 'data') USING TTL 3600;
b) INSERT INTO sessions (session_id, user_id, session_data) VALUES ('session1', 123, 'data') TTL 3600;
c) INSERT INTO sessions (session_id, user_id, session_data) VALUES ('session1', 123, 'data') WITH TTL 3600;
d) INSERT INTO sessions (session_id, user_id, session_data) VALUES ('session1', 123, 'data') USING TTL 360;


12. Scenario: Querying with LIMIT

You need to retrieve the top 5 most recent orders for user_id = 202 from the orders table, which is clustered by order_date.

What query would you use?
a) SELECT * FROM orders WHERE user_id = 202 LIMIT 5;
b) SELECT * FROM orders WHERE user_id = 202 ORDER BY order_date DESC LIMIT 5;
c) SELECT * FROM orders WHERE user_id = 202 AND order_date > '2024-01-01' LIMIT 5;
d) SELECT * FROM orders LIMIT 5;


13. Scenario: Querying a Large Dataset

You have a table events partitioned by event_date and clustered by event_id. You need to retrieve all events for a specific event_date, but the dataset is very large.

What query will you use?
a) SELECT * FROM events WHERE event_date = '2024-12-01';
b) SELECT * FROM events WHERE event_date = '2024-12-01' AND event_id > 100;
c) SELECT * FROM events WHERE event_date = '2024-12-01' ALLOW FILTERING;
d) SELECT * FROM events WHERE event_date = '2024-12-01' LIMIT 100;


14. Scenario: Using BATCH for Multiple Inserts

You want to insert multiple records into the orders table in a single batch. What CQL query would you use?
a) BEGIN BATCH INSERT INTO orders (order_id, user_id) VALUES (101, 202); APPLY BATCH;
b) BEGIN BATCH INSERT INTO orders (order_id, user_id) VALUES (101, 202), (102, 203); APPLY BATCH;
c) BEGIN BATCH INSERT INTO orders (order_id, user_id) VALUES (101, 202); END BATCH;
d) BEGIN BATCH VALUES (101, 202), (102, 203); APPLY BATCH;


15. Scenario: Retrieving Data Based on Multiple Columns

You need to query the orders table, partitioned by user_id and clustered by order_date. Retrieve the order details for user_id = 202 and a specific order_date = '2024-12-01'.

What query would you write?
a) SELECT * FROM orders WHERE user_id = 202 AND order_date = '2024-12-01';
b) SELECT * FROM orders WHERE user_id = 202 AND order_date = '2024-12-01' LIMIT 10;
c) SELECT * FROM orders WHERE order_date = '2024-12-01' AND user_id = 202;
d) SELECT * FROM orders WHERE user_id = 202 OR order_date = '2024-12-01';


16. Scenario: Dropping a Column

You need to remove the email column from the users table.

What CQL query would you use?
a) ALTER TABLE users DROP COLUMN email;
b) DROP COLUMN email FROM users;
c) DELETE COLUMN email FROM users;
d) ALTER TABLE users REMOVE email;


17. Scenario: Using ALLOW FILTERING

You are trying to query the users table with WHERE on first_name, but the column is not part of the primary key or indexed.

Which query will work?
a) SELECT * FROM users WHERE first_name = 'John' ALLOW FILTERING;
b) SELECT * FROM users WHERE first_name = 'John';
c) SELECT * FROM users WHERE first_name = 'John' INDEXED;
d) SELECT * FROM users WHERE first_name LIKE 'John';


18. Scenario: Querying Nested Collections

You have a table user_profiles with a user_id and a favorites collection column (list). You need to retrieve all the favorites for user_id = 101.

Which query would you use?
a) SELECT favorites FROM user_profiles WHERE user_id = 101;
b) SELECT * FROM user_profiles WHERE user_id = 101 AND favorites IN ('item1');
c) SELECT favorites FROM user_profiles WHERE user_id = 101 AND favorites LIKE 'item1';
d) SELECT favorites FROM user_profiles WHERE favorites IN ('item1');


19. Scenario: Creating a Table

You need to create a table user_login with columns: user_id, login_time, and ip_address. The table should be partitioned by user_id and clustered by login_time. What CQL query will you use?
a) CREATE TABLE user_login (user_id INT, login_time TIMESTAMP, ip_address TEXT, PRIMARY KEY (user_id, login_time));
b) CREATE TABLE user_login (user_id INT, login_time TIMESTAMP, ip_address TEXT, PRIMARY KEY (login_time, user_id));
c) CREATE TABLE user_login (user_id INT, login_time TIMESTAMP, ip_address TEXT, PRIMARY KEY (login_time));
d) CREATE TABLE user_login (user_id INT, login_time TIMESTAMP, ip_address TEXT, PRIMARY KEY (user_id));


20. Scenario: Query with DISTINCT

You need to retrieve distinct user_id values from the orders table. What CQL query would you use?
a) SELECT DISTINCT user_id FROM orders;
b) SELECT user_id FROM orders WHERE DISTINCT;
c) SELECT DISTINCT user_id FROM orders WHERE user_id;
d) SELECT DISTINCT FROM orders (user_id);

Here are the answers to the query-based scenario questions in table format:

QnoAnswer
1a) SELECT * FROM users WHERE user_id = 12345;
2a) INSERT INTO orders (order_id, user_id, product_id, order_date) VALUES (101, 202, 303, '2024-12-01');
3a) UPDATE users SET email = 'newemail@example.com' WHERE user_id = 12345;
4a) DELETE FROM users WHERE user_id = 12345;
5a) SELECT COUNT(*) FROM orders;
6b) SELECT * FROM orders WHERE user_id = 202 AND order_date > '2024-01-01';
7a) SELECT * FROM users WHERE first_name LIKE 'J%';
8a) SELECT * FROM orders WHERE user_id = 202 ALLOW FILTERING;
9a) CREATE INDEX ON users(email);
10a) SELECT * FROM users WHERE user_id IN (101, 102, 103);
11a) INSERT INTO sessions (session_id, user_id, session_data) VALUES ('session1', 123, 'data') USING TTL 3600;
12b) SELECT * FROM orders WHERE user_id = 202 ORDER BY order_date DESC LIMIT 5;
13a) SELECT * FROM events WHERE event_date = '2024-12-01';
14b) BEGIN BATCH INSERT INTO orders (order_id, user_id) VALUES (101, 202), (102, 203); APPLY BATCH;
15a) SELECT * FROM orders WHERE user_id = 202 AND order_date = '2024-12-01';
16a) ALTER TABLE users DROP COLUMN email;
17a) SELECT * FROM users WHERE first_name = 'John' ALLOW FILTERING;
18a) SELECT favorites FROM user_profiles WHERE user_id = 101;
19a) CREATE TABLE user_login (user_id INT, login_time TIMESTAMP, ip_address TEXT, PRIMARY KEY (user_id, login_time));
20a) SELECT DISTINCT user_id FROM orders;

Use a Blank Sheet, Note your Answers and Finally tally with our answer at last. Give Yourself Score.

X
error: Content is protected !!
Scroll to Top