Apache Flink is a powerful stream-processing framework for processing data in real-time and batch modes. Chapter 3 delves into core topics such as stream vs. batch processing, handling late data, aggregations, joins, and advanced operations like custom operators. Test your knowledge with these carefully curated MCQs, ideal for beginners and experts alike.
Stream Processing vs Batch Processing in Flink
What is the key characteristic of stream processing in Flink? a) Processes data as it arrives b) Processes data in fixed intervals c) Only supports static datasets d) Requires batch files as input
Which processing mode is more suitable for real-time analytics in Flink? a) Batch processing b) Stream processing c) File processing d) None of the above
In Flink, what defines batch processing? a) Continuous data processing b) Processing finite datasets c) Event-driven architecture d) Processing in real-time
Stream processing is primarily used for: a) Historical analysis b) Processing static data c) Real-time data processing d) Processing outdated datasets
Which Flink feature is ideal for combining stream and batch workloads? a) Hybrid architecture b) Flink SQL c) Unified data processing d) Task chaining
Handling Late Data and Out-of-Order Events
How does Flink handle late data in stream processing? a) By discarding it b) By using watermarks c) By reprocessing the entire dataset d) By pausing the stream
What are watermarks in Flink? a) Indicators of event time progress b) Markers for state partitioning c) Metrics for performance monitoring d) Identifiers for batch boundaries
Which configuration helps in handling out-of-order events in Flink? a) Keyed streams b) Sliding windows c) Event time characteristics d) Parallelism adjustment
Late data can be processed using: a) Timeout settings b) Allowed lateness c) Timestamp overrides d) Stateful operators
Flink discards late events when: a) Watermarks pass their event time b) Parallelism is reduced c) Checkpointing is enabled d) Backpressure occurs
Aggregations, Joins, and Stateful Operations
What is a keyed stream in Flink used for? a) Stateless operations b) Partitioning based on a key c) Real-time data visualization d) Scheduling tasks
Aggregations in Flink work on: a) Non-keyed streams only b) Both keyed and non-keyed streams c) Source streams exclusively d) Batch data only
Which operation is used for joining two streams in Flink? a) Merge b) CoGroup c) Interval join d) Reduce
What is the purpose of stateful operations in Flink? a) Storing intermediate results b) Partitioning data c) Discarding late events d) Generating watermarks
Which is a valid aggregation in Flink? a) Count b) Collect c) Sample d) Snapshot
Custom Operators and User-Defined Functions
What is the main purpose of a user-defined function (UDF) in Flink? a) Optimizing built-in operators b) Implementing custom logic c) Managing watermarks d) Scheduling tasks
UDFs in Flink can be: a) Stateless only b) Stateful or stateless c) Used for batch processing only d) Limited to Java
Which interface is used for defining a FlatMap function in Flink? a) FlatMapFunction b) MapFunction c) ReduceFunction d) AggregateFunction
What is the role of custom operators in Flink? a) Managing clusters b) Enhancing state management c) Enabling user-defined processing logic d) Performing windowing operations
How are UDFs implemented in Flink? a) As abstract classes b) As anonymous functions c) By implementing specific interfaces d) By using key-value pairs
Partitioning and Parallelism
Partitioning in Flink is primarily used for: a) Watermark generation b) Data redistribution c) Reducing backpressure d) Time-based processing
What determines parallelism in a Flink job? a) The cluster size b) Number of operators c) Degree of data partitioning d) Number of network buffers
KeyBy operation in Flink is used for: a) Sorting streams b) Grouping streams by a key c) Generating watermarks d) Partitioning data across jobs
What is the default partitioning strategy in Flink? a) Shuffle b) Keyed c) Hash d) Round-robin
Parallelism in Flink refers to: a) Task distribution across nodes b) Data aggregation across partitions c) Real-time batch processing d) Stream synchronization
Answers Table
QNo
Answer
1
a) Processes data as it arrives
2
b) Stream processing
3
b) Processing finite datasets
4
c) Real-time data processing
5
c) Unified data processing
6
b) By using watermarks
7
a) Indicators of event time progress
8
c) Event time characteristics
9
b) Allowed lateness
10
a) Watermarks pass their event time
11
b) Partitioning based on a key
12
b) Both keyed and non-keyed streams
13
c) Interval join
14
a) Storing intermediate results
15
a) Count
16
b) Implementing custom logic
17
b) Stateful or stateless
18
a) FlatMapFunction
19
c) Enabling user-defined processing logic
20
c) By implementing specific interfaces
21
b) Data redistribution
22
c) Degree of data partitioning
23
b) Grouping streams by a key
24
d) Round-robin
25
a) Task distribution across nodes
Stream Processing vs Batch Processing in Flink
Which programming model does Flink use for batch processing? a) MapReduce b) Bulk synchronous parallel c) Stream model d) Parallel loop model
Handling Late Data and Out-of-Order Events
In Flink, out-of-order events are primarily addressed using: a) Event time and watermarks b) Processing time windows c) System clock synchronization d) Batch reprocessing
Aggregations, Joins, and Stateful Operations
What is a window join in Flink? a) Joining two streams based on the same timestamp b) Joining two streams within a specified time window c) Joining static datasets d) Joining streams without keys
Custom Operators and User-Defined Functions
Which annotation is used to mark a UDF as rich in Flink? a) @RichFunction b) @RichUDF c) No specific annotation d) @CustomFunction
Partitioning and Parallelism
Rescaling in Flink helps to: a) Dynamically adjust parallelism during runtime b) Partition streams by key c) Reduce checkpoint intervals d) Manage memory backpressure
Updated Answers Table
QNo
Answer
26
b) Bulk synchronous parallel
27
a) Event time and watermarks
28
b) Joining two streams within a specified time window