Azure Data Factory (ADF) is a robust cloud-based data integration service designed to simplify complex workflows. This collection of Azure Data Factory MCQs questions focuses on advanced data flows and transformations. It includes topics such as aggregations, joins, pivot transformations, schema drift, and integrating with Databricks and Spark. These MCQs are essential for mastering ADF’s advanced features and optimizing performance for large-scale workflows.
Chapter 6: Data Flows and Transformations (Advanced)
Topic 1: Aggregation, Join, Union, and Pivot Transformations
Which transformation in Azure Data Factory allows summarizing data across rows? a) Pivot b) Aggregate c) Union d) Join
The Join transformation in ADF is used to: a) Combine data from multiple sources b) Split datasets into smaller partitions c) Filter rows based on conditions d) Generate pivot tables
What is the purpose of a Union transformation in ADF? a) Combining multiple datasets with identical schema b) Filtering data across multiple rows c) Aggregating numerical columns d) Splitting data into multiple outputs
A Pivot transformation in ADF allows: a) Aggregating rows into columns b) Combining datasets horizontally c) Adding new derived columns d) Handling schema drift
Which of the following transformations requires a grouping key? a) Join b) Aggregate c) Pivot d) Union
Topic 2: Derived Columns and Conditional Logic
A derived column in Azure Data Factory is: a) An input column passed as-is b) A new column created using expressions c) A duplicate column for validation d) A column used for filtering
To add conditional logic to a data flow, ADF uses: a) SQL queries b) If-Else expressions c) Schema drift handling d) Spark scripts
Which transformation in ADF allows creating new columns based on existing ones? a) Aggregate b) Derived Column c) Union d) Lookup
Conditional expressions in ADF data flows are written in: a) SQL b) Data Flow Expression Language c) Python d) JSON
In ADF, derived columns can be used for: a) Schema validation b) Applying transformations dynamically c) Enabling real-time data flow monitoring d) Creating new attributes based on logic
Topic 3: Handling Nulls, Expressions, and Schema Drift
In ADF, handling null values is typically done using: a) Conditional splits b) Null-coalescing expressions c) Join transformations d) SQL queries
Schema drift in ADF refers to: a) Data type mismatches b) Changes in the input schema over time c) Missing rows in the dataset d) Misaligned partitions
To handle schema drift, ADF uses: a) Dynamic mappings b) Fixed schemas only c) Hardcoded transformations d) SQL queries
Which function can replace null values in a column? a) replaceNull() b) isNull() c) coalesce() d) handleNull()
ADF expressions are evaluated in: a) Real time b) Batch mode c) Spark clusters exclusively d) The Integration Runtime
Topic 4: Optimizing Performance for Large-Scale Transformations
Optimizing ADF performance for large datasets involves: a) Using single-threaded execution b) Leveraging partitioning and caching c) Avoiding transformations entirely d) Storing data locally
Which of the following enhances ADF pipeline efficiency? a) Minimizing transformations b) Using Azure Data Lake only c) Avoiding parallel processing d) Over-partitioning data
The performance of a data flow can be optimized by: a) Enabling lazy evaluation b) Adjusting partition sizes c) Using on-premises integration runtimes exclusively d) Avoiding derived columns
Data skew can be reduced in ADF by: a) Increasing the number of partitions evenly b) Decreasing cluster nodes c) Applying static schema mappings d) Disabling caching
Monitoring pipeline performance in ADF is achieved through: a) Azure Monitor b) Data Lake logs c) SQL Profiler d) Power BI dashboards
Topic 5: Integration with Databricks and Spark for Transformations
Azure Data Factory integrates with Databricks to: a) Store data securely b) Perform advanced transformations with Spark c) Monitor pipeline activities d) Replace derived columns
Spark integration in ADF is beneficial for: a) Real-time database administration b) Processing large-scale distributed datasets c) Static data extraction only d) Simple file transfer
Which cluster type is commonly used in ADF Databricks integration? a) SQL Server clusters b) Apache Spark clusters c) PostgreSQL clusters d) Machine Learning clusters
The key advantage of using Databricks with ADF is: a) Cost reduction b) Scalability and parallelism for big data processing c) Data visualization features d) Easier network management
Integration with Spark in ADF requires: a) A Synapse Workspace b) A Databricks cluster or equivalent compute resource c) SQL Server Management Studio d) On-premises servers
Topic 6: Advanced Use Cases
ADF’s Mapping Data Flows are ideal for: a) Performing complex transformations with minimal code b) Hosting applications c) Storing backup files d) Managing network traffic
Pivot and aggregate transformations are commonly used together for: a) Data cleaning b) Summarizing and restructuring data c) Error logging d) Schema validation
ADF works seamlessly with Spark to: a) Create static schemas b) Enable real-time alerting c) Process batch and real-time data at scale d) Visualize datasets
When integrating with Spark, ADF can handle: a) Large datasets with dynamic transformations b) Static files exclusively c) Only structured data d) Limited data flows
Which advanced transformation is often performed using Databricks in ADF? a) Real-time ML model training b) Schema drift handling c) Advanced aggregations and joins d) Backup synchronization
Answers
Qno
Answer
1
b) Aggregate
2
a) Combine data from multiple sources
3
a) Combining multiple datasets with identical schema
4
a) Aggregating rows into columns
5
b) Aggregate
6
b) A new column created using expressions
7
b) If-Else expressions
8
b) Derived Column
9
b) Data Flow Expression Language
10
d) Creating new attributes based on logic
11
b) Null-coalescing expressions
12
b) Changes in the input schema over time
13
a) Dynamic mappings
14
c) coalesce()
15
a) Real time
16
b) Leveraging partitioning and caching
17
a) Minimizing transformations
18
b) Adjusting partition sizes
19
a) Increasing the number of partitions evenly
20
a) Azure Monitor
21
b) Perform advanced transformations with Spark
22
b) Processing large-scale distributed datasets
23
b) Apache Spark clusters
24
b) Scalability and parallelism for big data processing
25
b) A Databricks cluster or equivalent compute resource
26
a) Performing complex transformations with minimal code