Dive into these AWS Glue MCQ questions and answers to strengthen your understanding of ETL concepts, Glue Studio, and data transformations. Chapter 4 covers creating ETL jobs, scripting transformations, and leveraging AWS Glue Studio. This set is perfect for enhancing your skills in AWS Glue and data processing workflows.
MCQs on Creating ETL Jobs in AWS Glue
AWS Glue is primarily designed for: a) Data storage b) Extracting, transforming, and loading data c) Running machine learning models d) Managing EC2 instances
What is the main component of an AWS Glue ETL process? a) Glue Catalog b) Glue Job c) Glue Crawler d) Glue API
Glue jobs are executed using: a) AWS Lambda functions b) Spark runtime environment c) EC2 instances d) Step Functions
AWS Glue supports which programming languages for ETL scripts? a) Java and Ruby b) Python and Scala c) C++ and PHP d) Node.js and Go
What is the purpose of an AWS Glue Crawler? a) To store ETL scripts b) To create metadata tables in the Glue Data Catalog c) To monitor Glue Jobs d) To manage IAM roles
Which of the following AWS services integrates with AWS Glue for storing ETL results? a) S3 b) DynamoDB c) RDS d) CloudWatch
AWS Glue ETL jobs can be triggered by: a) Manually invoking them only b) Scheduled triggers or event-based triggers c) EC2 instance startups d) VPC flow logs
To handle large datasets in AWS Glue, which job parameter is commonly configured? a) Timeout b) Worker type and number c) IAM role d) Glue version
MCQs on Working with Glue Studio
AWS Glue Studio provides a graphical interface for: a) Managing Glue Catalogs b) Building and monitoring ETL workflows c) Running Lambda functions d) Configuring EC2 instances
What feature of AWS Glue Studio simplifies ETL job creation? a) Pre-built connectors for data sources b) Automated database indexing c) Machine learning integration d) Workflow testing
Glue Studio allows users to create jobs using: a) SQL Queries b) Drag-and-drop interface or custom code c) CloudFormation templates d) AWS CLI commands
In AWS Glue Studio, transformations can be added using: a) SparkSQL b) Visual nodes in the job graph c) DataSync policies d) API Gateway integrations
Glue Studio can preview data transformations using: a) Sample datasets b) Real-time logs c) Test EC2 instances d) RDS snapshots
Glue Studio provides a visual representation of: a) ETL workflows and transformations b) IAM policy usage c) CloudWatch metrics d) EC2 instance metrics
Which type of job does Glue Studio support creating? a) Batch processing jobs only b) Both batch and streaming jobs c) Machine learning inference jobs d) Event-driven Lambda jobs
Glue Studio jobs can be monitored via: a) CloudWatch dashboards b) The Glue Console’s job run history c) Step Function workflows d) DynamoDB streams
MCQs on Data Transformations and Scripts
In AWS Glue, a data transformation is: a) A process of modifying, filtering, or enriching data b) Storing data in a database c) Encrypting data at rest d) Setting IAM policies
What is a common transformation in AWS Glue? a) Filtering rows based on conditions b) Launching EC2 instances c) Updating IAM roles d) Archiving logs
AWS Glue provides dynamic frame APIs for: a) Data transformation and schema management b) Visualizing data on dashboards c) Creating IAM policies d) Encrypting data
Which library is used by Glue to simplify data transformation in Python? a) Pandas b) PySpark c) Numpy d) Scikit-learn
To handle schema evolution during transformations, Glue uses: a) DynamicFrames b) StaticTables c) SparkSQL queries d) Lambda functions
What is the main advantage of writing custom scripts in AWS Glue? a) Enables fine-grained control over ETL processes b) Improves IAM security c) Reduces data storage costs d) Simplifies workflow deployment
AWS Glue scripts can be developed in: a) AWS Lambda console b) Jupyter notebooks or Glue Console c) EC2 instances only d) CloudTrail dashboard
A PySpark script in AWS Glue typically starts by creating: a) A DataFrame b) A DynamicFrame c) An EC2 instance d) An S3 bucket
MCQs on Advanced ETL Techniques
What is a bookmark in AWS Glue? a) A marker to track processed data in incremental loads b) A log entry for failed jobs c) A tag for identifying workflows d) An IAM role assigned to the job
Glue bookmarks are useful for: a) Avoiding reprocessing of already processed data b) Optimizing job execution time c) Managing IAM policies d) Setting up CloudWatch alarms
Which transformation is used for data aggregation in Glue? a) GroupBy b) Flatten c) Join d) Map
What is the best practice for handling errors in Glue scripts? a) Add error-handling logic in the script b) Disable error logging c) Use IAM roles to resolve errors d) Restart the job
Glue supports joining datasets by using: a) SQL-style joins in scripts b) Predefined CloudFormation templates c) IAM roles d) S3 bucket configurations
To process streaming data with AWS Glue, you must use: a) Glue Streaming Jobs b) Glue Batch Jobs c) EC2 Data Processors d) RDS triggers
Answer Key
QNo
Answer
1
b) Extracting, transforming, and loading data
2
b) Glue Job
3
b) Spark runtime environment
4
b) Python and Scala
5
b) To create metadata tables in the Glue Data Catalog
6
a) S3
7
b) Scheduled triggers or event-based triggers
8
b) Worker type and number
9
b) Building and monitoring ETL workflows
10
a) Pre-built connectors for data sources
11
b) Drag-and-drop interface or custom code
12
b) Visual nodes in the job graph
13
a) Sample datasets
14
a) ETL workflows and transformations
15
b) Both batch and streaming jobs
16
b) The Glue Console’s job run history
17
a) A process of modifying, filtering, or enriching data
18
a) Filtering rows based on conditions
19
a) Data transformation and schema management
20
b) PySpark
21
a) DynamicFrames
22
a) Enables fine-grained control over ETL processes
23
b) Jupyter notebooks or Glue Console
24
b) A DynamicFrame
25
a) A marker to track processed data in incremental loads
26
a) Avoiding reprocessing of already processed data