MCQs on ETL Basics | AWS Glue MCQs Question

Dive into these AWS Glue MCQ questions and answers to strengthen your understanding of ETL concepts, Glue Studio, and data transformations. Chapter 4 covers creating ETL jobs, scripting transformations, and leveraging AWS Glue Studio. This set is perfect for enhancing your skills in AWS Glue and data processing workflows.


MCQs on Creating ETL Jobs in AWS Glue

  1. AWS Glue is primarily designed for:
    a) Data storage
    b) Extracting, transforming, and loading data
    c) Running machine learning models
    d) Managing EC2 instances
  2. What is the main component of an AWS Glue ETL process?
    a) Glue Catalog
    b) Glue Job
    c) Glue Crawler
    d) Glue API
  3. Glue jobs are executed using:
    a) AWS Lambda functions
    b) Spark runtime environment
    c) EC2 instances
    d) Step Functions
  4. AWS Glue supports which programming languages for ETL scripts?
    a) Java and Ruby
    b) Python and Scala
    c) C++ and PHP
    d) Node.js and Go
  5. What is the purpose of an AWS Glue Crawler?
    a) To store ETL scripts
    b) To create metadata tables in the Glue Data Catalog
    c) To monitor Glue Jobs
    d) To manage IAM roles
  6. Which of the following AWS services integrates with AWS Glue for storing ETL results?
    a) S3
    b) DynamoDB
    c) RDS
    d) CloudWatch
  7. AWS Glue ETL jobs can be triggered by:
    a) Manually invoking them only
    b) Scheduled triggers or event-based triggers
    c) EC2 instance startups
    d) VPC flow logs
  8. To handle large datasets in AWS Glue, which job parameter is commonly configured?
    a) Timeout
    b) Worker type and number
    c) IAM role
    d) Glue version

MCQs on Working with Glue Studio

  1. AWS Glue Studio provides a graphical interface for:
    a) Managing Glue Catalogs
    b) Building and monitoring ETL workflows
    c) Running Lambda functions
    d) Configuring EC2 instances
  2. What feature of AWS Glue Studio simplifies ETL job creation?
    a) Pre-built connectors for data sources
    b) Automated database indexing
    c) Machine learning integration
    d) Workflow testing
  3. Glue Studio allows users to create jobs using:
    a) SQL Queries
    b) Drag-and-drop interface or custom code
    c) CloudFormation templates
    d) AWS CLI commands
  4. In AWS Glue Studio, transformations can be added using:
    a) SparkSQL
    b) Visual nodes in the job graph
    c) DataSync policies
    d) API Gateway integrations
  5. Glue Studio can preview data transformations using:
    a) Sample datasets
    b) Real-time logs
    c) Test EC2 instances
    d) RDS snapshots
  6. Glue Studio provides a visual representation of:
    a) ETL workflows and transformations
    b) IAM policy usage
    c) CloudWatch metrics
    d) EC2 instance metrics
  7. Which type of job does Glue Studio support creating?
    a) Batch processing jobs only
    b) Both batch and streaming jobs
    c) Machine learning inference jobs
    d) Event-driven Lambda jobs
  8. Glue Studio jobs can be monitored via:
    a) CloudWatch dashboards
    b) The Glue Console’s job run history
    c) Step Function workflows
    d) DynamoDB streams

MCQs on Data Transformations and Scripts

  1. In AWS Glue, a data transformation is:
    a) A process of modifying, filtering, or enriching data
    b) Storing data in a database
    c) Encrypting data at rest
    d) Setting IAM policies
  2. What is a common transformation in AWS Glue?
    a) Filtering rows based on conditions
    b) Launching EC2 instances
    c) Updating IAM roles
    d) Archiving logs
  3. AWS Glue provides dynamic frame APIs for:
    a) Data transformation and schema management
    b) Visualizing data on dashboards
    c) Creating IAM policies
    d) Encrypting data
  4. Which library is used by Glue to simplify data transformation in Python?
    a) Pandas
    b) PySpark
    c) Numpy
    d) Scikit-learn
  5. To handle schema evolution during transformations, Glue uses:
    a) DynamicFrames
    b) StaticTables
    c) SparkSQL queries
    d) Lambda functions
  6. What is the main advantage of writing custom scripts in AWS Glue?
    a) Enables fine-grained control over ETL processes
    b) Improves IAM security
    c) Reduces data storage costs
    d) Simplifies workflow deployment
  7. AWS Glue scripts can be developed in:
    a) AWS Lambda console
    b) Jupyter notebooks or Glue Console
    c) EC2 instances only
    d) CloudTrail dashboard
  8. A PySpark script in AWS Glue typically starts by creating:
    a) A DataFrame
    b) A DynamicFrame
    c) An EC2 instance
    d) An S3 bucket

MCQs on Advanced ETL Techniques

  1. What is a bookmark in AWS Glue?
    a) A marker to track processed data in incremental loads
    b) A log entry for failed jobs
    c) A tag for identifying workflows
    d) An IAM role assigned to the job
  2. Glue bookmarks are useful for:
    a) Avoiding reprocessing of already processed data
    b) Optimizing job execution time
    c) Managing IAM policies
    d) Setting up CloudWatch alarms
  3. Which transformation is used for data aggregation in Glue?
    a) GroupBy
    b) Flatten
    c) Join
    d) Map
  4. What is the best practice for handling errors in Glue scripts?
    a) Add error-handling logic in the script
    b) Disable error logging
    c) Use IAM roles to resolve errors
    d) Restart the job
  5. Glue supports joining datasets by using:
    a) SQL-style joins in scripts
    b) Predefined CloudFormation templates
    c) IAM roles
    d) S3 bucket configurations
  6. To process streaming data with AWS Glue, you must use:
    a) Glue Streaming Jobs
    b) Glue Batch Jobs
    c) EC2 Data Processors
    d) RDS triggers

Answer Key

QNoAnswer
1b) Extracting, transforming, and loading data
2b) Glue Job
3b) Spark runtime environment
4b) Python and Scala
5b) To create metadata tables in the Glue Data Catalog
6a) S3
7b) Scheduled triggers or event-based triggers
8b) Worker type and number
9b) Building and monitoring ETL workflows
10a) Pre-built connectors for data sources
11b) Drag-and-drop interface or custom code
12b) Visual nodes in the job graph
13a) Sample datasets
14a) ETL workflows and transformations
15b) Both batch and streaming jobs
16b) The Glue Console’s job run history
17a) A process of modifying, filtering, or enriching data
18a) Filtering rows based on conditions
19a) Data transformation and schema management
20b) PySpark
21a) DynamicFrames
22a) Enables fine-grained control over ETL processes
23b) Jupyter notebooks or Glue Console
24b) A DynamicFrame
25a) A marker to track processed data in incremental loads
26a) Avoiding reprocessing of already processed data
27a) GroupBy
28a) Add error-handling logic in the script
29a) SQL-style joins in scripts
30a) Glue Streaming Jobs

Use a Blank Sheet, Note your Answers and Finally tally with our answer at last. Give Yourself Score.

X
error: Content is protected !!
Scroll to Top