AWS Glue is a fully managed ETL (Extract, Transform, Load) service that simplifies data preparation for analytics. Chapter 2 focuses on configuring AWS Glue, including environment setup, using the AWS Glue Console and CLI, and understanding IAM roles and permissions. Explore these AWS Glue MCQ questions and answers to enhance your knowledge.
MCQs: Prerequisites and Environment Setup
What is a prerequisite for using AWS Glue? a) A VPC configuration b) S3 bucket for data storage c) Enabling DynamoDB streams d) Setting up EC2 instances
Which programming languages does AWS Glue support for writing ETL scripts? a) Java and Python b) Python and Scala c) Python and Ruby d) Scala and JavaScript
What is the default location for storing AWS Glue scripts? a) AWS Glue Console b) Amazon DynamoDB c) Amazon S3 d) AWS CloudFormation
How do you set up a connection for accessing data sources in AWS Glue? a) Through AWS Lambda b) By creating a connection in the AWS Glue Console c) By enabling AWS Secrets Manager d) Using the AWS Glue CLI
What is the purpose of a data catalog in AWS Glue? a) To store raw data b) To maintain metadata about your data c) To host ETL jobs d) To handle schema migration
Before running AWS Glue jobs, what must be configured in the environment? a) S3 buckets and Lambda functions b) IAM roles and Glue Data Catalog c) EC2 instances and EBS volumes d) RDS databases
What network configuration is necessary for AWS Glue to connect to on-premises databases? a) Elastic Load Balancer setup b) Direct Connect or VPN setup c) Lambda integration d) API Gateway configuration
Which AWS service is commonly used with AWS Glue for storing extracted data? a) Amazon RDS b) Amazon S3 c) AWS DynamoDB d) Amazon EC2
What type of database is required for enabling Glue Data Catalog integration with Athena? a) NoSQL database b) Relational database c) MySQL-compatible database d) None; it uses Glue Data Catalog directly
How is the Glue Python library installed for local development? a) Using pip to install glue-python b) By downloading from the AWS Glue Console c) Through the AWS CLI d) By setting up a Lambda layer
MCQs: AWS Glue Console and CLI
Which interface is used for managing AWS Glue resources? a) AWS Lambda Console b) AWS Glue Console c) CloudFormation Console d) Elastic Beanstalk Console
What does the “Jobs” section in the AWS Glue Console allow you to do? a) Manage Glue Data Catalog b) Configure IAM roles c) Create, edit, and run ETL jobs d) Monitor EC2 instances
How can you start an AWS Glue job from the command line? a) Using aws-glue-cli b) With the start-job command in the AWS CLI c) By running a Lambda function d) Through an S3 event trigger
What is the command to list Glue Data Catalog tables using the AWS CLI? a) aws glue get-databases b) aws glue list-tables c) aws glue list-databases d) aws glue get-tables
How do you monitor the progress of AWS Glue jobs in the Console? a) Check logs in Amazon CloudWatch b) Use the “Triggers” section c) Access the AWS Lambda Console d) View job status in the “Jobs” section
What type of scripts can be edited in the AWS Glue Console? a) CloudFormation templates b) Python or Scala ETL scripts c) Java Lambda functions d) Shell scripts
What does the AWS Glue CLI allow you to do? a) Deploy applications on EC2 b) Perform all tasks that can be done in the AWS Glue Console c) Set up serverless databases d) Monitor S3 events
How can you trigger a Glue job manually using the Console? a) By configuring IAM policies b) By selecting the job and clicking “Run” c) By setting up an event bridge rule d) Using the AWS SDK
What is required to access the AWS Glue CLI? a) Access to an EC2 instance b) AWS IAM user credentials with Glue permissions c) Setting up a Glue endpoint d) A CloudFormation template
Which AWS Glue resource can be created using both the Console and CLI? a) Virtual Private Cloud (VPC) b) Data Catalog table c) Lambda function d) CloudTrail event
MCQs: IAM Roles and Permissions
What is the primary purpose of IAM roles in AWS Glue? a) To schedule Glue jobs b) To provide permissions for AWS Glue jobs to access resources c) To create Glue Data Catalogs d) To manage AWS billing
What is the default policy required for AWS Glue to read and write data in S3? a) AmazonS3FullAccess b) AWSGlueServiceRole c) GlueS3ReadWritePolicy d) AWSDataPipelineRole
Which policy ensures Glue jobs can access AWS Glue Data Catalog? a) AWSGlueConsoleFullAccess b) AWSGlueServiceRole c) GlueCatalogReadWritePolicy d) AmazonDynamoDBFullAccess
How do you restrict a Glue job from accessing a specific S3 bucket? a) Remove the Glue service role b) Attach a deny policy in IAM c) Use SCPs in AWS Organizations d) Revoke Glue Console permissions
What permission is needed to create triggers for Glue jobs? a) CloudWatch Logs permissions b) IAM role with glue:CreateTrigger c) AdministratorAccess policy d) GlueCatalogPolicy
What does the glue:BatchGetJobs permission allow? a) Access to Data Catalog metadata b) Retrieve details of specific Glue jobs c) Execute Glue jobs d) Monitor Glue job metrics
What must be included in a Glue service role to allow integration with Amazon Redshift? a) RedshiftDataPolicy b) S3ReadWriteRole c) GlueServiceRole for Redshift d) AmazonRedshiftFullAccess policy
Which IAM feature can restrict Glue job access to a specific VPC? a) IAM inline policies b) Resource-based policies c) VPC endpoint policies d) Glue trigger configurations
What AWS Glue-specific managed policy grants full Console access? a) AWSGlueServicePolicy b) AWSGlueFullAccess c) GlueDataAdminPolicy d) AWSGlueConsoleFullAccess
How can you ensure least privilege for AWS Glue jobs? a) Use the AWS Glue Administrator role b) Assign Glue-specific managed policies c) Grant only the permissions required for each task d) Allow full access to S3
Answers Table
Qno
Answer (Option with Text)
1
b) S3 bucket for data storage
2
b) Python and Scala
3
c) Amazon S3
4
b) By creating a connection in the AWS Glue Console
5
b) To maintain metadata about your data
6
b) IAM roles and Glue Data Catalog
7
b) Direct Connect or VPN setup
8
b) Amazon S3
9
d) None; it uses Glue Data Catalog directly
10
a) Using pip to install glue-python
11
b) AWS Glue Console
12
c) Create, edit, and run ETL jobs
13
b) With the start-job command in the AWS CLI
14
d) aws glue get-tables
15
a) Check logs in Amazon CloudWatch
16
b) Python or Scala ETL scripts
17
b) Perform all tasks that can be done in the AWS Glue Console
18
b) By selecting the job and clicking “Run”
19
b) AWS IAM user credentials with Glue permissions
20
b) Data Catalog table
21
b) To provide permissions for AWS Glue jobs to access resources
22
b) AWSGlueServiceRole
23
b) AWSGlueServiceRole
24
b) Attach a deny policy in IAM
25
b) IAM role with glue:CreateTrigger
26
b) Retrieve details of specific Glue jobs
27
d) AmazonRedshiftFullAccess policy
28
c) VPC endpoint policies
29
d) AWSGlueConsoleFullAccess
30
c) Grant only the permissions required for each task