Efficiently handling data is a cornerstone of working with Snowflake. Chapter 4 delves into bulk loading data using the powerful COPY command, explores supported file formats like CSV, JSON, and Parquet, and guides you on importing data from cloud storage solutions (S3, GCS, Azure). Additionally, it covers strategies for unloading and exporting data to external destinations. Here’s a set of 30 multiple-choice questions (MCQs) to test your understanding of these concepts.
Bulk Loading Data Using COPY Command
What is the primary purpose of the COPY INTO command in Snowflake? a) To export data to a local machine b) To load data into Snowflake tables c) To perform analytics on data d) To delete rows from a table
Which keyword in the COPY INTO command is used to specify the source data file location? a) FROM b) USING c) LOCATION d) INTO
When using the COPY command, which option helps validate the data without loading it? a) VALIDATE_ONLY b) NO_LOAD c) PREVIEW d) TEST
Which of the following is true about the ON_ERROR parameter in the COPY command? a) It defines how errors are logged b) It specifies what to do when errors occur during data loading c) It determines the format of the output d) It is optional and not recommended
The COPY command supports which types of files for loading data? a) Only CSV b) CSV and Parquet c) Various formats including CSV, JSON, and Parquet d) None of the above
Data File Formats (CSV, JSON, Parquet)
Which of the following is a columnar storage format? a) CSV b) JSON c) Parquet d) XML
What does the FIELD_OPTIONALLY_ENCLOSED_BY parameter do in a CSV file format? a) Specifies the delimiter for columns b) Indicates the character used for enclosing fields c) Specifies the encoding standard d) Defines the maximum row size
JSON files are primarily used for storing: a) Tabular data b) Key-value pairs and hierarchical structures c) Binary data d) Images
Which file format is most suitable for analytics on large datasets in Snowflake? a) CSV b) JSON c) Parquet d) Excel
Parquet files store data: a) Row-wise b) Columnar-wise c) As a binary tree d) In plain text
Loading from Cloud Storage (S3, GCS, Azure)
What is required to load data from AWS S3 into Snowflake? a) A storage integration or access credentials b) A Google Cloud Platform account c) Azure Data Lake access d) Local file system permissions
Which Snowflake command is used to stage data in cloud storage? a) LOAD_DATA b) PUT c) CREATE_STAGE d) IMPORT
To load data from GCS, you need: a) A Google Cloud Storage bucket and credentials b) An S3 bucket c) Azure Blob Storage integration d) A REST API
Azure Blob Storage integration with Snowflake uses: a) SAS tokens b) AWS CLI c) OAuth d) Snowpipe
Which file format supports efficient compression for loading from cloud storage? a) CSV b) JSON c) Parquet d) XML
Data Unloading and Exporting
The COPY INTO command for unloading data is used to: a) Load data into a table b) Export data from a Snowflake table to an external location c) Create new tables d) Merge tables
Data can be unloaded to: a) Only local storage b) External cloud storage like S3, GCS, and Azure c) Only to Excel files d) Only to a local database
Which parameter is critical for defining the file format while unloading data? a) FILE_FORMAT b) STORAGE_LOCATION c) ACCESS_ROLE d) DATA_COMPRESSION
During unloading, the OVERWRITE parameter is used to: a) Append data to existing files b) Remove existing files in the destination c) Optimize file size d) Generate summary reports
What is the default file format for unloaded data? a) Parquet b) CSV c) JSON d) XML
General Concepts
What is a Snowflake stage? a) A temporary table b) A data loading area c) A database schema d) A predefined set of queries
Internal stages in Snowflake are: a) Managed by Snowflake for file storage b) User-defined c) Third-party tools d) Deprecated
To secure data during unloading, you should use: a) SSL/TLS encryption b) Password protection c) Encryption keys provided by the cloud provider d) All of the above
Multi-part files in Snowflake unloading are useful for: a) Small datasets b) Optimizing load balancing and parallel processing c) Preventing data duplication d) Manual editing of files
Metadata for unloading files can be managed through: a) Snowflake streams b) Stage listings c) File headers d) External cataloging tools
Advanced Techniques
Using Snowpipe for loading data enables: a) Real-time or near real-time data ingestion b) Archiving of historical data c) Batch processing of files d) Manual data loading
Which option improves performance during data loading? a) Splitting large files into smaller chunks b) Compressing the files using gzip c) Storing files closer to Snowflake regions d) All of the above
Unloading compressed data can help: a) Save storage costs b) Decrease processing speed c) Add redundant information d) Make the data human-readable
AUTO_COMPRESS in the COPY INTO command: a) Enables automatic compression of unloaded data b) Automatically decompresses files during loading c) Specifies error-handling modes d) Deletes unwanted rows
Partitioning data during unloading is used for: a) Reducing file sizes for small datasets b) Optimizing storage and query efficiency c) Creating a data backup d) Enforcing access controls
Answers Table
QNo
Answer
1
b) To load data into Snowflake tables
2
a) FROM
3
b) NO_LOAD
4
b) It specifies what to do when errors occur during data loading
5
c) Various formats including CSV, JSON, and Parquet
6
c) Parquet
7
b) Indicates the character used for enclosing fields
8
b) Key-value pairs and hierarchical structures
9
c) Parquet
10
b) Columnar-wise
11
a) A storage integration or access credentials
12
c) CREATE_STAGE
13
a) A Google Cloud Storage bucket and credentials
14
a) SAS tokens
15
c) Parquet
16
b) Export data from a Snowflake table to an external location
17
b) External cloud storage like S3, GCS, and Azure
18
a) FILE_FORMAT
19
b) Remove existing files in the destination
20
b) CSV
21
b) A data loading area
22
a) Managed by Snowflake for file storage
23
d) All of the above
24
b) Optimizing load balancing and parallel processing