This chapter delves into Hadoop Distributed File System (HDFS) storage concepts, including file storage structure, file writing and reading processes, and the significance of file block size and replication factor for data reliability and performance.
Topic 1: HDFS File Storage Structure
What is the main unit of storage in HDFS? a) Block b) File c) Directory d) Node
Which of the following components manages file storage in HDFS? a) HDFS Client b) DataNode c) NameNode d) ResourceManager
How is a file stored in HDFS? a) As a single file on one machine b) As blocks across multiple nodes c) In a centralized database d) As partitions in memory
What is the role of the NameNode in HDFS? a) Storing actual data b) Managing the file system namespace c) Handling client requests for data d) Maintaining data replication
How does HDFS ensure fault tolerance in the file storage system? a) By storing data in multiple copies across different nodes b) By keeping files in a single node with backup c) By using encryption for files d) By allowing direct access to data files
Topic 2: Writing Files to HDFS
What command is used to copy files from a local system to HDFS? a) hdfs put b) hdfs copy c) hdfs move d) hdfs transfer
In HDFS, how are large files split? a) Into partitions b) Into blocks of a fixed size c) Into segments d) By using a hash function
When writing data to HDFS, what happens to the file before it is stored? a) It is compressed automatically b) It is split into multiple blocks c) It is encrypted for security d) It is validated against the file system
Which of the following is a feature of writing data in HDFS? a) Data is written sequentially to a single block b) Data is written in parallel across different nodes c) Data is split only after reaching a certain size d) Data is split and written synchronously
How does HDFS handle write requests to the same file from multiple clients? a) HDFS allows concurrent writes to the same file b) HDFS serializes the write operations c) HDFS locks the file until the write completes d) HDFS writes are always discarded if the file is open
Topic 3: Reading Files from HDFS
How does HDFS read a file that has been split into blocks? a) It reads each block sequentially from one node b) It reads blocks in parallel from different DataNodes c) It reconstructs the file from its metadata d) It waits for the full file to be available before reading
What happens when a block is not available while reading a file in HDFS? a) The file is corrupted and cannot be accessed b) The system retries reading the block from another replica c) The file is deleted d) The read request fails without retries
How are blocks retrieved in the HDFS read process? a) From the NameNode b) From the DataNode where they are stored c) From the client machine d) From the ResourceManager
Which HDFS command is used to read a file stored in HDFS? a) hdfs get b) hdfs read c) hdfs cat d) hdfs fetch
In case of a block failure, what does HDFS do to ensure data availability? a) It immediately starts a repair operation b) It serves the block from another replica c) It blocks further reads until the block is repaired d) It ignores the failure and serves partial data
Topic 4: Understanding File Block Size and Replication Factor
What is the default block size in HDFS? a) 16MB b) 64MB c) 128MB d) 256MB
How can the block size of a file be configured in HDFS? a) By modifying the NameNode configuration b) By setting the block size during file creation c) By using the hdfs set command d) By setting the block size in the client configuration
What is the primary reason for setting a replication factor in HDFS? a) To improve file compression b) To enhance security c) To ensure data availability and fault tolerance d) To reduce file access times
How many replicas are typically stored for each block in a default HDFS setup? a) 1 b) 2 c) 3 d) 4
What happens if the replication factor of a block falls below the configured number? a) The system alerts the administrator and tries to replicate the block b) The block is deleted automatically c) The block becomes unavailable for reading d) The block is replicated only when required
How does HDFS manage block replication? a) The NameNode replicates blocks based on user requests b) The DataNode takes care of replicating blocks automatically c) HDFS does not replicate blocks d) The client is responsible for replicating blocks
What effect does increasing the block size in HDFS have on performance? a) It increases read and write latency b) It reduces the overhead of block management c) It improves fault tolerance d) It reduces the file size
In HDFS, what is the role of DataNodes with respect to block storage? a) Storing block metadata b) Storing file blocks and serving them to clients c) Managing replication d) Running the NameNode
Can the block size of an existing file be changed in HDFS? a) Yes, by using the hdfs resize command b) No, the block size is fixed once the file is written c) Yes, but only by overwriting the file d) No, it can only be changed during file read
Which of the following is a consequence of having too many replicas of a file in HDFS? a) It leads to reduced storage efficiency b) It increases file access speed c) It improves fault tolerance d) It reduces the network load
How does HDFS handle file block replication in the case of DataNode failure? a) It fails to serve any blocks b) It replicates the blocks from other DataNodes to maintain replication factor c) It increases the block size automatically d) It stops further writes to the file
What determines the number of replicas in HDFS? a) The client configuration b) The block size c) The replication factor set in the HDFS configuration d) The data type
What is the impact of a large replication factor on HDFS performance? a) It increases storage overhead b) It reduces fault tolerance c) It decreases storage efficiency d) It increases read performance
How is the replication factor in HDFS adjusted? a) By changing the configuration file b) By using the hdfs dfs -setrep command c) By modifying the file system properties in the client d) By increasing block size
What is the ideal replication factor for a high-availability HDFS setup? a) 1 b) 2 c) 3 d) 4
Answers Table
QNo
Answer
1
a) Block
2
c) NameNode
3
b) Into blocks across multiple nodes
4
b) Managing the file system namespace
5
a) By storing data in multiple copies across different nodes
6
a) hdfs put
7
b) Into blocks of a fixed size
8
b) It is split into multiple blocks
9
b) Data is written in parallel across different nodes
10
b) HDFS serializes the write operations
11
b) It reads blocks in parallel from different DataNodes
12
b) The system retries reading the block from another replica
13
b) From the DataNode where they are stored
14
c) hdfs cat
15
b) It serves the block from another replica
16
c) 128MB
17
b) By setting the block size during file creation
18
c) To ensure data availability and fault tolerance
19
c) 3
20
a) The system alerts the administrator and tries to replicate the block
21
b) The DataNode takes care of replicating blocks automatically
22
b) It reduces the overhead of block management
23
b) Storing file blocks and serving them to clients
24
b) No, the block size is fixed once the file is written
25
a) It leads to reduced storage efficiency
26
b) It replicates the blocks from other DataNodes to maintain replication factor
27
c) The replication factor set in the HDFS configuration