MCQs on File Storage in HDFS | Hadoop HDFS

This chapter delves into Hadoop Distributed File System (HDFS) storage concepts, including file storage structure, file writing and reading processes, and the significance of file block size and replication factor for data reliability and performance.


Topic 1: HDFS File Storage Structure

  1. What is the main unit of storage in HDFS?
    a) Block
    b) File
    c) Directory
    d) Node
  2. Which of the following components manages file storage in HDFS?
    a) HDFS Client
    b) DataNode
    c) NameNode
    d) ResourceManager
  3. How is a file stored in HDFS?
    a) As a single file on one machine
    b) As blocks across multiple nodes
    c) In a centralized database
    d) As partitions in memory
  4. What is the role of the NameNode in HDFS?
    a) Storing actual data
    b) Managing the file system namespace
    c) Handling client requests for data
    d) Maintaining data replication
  5. How does HDFS ensure fault tolerance in the file storage system?
    a) By storing data in multiple copies across different nodes
    b) By keeping files in a single node with backup
    c) By using encryption for files
    d) By allowing direct access to data files

Topic 2: Writing Files to HDFS

  1. What command is used to copy files from a local system to HDFS?
    a) hdfs put
    b) hdfs copy
    c) hdfs move
    d) hdfs transfer
  2. In HDFS, how are large files split?
    a) Into partitions
    b) Into blocks of a fixed size
    c) Into segments
    d) By using a hash function
  3. When writing data to HDFS, what happens to the file before it is stored?
    a) It is compressed automatically
    b) It is split into multiple blocks
    c) It is encrypted for security
    d) It is validated against the file system
  4. Which of the following is a feature of writing data in HDFS?
    a) Data is written sequentially to a single block
    b) Data is written in parallel across different nodes
    c) Data is split only after reaching a certain size
    d) Data is split and written synchronously
  5. How does HDFS handle write requests to the same file from multiple clients?
    a) HDFS allows concurrent writes to the same file
    b) HDFS serializes the write operations
    c) HDFS locks the file until the write completes
    d) HDFS writes are always discarded if the file is open

Topic 3: Reading Files from HDFS

  1. How does HDFS read a file that has been split into blocks?
    a) It reads each block sequentially from one node
    b) It reads blocks in parallel from different DataNodes
    c) It reconstructs the file from its metadata
    d) It waits for the full file to be available before reading
  2. What happens when a block is not available while reading a file in HDFS?
    a) The file is corrupted and cannot be accessed
    b) The system retries reading the block from another replica
    c) The file is deleted
    d) The read request fails without retries
  3. How are blocks retrieved in the HDFS read process?
    a) From the NameNode
    b) From the DataNode where they are stored
    c) From the client machine
    d) From the ResourceManager
  4. Which HDFS command is used to read a file stored in HDFS?
    a) hdfs get
    b) hdfs read
    c) hdfs cat
    d) hdfs fetch
  5. In case of a block failure, what does HDFS do to ensure data availability?
    a) It immediately starts a repair operation
    b) It serves the block from another replica
    c) It blocks further reads until the block is repaired
    d) It ignores the failure and serves partial data

Topic 4: Understanding File Block Size and Replication Factor

  1. What is the default block size in HDFS?
    a) 16MB
    b) 64MB
    c) 128MB
    d) 256MB
  2. How can the block size of a file be configured in HDFS?
    a) By modifying the NameNode configuration
    b) By setting the block size during file creation
    c) By using the hdfs set command
    d) By setting the block size in the client configuration
  3. What is the primary reason for setting a replication factor in HDFS?
    a) To improve file compression
    b) To enhance security
    c) To ensure data availability and fault tolerance
    d) To reduce file access times
  4. How many replicas are typically stored for each block in a default HDFS setup?
    a) 1
    b) 2
    c) 3
    d) 4
  5. What happens if the replication factor of a block falls below the configured number?
    a) The system alerts the administrator and tries to replicate the block
    b) The block is deleted automatically
    c) The block becomes unavailable for reading
    d) The block is replicated only when required
  6. How does HDFS manage block replication?
    a) The NameNode replicates blocks based on user requests
    b) The DataNode takes care of replicating blocks automatically
    c) HDFS does not replicate blocks
    d) The client is responsible for replicating blocks
  7. What effect does increasing the block size in HDFS have on performance?
    a) It increases read and write latency
    b) It reduces the overhead of block management
    c) It improves fault tolerance
    d) It reduces the file size
  8. In HDFS, what is the role of DataNodes with respect to block storage?
    a) Storing block metadata
    b) Storing file blocks and serving them to clients
    c) Managing replication
    d) Running the NameNode
  9. Can the block size of an existing file be changed in HDFS?
    a) Yes, by using the hdfs resize command
    b) No, the block size is fixed once the file is written
    c) Yes, but only by overwriting the file
    d) No, it can only be changed during file read
  10. Which of the following is a consequence of having too many replicas of a file in HDFS?
    a) It leads to reduced storage efficiency
    b) It increases file access speed
    c) It improves fault tolerance
    d) It reduces the network load
  11. How does HDFS handle file block replication in the case of DataNode failure?
    a) It fails to serve any blocks
    b) It replicates the blocks from other DataNodes to maintain replication factor
    c) It increases the block size automatically
    d) It stops further writes to the file
  12. What determines the number of replicas in HDFS?
    a) The client configuration
    b) The block size
    c) The replication factor set in the HDFS configuration
    d) The data type
  13. What is the impact of a large replication factor on HDFS performance?
    a) It increases storage overhead
    b) It reduces fault tolerance
    c) It decreases storage efficiency
    d) It increases read performance
  14. How is the replication factor in HDFS adjusted?
    a) By changing the configuration file
    b) By using the hdfs dfs -setrep command
    c) By modifying the file system properties in the client
    d) By increasing block size
  15. What is the ideal replication factor for a high-availability HDFS setup?
    a) 1
    b) 2
    c) 3
    d) 4

Answers Table

QNoAnswer
1a) Block
2c) NameNode
3b) Into blocks across multiple nodes
4b) Managing the file system namespace
5a) By storing data in multiple copies across different nodes
6a) hdfs put
7b) Into blocks of a fixed size
8b) It is split into multiple blocks
9b) Data is written in parallel across different nodes
10b) HDFS serializes the write operations
11b) It reads blocks in parallel from different DataNodes
12b) The system retries reading the block from another replica
13b) From the DataNode where they are stored
14c) hdfs cat
15b) It serves the block from another replica
16c) 128MB
17b) By setting the block size during file creation
18c) To ensure data availability and fault tolerance
19c) 3
20a) The system alerts the administrator and tries to replicate the block
21b) The DataNode takes care of replicating blocks automatically
22b) It reduces the overhead of block management
23b) Storing file blocks and serving them to clients
24b) No, the block size is fixed once the file is written
25a) It leads to reduced storage efficiency
26b) It replicates the blocks from other DataNodes to maintain replication factor
27c) The replication factor set in the HDFS configuration
28a) It increases storage overhead
29b) By using the hdfs dfs -setrep command
30c) 3

Use a Blank Sheet, Note your Answers and Finally tally with our answer at last. Give Yourself Score.

X
error: Content is protected !!
Scroll to Top