Explore critical concepts in Hadoop Distributed File System (HDFS), including configuring High Availability (HA), Zookeeper’s role, Standby NameNode setup, and robust fault tolerance mechanisms to ensure seamless data reliability and accessibility.
Topic 1: Configuring HDFS High Availability (HA)
What is the primary purpose of High Availability (HA) in HDFS? a) To improve data compression b) To reduce replication c) To ensure NameNode availability during failures d) To increase block size
Which component enables seamless failover in HDFS High Availability? a) DataNode b) ResourceManager c) Zookeeper d) Standby NameNode
How many NameNodes are typically configured in an HA-enabled HDFS cluster? a) One active and one standby b) Only one active c) Two active nodes d) Multiple standby nodes
What is shared between the active and standby NameNodes in HDFS HA? a) Block reports b) Metadata c) Data blocks d) Configuration files
In HA configuration, what happens when the active NameNode fails? a) The cluster halts operations b) Zookeeper selects a new active NameNode c) Data is transferred to another cluster d) The system switches to a backup cluster
Which protocol is used for communication between active and standby NameNodes? a) HDFS Protocol b) Journal Protocol c) Failover Protocol d) RPC Protocol
What ensures consistent state synchronization between NameNodes in HA? a) Block replication b) Edit logs stored in shared storage c) Periodic data snapshots d) DataNode heartbeats
What command is used to manually failover between NameNodes in HA? a) hdfs failover b) hdfs haadmin -failover c) hdfs switch d) hdfs toggle
What is a critical component of HA shared storage? a) Data blocks b) Quorum Journal Manager c) ResourceManager d) Checkpoints
In HDFS HA, which feature prevents split-brain scenarios? a) Data replication b) Zookeeper quorum management c) Multiple active NameNodes d) Dynamic block allocation
Topic 2: Role of Zookeeper in HDFS High Availability
What is the primary role of Zookeeper in HDFS High Availability? a) Storing file data b) Coordinating NameNode failover c) Managing block replication d) Storing metadata
How does Zookeeper detect a NameNode failure? a) By monitoring edit logs b) Through heartbeats from NameNodes c) By analyzing block reports d) Using DataNode feedback
In HDFS HA, what does Zookeeper maintain to ensure high availability? a) Namespace snapshots b) Cluster health reports c) A list of active and standby NameNodes d) A checkpoint of the file system
What is required to set up Zookeeper for HA in HDFS? a) At least three Zookeeper nodes for quorum b) A single centralized Zookeeper node c) An additional DataNode d) Exclusive hardware for Zookeeper
How does Zookeeper handle network partition in HDFS HA? a) By stopping all NameNodes b) By maintaining a quorum to decide the active NameNode c) By assigning a new DataNode as the active node d) By reducing replication factor
Topic 3: Standby NameNode Configuration
What is the role of the Standby NameNode in HDFS? a) Storing data blocks b) Monitoring DataNode health c) Synchronizing metadata with the active NameNode d) Acting as a backup DataNode
How does the Standby NameNode stay updated with the active NameNode? a) By replicating block reports b) By reading shared edit logs c) By storing snapshots of the file system d) By directly accessing DataNodes
Can a Standby NameNode handle client requests in HDFS HA? a) Yes, it serves all read and write requests b) No, it only handles metadata synchronization c) Yes, but only for read requests d) No, it remains passive
What component allows a Standby NameNode to transition to active mode? a) Heartbeats from DataNodes b) Zookeeper quorum notifications c) Shared data blocks d) HDFS client requests
What configuration file is essential for setting up a Standby NameNode? a) core-site.xml b) hdfs-site.xml c) mapred-site.xml d) yarn-site.xml
Topic 4: Fault Tolerance Mechanisms in HDFS
What is the primary mechanism HDFS uses to achieve fault tolerance? a) Data partitioning b) Replication of data blocks c) Distributed metadata storage d) File compression
How many replicas are created by default for each block in HDFS? a) 2 b) 3 c) 4 d) 5
What happens when a DataNode fails in HDFS? a) The cluster halts operations b) Blocks are re-replicated to other DataNodes c) Files stored in that node are lost d) Metadata is recreated by the NameNode
Which mechanism ensures the replication factor is maintained in HDFS? a) Periodic metadata updates b) DataNode heartbeats c) Zookeeper notifications d) HDFS manual replication
How does HDFS handle corrupted blocks? a) By deleting the corrupted block b) By replacing the block from another replica c) By halting client access d) By marking the file as corrupted
What ensures metadata recovery during NameNode failure in HDFS? a) Checkpointing by Secondary NameNode b) Zookeeper notifications c) DataNode logs d) Replication of block reports
How is data integrity verified in HDFS? a) By Zookeeper validation b) By checksum verification c) By metadata replication d) By periodic DataNode validation
What is the effect of increasing the replication factor in HDFS? a) Reduced data reliability b) Improved fault tolerance c) Reduced storage requirements d) Faster write operations
What happens when the replication factor of a file falls below the configured value? a) The file becomes inaccessible b) HDFS replicates the blocks to other DataNodes c) HDFS deletes the file automatically d) HDFS reduces the block size
What component periodically reports block health to the NameNode? a) ResourceManager b) DataNode c) Zookeeper d) Secondary NameNode
Answers Table
QNo
Answer
1
c) To ensure NameNode availability during failures
2
c) Zookeeper
3
a) One active and one standby
4
b) Metadata
5
b) Zookeeper selects a new active NameNode
6
b) Journal Protocol
7
b) Edit logs stored in shared storage
8
b) hdfs haadmin -failover
9
b) Quorum Journal Manager
10
b) Zookeeper quorum management
11
b) Coordinating NameNode failover
12
b) Through heartbeats from NameNodes
13
c) A list of active and standby NameNodes
14
a) At least three Zookeeper nodes for quorum
15
b) By maintaining a quorum to decide the active NameNode
16
c) Synchronizing metadata with the active NameNode