Explore 30 MCQs covering critical aspects of managing HDFS in a cluster. Topics include monitoring tools, configuring metrics, log analysis, and troubleshooting techniques to optimize HDFS performance and reliability.
Topic 1: Cluster Monitoring and Management Tools
Which tool is commonly used to monitor Hadoop cluster health and performance? a) Ganglia b) Tableau c) Power BI d) Google Analytics
What does Ambari provide for Hadoop cluster management? a) A graphical interface for monitoring and managing Hadoop clusters b) A command-line tool for file operations c) A debugging tool for MapReduce jobs d) A distributed storage monitoring tool
Which command-line utility is used to check the overall health of an HDFS cluster? a) hdfs fsck b) hdfs ls c) hdfs du d) hdfs copy
What type of data is displayed in the HDFS Web UI? a) Job execution plans b) Active DataNodes, Namenode health, and storage usage c) MapReduce log summaries d) Network traffic details
Which of the following is true about the ResourceManager in Hadoop? a) It handles HDFS storage failures b) It manages cluster resources for job execution c) It monitors DataNode storage usage d) It is a backup for the Namenode
How does Cloudera Manager help in HDFS management? a) By encrypting HDFS files b) By providing automated alerts for cluster issues c) By optimizing SQL queries in Hive d) By integrating HDFS with Spark
What is the primary use of the dfsadmin command? a) Managing HDFS metrics b) Performing administrative tasks on HDFS c) Monitoring YARN jobs d) Troubleshooting MapReduce issues
Which of the following is used to visualize HDFS metrics in real time? a) Apache Zeppelin b) Grafana c) Tableau d) Microsoft Excel
How can an administrator monitor disk usage of individual DataNodes in a cluster? a) By using the Namenode UI b) By running the jps command c) By using the HDFS balancer tool d) By reviewing the DataNode logs
Which tool integrates seamlessly with Hadoop for log aggregation and analysis? a) Logstash b) Tableau c) Jenkins d) Power BI
Topic 2: Configuring and Monitoring HDFS Metrics
Which property in the hdfs-site.xml file is used to configure the Namenode heap size? a) dfs.replication b) dfs.namenode.memory.size c) dfs.namenode.heap.size d) dfs.datanode.du.reserved
What is the role of JMX in HDFS monitoring? a) To provide a user-friendly GUI for HDFS management b) To expose runtime metrics for monitoring tools c) To replicate blocks across DataNodes d) To manage Namenode log files
Which metric measures the percentage of blocks that are under-replicated in HDFS? a) dfs.capacity.remaining b) dfs.replication.under c) dfs.blocks.missing d) dfs.block.underreplicated
What is the function of the dfsadmin -report command? a) To display storage capacity and utilization statistics of the cluster b) To list files in the HDFS root directory c) To show block replication details d) To check cluster configuration files
How are HDFS metrics exposed for external monitoring systems? a) Through JSON files stored on Namenodes b) Using REST APIs or JMX interfaces c) By generating CSV reports periodically d) Through shell scripts run on DataNodes
Which metric indicates the total disk capacity available in the cluster? a) dfs.disk.capacity.available b) dfs.capacity.used c) dfs.datanode.capacity d) dfs.capacity.remaining
How can an administrator monitor HDFS throughput in real time? a) Using the hadoop fsck command b) By enabling JMX and analyzing metrics in a monitoring tool c) Through Namenode logs d) By running periodic disk scans
What does the HDFS metric dfs.namenode.startup.time indicate? a) Namenode downtime b) Namenode initialization time c) DataNode failure detection time d) Block replication time
How can HDFS metrics be exported to a tool like Prometheus? a) By using a log parser b) By enabling a JMX exporter for HDFS metrics c) By writing custom scripts for metrics aggregation d) Through Namenode Web UI
Which of the following is critical for monitoring Namenode performance? a) Disk read latency b) Block scanning frequency c) Heap memory usage d) Cluster replication factor
Topic 3: Understanding DataNode and NameNode Logs
Which file contains the Namenode’s operational logs? a) namenode.log b) hadoop-namenode.log c) dfs-namenode.log d) namenode-operations.log
What does the error “Block missing exception” indicate in Namenode logs? a) Namenode memory overflow b) DataNode storage full c) A block with no replicas is detected d) Network latency
How can DataNode logs be accessed? a) By running hdfs fsck on the DataNode b) Through the Namenode Web UI c) By checking log files on the DataNode server d) By running the dfsadmin command
What does a “heartbeat lost” message in the Namenode logs indicate? a) A DataNode is removed from the cluster b) A DataNode is temporarily unreachable c) The Namenode has crashed d) Block corruption
How are HDFS logs typically stored in a Hadoop cluster? a) In a relational database b) As flat files on disk c) In the HDFS root directory d) In the Namenode metadata file
Which tool can be used to analyze Namenode and DataNode logs for troubleshooting? a) Logstash b) Tableau c) Grafana d) Apache Spark
How can log verbosity levels for Namenode logs be configured? a) By editing log4j.properties b) Using the HDFS Web UI c) By restarting the cluster in debug mode d) By changing the log file size
What is indicated by “BlockUnderConstructionException” in DataNode logs? a) Corrupted block b) Incomplete file write c) Disk failure d) Namenode failure
Which log file records the history of HDFS checkpoint operations? a) namenode-checkpoint.log b) hdfs-checkpoint.log c) secondarynamenode.log d) hadoop-checkpoint.log
What should an administrator check first when troubleshooting a failed DataNode? a) Disk space utilization b) DataNode log files c) Namenode configuration file d) Cluster replication factor
Answers Table
Qno
Answer (Option with the text)
1
a) Ganglia
2
a) A graphical interface for monitoring and managing Hadoop clusters
3
a) hdfs fsck
4
b) Active DataNodes, Namenode health, and storage usage
5
b) It manages cluster resources for job execution
6
b) By providing automated alerts for cluster issues
7
b) Performing administrative tasks on HDFS
8
b) Grafana
9
a) By using the Namenode UI
10
a) Logstash
11
c) dfs.namenode.heap.size
12
b) To expose runtime metrics for monitoring tools
13
d) dfs.block.underreplicated
14
a) To display storage capacity and utilization statistics of the cluster
15
b) Using REST APIs or JMX interfaces
16
d) dfs.capacity.remaining
17
b) By enabling JMX and analyzing metrics in a monitoring tool