MCQs on Managing HDFS in a Cluster | Hadoop HDFS

Explore 30 MCQs covering critical aspects of managing HDFS in a cluster. Topics include monitoring tools, configuring metrics, log analysis, and troubleshooting techniques to optimize HDFS performance and reliability.


Topic 1: Cluster Monitoring and Management Tools

  1. Which tool is commonly used to monitor Hadoop cluster health and performance?
    a) Ganglia
    b) Tableau
    c) Power BI
    d) Google Analytics
  2. What does Ambari provide for Hadoop cluster management?
    a) A graphical interface for monitoring and managing Hadoop clusters
    b) A command-line tool for file operations
    c) A debugging tool for MapReduce jobs
    d) A distributed storage monitoring tool
  3. Which command-line utility is used to check the overall health of an HDFS cluster?
    a) hdfs fsck
    b) hdfs ls
    c) hdfs du
    d) hdfs copy
  4. What type of data is displayed in the HDFS Web UI?
    a) Job execution plans
    b) Active DataNodes, Namenode health, and storage usage
    c) MapReduce log summaries
    d) Network traffic details
  5. Which of the following is true about the ResourceManager in Hadoop?
    a) It handles HDFS storage failures
    b) It manages cluster resources for job execution
    c) It monitors DataNode storage usage
    d) It is a backup for the Namenode
  6. How does Cloudera Manager help in HDFS management?
    a) By encrypting HDFS files
    b) By providing automated alerts for cluster issues
    c) By optimizing SQL queries in Hive
    d) By integrating HDFS with Spark
  7. What is the primary use of the dfsadmin command?
    a) Managing HDFS metrics
    b) Performing administrative tasks on HDFS
    c) Monitoring YARN jobs
    d) Troubleshooting MapReduce issues
  8. Which of the following is used to visualize HDFS metrics in real time?
    a) Apache Zeppelin
    b) Grafana
    c) Tableau
    d) Microsoft Excel
  9. How can an administrator monitor disk usage of individual DataNodes in a cluster?
    a) By using the Namenode UI
    b) By running the jps command
    c) By using the HDFS balancer tool
    d) By reviewing the DataNode logs
  10. Which tool integrates seamlessly with Hadoop for log aggregation and analysis?
    a) Logstash
    b) Tableau
    c) Jenkins
    d) Power BI

Topic 2: Configuring and Monitoring HDFS Metrics

  1. Which property in the hdfs-site.xml file is used to configure the Namenode heap size?
    a) dfs.replication
    b) dfs.namenode.memory.size
    c) dfs.namenode.heap.size
    d) dfs.datanode.du.reserved
  2. What is the role of JMX in HDFS monitoring?
    a) To provide a user-friendly GUI for HDFS management
    b) To expose runtime metrics for monitoring tools
    c) To replicate blocks across DataNodes
    d) To manage Namenode log files
  3. Which metric measures the percentage of blocks that are under-replicated in HDFS?
    a) dfs.capacity.remaining
    b) dfs.replication.under
    c) dfs.blocks.missing
    d) dfs.block.underreplicated
  4. What is the function of the dfsadmin -report command?
    a) To display storage capacity and utilization statistics of the cluster
    b) To list files in the HDFS root directory
    c) To show block replication details
    d) To check cluster configuration files
  5. How are HDFS metrics exposed for external monitoring systems?
    a) Through JSON files stored on Namenodes
    b) Using REST APIs or JMX interfaces
    c) By generating CSV reports periodically
    d) Through shell scripts run on DataNodes
  6. Which metric indicates the total disk capacity available in the cluster?
    a) dfs.disk.capacity.available
    b) dfs.capacity.used
    c) dfs.datanode.capacity
    d) dfs.capacity.remaining
  7. How can an administrator monitor HDFS throughput in real time?
    a) Using the hadoop fsck command
    b) By enabling JMX and analyzing metrics in a monitoring tool
    c) Through Namenode logs
    d) By running periodic disk scans
  8. What does the HDFS metric dfs.namenode.startup.time indicate?
    a) Namenode downtime
    b) Namenode initialization time
    c) DataNode failure detection time
    d) Block replication time
  9. How can HDFS metrics be exported to a tool like Prometheus?
    a) By using a log parser
    b) By enabling a JMX exporter for HDFS metrics
    c) By writing custom scripts for metrics aggregation
    d) Through Namenode Web UI
  10. Which of the following is critical for monitoring Namenode performance?
    a) Disk read latency
    b) Block scanning frequency
    c) Heap memory usage
    d) Cluster replication factor

Topic 3: Understanding DataNode and NameNode Logs

  1. Which file contains the Namenode’s operational logs?
    a) namenode.log
    b) hadoop-namenode.log
    c) dfs-namenode.log
    d) namenode-operations.log
  2. What does the error “Block missing exception” indicate in Namenode logs?
    a) Namenode memory overflow
    b) DataNode storage full
    c) A block with no replicas is detected
    d) Network latency
  3. How can DataNode logs be accessed?
    a) By running hdfs fsck on the DataNode
    b) Through the Namenode Web UI
    c) By checking log files on the DataNode server
    d) By running the dfsadmin command
  4. What does a “heartbeat lost” message in the Namenode logs indicate?
    a) A DataNode is removed from the cluster
    b) A DataNode is temporarily unreachable
    c) The Namenode has crashed
    d) Block corruption
  5. How are HDFS logs typically stored in a Hadoop cluster?
    a) In a relational database
    b) As flat files on disk
    c) In the HDFS root directory
    d) In the Namenode metadata file
  6. Which tool can be used to analyze Namenode and DataNode logs for troubleshooting?
    a) Logstash
    b) Tableau
    c) Grafana
    d) Apache Spark
  7. How can log verbosity levels for Namenode logs be configured?
    a) By editing log4j.properties
    b) Using the HDFS Web UI
    c) By restarting the cluster in debug mode
    d) By changing the log file size
  8. What is indicated by “BlockUnderConstructionException” in DataNode logs?
    a) Corrupted block
    b) Incomplete file write
    c) Disk failure
    d) Namenode failure
  9. Which log file records the history of HDFS checkpoint operations?
    a) namenode-checkpoint.log
    b) hdfs-checkpoint.log
    c) secondarynamenode.log
    d) hadoop-checkpoint.log
  10. What should an administrator check first when troubleshooting a failed DataNode?
    a) Disk space utilization
    b) DataNode log files
    c) Namenode configuration file
    d) Cluster replication factor

Answers Table

QnoAnswer (Option with the text)
1a) Ganglia
2a) A graphical interface for monitoring and managing Hadoop clusters
3a) hdfs fsck
4b) Active DataNodes, Namenode health, and storage usage
5b) It manages cluster resources for job execution
6b) By providing automated alerts for cluster issues
7b) Performing administrative tasks on HDFS
8b) Grafana
9a) By using the Namenode UI
10a) Logstash
11c) dfs.namenode.heap.size
12b) To expose runtime metrics for monitoring tools
13d) dfs.block.underreplicated
14a) To display storage capacity and utilization statistics of the cluster
15b) Using REST APIs or JMX interfaces
16d) dfs.capacity.remaining
17b) By enabling JMX and analyzing metrics in a monitoring tool
18b) Namenode initialization time
19b) By enabling a JMX exporter for HDFS metrics
20c) Heap memory usage
21b) hadoop-namenode.log
22c) A block with no replicas is detected
23c) By checking log files on the DataNode server
24b) A DataNode is temporarily unreachable
25b) As flat files on disk
26a) Logstash
27a) By editing log4j.properties
28b) Incomplete file write
29c) secondarynamenode.log
30b) DataNode log files

Use a Blank Sheet, Note your Answers and Finally tally with our answer at last. Give Yourself Score.

X
error: Content is protected !!
Scroll to Top