MCQs on Data Cleaning and Preprocessing | R

Data Cleaning and Preprocessing in R
Master the essential techniques of data cleaning and preprocessing in R. Learn how to handle missing and duplicate data, transform data with dplyr, and perform data summarization efficiently.


Handling Missing and Duplicate Data

  1. Which function is used to identify missing values in a data frame in R?
    a) missing()
    b) is.na()
    c) na.test()
    d) is.null()
  2. What does the na.omit() function do in R?
    a) Replaces NA values with zeros
    b) Removes rows with missing values
    c) Replaces missing values with the mean
    d) Identifies missing values
  3. How can you replace missing values with the mean of the column in R?
    a) replace.na()
    b) replace()
    c) mutate()
    d) fill()
  4. Which of the following functions is used to remove duplicate rows in a data frame in R?
    a) distinct()
    b) remove_duplicates()
    c) unique()
    d) drop_duplicates()
  5. What does the function complete.cases() return?
    a) Rows without any missing values
    b) Rows with missing values
    c) A summary of missing values
    d) A boolean value for each row
  6. How can you count the number of missing values in a column of a data frame in R?
    a) sum(is.na(column))
    b) count.na(column)
    c) is.na(column).sum()
    d) missing.count(column)
  7. Which of the following is NOT a method for handling missing data in R?
    a) Removing rows with missing values
    b) Replacing missing values with zeros
    c) Replacing missing values with column mean
    d) Ignoring missing data
  8. To identify duplicate rows based on specific columns, which function can be used?
    a) duplicated()
    b) remove_duplicates()
    c) distinct()
    d) unique()
  9. Which R function is used to replace missing values in a data frame with a specified value?
    a) fill()
    b) replace()
    c) replace_na()
    d) substitute()
  10. What is the purpose of the drop_na() function in R?
    a) Drop rows with all missing values
    b) Drop rows with specific missing values
    c) Drop columns with missing values
    d) None of the above

Data Transformation with dplyr

  1. Which R package provides functions like mutate(), filter(), and select()?
    a) tidyr
    b) ggplot2
    c) dplyr
    d) base
  2. What does the mutate() function do in R?
    a) Adds new columns to a data frame
    b) Filters rows based on conditions
    c) Changes the structure of a data frame
    d) Selects specific columns
  3. How do you select specific columns from a data frame using dplyr?
    a) filter()
    b) select()
    c) slice()
    d) mutate()
  4. Which function in dplyr is used to filter rows based on conditions?
    a) select()
    b) mutate()
    c) filter()
    d) arrange()
  5. To arrange the rows of a data frame in ascending order of a column, which function is used?
    a) sort()
    b) arrange()
    c) order()
    d) order_by()
  6. How do you apply a function to each column of a data frame using dplyr?
    a) apply()
    b) summarize()
    c) mutate_all()
    d) map()
  7. What is the purpose of the group_by() function in dplyr?
    a) To group data by specific columns for aggregation
    b) To rearrange the data
    c) To filter data based on conditions
    d) To transform data
  8. Which function in dplyr is used to summarize data by a group?
    a) summarize()
    b) mutate()
    c) group_by()
    d) filter()
  9. How do you create a new column in a data frame based on an existing one using dplyr?
    a) mutate()
    b) create()
    c) transform()
    d) add_column()
  10. To calculate the mean of a column after grouping by another column, you would use:
    a) summarize(mean(column))
    b) group_by(column) %>% summarize(mean())
    c) mutate(mean(column))
    d) mean_by(column)

Data Summarization Techniques

  1. What does the summary() function provide in R?
    a) A summary of missing values
    b) A quick overview of statistics like mean, median, etc.
    c) The structure of a data frame
    d) A detailed visualization
  2. Which R function is used to compute the mean of a numeric column?
    a) sum()
    b) mean()
    c) avg()
    d) calculate()
  3. Which function is used to calculate the standard deviation of a numeric column in R?
    a) sd()
    b) std()
    c) stdev()
    d) deviation()
  4. Which function would you use to get a quick count of non-missing values in a column?
    a) count()
    b) length()
    c) n()
    d) sum()
  5. What does the table() function in R do?
    a) Displays summary statistics
    b) Creates a frequency table
    c) Transforms data
    d) Creates visualizations
  6. Which function can be used to calculate the median of a numeric column in R?
    a) median()
    b) mean()
    c) average()
    d) calc.median()
  7. How would you group data by one or more variables and then calculate the sum of each group in R?
    a) group_by() %>% summarise(sum())
    b) summarise() %>% group_by()
    c) sum_group_by()
    d) group_sum()
  8. To count the number of unique values in a column, which function would you use?
    a) unique()
    b) distinct()
    c) count()
    d) length()
  9. Which of the following is used to calculate the interquartile range (IQR) of a numeric column in R?
    a) iqr()
    b) IQR()
    c) interquartile()
    d) range()
  10. To get the five-number summary (minimum, first quartile, median, third quartile, maximum) of a numeric column, which function is used?
    a) summary()
    b) quantile()
    c) five_number_summary()
    d) stat_summary()

Answer Key

QNoAnswer (Option with text)
1b) is.na()
2b) Removes rows with missing values
3b) replace()
4a) distinct()
5a) Rows without any missing values
6a) sum(is.na(column))
7d) Ignoring missing data
8a) duplicated()
9c) replace_na()
10a) Drop rows with all missing values
11c) dplyr
12a) Adds new columns to a data frame
13b) select()
14c) filter()
15b) arrange()
16c) mutate_all()
17a) To group data by specific columns for aggregation
18a) summarize()
19a) mutate()
20b) group_by(column) %>% summarize(mean())
21b) A quick overview of statistics like mean, median, etc.
22b) mean()
23a) sd()
24c) n()
25b) Creates a frequency table
26a) median()
27a) group_by() %>% summarise(sum())
28b) distinct()
29b) IQR()
30a) summary()

Use a Blank Sheet, Note your Answers and Finally tally with our answer at last. Give Yourself Score.

X
error: Content is protected !!
Scroll to Top