MCQs on Advanced Data Manipulation | R

Enhance your data manipulation skills in R with complex joins, efficient use of data.table, and advanced reshaping techniques with tidyr. Master these tools for efficient data processing.


Complex Joins and Aggregations (dplyr)

  1. Which function in dplyr is used to combine two data frames by a common column?
    a) merge()
    b) join()
    c) left_join()
    d) bind_rows()
  2. How would you perform a full outer join using dplyr?
    a) left_join()
    b) right_join()
    c) full_join()
    d) inner_join()
  3. Which function in dplyr is used to join data frames by multiple columns?
    a) multi_join()
    b) full_join()
    c) inner_join()
    d) by()
  4. What does the summarise() function do in dplyr?
    a) Creates new columns in a data frame
    b) Filters data based on conditions
    c) Summarizes data by calculating aggregates
    d) Groups data based on specific columns
  5. Which function is used to group data by one or more variables before summarizing it in dplyr?
    a) arrange()
    b) group_by()
    c) select()
    d) filter()
  6. How can you calculate the sum of a grouped column in dplyr?
    a) summarize(sum())
    b) group_by() %>% sum()
    c) summarize(total())
    d) group_by() %>% summarize(sum(column))
  7. In dplyr, which function can be used to combine data frames vertically?
    a) bind_rows()
    b) full_join()
    c) left_join()
    d) merge()
  8. Which of the following dplyr joins returns all rows from the left data frame and matching rows from the right data frame?
    a) full_join()
    b) left_join()
    c) right_join()
    d) inner_join()
  9. To join two data frames where the column names are different in each, which argument can be used in dplyr?
    a) by()
    b) on()
    c) column_names()
    d) matching()
  10. What function can be used to calculate the mean of a grouped variable in dplyr?
    a) mean_by()
    b) summarize(mean())
    c) mutate(mean())
    d) group_by() %>% mean()

Working with data.table for Efficiency

  1. Which package is primarily used for handling large data efficiently in R?
    a) tidyverse
    b) data.table
    c) dplyr
    d) ggplot2
  2. How can you convert a data frame to a data.table object?
    a) as.data.table()
    b) data.table()
    c) convert()
    d) to.data.table()
  3. How do you select a column in a data.table by reference?
    a) dt[, "column_name"]
    b) dt[, column_name]
    c) dt$column_name
    d) dt["column_name"]
  4. Which of the following is the correct syntax to filter rows based on a condition in data.table?
    a) dt[column_name > value]
    b) filter(dt, column_name > value)
    c) subset(dt, column_name > value)
    d) dt[filter(column_name > value)]
  5. How do you update a column in data.table by reference?
    a) dt[, column_name := new_value]
    b) dt$column_name <- new_value
    c) update(dt, column_name, new_value)
    d) dt[column_name] <- new_value
  6. What does the setkey() function in data.table do?
    a) Sorts the data by a specific column
    b) Creates a key for indexing
    c) Joins data tables
    d) Selects rows by column values
  7. In data.table, how would you calculate the sum of a column grouped by another column?
    a) dt[, sum(column), by = group_column]
    b) dt$sum(column) %>% group_by(group_column)
    c) group_by(dt, group_column) %>% sum(column)
    d) aggregate(dt, by = group_column, FUN = sum)
  8. Which of the following is used to perform an inner join in data.table?
    a) merge()
    b) inner_join()
    c) setkey()
    d) join()
  9. What is the main advantage of using data.table over a regular data.frame?
    a) Smaller memory usage and faster computation
    b) Better visualizations
    c) Simpler syntax
    d) Supports only smaller data sets
  10. How can you perform an efficient merge operation between two data.table objects?
    a) merge()
    b) left_join()
    c) setkey()
    d) merge.data.table()

Advanced Reshaping (tidyr)

  1. Which function in tidyr is used to convert a wide-format data frame into long format?
    a) spread()
    b) gather()
    c) pivot_wider()
    d) pivot_longer()
  2. What does the pivot_wider() function do in tidyr?
    a) Converts long-format data into wide format
    b) Converts wide-format data into long format
    c) Filters data based on conditions
    d) Summarizes data by groups
  3. How do you separate a single column into multiple columns based on a delimiter in tidyr?
    a) separate()
    b) split()
    c) extract()
    d) subseparate()
  4. Which function is used to fill missing values in a column in tidyr?
    a) fill()
    b) na.fill()
    c) replace_na()
    d) complete()
  5. To reshape a data frame where rows are stacked and multiple columns are combined into a single column, which function is used?
    a) gather()
    b) spread()
    c) pivot_wider()
    d) pivot_longer()
  6. Which tidyr function is used to convert a data frame into a more complete form by filling missing combinations of data?
    a) expand()
    b) complete()
    c) fill()
    d) expand_grid()
  7. What is the purpose of the unnest() function in tidyr?
    a) Unwraps nested data frames or lists into separate columns
    b) Removes NA values
    c) Reshapes data from long to wide format
    d) Converts categorical data into numeric
  8. How can you convert a long-format data frame to a wide-format data frame in tidyr?
    a) spread()
    b) pivot_wider()
    c) gather()
    d) separate()
  9. Which function in tidyr is used to make a data frame with all possible combinations of a set of columns?
    a) expand()
    b) complete()
    c) spread()
    d) nest()
  10. What does the separate() function do in tidyr?
    a) Combines two columns into one
    b) Converts wide-format data into long format
    c) Splits a single column into multiple columns
    d) Removes missing values from a column

Answer Key

QNoAnswer (Option with text)
1c) left_join()
2c) full_join()
3d) by()
4c) Summarizes data by calculating aggregates
5b) group_by()
6d) group_by() %>% summarize(sum(column))
7a) bind_rows()
8b) left_join()
9a) by()
10b) group_by(column) %>% summarize(mean())
11b) data.table
12a) as.data.table()
13b) dt[, column_name]
14a) dt[column_name > value]
15a) dt[, column_name := new_value]
16b) Creates a key for indexing
17a) dt[, sum(column), by = group_column]
18a) merge()
19a) Smaller memory usage and faster computation
20d) merge.data.table()
21b) gather()
22a) Converts long-format data into wide format
23a) separate()
24c) replace_na()
25a) gather()
26b) complete()
27a) Unwraps nested data frames or lists into separate columns
28b) pivot_wider()
29a) expand()
30c) Splits a single column into multiple columns

Use a Blank Sheet, Note your Answers and Finally tally with our answer at last. Give Yourself Score.

X
error: Content is protected !!
Scroll to Top