MCQs on Exploratory Data Analysis (EDA) | R

Master Exploratory Data Analysis (EDA) in R by diving into key concepts like understanding distributions, visualizing relationships, and using EDA packages like DataExplorer and skimr. Test your knowledge with these MCQs!


MCQs on Exploratory Data Analysis (EDA) in R

Understanding Distributions and Trends

  1. Which function in R is used to plot a histogram of a numeric variable?
    a) hist()
    b) plot()
    c) boxplot()
    d) density()
  2. What does a boxplot in R represent?
    a) The distribution of data with a summary of its quartiles and outliers
    b) The relationship between two categorical variables
    c) A scatter plot with smoothed lines
    d) A distribution of the data points for linear regression
  3. What is the purpose of the density() function in R?
    a) To plot a histogram
    b) To plot a kernel density estimate
    c) To create a boxplot
    d) To visualize a correlation matrix
  4. How can you detect skewness in a dataset using a plot in R?
    a) By creating a boxplot
    b) By creating a histogram or density plot
    c) By plotting a correlation matrix
    d) By using a scatter plot
  5. Which R function allows you to summarize the distribution of a numeric variable with statistics such as mean, median, and quantiles?
    a) summary()
    b) describe()
    c) summaryStats()
    d) stat.summary()
  6. What is the primary purpose of the qqnorm() function in R?
    a) To create a normal probability plot to assess the normality of a dataset
    b) To summarize the quantitative relationship between variables
    c) To visualize the distribution of a categorical variable
    d) To generate a time series plot
  7. What type of plot is most appropriate to visualize the distribution of a continuous variable in R?
    a) Bar plot
    b) Box plot
    c) Histogram
    d) Pie chart
  8. In R, what does the skewness() function from the e1071 package measure?
    a) The symmetry of the distribution
    b) The spread of the data
    c) The central tendency of the data
    d) The relationship between two variables
  9. What R function can you use to calculate the kurtosis (peakedness) of a distribution?
    a) kurtosis()
    b) describe()
    c) summary()
    d) kurt()
  10. Which of the following methods can be used to visualize trends over time in R?
    a) Line plot
    b) Histogram
    c) Box plot
    d) Bar chart

Visualizing Relationships Between Variables

  1. Which plot in R is most commonly used to visualize the relationship between two continuous variables?
    a) Scatter plot
    b) Heatmap
    c) Boxplot
    d) Histogram
  2. What does the cor() function in R compute?
    a) The variance of a dataset
    b) The correlation coefficient between two variables
    c) The mean of a dataset
    d) The sum of all data points
  3. What function can you use to visualize the relationship between three variables in R?
    a) scatter3D()
    b) scatterplot3d()
    c) heatmap()
    d) plot3D()
  4. Which type of plot is ideal for visualizing the relationship between categorical and continuous variables?
    a) Histogram
    b) Bar plot
    c) Boxplot
    d) Pie chart
  5. What does the pairs() function in R do?
    a) Plots a pairwise scatterplot matrix
    b) Generates a time series plot
    c) Creates a 3D scatter plot
    d) Summarizes categorical data
  6. How do you add a regression line to a scatter plot in R?
    a) Use the abline() function
    b) Use the add.line() function
    c) Use the plot.regression() function
    d) Use the scatter.line() function
  7. Which function can be used to compute the correlation matrix for multiple variables in R?
    a) cor()
    b) cor.test()
    c) cov()
    d) cor.matrix()
  8. To visualize the distribution of a continuous variable by groups in R, which plot is appropriate?
    a) Histogram
    b) Bar plot
    c) Boxplot
    d) Scatter plot
  9. What function would you use to create a heatmap in R to visualize correlations between variables?
    a) heatmap()
    b) corrplot()
    c) scatter3D()
    d) ggplot()
  10. Which of the following functions in R is used to add gridlines to a plot?
    a) grid()
    b) lines()
    c) abline()
    d) points()

Using EDA Packages (DataExplorer, skimr)

  1. Which package in R is used to automatically generate EDA reports?
    a) ggplot2
    b) skimr
    c) DataExplorer
    d) dplyr
  2. What does the plot_intro() function from the DataExplorer package do?
    a) Creates an introductory plot of data distribution
    b) Plots the summary statistics of each variable
    c) Displays missing values
    d) Provides an overview of correlations
  3. In the skimr package, which function is used to generate a concise summary of a dataset?
    a) skim()
    b) summarize()
    c) quick_summary()
    d) data_summary()
  4. What is the purpose of the DataExplorer::create_report() function in R?
    a) To generate an HTML report of the exploratory analysis of a dataset
    b) To generate summary statistics
    c) To plot histograms
    d) To generate regression models
  5. How does the skimr package handle missing values in the summary?
    a) It provides the number and percentage of missing values for each variable
    b) It automatically removes missing values
    c) It imputes missing values
    d) It replaces missing values with zeros
  6. Which function in the DataExplorer package can be used to visualize missing values in a dataset?
    a) plot_missing()
    b) plot_na()
    c) missing_plot()
    d) na_plot()
  7. What does the DataExplorer::plot_bar() function do?
    a) Plots a bar plot for categorical variables
    b) Plots a histogram for numerical variables
    c) Creates a scatter plot
    d) Plots a time series graph
  8. How can you check the number of unique values in each column using skimr?
    a) skim()
    b) skim_with()
    c) summarize()
    d) unique_count()
  9. What is the default behavior of the DataExplorer::plot_histogram() function?
    a) It plots histograms for continuous variables
    b) It plots bar plots for categorical variables
    c) It summarizes correlations
    d) It generates boxplots for each variable
  10. How do you visualize the distribution of missing data across columns using the DataExplorer package?
    a) plot_missing()
    b) plot_na()
    c) missing_data_plot()
    d) na_visualize()

Answers Table

QnoAnswer
1a) hist()
2a) The distribution of data with a summary of its quartiles and outliers
3b) To plot a kernel density estimate
4b) By creating a histogram or density plot
5a) summary()
6a) To create a normal probability plot to assess the normality of a dataset
7c) Histogram
8a) The symmetry of the distribution
9a) kurtosis()
10a) Line plot
11a) Scatter plot
12b) The correlation coefficient between two variables
13b) scatterplot3d()
14c) Boxplot
15a) Plots a pairwise scatterplot matrix
16a) Use the abline() function
17a) cor()
18c) Boxplot
19b) corrplot()
20a) grid()
21c) DataExplorer
22a) Creates an introductory plot of data distribution
23a) skim()
24a) To generate an HTML report of the exploratory analysis of a dataset
25a) It provides the number and percentage of missing values for each variable
26a) plot_missing()
27a) Plots a bar plot for categorical variables
28a) skim()
29a) It plots histograms for continuous variables
30a) plot_missing()

Use a Blank Sheet, Note your Answers and Finally tally with our answer at last. Give Yourself Score.

X
error: Content is protected !!
Scroll to Top