Master Exploratory Data Analysis (EDA) in R by diving into key concepts like understanding distributions, visualizing relationships, and using EDA packages like DataExplorer and skimr. Test your knowledge with these MCQs!
MCQs on Exploratory Data Analysis (EDA) in R
Understanding Distributions and Trends
Which function in R is used to plot a histogram of a numeric variable? a) hist() b) plot() c) boxplot() d) density()
What does a boxplot in R represent? a) The distribution of data with a summary of its quartiles and outliers b) The relationship between two categorical variables c) A scatter plot with smoothed lines d) A distribution of the data points for linear regression
What is the purpose of the density() function in R? a) To plot a histogram b) To plot a kernel density estimate c) To create a boxplot d) To visualize a correlation matrix
How can you detect skewness in a dataset using a plot in R? a) By creating a boxplot b) By creating a histogram or density plot c) By plotting a correlation matrix d) By using a scatter plot
Which R function allows you to summarize the distribution of a numeric variable with statistics such as mean, median, and quantiles? a) summary() b) describe() c) summaryStats() d) stat.summary()
What is the primary purpose of the qqnorm() function in R? a) To create a normal probability plot to assess the normality of a dataset b) To summarize the quantitative relationship between variables c) To visualize the distribution of a categorical variable d) To generate a time series plot
What type of plot is most appropriate to visualize the distribution of a continuous variable in R? a) Bar plot b) Box plot c) Histogram d) Pie chart
In R, what does the skewness() function from the e1071 package measure? a) The symmetry of the distribution b) The spread of the data c) The central tendency of the data d) The relationship between two variables
What R function can you use to calculate the kurtosis (peakedness) of a distribution? a) kurtosis() b) describe() c) summary() d) kurt()
Which of the following methods can be used to visualize trends over time in R? a) Line plot b) Histogram c) Box plot d) Bar chart
Visualizing Relationships Between Variables
Which plot in R is most commonly used to visualize the relationship between two continuous variables? a) Scatter plot b) Heatmap c) Boxplot d) Histogram
What does the cor() function in R compute? a) The variance of a dataset b) The correlation coefficient between two variables c) The mean of a dataset d) The sum of all data points
What function can you use to visualize the relationship between three variables in R? a) scatter3D() b) scatterplot3d() c) heatmap() d) plot3D()
Which type of plot is ideal for visualizing the relationship between categorical and continuous variables? a) Histogram b) Bar plot c) Boxplot d) Pie chart
What does the pairs() function in R do? a) Plots a pairwise scatterplot matrix b) Generates a time series plot c) Creates a 3D scatter plot d) Summarizes categorical data
How do you add a regression line to a scatter plot in R? a) Use the abline() function b) Use the add.line() function c) Use the plot.regression() function d) Use the scatter.line() function
Which function can be used to compute the correlation matrix for multiple variables in R? a) cor() b) cor.test() c) cov() d) cor.matrix()
To visualize the distribution of a continuous variable by groups in R, which plot is appropriate? a) Histogram b) Bar plot c) Boxplot d) Scatter plot
What function would you use to create a heatmap in R to visualize correlations between variables? a) heatmap() b) corrplot() c) scatter3D() d) ggplot()
Which of the following functions in R is used to add gridlines to a plot? a) grid() b) lines() c) abline() d) points()
Using EDA Packages (DataExplorer, skimr)
Which package in R is used to automatically generate EDA reports? a) ggplot2 b) skimr c) DataExplorer d) dplyr
What does the plot_intro() function from the DataExplorer package do? a) Creates an introductory plot of data distribution b) Plots the summary statistics of each variable c) Displays missing values d) Provides an overview of correlations
In the skimr package, which function is used to generate a concise summary of a dataset? a) skim() b) summarize() c) quick_summary() d) data_summary()
What is the purpose of the DataExplorer::create_report() function in R? a) To generate an HTML report of the exploratory analysis of a dataset b) To generate summary statistics c) To plot histograms d) To generate regression models
How does the skimr package handle missing values in the summary? a) It provides the number and percentage of missing values for each variable b) It automatically removes missing values c) It imputes missing values d) It replaces missing values with zeros
Which function in the DataExplorer package can be used to visualize missing values in a dataset? a) plot_missing() b) plot_na() c) missing_plot() d) na_plot()
What does the DataExplorer::plot_bar() function do? a) Plots a bar plot for categorical variables b) Plots a histogram for numerical variables c) Creates a scatter plot d) Plots a time series graph
How can you check the number of unique values in each column using skimr? a) skim() b) skim_with() c) summarize() d) unique_count()
What is the default behavior of the DataExplorer::plot_histogram() function? a) It plots histograms for continuous variables b) It plots bar plots for categorical variables c) It summarizes correlations d) It generates boxplots for each variable
How do you visualize the distribution of missing data across columns using the DataExplorer package? a) plot_missing() b) plot_na() c) missing_data_plot() d) na_visualize()
Answers Table
Qno
Answer
1
a) hist()
2
a) The distribution of data with a summary of its quartiles and outliers
3
b) To plot a kernel density estimate
4
b) By creating a histogram or density plot
5
a) summary()
6
a) To create a normal probability plot to assess the normality of a dataset
7
c) Histogram
8
a) The symmetry of the distribution
9
a) kurtosis()
10
a) Line plot
11
a) Scatter plot
12
b) The correlation coefficient between two variables
13
b) scatterplot3d()
14
c) Boxplot
15
a) Plots a pairwise scatterplot matrix
16
a) Use the abline() function
17
a) cor()
18
c) Boxplot
19
b) corrplot()
20
a) grid()
21
c) DataExplorer
22
a) Creates an introductory plot of data distribution
23
a) skim()
24
a) To generate an HTML report of the exploratory analysis of a dataset
25
a) It provides the number and percentage of missing values for each variable