Dive into Machine Learning in R with these 30 multiple-choice questions covering key concepts such as ML libraries (caret, tidymodels), supervised learning (Decision Trees, Random Forests), and unsupervised learning (Clustering, PCA). Test your skills!
MCQs on Machine Learning in R
Introduction to ML Libraries (caret, tidymodels)
Which R package is widely used for training and tuning machine learning models? a) ggplot2 b) caret c) dplyr d) lubridate
What is the primary function of the caret package in R? a) Data visualization b) Model training, tuning, and evaluation c) Data manipulation d) File handling
The tidymodels package in R is primarily used for: a) Data cleaning b) Machine learning workflows c) Data visualization d) Statistical testing
Which of the following is a core function provided by the caret package? a) train() b) fit() c) predict() d) evaluate()
What does the trainControl() function in caret allow you to specify? a) The number of rows for cross-validation b) The algorithm used for prediction c) The control parameters for model training d) The metrics to evaluate a model
Which function from the tidymodels package is used to split data into training and testing sets? a) initial_split() b) split_data() c) train_split() d) partition_data()
What is the main advantage of using tidymodels over caret? a) It provides faster model training b) It simplifies workflows by using consistent data objects c) It supports more algorithms d) It focuses on time series data
Which function from caret is used to generate predictions on new data? a) predict() b) forecast() c) model_predict() d) predict_model()
In the caret package, what does the method argument specify? a) The type of model to train b) The evaluation metrics to use c) The number of cross-validation folds d) The split ratio for training and testing
The tidymodels framework is designed to: a) Handle only regression models b) Simplify model creation and evaluation workflows c) Work exclusively with decision trees d) Visualize machine learning models
Supervised Learning (Decision Trees, Random Forests)
What does a Decision Tree model do? a) Identifies the best clustering solution b) Classifies data based on binary decisions at each node c) Reduces dimensionality of the data d) Performs linear regression
Which of the following is a key advantage of using Random Forests over a single Decision Tree? a) Random Forests are faster b) Random Forests reduce overfitting by averaging over multiple trees c) Decision Trees work better with categorical data d) Random Forests do not require feature scaling
In a Random Forest, what is the function of the ntree parameter? a) The number of training data points b) The maximum depth of each tree c) The number of trees to be built in the forest d) The number of features used in each tree
Which of the following is a criterion used to split nodes in a Decision Tree? a) Variance reduction b) Entropy or Gini impurity c) Root Mean Squared Error (RMSE) d) Log-likelihood
In the Random Forest algorithm, what is bagging? a) A method to tune hyperparameters b) A method of combining multiple weak models to create a stronger model c) A way to eliminate irrelevant features d) A technique for dimensionality reduction
What is the primary use case of Decision Trees? a) Regression problems b) Classification problems c) Clustering problems d) Time series forecasting
How does the Random Forest model handle missing data? a) By imputing values with the mean b) By excluding observations with missing values c) By using multiple imputation methods d) By using surrogate splits in trees
What is one limitation of Decision Trees? a) They require a lot of training data b) They are prone to overfitting, especially with complex datasets c) They cannot handle categorical data d) They are computationally expensive
Which of the following is a common hyperparameter in Decision Trees? a) Learning rate b) Max depth c) Number of clusters d) Number of features
What technique can be used to reduce overfitting in a Decision Tree model? a) Increase the number of data points b) Limit the tree depth c) Use a lower learning rate d) Perform feature selection
Unsupervised Learning (Clustering, PCA)
What is Clustering in the context of Unsupervised Learning? a) Grouping data points into predefined categories b) Grouping data points based on similarity or distance c) Predicting the future values of data d) Reducing the dimensionality of data
Which of the following algorithms is commonly used for Clustering? a) K-means b) Logistic Regression c) Decision Tree d) Random Forest
What is the primary objective of Principal Component Analysis (PCA)? a) To predict target variables b) To reduce the dimensionality of data while retaining most of the variance c) To identify clusters in the data d) To handle missing values
In K-means clustering, what does the parameter k represent? a) The maximum number of iterations b) The number of clusters to create c) The number of features used d) The number of data points
What does the Elbow Method help determine in Clustering? a) The optimal number of clusters b) The best performing model c) The most important features d) The model’s accuracy
In PCA, what are the principal components? a) The original features b) The new, uncorrelated features that explain most of the variance in the data c) The clusters identified in the dataset d) The predicted target variables
What is a disadvantage of using K-means clustering? a) It only works for regression problems b) It assumes the data is linearly separable c) It requires the number of clusters to be pre-specified d) It cannot handle categorical data
How does PCA handle correlated variables? a) It eliminates all correlated variables b) It creates new uncorrelated variables that are combinations of the original ones c) It replaces the correlated variables with their mean d) It selects only the most important correlated variable
What is the role of the Silhouette Score in Clustering? a) It measures the accuracy of the model b) It helps to determine the number of clusters c) It evaluates the quality of clusters based on cohesion and separation d) It identifies the features that matter most
In K-means, what happens during the update step? a) The algorithm calculates the optimal number of clusters b) The centroid of each cluster is recalculated based on the mean of its points c) The clusters are split into smaller groups d) The data points are assigned to the nearest centroid
Answers Table
Qno
Answer
1
b) caret
2
b) Model training, tuning, and evaluation
3
b) Machine learning workflows
4
a) train()
5
c) The control parameters for model training
6
a) initial_split()
7
b) It simplifies workflows by using consistent data objects
8
a) predict()
9
a) The type of model to train
10
b) Simplify model creation and evaluation workflows
11
b) Classifies data based on binary decisions at each node
12
b) Random Forests reduce overfitting by averaging over multiple trees
13
c) The number of trees to be built in the forest
14
b) Entropy or Gini impurity
15
b) A method of combining multiple weak models to create a stronger model
16
b) Classification problems
17
d) By using surrogate splits in trees
18
b) They are prone to overfitting, especially with complex datasets
19
b) Max depth
20
b) Limit the tree depth
21
b) Grouping data points based on similarity or distance
22
a) K-means
23
b) To reduce the dimensionality of data while retaining most of the variance
24
b) The number of clusters to create
25
a) The optimal number of clusters
26
b) The new, uncorrelated features that explain most of the variance in the data
27
c) It requires the number of clusters to be pre-specified
28
b) It creates new uncorrelated variables that are combinations of the original ones
29
c) It evaluates the quality of clusters based on cohesion and separation
30
b) The centroid of each cluster is recalculated based on the mean of its points