MCQs on Machine Learning in R | R

Dive into Machine Learning in R with these 30 multiple-choice questions covering key concepts such as ML libraries (caret, tidymodels), supervised learning (Decision Trees, Random Forests), and unsupervised learning (Clustering, PCA). Test your skills!


MCQs on Machine Learning in R

Introduction to ML Libraries (caret, tidymodels)

  1. Which R package is widely used for training and tuning machine learning models?
    a) ggplot2
    b) caret
    c) dplyr
    d) lubridate
  2. What is the primary function of the caret package in R?
    a) Data visualization
    b) Model training, tuning, and evaluation
    c) Data manipulation
    d) File handling
  3. The tidymodels package in R is primarily used for:
    a) Data cleaning
    b) Machine learning workflows
    c) Data visualization
    d) Statistical testing
  4. Which of the following is a core function provided by the caret package?
    a) train()
    b) fit()
    c) predict()
    d) evaluate()
  5. What does the trainControl() function in caret allow you to specify?
    a) The number of rows for cross-validation
    b) The algorithm used for prediction
    c) The control parameters for model training
    d) The metrics to evaluate a model
  6. Which function from the tidymodels package is used to split data into training and testing sets?
    a) initial_split()
    b) split_data()
    c) train_split()
    d) partition_data()
  7. What is the main advantage of using tidymodels over caret?
    a) It provides faster model training
    b) It simplifies workflows by using consistent data objects
    c) It supports more algorithms
    d) It focuses on time series data
  8. Which function from caret is used to generate predictions on new data?
    a) predict()
    b) forecast()
    c) model_predict()
    d) predict_model()
  9. In the caret package, what does the method argument specify?
    a) The type of model to train
    b) The evaluation metrics to use
    c) The number of cross-validation folds
    d) The split ratio for training and testing
  10. The tidymodels framework is designed to:
    a) Handle only regression models
    b) Simplify model creation and evaluation workflows
    c) Work exclusively with decision trees
    d) Visualize machine learning models

Supervised Learning (Decision Trees, Random Forests)

  1. What does a Decision Tree model do?
    a) Identifies the best clustering solution
    b) Classifies data based on binary decisions at each node
    c) Reduces dimensionality of the data
    d) Performs linear regression
  2. Which of the following is a key advantage of using Random Forests over a single Decision Tree?
    a) Random Forests are faster
    b) Random Forests reduce overfitting by averaging over multiple trees
    c) Decision Trees work better with categorical data
    d) Random Forests do not require feature scaling
  3. In a Random Forest, what is the function of the ntree parameter?
    a) The number of training data points
    b) The maximum depth of each tree
    c) The number of trees to be built in the forest
    d) The number of features used in each tree
  4. Which of the following is a criterion used to split nodes in a Decision Tree?
    a) Variance reduction
    b) Entropy or Gini impurity
    c) Root Mean Squared Error (RMSE)
    d) Log-likelihood
  5. In the Random Forest algorithm, what is bagging?
    a) A method to tune hyperparameters
    b) A method of combining multiple weak models to create a stronger model
    c) A way to eliminate irrelevant features
    d) A technique for dimensionality reduction
  6. What is the primary use case of Decision Trees?
    a) Regression problems
    b) Classification problems
    c) Clustering problems
    d) Time series forecasting
  7. How does the Random Forest model handle missing data?
    a) By imputing values with the mean
    b) By excluding observations with missing values
    c) By using multiple imputation methods
    d) By using surrogate splits in trees
  8. What is one limitation of Decision Trees?
    a) They require a lot of training data
    b) They are prone to overfitting, especially with complex datasets
    c) They cannot handle categorical data
    d) They are computationally expensive
  9. Which of the following is a common hyperparameter in Decision Trees?
    a) Learning rate
    b) Max depth
    c) Number of clusters
    d) Number of features
  10. What technique can be used to reduce overfitting in a Decision Tree model?
    a) Increase the number of data points
    b) Limit the tree depth
    c) Use a lower learning rate
    d) Perform feature selection

Unsupervised Learning (Clustering, PCA)

  1. What is Clustering in the context of Unsupervised Learning?
    a) Grouping data points into predefined categories
    b) Grouping data points based on similarity or distance
    c) Predicting the future values of data
    d) Reducing the dimensionality of data
  2. Which of the following algorithms is commonly used for Clustering?
    a) K-means
    b) Logistic Regression
    c) Decision Tree
    d) Random Forest
  3. What is the primary objective of Principal Component Analysis (PCA)?
    a) To predict target variables
    b) To reduce the dimensionality of data while retaining most of the variance
    c) To identify clusters in the data
    d) To handle missing values
  4. In K-means clustering, what does the parameter k represent?
    a) The maximum number of iterations
    b) The number of clusters to create
    c) The number of features used
    d) The number of data points
  5. What does the Elbow Method help determine in Clustering?
    a) The optimal number of clusters
    b) The best performing model
    c) The most important features
    d) The model’s accuracy
  6. In PCA, what are the principal components?
    a) The original features
    b) The new, uncorrelated features that explain most of the variance in the data
    c) The clusters identified in the dataset
    d) The predicted target variables
  7. What is a disadvantage of using K-means clustering?
    a) It only works for regression problems
    b) It assumes the data is linearly separable
    c) It requires the number of clusters to be pre-specified
    d) It cannot handle categorical data
  8. How does PCA handle correlated variables?
    a) It eliminates all correlated variables
    b) It creates new uncorrelated variables that are combinations of the original ones
    c) It replaces the correlated variables with their mean
    d) It selects only the most important correlated variable
  9. What is the role of the Silhouette Score in Clustering?
    a) It measures the accuracy of the model
    b) It helps to determine the number of clusters
    c) It evaluates the quality of clusters based on cohesion and separation
    d) It identifies the features that matter most
  10. In K-means, what happens during the update step?
    a) The algorithm calculates the optimal number of clusters
    b) The centroid of each cluster is recalculated based on the mean of its points
    c) The clusters are split into smaller groups
    d) The data points are assigned to the nearest centroid

Answers Table

QnoAnswer
1b) caret
2b) Model training, tuning, and evaluation
3b) Machine learning workflows
4a) train()
5c) The control parameters for model training
6a) initial_split()
7b) It simplifies workflows by using consistent data objects
8a) predict()
9a) The type of model to train
10b) Simplify model creation and evaluation workflows
11b) Classifies data based on binary decisions at each node
12b) Random Forests reduce overfitting by averaging over multiple trees
13c) The number of trees to be built in the forest
14b) Entropy or Gini impurity
15b) A method of combining multiple weak models to create a stronger model
16b) Classification problems
17d) By using surrogate splits in trees
18b) They are prone to overfitting, especially with complex datasets
19b) Max depth
20b) Limit the tree depth
21b) Grouping data points based on similarity or distance
22a) K-means
23b) To reduce the dimensionality of data while retaining most of the variance
24b) The number of clusters to create
25a) The optimal number of clusters
26b) The new, uncorrelated features that explain most of the variance in the data
27c) It requires the number of clusters to be pre-specified
28b) It creates new uncorrelated variables that are combinations of the original ones
29c) It evaluates the quality of clusters based on cohesion and separation
30b) The centroid of each cluster is recalculated based on the mean of its points

Use a Blank Sheet, Note your Answers and Finally tally with our answer at last. Give Yourself Score.

X
error: Content is protected !!
Scroll to Top