CDI Practical User Guides
I PREFACE
Welcome to the Machine Learning Domain
🚀 What You’ll Gain
II DATA PREP & EDA
1
How do you load and inspect a dataset for modeling?
1.1
Recommended Dataset: Titanic Survival (Classification)
1.2
Explanation
1.3
Python Code
1.4
R Code
2
How do you handle missing values in a machine learning dataset?
2.1
Explanation
2.2
Python Code
2.3
R Code
3
How do you encode categorical variables for machine learning?
3.1
Explanation
3.2
Python Code
3.3
R Code
4
How do you split a dataset into training and testing sets?
4.1
Explanation
4.2
Python Code
4.3
R Code
III SUPERVISED LEARNING MODEL TRAINING
đź§ Supervised Learning
5
How do you train and visualize a polynomial regression model using the Boston housing dataset?
5.1
Explanation
5.2
Python Code
5.3
R Code
6
How do you evaluate regression models using R², RMSE, and MAE?
6.1
Explanation
6.2
Python Code
6.3
R Code
7
How do you train a decision tree classifier?
7.1
Explanation
7.2
Python Code
8
How do you evaluate model performance using a confusion matrix and accuracy?
8.1
Explanation
8.2
Python Code
9
How do you evaluate a model using ROC curve and AUC?
9.1
Explanation
9.2
Python Code
9.3
R Code
10
How do you train a logistic regression model?
10.1
Explanation
10.2
Python Code
10.3
R Code
11
How do you train a random forest model and check variable importance?
11.1
Explanation
11.2
Python Code
11.3
R Code
12
How do you train a support vector machine (SVM) model?
12.1
Explanation
12.2
Python Code
12.3
R Code
13
How do you train a k-nearest neighbors (KNN) model?
13.1
Explanation
13.2
Python Code
13.3
R Code
14
How do you train a Naive Bayes model?
14.1
Explanation
14.2
Python Code
14.3
R Code
15
How do you train a gradient boosting model using XGBoost?
15.1
Explanation
15.2
Python Code
15.3
R Code
16
How do you visualize decision boundaries and understand model overfitting?
16.1
Explanation
16.2
Python Code
16.3
R Code
17
How do you compare L1 and L2 regularization in regression models?
17.1
Explanation
17.2
Python Code
17.3
R Code
18
How do you visualize L1 vs. L2 regularization paths side by side in R?
18.1
Explanation
18.2
R Code
IV UNSUPERVISED LEARNING MODEL TRAINING
🔍 Unsupervised Learning
19
How do you perform clustering with k-means?
19.1
Explanation
19.2
Recommended Dataset: Gene Expression (Unlabeled Clustering)
19.3
Python Code
19.4
R Code
20
How do you reduce dimensions with PCA or t-SNE for visualization?
20.1
Explanation
20.2
Python Code
20.3
R Code
21
How do you cluster data using hierarchical clustering or DBSCAN?
21.1
Explanation
21.2
Python Code
21.3
R Code
22
How do you visualize clusters with UMAP in Python or R?
22.1
Explanation
22.2
Python Code
22.3
R Code
23
How do you combine dimensionality reduction with clustering to improve results?
23.1
Explanation
23.2
Python Code
23.3
R Code
24
How do you evaluate clustering quality using silhouette score and ARI?
24.1
Explanation
24.2
Python Code
24.3
R Code
V MODEL COMPARISON
🔍 Model Comparison
25
How do you compare multiple models and choose the best one?
25.1
Explanation
25.2
Python Code
25.3
R Code
26
How do you create a heatmap to compare model performance across metrics?
26.1
Explanation
26.2
Python Code
26.3
R Code
VI FEATURE IMPORTANCE
27
How do you tune hyperparameters to improve model performance?
27.1
Explanation
27.2
Python Code
27.3
R Coce
VII MODEL INTERPRETATION
28
How do you explain predictions using SHAP or LIME?
28.1
Explanation
28.2
Python Code
28.3
R Coce
VIII MODE DEPLOYMENT
29
How do you save and load machine learning models for reuse?
29.1
Explanation
29.2
Python Code
29.3
R Code
30
How do you build a basic Streamlit app to deploy your ML model?
30.1
Explanation
30.2
Python Code: streamlit_app.py
Explore More Guides
Machine Learning Q&A Guide
Machine Learning Q&A Guide
Last updated: July 16, 2025