Chapter 4 1 min read
Save

Machine Learning for Data Science

Data Science and Analytics · BCA · Updated Apr 23, 2026

Table of Contents

Machine Learning for Data Science

ML automates pattern discovery and prediction from data, turning data into actionable predictions at scale.

Regression

Linear regression (continuous values), multiple regression, polynomial regression. Metrics: MSE, RMSE, MAE, R-squared. Regularisation (Ridge, Lasso) prevents overfitting.

Classification

Logistic regression, decision trees, random forests, SVM, Naive Bayes (good for text). Metrics: accuracy, precision, recall, F1-score, ROC-AUC, confusion matrix.

Clustering

K-means (elbow method for k), hierarchical clustering, DBSCAN (density-based, handles noise). Evaluation: silhouette score. Used for customer segmentation, anomaly detection.

Model Selection

Train/test split (80/20), k-fold cross-validation, hyperparameter tuning (grid search, random search). Bias-variance trade-off: simple models underfit, complex overfit.

Ensemble Methods

Bagging (Random Forest), boosting (XGBoost, LightGBM, AdaBoost), stacking. Ensembles typically outperform individual models.

Feature Importance

Tree-based importance, SHAP values, partial dependence plots. Enables model interpretation and feature selection.

Summary

ML applies regression, classification, and clustering with proper evaluation and ensemble methods for reliable, actionable results.

Related Notes

Discussion

0 comments

Join the discussion

Log in to share your thoughts and help fellow students.

Log in to comment

No comments yet. Be the first to share your thoughts!