Q&A 29 How do you save and load machine learning models for reuse?

29.1 Explanation

After training a machine learning model, you’ll often want to save it for later use, especially when deploying it in an application or sharing it with others. This avoids retraining every time and enables fast and reproducible predictions.

  • In Python, common tools include joblib and pickle
  • In R, use saveRDS() to save any object and readRDS() to load it

This Q&A shows how to persist trained models and restore them when needed.


29.2 Python Code

## Python Code
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import joblib
import os

# Load dataset
df = pd.read_csv("data/titanic.csv").dropna(subset=["Age", "Fare", "Embarked", "Sex", "Survived"])
X = df[["Pclass", "Age", "Fare"]].copy()
X["Sex"] = pd.factorize(df["Sex"])[0]
X["Embarked"] = pd.factorize(df["Embarked"])[0]
y = df["Survived"]

# Train model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train, y_train)

# Save model to 'models/' directory
os.makedirs("models", exist_ok=True)
joblib.dump(rf_model, "models/rf_titanic.joblib")

# Load model
loaded_model = joblib.load("models/rf_titanic.joblib")

# Predict with loaded model
preds = loaded_model.predict(X_test)
print(preds[:15])

[1 1 1 1 0 0 1 1 1 0 0 0 0 0 1]

29.3 R Code

library(readr)
library(dplyr)
library(tidyr)
library(randomForest)

# Load and prepare Titanic dataset
df <- read_csv("data/titanic.csv") |>
  drop_na(Age, Fare, Embarked, Sex, Survived) |>
  mutate(
    Sex = as.factor(Sex),
    Embarked = as.factor(Embarked),
    Survived = as.factor(Survived)
  )

# Train random forest
rf_model <- randomForest(Survived ~ Pclass + Age + Fare + Sex + Embarked, data = df)

# Create models/ directory if it doesn't exist
if (!dir.exists("models")) dir.create("models")

# Save model
saveRDS(rf_model, "models/rf_titanic.rds")

# Load model
loaded_model <- readRDS("models/rf_titanic.rds")

# Predict with loaded model
predict(loaded_model, df[1:5, ])
1 2 3 4 5 
0 1 0 1 0 
Levels: 0 1

âś… Clarified Benefit: Creating a models/ folder ensures your saved model is organized, and future scripts or apps can easily locate it.