Saving and Loading Scikit-learn Models

Saving and loading models is essential for deploying machine learning solutions and avoiding retraining. Scikit-learn supports model persistence using the joblib and pickle libraries, which serialize and deserialize Python objects.

Key Characteristics

Enables reuse of trained models
Reduces computational overhead
Ensures reproducibility
Compatible with most Scikit-learn objects

Basic Rules

Use joblib for Scikit-learn models (better with large numpy arrays)
Use pickle for general Python object serialization
Save preprocessing steps along with the model
Validate reloaded models before use

Syntax Table

SL NO	Technique	Syntax Example	Description
1	Save with joblib	`joblib.dump(model, 'model.pkl')`	Saves model to file
2	Load with joblib	`model = joblib.load('model.pkl')`	Loads model from file
3	Save with pickle	`pickle.dump(model, open('file.pkl', 'wb'))`	Saves using pickle
4	Load with pickle	`model = pickle.load(open('file.pkl', 'rb'))`	Loads using pickle
5	Save pipeline	`joblib.dump(pipe, 'pipeline.pkl')`	Saves preprocessing and model pipeline

Syntax Explanation

1. Saving a Model with joblib

What is it?
Serializes a trained model and saves it to disk using joblib, which is optimized for objects containing large NumPy arrays.

Syntax:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import joblib

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = RandomForestClassifier()
model.fit(X_train, y_train)
joblib.dump(model, 'rf_model.pkl')

Explanation:

Trains a model and saves it using joblib
Creates a file rf_model.pkl containing the model

2. Loading a Model with joblib

What is it?
Deserializes a model file created with joblib and loads it back into memory.

Syntax:

model = joblib.load('rf_model.pkl')
y_pred = model.predict(X_test)

Explanation:

Reloads the saved model
Predicts with no need to retrain

3. Saving a Model with pickle

What is it?
Serializes a trained model using Python’s built-in pickle module for general-purpose object saving.

Syntax:

import pickle
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

Explanation:

Uses Python’s built-in pickle module
Works for general Python objects including models

4. Loading a Model with pickle

What is it?
Deserializes a file saved using pickle and restores the model object.

Syntax:

with open('model.pkl', 'rb') as f:
    model = pickle.load(f)

Explanation:

Reads binary file and loads the original model object

5. Saving a Pipeline

What is it?
Saves an entire Scikit-learn Pipeline including both preprocessing steps and the final estimator.

Syntax:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import joblib

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('lr', LogisticRegression())
])
pipe.fit(X_train, y_train)
joblib.dump(pipe, 'pipeline.pkl')

Explanation:

Saves both preprocessing and model steps
Useful for production deployments

Real-Life Project: Save and Reload KNN Pipeline

Project Name

Reusable KNN Pipeline

Project Overview

Train a KNN model with preprocessing and persist it for reuse.

Project Goal

Save, reload, and reuse a Scikit-learn pipeline with minimal reconfiguration.

Code for This Project

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
import joblib

# Prepare data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create pipeline
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('knn', KNeighborsClassifier())
])
pipe.fit(X_train, y_train)

# Save pipeline
joblib.dump(pipe, 'knn_pipeline.pkl')

# Load pipeline
loaded_pipe = joblib.load('knn_pipeline.pkl')
print("Loaded Pipeline Accuracy:", loaded_pipe.score(X_test, y_test))

Expected Output

Model accuracy from reloaded pipeline
Identical output to original model

Common Mistakes to Avoid

❌ Saving only the model without preprocessing steps
❌ Forgetting to test the reloaded model
❌ Using pickle with large numpy arrays (prefer joblib)

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

🔗 Available on Amazon

Key Characteristics

Basic Rules

Syntax Table

Syntax Explanation

1. Saving a Model with joblib

2. Loading a Model with joblib

3. Saving a Model with pickle

4. Loading a Model with pickle

5. Saving a Pipeline

Real-Life Project: Save and Reload KNN Pipeline

Project Name

Project Overview

Project Goal

Code for This Project

Expected Output

Common Mistakes to Avoid

Further Reading Recommendation

📘 Hands-On Python and Scikit-Learn: A Practical Guide to Machine Learning by Sarful Hassan

Login