How to Save and Load Scikit-Learn Model

Saving and loading Scikit-Learn models is part of the lifecycle of most models - typically, you'll train them in one runtime and serve them in another.

In this Byte - you'll learn how to save and load a regressor using Scikit-Learn. First off, let's build a simple regressor and fit it:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn import ensemble

X, y = datasets.fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = ensemble.RandomForestRegressor().fit(X_train_scaled, y_train)

With the model fit - let's go ahead and save it.

Note: The data is scaled for the model to learn from. You'll want to save the state of this scaler as well, and load it to preprocess the data when the model is used for inference. To learn more about saving scalers - read "How to Save and Load Fit Scikit-Learn Scalers"!

Pickle

pickle is a generally popular and widely-used serialization package, that can easily store and retrieve Scikit-Learn models:

import pickle

print('Model score:', model.score(X_test_scaled, y_test))
pickle.dump(model, open('rfr_model.sav', 'wb'))
                                            
loaded_model = pickle.load(open('rfr_model.sav', 'rb'))                 
print('Loaded model score:', loaded_model.score(X_test_scaled, y_test))

This results in:

Model score: 0.7974923279678516
Loaded model score: 0.7974923279678516

Joblib

Generally speaking - joblib is faster with larger arrays, however, for loading and saving models, it makes no practical difference:

import joblib

print('Model score:', model.score(X_test_scaled, y_test))
joblib.dump(model, 'rfr_model.sav')
 
loaded_model = joblib.load('rfr_model.sav')
print('Loaded model score:', loaded_model.score(X_test_scaled, y_test))

This results in:

Model score: 0.7974923279678516
Loaded model score: 0.7974923279678516

# python # machine learning # scikit-learn

Last Updated: July 1st, 2022

Was this helpful?

You might also like...

David LandupAuthor

Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.

Great passion for accessible education and promotion of reason, science, humanism, and progress.

Free

Monitor with Ping Bot

# monitoring

# uptime

# observability

Reliable monitoring for your app, databases, infrastructure, and the vendors they rely on. Ping Bot is a powerful uptime and performance monitoring tool that helps notify you and resolve issues before they affect your customers.

Learn more