How to Save and Load Scikit-Learn Model
Saving and loading Scikit-Learn models is part of the lifecycle of most models - typically, you'll train them in one runtime and serve them in another.
In this Byte - you'll learn how to save and load a regressor using Scikit-Learn. First off, let's build a simple regressor and fit it:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn import ensemble
X, y = datasets.fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
model = ensemble.RandomForestRegressor().fit(X_train_scaled, y_train)
With the model fit - let's go ahead and save it.
Note: The data is scaled for the model to learn from. You'll want to save the state of this scaler as well, and load it to preprocess the data when the model is used for inference. To learn more about saving scalers - read "How to Save and Load Fit Scikit-Learn Scalers"!
Pickle
pickle
is a generally popular and widely-used serialization package, that can easily store and retrieve Scikit-Learn models:
import pickle
print('Model score:', model.score(X_test_scaled, y_test))
pickle.dump(model, open('rfr_model.sav', 'wb'))
loaded_model = pickle.load(open('rfr_model.sav', 'rb'))
print('Loaded model score:', loaded_model.score(X_test_scaled, y_test))
This results in:
Model score: 0.7974923279678516
Loaded model score: 0.7974923279678516
Joblib
Generally speaking - joblib
is faster with larger arrays, however, for loading and saving models, it makes no practical difference:
import joblib
print('Model score:', model.score(X_test_scaled, y_test))
joblib.dump(model, 'rfr_model.sav')
loaded_model = joblib.load('rfr_model.sav')
print('Loaded model score:', loaded_model.score(X_test_scaled, y_test))
This results in:
Model score: 0.7974923279678516
Loaded model score: 0.7974923279678516
You might also like...
Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.
Great passion for accessible education and promotion of reason, science, humanism, and progress.