How to Save and Load Fit Scikit-Learn Scalers
Scikit-Learn's scalers are the backbone of practically all regressors and classifiers built on top of them, scaling the data to a workable range and preparing a latent representation to learn from.
If you'd like to read more about feature scaling, read our "Feature Scaling Data with Scikit-Learn for Machine Learning in Python"!
When you want to push your model to production, you'll want to scale the data in the same way it was scaled during training for your model to work. A fresh scaler that wasn't fit on your training data will never reproduce the same latent representations!
Thankfully, it's easy to save an already fit scaler and load it in a different environment alongside the model, to scale the data in the same way as during training:
import joblib
scaler = sklearn.preprocessing.StandardScaler()
joblib.dump(scaler, 'scaler.save')
scaler = joblib.load('scaler.save')
Putting it into practice:
import joblib
scaler = sklearn.preprocessing.MinMaxScaler()
scaler.fit(X_train)
print('Scaler results:', scaler.transform(X_train)[:1])
joblib.dump(scaler, 'scaler.save')
scaler = joblib.load('scaler.save')
print('Loaded scaler results:', scaler.transform(X_train)[:1])
This results in:
Scaler results: [[0.16060468 0.52941176 0.02742132 0.02532079 0.02561875 0.00184402
0.4293305 0.47310757]]
Loaded scaler results: [[0.16060468 0.52941176 0.02742132 0.02532079 0.02561875 0.00184402
0.4293305 0.47310757]]
The data was scaled in the exact same way across both scaler
objects!
You might also like...
Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.
Great passion for accessible education and promotion of reason, science, humanism, and progress.