End-to-End Random Forest Regression Pipeline with Scikit-Learn

Regression is a technique in statistics and machine learning, in which the value of an independent variable is predicted by its relationship with other variables.

Frameworks like Scikit-Learn make it easier than ever to perform regression with a wide variety of models - one of the strongest ones being built on the Random Forest algorithm.

If you'd like to read an in-depth guide to Random Forests, read our "Random Forest Algorithm with Python and Scikit-Learn"!

Using Scikit-Learn pipelines, you can build an end-to-end pipeline, load a dataset, perform feature scaling and and supply the data into a regression model in as little as 4 lines of code:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline

X, y = datasets.fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

pipeline = Pipeline([('scaler', MinMaxScaler()), ('regressor', RandomForestRegressor())])
pipeline.fit(X_train, y_train)

r2 = pipeline.score(X_test, y_test)
print(f"RFR: {r2}") # RFR: 0.811299774049597

Alternatively, you can separate the steps outside of the pipeline, which is a bit more verbose, yet more flexible:

X, y = datasets.fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

rfr = ensemble.RandomForestRegressor().fit(X_train_scaled, y_train)

r2 = rfr.score(X_test_scaled, y_test)
print(f"RFR: {r2}") # RFR: 0.8039270852048439
Last Updated: July 1st, 2022
Was this helpful?
David LandupAuthor

Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.

Great passion for accessible education and promotion of reason, science, humanism, and progress.


Bank Note Fraud Detection with SVMs in Python with Scikit-Learn

# python# machine learning# scikit-learn# data science

Can you tell the difference between a real and a fraud bank note? Probably! Can you do it for 1000 bank notes? Probably! But it...

David Landup
Cássia Sampaio

© 2013-2024 Stack Abuse. All rights reserved.