Load Scikit-Learn Dataset as Pandas DataFrame

Load Scikit-Learn Dataset as Pandas DataFrame

Scikit-Learn offers several datasets to play around with - most of them being toy datasets to learn from and test things out.

Some beginners find the comfort of a tabular Pandas DataFrame format more intuitive than NumPy arrays. Thankfully, you can import a dataset as a Bunch object containing a DataFrame by setting as_frame to True:

import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing

data = fetch_california_housing(as_frame=True)

This Bunch object contains data and target our "X" and "y", but they're separate! The data field is a DataFrame:

data.data

While our target is a Series:

data.target

Get free courses, guided projects, and more

No spam ever. Unsubscribe anytime. Read our Privacy Policy.

0        4.526
1        3.585
2        3.521
3        3.413
4        3.422
         ...  
20635    0.781
20636    0.771
20637    0.923
20638    0.847
20639    0.894
Name: MedHouseVal, Length: 20640, dtype: float64

The easiest way to combine them is to simply assign the series to a DataFrame:

df = data.data.assign(MedHouseVal=data.target)
df

This results in:

Or, you can create a new frame, with the data and feature_names, adding the target by simply assigning it to a new column:

df = pd.DataFrame(data=data.data, columns=data.feature_names)
df['MedHouseVal'] = data.target
df
Was this helpful?
David LandupAuthor

Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.

Great passion for accessible education and promotion of reason, science, humanism, and progress.

Project

Bank Note Fraud Detection with SVMs in Python with Scikit-Learn

# python# machine learning# scikit-learn# data science

Can you tell the difference between a real and a fraud bank note? Probably! Can you do it for 1000 bank notes? Probably! But it...

David Landup
Cássia Sampaio
Details
Course

Data Visualization in Python with Matplotlib and Pandas

# python# pandas# matplotlib

Data Visualization in Python with Matplotlib and Pandas is a course designed to take absolute beginners to Pandas and Matplotlib, with basic Python knowledge, and...

David Landup
David Landup
Details

© 2013-2023 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms