Load Scikit-Learn Dataset as Pandas DataFrame
Scikit-Learn offers several datasets to play around with - most of them being toy datasets to learn from and test things out.
Some beginners find the comfort of a tabular Pandas DataFrame
format more intuitive than NumPy arrays. Thankfully, you can import a dataset as a Bunch
object containing a DataFrame
by setting as_frame
to True
:
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing(as_frame=True)
This Bunch
object contains data
and target
our "X" and "y", but they're separate! The data
field is a DataFrame
:
data.data
While our target is a Series
:
data.target
0 4.526
1 3.585
2 3.521
3 3.413
4 3.422
...
20635 0.781
20636 0.771
20637 0.923
20638 0.847
20639 0.894
Name: MedHouseVal, Length: 20640, dtype: float64
The easiest way to combine them is to simply assign the series to a DataFrame
:
df = data.data.assign(MedHouseVal=data.target)
df
This results in:
Or, you can create a new frame, with the data
and feature_names
, adding the target by simply assigning it to a new column:
df = pd.DataFrame(data=data.data, columns=data.feature_names)
df['MedHouseVal'] = data.target
df
Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.
Great passion for accessible education and promotion of reason, science, humanism, and progress.