Scikit-Learn offers several datasets to play around with - most of them being toy datasets to learn from and test things out.
Some beginners find the comfort of a tabular Pandas
DataFrame format more intuitive than NumPy arrays. Thankfully, you can import a dataset as a
Bunch object containing a
DataFrame by setting
import pandas as pd import numpy as np from sklearn.datasets import fetch_california_housing data = fetch_california_housing(as_frame=True)
Bunch object contains
target our "X" and "y", but they're separate! The
data field is a
While our target is a
0 4.526 1 3.585 2 3.521 3 3.413 4 3.422 ... 20635 0.781 20636 0.771 20637 0.923 20638 0.847 20639 0.894 Name: MedHouseVal, Length: 20640, dtype: float64
The easiest way to combine them is to simply assign the series to a
df = data.data.assign(MedHouseVal=data.target) df
This results in:
Or, you can create a new frame, with the
feature_names, adding the target by simply assigning it to a new column:
df = pd.DataFrame(data=data.data, columns=data.feature_names) df['MedHouseVal'] = data.target df
Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.
Great passion for accessible education and promotion of reason, science, humanism, and progress.