scikit-learn

Articles: 51

Recently published

Article

Guide to the K-Nearest Neighbors Algorithm in Python and Scikit-Learn

The K-nearest Neighbors (KNN) algorithm is a type of supervised machine learning algorithm used for classification, regression as well as outlier detection. It is extremely easy to implement in its most basic form but can perform fairly complex tasks. It is a lazy learning algorithm since it doesn't have a...

Cássia Sampaio

Aug 21, 2022·54 min read

Article

K-Means Clustering with the Elbow method

K-means clustering is an unsupervised learning algorithm that groups data based on each point euclidean distance to a central point called centroid. The centroids are defined by the means of all points that are in the same cluster. The algorithm first chooses random points as centroids and then iterates adjusting...

Cássia Sampaio

Jul 08, 2022·6 min read

Article

Definitive Guide to K-Means Clustering with Scikit-Learn

K-Means clustering is one of the most widely used unsupervised machine learning algorithms that form clusters of data based on the similarity between data instances. In this guide, we will first take a look at a simple example to understand how the K-Means algorithm works before implementing it using Scikit-Learn....

Cássia Sampaio

Jul 07, 2022·38 min read

Byte

Load Scikit-Learn Dataset as Pandas DataFrame

Scikit-Learn offers several datasets to play around with - most of them being toy datasets to learn from and test things out. Some beginners find the comfort of a tabular Pandas DataFrame format more intuitive than NumPy arrays. Thankfully, you can import a dataset as a Bunch object containing a...

David Landup

Jul 06, 2022·3 min read

Article

K-Means Elbow Method and Silhouette Analysis with Yellowbrick and Scikit Learn

K-Means is one of the most popular clustering algorithms. By having central points to a cluster, it groups other points based on their distance to that central point. A downside of K-Means is having to choose the number of clusters, K, prior to running the algorithm that groups points. If...

Cássia Sampaio

Jul 04, 2022·7 min read

Byte

How to Save and Load XGBoost Models

Models are more often than not trained to be deployed to production and to give meaningful predictions for new input. To move them outside of your training environment - you'll want to save a trained model and load it in a different one. XGBoost is a great, flexible and blazingly...

David Landup

Jul 03, 2022·7 min read

Byte

Agglomerative Hierarchical Clustering in Python with Scikit-Learn

Agglomerative Hierarchical Clustering is an unsupervised learning algorithm that links data points based on distance to form a cluster, and then links those already clustered points into another cluster, creating a structure of clusters with sub-clusters. It is easily implemented using Scikit-Learn which already has single, average, complete and ward...

Cássia Sampaio

Jul 02, 2022·8 min read

Article

Definitive Guide to Hierarchical Clustering with Python and Scikit-Learn

In this guide, we will focus on implementing the Hierarchical Clustering Algorithm with Scikit-Learn to solve a marketing problem. After reading the guide, you will understand: When to apply Hierarchical Clustering How to visualize the dataset to understand if it is fit for clustering How to pre-process features and engineer...

Cássia Sampaio

Jul 01, 2022·52 min read

Byte

End-to-End Random Forest Regression Pipeline with Scikit-Learn

Regression is a technique in statistics and machine learning, in which the value of an independent variable is predicted by its relationship with other variables. Frameworks like Scikit-Learn make it easier than ever to perform regression with a wide variety of models - one of the strongest ones being built...

David Landup

Jul 01, 2022·4 min read