Data Scientist, Research Software Engineer, and teacher. Cassia is passionate about transformative processes in data, technology and life. She is graduated in Philosophy and Information Systems, with a Strictu Sensu Master's Degree in the field of Foundations Of Mathematics.
Definitive Guide to the Random Forest Algorithm with Python and Scikit-Learn
The Random Forest algorithm is one of the most flexible, powerful and widely-used algorithms for classification and regression, built as an ensemble of Decision Trees. If you aren't familiar with these - no worries, we'll cover all of these concepts. In this in-depth hands-on guide, we'll build an intuition on...
Get Feature Importances for Random Forest with Python and Scikit-Learn
The Random Forest algorithm is a tree-based supervised learning algorithm that uses an ensemble of predicitions of many decision trees, either to classify a data point or determine it's approximate value. This means it can either be used for classification or regression. When applied for classification, the class of the...
Definitive Guide to Logistic Regression in Python
Sometimes confused with linear regression by novices - due to sharing the term regression - logistic regression is far different from linear regression. While linear regression predicts values such as 2, 2.45, 6.77 or continuous values, making it a regression algorithm, logistic regression predicts values such as 0...
Guide to the K-Nearest Neighbors Algorithm in Python and Scikit-Learn
The K-nearest Neighbors (KNN) algorithm is a type of supervised machine learning algorithm used for classification, regression as well as outlier detection. It is extremely easy to implement in its most basic form but can perform fairly complex tasks. It is a lazy learning algorithm since it doesn't have a...
K-Means Clustering with the Elbow method
K-means clustering is an unsupervised learning algorithm that groups data based on each point euclidean distance to a central point called centroid. The centroids are defined by the means of all points that are in the same cluster. The algorithm first chooses random points as centroids and then iterates adjusting...
Definitive Guide to K-Means Clustering with Scikit-Learn
K-Means clustering is one of the most widely used unsupervised machine learning algorithms that form clusters of data based on the similarity between data instances. In this guide, we will first take a look at a simple example to understand how the K-Means algorithm works before implementing it using Scikit-Learn....
K-Means Elbow Method and Silhouette Analysis with Yellowbrick and Scikit-Learn
K-Means is one of the most popular clustering algorithms. By having central points to a cluster, it groups other points based on their distance to that central point. A downside of K-Means is having to choose the number of clusters, K, prior to running the algorithm that groups points. If...
Definitive Guide to Hierarchical Clustering with Python and Scikit-Learn
In this guide, we will focus on implementing the Hierarchical Clustering Algorithm with Scikit-Learn to solve a marketing problem. After reading the guide, you will understand: When to apply Hierarchical Clustering How to visualize the dataset to understand if it is fit for clustering How to pre-process features and engineer...
Linear Regression in Python with Scikit-Learn
If you had studied longer, would your overall scores get any better? One way of answering this question is by having data on how long you studied for and what scores you got. We can then try to see if there is a pattern in that data, and if in...