data science

Articles: 64

Recently published

Article

Loading a Pretrained TensorFlow Model into TensorFlow Serving

You are part of a project that will use deep learning to try to identify what is in images - such as cars, ducks, mountains, sky, trees, etc. In this project, two things are important - the first one, is that the deep learning model trains quickly, with efficiency (because...

Cássia Sampaio

Mar 03, 2023·28 min read

Byte

Plot Decision Trees Using Python and Scikit-Learn

Decision trees are widely used in machine learning problems. We'll assume you are already familiar with the concept of decision trees and you've just trained your tree based algorithm! Advice: If not, you can read our in-depth guide on "Decision Trees in Python with Scikit-Learn guide". Now, it...

Cássia Sampaio

Nov 12, 2022·7 min read

Article

Definitive Guide to the Random Forest Algorithm with Python and Scikit-Learn

The Random Forest algorithm is one of the most flexible, powerful and widely-used algorithms for classification and regression, built as an ensemble of Decision Trees. If you aren't familiar with these - no worries, we'll cover all of these concepts. In this in-depth hands-on guide, we'll build an intuition on...

Cássia Sampaio

Oct 25, 2022·53 min read

Article

Get Feature Importances for Random Forest with Python and Scikit-Learn

The Random Forest algorithm is a tree-based supervised learning algorithm that uses an ensemble of predictions of many decision trees, either to classify a data point or determine its approximate value. This means it can either be used for classification or regression. When applied for classification, the class of the...

Cássia Sampaio

Oct 18, 2022·12 min read

Article

Definitive Guide to Logistic Regression in Python

Sometimes confused with linear regression by novices - due to sharing the term regression - logistic regression is far different from linear regression. While linear regression predicts values such as 2, 2.45, 6.77 or continuous values, making it a regression algorithm, logistic regression predicts values such as 0...

Cássia Sampaio

Sep 03, 2022·53 min read

Article

How to Fill NaNs in a Pandas DataFrame

Missing values are common and occur either due to human error, instrument error, processing from another team, or otherwise just a lack of data for a certain observation. In this Byte, we'll take a look at how to fill NaNs in a DataFrame, if you choose to handle NaNs by...

David Landup

Jul 12, 2022·6 min read

Article

K-Means Clustering with the Elbow method

K-means clustering is an unsupervised learning algorithm that groups data based on each point euclidean distance to a central point called centroid. The centroids are defined by the means of all points that are in the same cluster. The algorithm first chooses random points as centroids and then iterates adjusting...

Cássia Sampaio

Jul 08, 2022·6 min read

Article

Definitive Guide to K-Means Clustering with Scikit-Learn

K-Means clustering is one of the most widely used unsupervised machine learning algorithms that form clusters of data based on the similarity between data instances. In this guide, we will first take a look at a simple example to understand how the K-Means algorithm works before implementing it using Scikit-Learn....

Cássia Sampaio

Jul 07, 2022·38 min read

Byte

Load Scikit-Learn Dataset as Pandas DataFrame

Scikit-Learn offers several datasets to play around with - most of them being toy datasets to learn from and test things out. Some beginners find the comfort of a tabular Pandas DataFrame format more intuitive than NumPy arrays. Thankfully, you can import a dataset as a Bunch object containing a...

David Landup

Jul 06, 2022·3 min read