Cássia Sampaio

Articles: 34

Joined: Jun 08, 2022

Data Scientist, Research Software Engineer, and teacher. Cassia is passionate about transformative processes in data, technology and life. She is graduated in Philosophy and Information Systems, with a Strictu Sensu Master's Degree in the field of Foundations Of Mathematics.

Recently published

Article

Simple NLP in Python with TextBlob: Lemmatization

TextBlob is a package built on top of two other packages, one of them is called Natural Language Toolkit, known mainly in its abbreviated form as NLTK, and the other is Pattern. NLTK is a traditional package used for text processing or Natural Language Processing (NLP), and Pattern is built...

Cássia Sampaio

Jun 01, 2023·19 min read

Article

Implementing Other SVM Flavors with Python's Scikit-Learn

This guide is the third and final part of three guides about Support Vector Machines (SVMs). In this guide, we will keep working with the forged bank notes use case, have a quick recap about the general idea behind SVMs, understand what is the kernel trick, and implement different types...

Cássia Sampaio

Apr 24, 2023·16 min read

Article

Understanding SVM Hyperparameters

This guide is the second part of three guides about Support Vector Machines (SVMs). In this guide, we will keep working on the forged bank notes use case, understand what SVM parameters are already being set by Scikit-Learn, what are C and Gamma hyperparameters, and how to tune them using...

Cássia Sampaio

Apr 21, 2023·22 min read

Article

Implementing SVM and Kernel SVM with Python's Scikit-Learn

This guide is the first part of three guides about Support Vector Machines (SVMs). In this series, we will work on a forged bank notes use case, learn about the simple SVM, then about SVM hyperparameters and, finally, learn a concept called the kernel trick and explore other types of...

Cássia Sampaio

Apr 17, 2023·45 min read

Article

DBSCAN with Scikit-Learn in Python

You are working in a consulting company as a data scientist. The project you were currently assigned to has data from students who have recently finished courses about finances. The financial company that conducts the courses wants to understand if there are common factors that influence students to purchase the...

Cássia Sampaio

Mar 17, 2023·27 min read

Article

Loading a Pretrained TensorFlow Model into TensorFlow Serving

You are part of a project that will use deep learning to try to identify what is in images - such as cars, ducks, mountains, sky, trees, etc. In this project, two things are important - the first one, is that the deep learning model trains quickly, with efficiency (because...

Cássia Sampaio

Mar 03, 2023·28 min read

Article

Converting JSON to a Dictionary in Python

In the world of software development, exchanging data between different systems is a common task. One popular format for data exchange is JSON (JavaScript Object Notation), which is a lightweight and easy-to-read format for data representation. In Python, JSON data can be easily converted to dictionary objects, and vice versa....

Cássia Sampaio

Feb 21, 2023·10 min read

Article

Definitive Guide to the Random Forest Algorithm with Python and Scikit-Learn

The Random Forest algorithm is one of the most flexible, powerful and widely-used algorithms for classification and regression, built as an ensemble of Decision Trees. If you aren't familiar with these - no worries, we'll cover all of these concepts. In this in-depth hands-on guide, we'll build an intuition on...

Cássia Sampaio

Oct 25, 2022·53 min read

Article

Get Feature Importances for Random Forest with Python and Scikit-Learn

The Random Forest algorithm is a tree-based supervised learning algorithm that uses an ensemble of predictions of many decision trees, either to classify a data point or determine its approximate value. This means it can either be used for classification or regression. When applied for classification, the class of the...

Cássia Sampaio

Oct 18, 2022·12 min read