Exploratory Data Analysis

David Landup
David Landup

Loading the Data

We'll start out by downloading the dataset and loading it in. We'll be working with the Breast Histopathology Images dataset. It contains 198738 IDC(-) image patches and 78786 IDC(+) image patches.

  • IDC(-) refers to benign cases
  • IDC(+) refers to malignant cases

Note: IDC(-) in this dataset implies that the patient doesn't have Invasive Ductal Carcinoma. It implies that they have a benign case or normal tissue, rather than a malignant case. Besides IDC, another condition exists - Non-Invasive Ductal Carcinoma also known as Ductal carcinoma in situ (DCIS).

The dataset comes from a 2016 study - "Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases" by Andrew Janowczyk and Anant Madabhushi. Their study focused on several tasks, one of which was IDC clasification, for which they had an F-score of 0.7648 on 50k testing patches.

The dataset we're working with is derived from 279 patients, each of which has a unique ID. Each patient has a dedicated folder, named by their ID, with two subfolders - 0 and 1. The folder named 0 consists of images of benign tissue samples (those without IDC markers). The folder named 1 consists of images of malignant tissue samples (those containing IDC markers).

Start project to continue
Lessson 2/4
You must first start the project before tracking progress.
Mark completed

© 2013-2024 Stack Abuse. All rights reserved.