Object Detection and Instance Segmentation in Python with Detectron2

Object Detection and Instance Segmentation in Python with Detectron2


Object detection is a large field in computer vision, and one of the more important applications of computer vision "in the wild". On one end, it can be used to build autonomous systems that navigate agents through environments - be it robots performing tasks or self-driving cars, but this requires intersection with other fields. However, anomaly detection (such as defective products on a line), locating objects within images, facial detection and various other applications of object detection can be done without intersecting other fields.

Advice This short guide is based on a small part of a much larger lesson on object detection belonging to our "Practical Deep Learning for Computer Vision with Python" course.

Object detection isn't as standardized as image classification, mainly because most of the new developments are typically done by individual researchers, maintainers and developers, rather than large libraries and frameworks. It's difficult to package the necessary utility scripts in a framework like TensorFlow or PyTorch and maintain the API guidelines that guided the development so far.

This makes object detection somewhat more complex, typically more verbose (but not always), and less approachable than image classification. One of the major benefits of being in an ecosystem is that it provides you with a way to not search for useful information on good practices, tools and approaches to use. With object detection - most have to do way more research on the landscape of the field to get a good grip.

In this short guide, we'll be performing Object Detection and Instance Segmentation, using a Mask R-CNN, in Python, with the Detectron2 Platform, written in PyTorch.

Meta AI's Detectron2 - Instance Segmentation and Object Detection

Detectron2 is Meta AI (formerly FAIR - Facebook AI Research)'s open source object detection, segmentation and pose estimation package - all in one. Given an input image, it can return the labels, bounding boxes, confidence scores, masks and skeletons of objects. This is well-represented on the repository's page:

It's meant to be used as a library on the top of which you can build research projects. It offers a model zoo with most implementations relying on Mask R-CNN and R-CNNs in general, alongside RetinaNet. They also have a pretty decent documentation. Let's run an exemplary inference script!

First, let's install the dependencies:

$ pip install pyyaml==5.1
$ pip install 'git+https://github.com/facebookresearch/detectron2.git'

Next, we'll import the Detectron2 utilities - this is where framework-domain knowledge comes into play. You can construct a detector using the DefaultPredictor class, by passing in a configuration object that sets it up. The Visualizer offers support for visualizing results. MetadataCatalog and DatasetCatalog belong to Detectron2's data API and offer information on built-in datasets as well as their metadata.

Let's import the classes and functions we'll be using:

import torch, detectron2
from detectron2.utils.logger import setup_logger

from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog

Using requests, we'll download an image and save it to our local drive:

import matplotlib.pyplot as plt
import requests
response = requests.get('http://images.cocodataset.org/val2017/000000439715.jpg')
open("input.jpg", "wb").write(response.content)
im = cv2.imread("./input.jpg")
fig, ax = plt.subplots(figsize=(18, 8))
ax.imshow(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))

This results in:

Now, we load the configuration, enact changes if need be (the models run on GPU by default, so if you don't have a GPU, you'll want to set the device to 'cpu' in the config):

cfg = get_cfg()

cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
# If you don't have a GPU and CUDA enabled, the next line is required
# cfg.MODEL.DEVICE = "cpu"

Here, we specify which model we'd like to run from the model_zoo. We've imported an instance segmentation model, based on the Mask R-CNN architecture, and with a ResNet50 backbone. Depending on what you'd like to achieve (keypoint detection, instance segmentation, panoptic segmentation or object detection), you'll load in the appropriate model.

Finally, we can construct a predictor with this cfg and run it on the inputs! The Visualizer class is used to draw predictions on the image (in this case, segmented instances, classes and bounding boxes:

predictor = DefaultPredictor(cfg)
outputs = predictor(im)

v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
fig, ax = plt.subplots(figsize=(18, 8))
ax.imshow(out.get_image()[:, :, ::-1])

Finally, this results in:

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

Going Further - Practical Deep Learning for Computer Vision

Your inquisitive nature makes you want to go further? We recommend checking out our Course: "Practical Deep Learning for Computer Vision with Python".

Another Computer Vision Course?

We won't be doing classification of MNIST digits or MNIST fashion. They served their part a long time ago. Too many learning resources are focusing on basic datasets and basic architectures before letting advanced black-box architectures shoulder the burden of performance.

We want to focus on demystification, practicality, understanding, intuition and real projects. Want to learn how you can make a difference? We'll take you on a ride from the way our brains process images to writing a research-grade deep learning classifier for breast cancer to deep learning networks that "hallucinate", teaching you the principles and theory through practical work, equipping you with the know-how and tools to become an expert at applying deep learning to solve computer vision.

What's inside?

  • The first principles of vision and how computers can be taught to "see"
  • Different tasks and applications of computer vision
  • The tools of the trade that will make your work easier
  • Finding, creating and utilizing datasets for computer vision
  • The theory and application of Convolutional Neural Networks
  • Handling domain shift, co-occurrence, and other biases in datasets
  • Transfer Learning and utilizing others' training time and computational resources for your benefit
  • Building and training a state-of-the-art breast cancer classifier
  • How to apply a healthy dose of skepticism to mainstream ideas and understand the implications of widely adopted techniques
  • Visualizing a ConvNet's "concept space" using t-SNE and PCA
  • Case studies of how companies use computer vision techniques to achieve better results
  • Proper model evaluation, latent space visualization and identifying the model's attention
  • Performing domain research, processing your own datasets and establishing model tests
  • Cutting-edge architectures, the progression of ideas, what makes them unique and how to implement them
  • KerasCV - a WIP library for creating state of the art pipelines and models
  • How to parse and read papers and implement them yourself
  • Selecting models depending on your application
  • Creating an end-to-end machine learning pipeline
  • Landscape and intuition on object detection with Faster R-CNNs, RetinaNets, SSDs and YOLO
  • Instance and semantic segmentation
  • Real-Time Object Recognition with YOLOv5
  • Training YOLOv5 Object Detectors
  • Working with Transformers using KerasNLP (industry-strength WIP library)
  • Integrating Transformers with ConvNets to generate captions of images
  • DeepDream


Instance segmentation goes one step beyond semantic segmentation, and notes the qualitative difference between individual instances of a class (person 1, person 2, etc...) rather than just whether they belong to one. In a way - it's pixel-level classification.

In this short guide, we've taken a quick look at how Detectron2 makes instance segmentation and object detection easy and accessible through their API, using a Mask R-CNN.

Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

David LandupAuthor

Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.

Great passion for accessible education and promotion of reason, science, humanism, and progress.

20% off

Practical Deep Learning for Computer Vision with Python

# tensorflow# computer vision# Object Detection# deep learning

DeepDream with TensorFlow/Keras Keypoint Detection with Detectron2 Image Captioning with KerasNLP Transformers and ConvNets Semantic Segmentation with DeepLabV3+ in Keras Real-Time Object Detection from...

David Landup
Jovana Ninkovic

Building Your First Convolutional Neural Network With Keras

# python# machine learning# keras# tensorflow

Most resources start with pristine datasets, start at importing and finish at validation. There's much more to know. Why was a class predicted? Where was...

David Landup
David Landup

© 2013-2023 Stack Abuse. All rights reserved.