Introduction
Object detection is a large field in computer vision, and one of the more important applications of computer vision "in the wild". On one end, it can be used to build autonomous systems that navigate agents through environments - be it robots performing tasks or self-driving cars, but this requires intersection with other fields. However, anomaly detection (such as defective products on a line), locating objects within images, facial detection and various other applications of object detection can be done without intersecting other fields.
Advice This short guide is based on a small part of a much larger lesson on object detection belonging to our "Practical Deep Learning for Computer Vision with Python" course.
Object detection isn't as standardized as image classification, mainly because most of the new developments are typically done by individual researchers, maintainers and developers, rather than large libraries and frameworks. It's difficult to package the necessary utility scripts in a framework like TensorFlow or PyTorch and maintain the API guidelines that guided the development so far.
This makes object detection somewhat more complex, typically more verbose (but not always), and less approachable than image classification. One of the major benefits of being in an ecosystem is that it provides you with a way to not search for useful information on good practices, tools and approaches to use. With object detection - most have to do way more research on the landscape of the field to get a good grip.
In this short guide, we'll be performing Object Detection and Instance Segmentation, using a Mask R-CNN, in Python, with the Detectron2 Platform, written in PyTorch.
Meta AI's Detectron2 - Instance Segmentation and Object Detection
Detectron2 is Meta AI (formerly FAIR - Facebook AI Research)'s open source object detection, segmentation and pose estimation package - all in one. Given an input image, it can return the labels, bounding boxes, confidence scores, masks and skeletons of objects. This is well-represented on the repository's page:
It's meant to be used as a library on the top of which you can build research projects. It offers a model zoo with most implementations relying on Mask R-CNN and R-CNNs in general, alongside RetinaNet. They also have a pretty decent documentation. Let's run an exemplary inference script!
First, let's install the dependencies:
$ pip install pyyaml==5.1
$ pip install 'git+https://github.com/facebookresearch/detectron2.git'
Next, we'll import the Detectron2 utilities - this is where framework-domain knowledge comes into play. You can construct a detector using the DefaultPredictor
class, by passing in a configuration object that sets it up. The Visualizer
offers support for visualizing results. MetadataCatalog
and DatasetCatalog
belong to Detectron2's data API and offer information on built-in datasets as well as their metadata.
Let's import the classes and functions we'll be using:
import torch, detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
Using requests
, we'll download an image and save it to our local drive:
import matplotlib.pyplot as plt
import requests
response = requests.get('http://images.cocodataset.org/val2017/000000439715.jpg')
open("input.jpg", "wb").write(response.content)
im = cv2.imread("./input.jpg")
fig, ax = plt.subplots(figsize=(18, 8))
ax.imshow(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))
This results in:
Now, we load the configuration, enact changes if need be (the models run on GPU by default, so if you don't have a GPU, you'll want to set the device to "cpu"
in the config):
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
# If you don't have a GPU and CUDA enabled, the next line is required
# cfg.MODEL.DEVICE = "cpu"
Here, we specify which model we'd like to run from the model_zoo
. We've imported an instance segmentation model, based on the Mask R-CNN architecture, and with a ResNet50 backbone. Depending on what you'd like to achieve (keypoint detection, instance segmentation, panoptic segmentation or object detection), you'll load in the appropriate model.
Finally, we can construct a predictor with this cfg
and run it on the inputs! The Visualizer
class is used to draw predictions on the image (in this case, segmented instances, classes and bounding boxes:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
predictor = DefaultPredictor(cfg)
outputs = predictor(im)
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
fig, ax = plt.subplots(figsize=(18, 8))
ax.imshow(out.get_image()[:, :, ::-1])
Finally, this results in: