Introduction
Object detection is a large field in computer vision, and one of the more important applications of computer vision "in the wild". On one end, it can be used to build autonomous systems that navigate agents through environments - be it robots performing tasks or self-driving cars, but this requires intersection with other fields. However, anomaly detection (such as defective products on a line), locating objects within images, facial detection and various other applications of object detection can be done without intersecting other fields.
Advice This short guide is based on a small part of a much larger lesson on object detection belonging to our "Practical Deep Learning for Computer Vision with Python" course.
Object detection isn't as standardized as image classification, mainly because most of the new developments are typically done by individual researchers, maintainers and developers, rather than large libraries and frameworks. It's difficult to package the necessary utility scripts in a framework like TensorFlow or PyTorch and maintain the API guidelines that guided the development so far.
This makes object detection somewhat more complex, typically more verbose (but not always), and less approachable than image classification. One of the major benefits of being in an ecosystem is that it provides you with a way to not search for useful information on good practices, tools and approaches to use. With object detection - most people have to do way more research on the landscape of the field to get a good grip.
Fortunately for the masses - Ultralytics has developed a simple, very powerful and beautiful object detection API around their YOLOv5 implementation.
In this short guide, we'll be performing Object Detection in Python, with YOLOv5 built by Ultralytics in PyTorch, using a set of pre-trained weights trained on MS COCO.
YOLOv5
YOLO (You Only Look Once) is a methodology, as well as a family of models built for object detection. Since the inception in 2015, YOLOv1, YOLOv2 (YOLO9000) and YOLOv3 have been proposed by the same author(s) - and the deep learning community continued with open-sourced advancements in the continuing years.
Ultralytics' YOLOv5 is the first large-scale implementation of YOLO in PyTorch, which made it more accessible than ever before, but the main reason YOLOv5 has gained such a foothold is also the beautifully simple and powerful API built around it. The project abstracts away the unnecessary details, while allowing customizability, practically all usable export formats, and employs amazing practices that make the entire project both efficient and as optimal as it can be. Truly, it's an example of the beauty of open source software implementation, and how it powers the world we live in.
The project provides pre-trained weights on MS COCO, a staple dataset on objects in context, which can be used to both benchmark and build general object detection systems - but most importantly, can be used to transfer general knowledge of objects in context to custom datasets.
Advice: If you'd like to learn more about the YOLO method, as well as competitive methods such as SSDs (Single-Shot Detectors) and the two-stage detector camp including Faster R-CNN and retina net - our course lesson on "Object Detection and Segmentation - R-CNNs, RetinaNet, SSD, YOLO"!
Object Detection with YOLOv5
Before moving forward, make sure you have torch
and torchvision
installed:
! python -m pip install torch torchvision
YOLOv5's got detailed, no-nonsense documentation and a beautifully simple API, as shown on the repo itself, and in the following example:
import torch
# Loading in yolov5s - you can switch to larger models such as yolov5m or yolov5l, or smaller such as yolov5n
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
img = 'https://i.ytimg.com/vi/q71MCWAEfL8/maxresdefault.jpg' # or file, Path, PIL, OpenCV, numpy, list
results = model(img)
fig, ax = plt.subplots(figsize=(16, 12))
ax.imshow(results.render()[0])
plt.show()
The second argument of the hub.load()
method specifies the weights we'd like to use. By choosing anywhere between yolov5n
to yolov5l6
- we're loading in the MS COCO pre-trained weights. For custom models:
model = torch.hub.load('ultralytics/yolov5', 'custom', path='path_to_weights.pt')
In any case - once you pass the input through the model, the returned object includes helpful methods to interpret the results, and we've chosen to render()
them, which returns a NumPy array that we can chuck into an imshow()
call. This results in a nicely formatted:
Saving Results as Files
You can save the results of the inference as a file, using the results.save()
method:
results.save(save_dir='results')
This will create a new directory if it isn't already present, and save the same image we've just plotted as a file.
Cropping Out Objects
You can also decide to crop out the detected objects as individual files. In our case, for every label detected, a number of images can be extracted. This is easily achieved via the results.crop()
method, which creates a runs/detect/
directory, with expN/crops
(where N increases for each run), in which a directory with cropped images is made for each label:
results.crop()
Saved 1 image to runs/detect/exp2
Saved results to runs/detect/exp2
[{'box': [tensor(295.09409),
tensor(277.03699),
tensor(514.16113),
tensor(494.83691)],
'conf': tensor(0.25112),
'cls': tensor(0.),
'label': 'person 0.25',
'im': array([[[167, 186, 165],
[174, 184, 167],
[173, 184, 164],
You can also verify the output file structure with:
! ls runs/detect/exp2/crops
# crops maxresdefault.jpg
! ls runs/detect/exp2/crops
# backpack bus car handbag person 'traffic light' umbrella
Object Counting
By default, when you perform detection or print the results
object - you'll get the number of images that inference was performed on for that results
object (YOLOv5 works with batches of images as well), its resolution and the count of each label detected:
print(results)
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
This results in:
image 1/1: 720x1280 14 persons, 1 car, 3 buss, 6 traffic lights, 1 backpack, 1 umbrella, 1 handbag
Speed: 35.0ms pre-process, 256.2ms inference, 0.7ms NMS per image at shape (1, 3, 384, 640)
Inference with Scripts
Alternatively, you can run the detection script, detect.py
, by cloning the YOLOv5 repository:
$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt
And then running:
$ python detect.py --source img.jpg
Alternatively, you can provide a URL, video file, path to a directory with multiple files, a glob in a path to only match for certain files, a YouTube link or any other HTTP stream. The results are saved into the runs/detect
directory.
Conclusion
In this short guide, we've taken a look at how you can perform object detection with YOLOv5 built using PyTorch.