Object Detection Inference in Python with YOLOv5 and PyTorch

Introduction

Object detection is a large field in computer vision, and one of the more important applications of computer vision "in the wild". On one end, it can be used to build autonomous systems that navigate agents through environments - be it robots performing tasks or self-driving cars, but this requires intersection with other fields. However, anomaly detection (such as defective products on a line), locating objects within images, facial detection and various other applications of object detection can be done without intersecting other fields.

Advice This short guide is based on a small part of a much larger lesson on object detection belonging to our "Practical Deep Learning for Computer Vision with Python" course.

Object detection isn't as standardized as image classification, mainly because most of the new developments are typically done by individual researchers, maintainers and developers, rather than large libraries and frameworks. It's difficult to package the necessary utility scripts in a framework like TensorFlow or PyTorch and maintain the API guidelines that guided the development so far.

This makes object detection somewhat more complex, typically more verbose (but not always), and less approachable than image classification. One of the major benefits of being in an ecosystem is that it provides you with a way to not search for useful information on good practices, tools and approaches to use. With object detection - most people have to do way more research on the landscape of the field to get a good grip.

Fortunately for the masses - Ultralytics has developed a simple, very powerful and beautiful object detection API around their YOLOv5 implementation.

In this short guide, we'll be performing Object Detection in Python, with YOLOv5 built by Ultralytics in PyTorch, using a set of pre-trained weights trained on MS COCO.

YOLOv5

YOLO (You Only Look Once) is a methodology, as well as a family of models built for object detection. Since the inception in 2015, YOLOv1, YOLOv2 (YOLO9000) and YOLOv3 have been proposed by the same author(s) - and the deep learning community continued with open-sourced advancements in the continuing years.

Ultralytics' YOLOv5 is the first large-scale implementation of YOLO in PyTorch, which made it more accessible than ever before, but the main reason YOLOv5 has gained such a foothold is also the beautifully simple and powerful API built around it. The project abstracts away the unnecessary details, while allowing customizability, practically all usable export formats, and employs amazing practices that make the entire project both efficient and as optimal as it can be. Truly, it's an example of the beauty of open source software implementation, and how it powers the world we live in.

The project provides pre-trained weights on MS COCO, a staple dataset on objects in context, which can be used to both benchmark and build general object detection systems - but most importantly, can be used to transfer general knowledge of objects in context to custom datasets.

Advice: If you'd like to learn more about the YOLO method, as well as competitive methods such as SSDs (Single-Shot Detectors) and the two-stage detector camp including Faster R-CNN and retina net - our course lesson on "Object Detection and Segmentation - R-CNNs, RetinaNet, SSD, YOLO"!

Object Detection with YOLOv5

Before moving forward, make sure you have torch and torchvision installed:

! python -m pip install torch torchvision

YOLOv5's got detailed, no-nonsense documentation and a beautifully simple API, as shown on the repo itself, and in the following example:

import torch
# Loading in yolov5s - you can switch to larger models such as yolov5m or yolov5l, or smaller such as yolov5n
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
img = 'https://i.ytimg.com/vi/q71MCWAEfL8/maxresdefault.jpg'  # or file, Path, PIL, OpenCV, numpy, list
results = model(img)
fig, ax = plt.subplots(figsize=(16, 12))
ax.imshow(results.render()[0])
plt.show()

The second argument of the hub.load() method specifies the weights we'd like to use. By choosing anywhere between yolov5n to yolov5l6 - we're loading in the MS COCO pre-trained weights. For custom models:

model = torch.hub.load('ultralytics/yolov5', 'custom', path='path_to_weights.pt')

In any case - once you pass the input through the model, the returned object includes helpful methods to interpret the results, and we've chosen to render() them, which returns a NumPy array that we can chuck into an imshow() call. This results in a nicely formatted:

Saving Results as Files

You can save the results of the inference as a file, using the results.save() method:

results.save(save_dir='results')

This will create a new directory if it isn't already present, and save the same image we've just plotted as a file.

Cropping Out Objects

You can also decide to crop out the detected objects as individual files. In our case, for every label detected, a number of images can be extracted. This is easily achieved via the results.crop() method, which creates a runs/detect/ directory, with expN/crops (where N increases for each run), in which a directory with cropped images is made for each label:

results.crop()
Saved 1 image to runs/detect/exp2
Saved results to runs/detect/exp2

[{'box': [tensor(295.09409),
   tensor(277.03699),
   tensor(514.16113),
   tensor(494.83691)],
  'conf': tensor(0.25112),
  'cls': tensor(0.),
  'label': 'person 0.25',
  'im': array([[[167, 186, 165],
          [174, 184, 167],
          [173, 184, 164],

You can also verify the output file structure with:

! ls runs/detect/exp2/crops
# crops  maxresdefault.jpg

! ls runs/detect/exp2/crops
# backpack   bus   car   handbag   person  'traffic light'   umbrella

Object Counting

By default, when you perform detection or print the results object - you'll get the number of images that inference was performed on for that results object (YOLOv5 works with batches of images as well), its resolution and the count of each label detected:

print(results)
Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

This results in:

image 1/1: 720x1280 14 persons, 1 car, 3 buss, 6 traffic lights, 1 backpack, 1 umbrella, 1 handbag
Speed: 35.0ms pre-process, 256.2ms inference, 0.7ms NMS per image at shape (1, 3, 384, 640)

Inference with Scripts

Alternatively, you can run the detection script, detect.py, by cloning the YOLOv5 repository:

$ git clone https://github.com/ultralytics/yolov5 
$ cd yolov5
$ pip install -r requirements.txt

And then running:

$ python detect.py --source img.jpg

Alternatively, you can provide a URL, video file, path to a directory with multiple files, a glob in a path to only match for certain files, a YouTube link or any other HTTP stream. The results are saved into the runs/detect directory.

Conclusion

In this short guide, we've taken a look at how you can perform object detection with YOLOv5 built using PyTorch.

Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

David LandupAuthor

Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.

Great passion for accessible education and promotion of reason, science, humanism, and progress.

© 2013-2025 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms