Object Detection Inference in Python with YOLOv5 and PyTorch

Object Detection Inference in Python with YOLOv5 and PyTorch

Introduction

Object detection is a large field in computer vision, and one of the more important applications of computer vision "in the wild". On one end, it can be used to build autonomous systems that navigate agents through environments - be it robots performing tasks or self-driving cars, but this requires intersection with other fields. However, anomaly detection (such as defective products on a line), locating objects within images, facial detection and various other applications of object detection can be done without intersecting other fields.

Advice This short guide is based on a small part of a much larger lesson on object detection belonging to our "Practical Deep Learning for Computer Vision with Python" course.

Object detection isn't as standardized as image classification, mainly because most of the new developments are typically done by individual researchers, maintainers and developers, rather than large libraries and frameworks. It's difficult to package the necessary utility scripts in a framework like TensorFlow or PyTorch and maintain the API guidelines that guided the development so far.

This makes object detection somewhat more complex, typically more verbose (but not always), and less approachable than image classification. One of the major benefits of being in an ecosystem is that it provides you with a way to not search for useful information on good practices, tools and approaches to use. With object detection - most people have to do way more research on the landscape of the field to get a good grip.

Fortunately for the masses - Ultralytics has developed a simple, very powerful and beautiful object detection API around their YOLOv5 implementation.

In this short guide, we'll be performing Object Detection in Python, with YOLOv5 built by Ultralytics in PyTorch, using a set of pre-trained weights trained on MS COCO.

YOLOv5

YOLO (You Only Look Once) is a methodology, as well as family of models built for object detection. Since the inception in 2015, YOLOv1, YOLOv2 (YOLO9000) and YOLOv3 have been proposed by the same author(s) - and the deep learning community continued with open-sourced advancements in the continuing years.

Ultralytics' YOLOv5 is the first large-scale implementation of YOLO in PyTorch, which made it more accessible than ever before, but the main reason YOLOv5 has gained such a foothold is also the beautifully simple and powerful API built around it. The project abstracts away the unnecessary details, while allowing customizability, practically all usable export formats, and employs amazing practices that make the entire project both efficient and as optimal as it can be. Truly, it's an example of the beauty of open source software implementation, and how it powers the world we live in.

The project provides pre-trained weights on MS COCO, a staple dataset on objects in context, which can be used to both benchmark and build general object detection systems - but most importantly, can be used to transfer general knowledge of objects in context to custom datasets.

Advice: If you'd like to learn more about the YOLO method, as well as competitive methods such as SSDs (Single-Shot Detectors) and the two-stage detector camp including Faster R-CNN and retina net - our course lesson on "Object Detection and Segmentation - R-CNNs, RetinaNet, SSD, YOLO"!

Object Detection with YOLOv5

Before moving forward, make sure you have torch and torchvision installed:

! python -m pip install torch torchvision

YOLOv5's got detailed, no-nonsense documentation and a beautifully simple API, as shown on the repo itself, and in the following example:

import torch
# Loading in yolov5s - you can switch to larger models such as yolov5m or yolov5l, or smaller such as yolov5n
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
img = 'https://i.ytimg.com/vi/q71MCWAEfL8/maxresdefault.jpg'  # or file, Path, PIL, OpenCV, numpy, list
results = model(img)
fig, ax = plt.subplots(figsize=(16, 12))
ax.imshow(results.render()[0])
plt.show()

The second argument of the hub.load() method specifies the weights we'd like to use. By choosing anywhere between yolov5n to yolov5l6 - we're loading in the MS COCO pre-trained weights. For custom models:

model = torch.hub.load('ultralytics/yolov5', 'custom', path='path_to_weights.pt')

In any case - once you pass the input through the model, the returned object includes helpful methods to interpret the results, and we've chosen to render() them, which returns a NumPy array that we can chuck into an imshow() call. This results in a nicely formatted:

Saving Results as Files

You can save the results of the inference as a file, using the results.save() method:

results.save(save_dir='results')

This will create a new directory if it isn't already present, and save the same image we've just plotted as a file.

Cropping Out Objects

You can also decide to crop out the detected objects as individual files. In our case, for every label detected, a number of images can be extracted. This is easily achieved via the results.crop() method, which rcreates a runs/detect/ directory, with expN/crops (where N increases for each run), in which a directory with cropped images is made for each label:

results.crop()
Saved 1 image to runs/detect/exp2
Saved results to runs/detect/exp2

[{'box': [tensor(295.09409),
   tensor(277.03699),
   tensor(514.16113),
   tensor(494.83691)],
  'conf': tensor(0.25112),
  'cls': tensor(0.),
  'label': 'person 0.25',
  'im': array([[[167, 186, 165],
          [174, 184, 167],
          [173, 184, 164],

You can also verify the output file structure with:

! ls runs/detect/exp2/crops
# crops  maxresdefault.jpg

! ls runs/detect/exp2/crops
# backpack   bus   car   handbag   person  'traffic light'   umbrella

Object Counting

By default, when you perform detection or print the results object - you'll gget the number of images that inference was performed on for that results object (YOLOv5 works with batches of images as well), its resolution and the count of each label detected:

print(results)

This results in:

image 1/1: 720x1280 14 persons, 1 car, 3 buss, 6 traffic lights, 1 backpack, 1 umbrella, 1 handbag
Speed: 35.0ms pre-process, 256.2ms inference, 0.7ms NMS per image at shape (1, 3, 384, 640)

Inference with Scripts

Alternatively, you can run the detection script, detect.py, by cloning the YOLOv5 repository:

$ git clone https://github.com/ultralytics/yolov5 
$ cd yolov5
$ pip install -r requirements.txt

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

And then running:

$ python detect.py --source img.jpg

Alternatively, you can provide a URL, video file, path to a directory with multiple files, a glob in a path to only match for certain files, a YouTube link or any other HTTP stream. The results are saved into the runs/detect directory.

Going Further - Practical Deep Learning for Computer Vision

Your inquisitive nature makes you want to go further? We recommend checking out our Course: "Practical Deep Learning for Computer Vision with Python".

Another Computer Vision Course?

We won't be doing classification of MNIST digits or MNIST fashion. They served their part a long time ago. Too many learning resources are focusing on basic datasets and basic architectures before letting advanced black-box architectures shoulder the burden of performance.

We want to focus on demystification, practicality, understanding, intuition and real projects. Want to learn how you can make a difference? We'll take you on a ride from the way our brains process images to writing a research-grade deep learning classifier for breast cancer to deep learning networks that "hallucinate", teaching you the principles and theory through practical work, equipping you with the know-how and tools to become an expert at applying deep learning to solve computer vision.

What's inside?

  • The first principles of vision and how computers can be taught to "see"
  • Different tasks and applications of computer vision
  • The tools of the trade that will make your work easier
  • Finding, creating and utilizing datasets for computer vision
  • The theory and application of Convolutional Neural Networks
  • Handling domain shift, co-occurrence, and other biases in datasets
  • Transfer Learning and utilizing others' training time and computational resources for your benefit
  • Building and training a state-of-the-art breast cancer classifier
  • How to apply a healthy dose of skepticism to mainstream ideas and understand the implications of widely adopted techniques
  • Visualizing a ConvNet's "concept space" using t-SNE and PCA
  • Case studies of how companies use computer vision techniques to achieve better results
  • Proper model evaluation, latent space visualization and identifying the model's attention
  • Performing domain research, processing your own datasets and establishing model tests
  • Cutting-edge architectures, the progression of ideas, what makes them unique and how to implement them
  • KerasCV - a WIP library for creating state of the art pipelines and models
  • How to parse and read papers and implement them yourself
  • Selecting models depending on your application
  • Creating an end-to-end machine learning pipeline
  • Landscape and intuition on object detection with Faster R-CNNs, RetinaNets, SSDs and YOLO
  • Instance and semantic segmentation
  • Real-Time Object Recognition with YOLOv5
  • Training YOLOv5 Object Detectors
  • Working with Transformers using KerasNLP (industry-strength WIP library)
  • Integrating Transformers with ConvNets to generate captions of images
  • DeepDream

Conclusion

In this short guide, we've taken a look at how you can perform object detection with YOLOv5 built using PyTorch.

Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

David LandupAuthor

Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.

Great passion for accessible education and promotion of reason, science, humanism, and progress.

Project

Real-Time Road Sign Detection with YOLOv5

# python# machine learning# computer vision# pytorch

If you drive - there's a chance you enjoy cruising down the road. A responsible driver pays attention to the road signs, and adjusts their...

David Landup
David Landup
Details
Course

Practical Deep Learning for Computer Vision with Python

# python# machine learning# tensorflow# computer vision

DeepDream with TensorFlow/Keras Keypoint Detection with Detectron2 Image Captioning with KerasNLP Transformers and ConvNets Semantic Segmentation with DeepLabV3+ in Keras Real-Time Object Detection from...

David Landup
Jovana Ninkovic
Details

© 2013-2022 Stack Abuse. All rights reserved.

DisclosurePrivacyTerms