Object Detection with OpenCV-Python Using a Haar-Cascade Classifier

# Object Detection with OpenCV-Python Using a Haar-Cascade Classifier

### Introduction

Python has many applications in the field of Computer Vision, typically though Deep Learning. From performing OCR on documents to allowing robots to "see" - Computer Vision is an exciting and challenging field!

OpenCV is an open source, cross-platform framework, developed as a library oriented towards real-time Computer Vision. Being cross-platform, you can interface with it through C++, Python and Java, regardless of your operating system!

Computer Vision is a wide field, and there are many individual tasks/problems you could try tackling. A large one is Object Detection.

Note: Object Detection refers to the classification (labelling), position detection and outline detection (usually crude, such as a bounding box) for an object in an image, video or stream. These are three distinct tasks that could be topics in their own light.
Non-crude outline detection can also referred to as image segmentation, if you segment the image into each distinct object, though, image segmentation isn't limited to this application.

In this guide, you'll learn how to perform Object Detection in Python with OpenCV. We'll cover how to read, detect and display detected objects in an image, video file and in real-time, using the pretrained Haar-Cascade Classifier.

Let's get started with installing OpenCV!

### Object Detection Using OpenCV

If you haven't already installed OpenCV - installing the Python driver for it is easy with pip:

\$ pip install opencv-python


That's it! OpenCV and all of the dependencies it works with will be installed.

Note: If you're getting errors with the installation, try installing opencv-contrib-python instead.

Now that we have our library set up, our first step in object recognition is reading and displaying an image with OpenCV. You can use any image you like, in this guide we'll use face_image.jpg, obtained through thispersondoesnotexist.com.

The website generates "imagined people" using StyleGan.

The imread() method of the cv2 module (represents OpenCV) can be used to load in an image. Then - we can display it in a window:

import cv2

image_path = "generic-face.webp" # Put an absolute/relative path to your image
window_name = f"Detected Objects in {image_path}" # Set name of window that shows image
cv2.namedWindow(window_name, cv2.WINDOW_KEEPRATIO) # Create window and set title
cv2.imshow(window_name, original_image)  # Load image in window
cv2.resizeWindow(window_name, (400, 400))  # Resize window
cv2.waitKey(0)  # Keep window open indefinitely until any keypress
cv2.destroyAllWindows()  # Destroy all open OpenCV windows


Running this code will bring up a window like this:

Note: Sometimes your OS may not bring the Window to the front of the screen, making it seem like the code is running indefinitely. Be sure to cycle through your open windows if you don't see a window after running the code.

The imread() method loads the image, and the imshow() method is used to display the image on the window. The namedWindow() and resizeWindow() methods are used to create a custom window for the image in case of any discrepancies related to the size of the window and image.

The waitKey() method keeps a window open for a given amount of milliseconds, or until a key is pressed. A value of 0 means that OpenCV will keep the window open indefinitely until we press a key to close it. The destroyAllWindows() method tells OpenCV to close all windows that it opened.

With the basic setup, let's take the next steps to detect objects with OpenCV. We need to understand:

1. How to draw using OpenCV (to "localize"/outline objects when detected)
2. Haar Cascade Classifiers (how OpenCV distinguishes objects)

#### How to Draw Using OpenCV?

OpenCV can draw various shapes including rectangles, circles, and lines. We can even use a putText() method to put a label with the shape. Let's draw a simple rectangular shape in the image using the rectangle() method that takes positional arguments, color, and the thickness of the shape.

Add a new line to create a rectangle after reading the image and before naming the window:

# Reading the image
...

rectangle = cv2.rectangle(original_image,
(200, 100), # X-Y start
(900, 800), # X-Y end
(0, 255, 0),
2)
cv2.namedWindow(window_name, cv2.WINDOW_KEEPRATIO)

# Naming the window
...


Now, re-run your code to see a rectangle drawn over the image:

Here, we fixed the location of the rectangle with the cv2.rectangle() call. These locations are something to be inferred from the image, not guessed. That's where OpenCV can do the heavy lifting! Once it does - we can use this exact method to draw a rectangle around the detected object instead.

Drawing rectangles (or circles) like this is an important step in Object Detection, as it lets us anntoate (label) the objects we detect in a clear way.

Now that we are done with the drawing with OpenCV let's take a look at the concept of the Haar Cascade Classifier, how it works, and how it lets us identify objects in an image!

A Haar-Cascade Classifier is a machine learning classifier that works with Haar features. It's embodied in the cv2.CascadeClassifier class. Several XML files come pre-packaged with OpenCV, each of which holds the Haar features for different objects.

Haar features work in a similar fashion to feature maps of regular Convolutional Neural Networks (CNNs).

The features are calculated for many regions of an image, where the pixel intensities are summed, before a difference between these sums is computed. This downsampling of the image, leads to a simplified feature map that can be used to detect patterns in images.

Note: There are many pattern-recognition options out there, including extremely powerful networks which offer better accuracy and more flexibility than Haar-Cascade Classifiers. The main appeal of Haar features and the Haar-Cascade Classifier is how fast it is. It's really well suited for real-time object detection, where it sees most of its use.

When you install OpenCV, you get access to XML files with the Haar features for:

1. Eyes
2. Frontal Face
3. Full Body
4. Upper Body
5. Lower Body
6. Cats
7. Stop Signs

You can find their filenames in the official GitHub repository.

These cover a fairly wide spectrum of use! For instance, let's load in the classifier for eyes and try to detect eyes in the image we've loaded in, drawing a rectangle around the detected object:

import cv2

image_path = "face_image.jpg"
window_name = f"Detected Objects in {image_path}"

# Convert the image to grayscale for easier computation
image_grey = cv2.cvtColor(original_image, cv2.COLOR_RGB2GRAY)

# Draw rectangles on the detected objects
if len(detected_objects) != 0:
for (x, y, width, height) in detected_objects:
cv2.rectangle(original_image, (x, y),
(x + height, y + width),
(0, 255, 0), 2)

cv2.namedWindow(window_name, cv2.WINDOW_KEEPRATIO)
cv2.imshow(window_name, original_image)
cv2.resizeWindow(window_name, 400, 400)
cv2.waitKey(0)
cv2.destroyAllWindows()


## Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

Running this code should show something similar to this:

Here, we're greyscaling the image for the classifier to reduce the computational cost (more information means more computation). The colors don't matter too much for this detection, as the patterns that define eyes look pretty much the same whether they're colored or not.

The cascade_classifier is a CascadeClassifier instance, with loaded Haar features for eyes. We're dynamically locating the file through f-Strings!

The detectMultiScale() method is what does the actual detection and can detect the same object on an image, regardless of scale. It returns a list of the coordinates of the detected objects, in the form of rectangles (tuples). This makes it natural to outline them with, well, rectangles! For each tuple of (x, y, width, height) located in the detected_objects, we can draw a rectangle.

The minSize argument defines the minimum size of an object to be considered. If you set the size to be really small, the classifier will likely pick up a lot of fake-positives on the image. This usually depends on the resolution of images you're working with and the average object size. In practice, it boils down to reasonably testing sizes until it performs well.

Let's set the min size to (0, 0) to see what gets picked up:

In this image, there isn't other fluff to go with that can be misclassified as an eye, so we only really have two misclassifications. One in the eye itself, and one on the chin! Depending on the resolution of the image as well as the contents, setting a low size might end up highlighting a good portion of the image incorrectly.

The process of detecting objects for all other images is the same. You load in the correctly trained classifier, run detectMultiScale() and draw on top of the detected_objects.

It's worth noting that you can combine multiple classifiers! For instance, you could detect the frontal face, eyes and mouth of an individual separately and draw on them. Let's load in these classifiers and use the same image with different colors for each object type:

import cv2

image_path = "face_image.jpg"
window_name = f"Detected Objects in {image_path}"

# Convert the image to grayscale for easier computation
image_grey = cv2.cvtColor(original_image, cv2.COLOR_RGB2GRAY)

detected_eyes = eye_classifier.detectMultiScale(image_grey, minSize=(50, 50))
detected_face = face_classifier.detectMultiScale(image_grey, minSize=(50, 50))
detected_smile = smile_classifier.detectMultiScale(image_grey, minSize=(200, 200))

# Draw rectangles on eyes
if len(detected_eyes) != 0:
for (x, y, width, height) in detected_eyes:
cv2.rectangle(original_image, (x, y),
(x + height, y + width),
(0, 255, 0), 2)
# Draw rectangles on eyes
if len(detected_face) != 0:
for (x, y, width, height) in detected_face:
cv2.rectangle(original_image, (x, y),
(x + height, y + width),
(255, 0, 0), 2)

# Draw rectangles on eyes
if len(detected_smile) != 0:
for (x, y, width, height) in detected_smile:
cv2.rectangle(original_image, (x, y),
(x + height, y + width),
(0, 0, 255), 2)

cv2.namedWindow(window_name, cv2.WINDOW_KEEPRATIO)
cv2.imshow(window_name, original_image)
cv2.resizeWindow(window_name, 400, 400)
cv2.waitKey(0)
cv2.destroyAllWindows()


Here, we've loaded in three classifiers - one for smiles, one for eyes and one for faces. Each of them is run on the image and we draw rectangles around all detected objects, coloring the rectanges by the object's class:

The smile didn't get picked up that well - perhaps because the smile in the image is pretty neutral. It's not a wide smile, which could've thrown the classifier off.

### Object Detection in a Video Using OpenCV

With object detection in images out of the way - let's switch to videos. Videos are, just images in short succession anyway, so much the same process is applied. This time, though, they're applied on each frame.

To detect objects in an video, the primary step is to load the video file in the program. After loading the video file, we have to segregate the video data frame by frame and perform object detection using just like before.

For this guide, we'll be using a freely available video of a cat on a tree, saved as cat-on-tree.mp4 locally. The file is free to use, according to the creator of the video, so we're good to go!

Let's first load in the video and display it:

import cv2
import time

video_path = "cat-on-tree.mp4"
window_name = f"Detected Objects in {video_path}"
video = cv2.VideoCapture(video_path)

while True:
# read() returns a boolean alongside the image data if it was successful
# Quit if no image can be read from the video
if not ret:
break
# Resize window to fit screen, since it's vertical and long
cv2.namedWindow(window_name, cv2.WINDOW_NORMAL)
cv2.imshow(window_name, frame)
if cv2.waitKey(1) == 27:
break
# Sleep for 1/30 seconds to get 30 frames per second in the output
time.sleep(1/30)

video.release()
cv2.destroyAllWindows()


This code will read the video file and display its contents until the key Esc key is pressed. The VideoCapture() is used to read the video file from the path, and if we give the value 0 in the method, it will open the webcam and read the frames from the input. We'll do this later and for now deal with a local video file.

Now, we can apply a Haar-Cascade Classifier just like before on each image in the video:

import cv2
import time

video_path = "cat-on-tree.mp4"
window_name = f"Detected Objects in {video_path}"
video = cv2.VideoCapture(video_path)

while True:
# read() returns a boolean alongside the image data if it was successful
# Quit if no image can be read from the video
if not ret:
break
cv2.namedWindow(window_name, cv2.WINDOW_NORMAL)
# Greyscale image for classification
image = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Define classifier
# Detect objects
image, minSize=(50, 50))
# Draw rectangles
if len(detected_objects) != 0:
for (x, y, height, width) in detected_objects:
cv2.rectangle(
frame, (x, y), ((x + height), (y + width)), (0, 255, 0), 15)
#Show image
cv2.imshow(window_name, frame)

if cv2.waitKey(1) == 27:
break

video.release()
cv2.destroyAllWindows()


The classifier is trained on frontal images of cats, which means that it can't really detect profiles. For a good portion of the video, the cat is positioned from a profile, so until it moves its face towards the camera - there's bound to be plentiful misclassifications.

It just so happens that the blurred background has some features that the classifier picks up as possibly cat faces. Though, once it moves its head - it clearly locks onto its face.

This is what it classifies when the cat is looking to the side:

And how it correctly gets the cat when it's facing the camera:

We're, really, detecting these boxes in real-time in the video. We could also save these detected objects (again, just a list of numbers) and draw them "offline" for each frame and re-render the video to save on CPU power while the detection is going on.

### Object Detection in Real-Time Using OpenCV

Detecting objects in a real-time video is, again, nothing different from detecting from videos or from images. We've detected the cat face in real-time on the video, though, the video was local.

Let's get a video stream from a webcam! To take the input from the webcam, we have to make a slight change to the VideoCapture() call. As mentioned earlier, instead of giving it a file path, we give it a number (in most cases, 0 ,when you have one webcam):

import cv2

window_name = "Detected Objects in webcam"
video = cv2.VideoCapture(0)

while video.isOpened():
if not ret:
break
cv2.imshow(window_name, frame)
if cv2.waitKey(1) == 27:
break

video.release()
cv2.destroyAllWindows()


Note: On macOS, you may have to give the Terminal or program running the Terminal permissions to use the webcam before this works.

Now, to perform real-time object detection, we can follow the same approach that we did with the video file i.e. segregation of each frame and detecting objects frame by frame and displaying them in unison:

import cv2

window_name = "Detected Objects in webcam"
video = cv2.VideoCapture(0)

while video.isOpened():

if not ret:
break

image = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
image, minSize=(20, 20))

if len(detected_objects) != 0:
for (x, y, height, width) in detected_objects:
cv2.rectangle(
frame, (x, y), ((x + height), (y + width)), (0, 255, 0), 5)
cv2.imshow(window_name, frame)

if cv2.waitKey(1) == 27:
break

video.release()
cv2.destroyAllWindows()


When you run the above code, a window will popup streaming from your webcam and you'll see a rectangle highlighting your face! This code will most likely run faster than the previous one, as webcams generally don't have really high resolution, so these images are much less computationally expensive.

It helps if you're sitting in a well-lit room, or if you at least have a light source directed towards your face.

### Conclusion

In this guide, we've used OpenCV to perform Object Detection in Python, using the Haar-Cascade Classifier.

We've been introduced to the classifier, Haar features and performed object detection on images, videos in real-time as well as a video stream from a webcam!

The next step in object detection using OpenCV is to explore other classifiers like Yolo and mobilenetv3 because the accuracy you get from Haar Cascades is a lackluster compared to deep neural network alternatives.

Last Updated: January 4th, 2022

Get tutorials, guides, and dev jobs in your inbox.

## Prepping for an interview?

• Improve your skills by solving one coding problem every day
• Get the solutions the next morning via email
• Practice on actual problems asked by top companies, like:

## Make Clarity from Data - Quickly Learn Data Visualization with Python

Learn the landscape of Data Visualization tools in Python - work with Seaborn, Plotly, and Bokeh, and excel in Matplotlib!

From simple plot types to ridge plots, surface plots and spectrograms - understand your data and learn to draw conclusions from it.