OpenCV Adaptive Thresholding in Python with cv2.adaptiveThreshold()

Introduction

Thresholding is a simple and efficient technique to perform basic segmentation in an image, and to binarize it (turn it into a binary image) where pixels are either 0 or 1 (or 255 if you're using integers to represent them).

Typically, you can use thresholding to perform simple background-foreground segmentation in an image, and it boils down to variants on a simple technique for each pixel:

if pixel_value > threshold:
    pixel_value = MAX
else:
    pixel_value = 0

Advice: This essential process is known as Binary Thresholding. There are various ways you can tweak this general idea, and we've covered them in the previous guide - "OpenCV Thresholding in Python with cv2.threshold()".

Simple thresholding has glaring issues and requires fairly pristine input, which makes it not-so-practical for many use cases. The main offender is a global threshold which is applied to the entire image, whereas images are rarely uniform enough for blanket thresholds to work, unless they're artificial.

A global threshold would work well on separating characters in a black and white book, on scanned pages. A global threshold will very likely fail on a phone picture of that same page, since the lighting conditions may be variable between parts of the page, making a global cut-off point too sensitive to real data.

To combat this - we can employ local thresholds, using a technique known as adaptive thresholding. Instead of treating all parts of the image with the same rule, we can change the threshold for each local area with the one that seems fitting for it. This makes thresholding partly invariant to changes in lighting, noise and other factors. While much more useful than global thresholding, thresholding itself is a limited, rigid technique, and is best applied for help with image preprocessing (especially when it comes to identifying images to discard), rather than segmentation.

For more delicate applications that require context, you're better off employing more advanced techniques, including deep learning, which has been driving the recent advancements in computer vision.

Advice: If you'd like to learn more about multi-class semantic segmentation with Deep Learning - you can enroll our DeepLabV3+ Semantic Segmentation with Keras!

Adaptive Thresholding with OpenCV

Let's load in an image with variable lighting conditions, where one part of the image is in more focus than another, with the picture being taken from an angle. A picture I took of Harold McGee's "On Food and Cooking" will serve great!

img = cv2.imread('book.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img)

Now, using regular thresholding, we can try to separate out the letters from the background, since there's a clear color difference between them. All paper-color will be treated as the background. Since we don't really know what the threshold should be - let's apply Otsu's method to find a good value, anticipating that the image is somewhat bi-modal (dominated by two colors mostly):

img = cv2.imread('book.jpg')
# Otu's method requires grayscale images and blurring helps
# both accentuate bi-modal colors, but also removes some noise
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (7, 7), 0)

ret, mask = cv2.threshold(blurred, 0, 255, cv2.THRESH_OTSU)
print(f'Threshold: {ret}')

fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ax[0].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
ax[1].imshow(cv2.cvtColor(mask, cv2.COLOR_BGR2RGB))

Let's take a look at the result:

Ouch. The left part of the text is mainly faded, the shadow around the gutter totally ate a portion of the image, and the text is too saturated! This is an image "in the wild", and blanket rules such as global thresholding don't work well. What should the threshold be? It depends on the part of the image!

The cv2.adaptiveThreshold() method allows us to do exactly this:

cv2.adaptiveThreshold(img, 
                      max_value, 
                      adaptive_method, 
                      threshold_method, 
                      block_size, 
                      C)

The adaptive_method can be a cv2.ADAPTIVE_THRESH_MEAN_C or cv2.ADAPTIVE_THRESH_GAUSSIAN_C, where C is the last argument you set. Both of these methods calculate the threshold according to the neighbors of the pixel in question, where the block_size dictates the number of neighbors to be considered (the area of the neighborhood).

ADAPTIVE_THRESH_MEAN_C takes the mean of the neighbors and deducts C, while ADAPTIVE_THRESH_GAUSSIAN_C takes the Gaussian-weighted sum of the neighbors and deducts C.

It also allows you to set a binarization strategy, but is limited to THRESH_BINARY and THRESH_BINARY_INV, and changing between them will effectively switch what's "background" and what's "foreground".

The method just returns the mask for the image - not the return code and the mask. Let's try segmenting the characters in the same image as before, using adaptive thresholding:

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

# Read and prepare image
img = cv2.imread('book.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (7, 7), 0)

# Apply adaptive thresholding
mask = cv2.adaptiveThreshold(blurred, 
                              255, 
                              cv2.ADAPTIVE_THRESH_MEAN_C, 
                              cv2.THRESH_BINARY, 
                              31, 
                              10)

# Plot results
fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ax[0].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
ax[1].imshow(cv2.cvtColor(mask, cv2.COLOR_BGR2RGB))
plt.tight_layout()

This results in a much cleaner image:

Note: The block_size argument must be an uneven number.

In much the same way, we can apply Gaussian thresholding:

mask = cv2.adaptiveThreshold(blurred, 
                              255, 
                              cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 
                              cv2.THRESH_BINARY, 
                              31, 
                              10)

Which also produces a pretty satisfactory image in the end:

Both the block size (neighbor area) and C are hyperparameters to tune here. Try out different values and choose the one that works best on your image. In general, Gaussian thresholding is less sensitive to noise and will produce a bit bleaker, cleaner images, but this varies and depends on the input.

Limitations of Adaptive Thresholding

With adaptive thresholding, we were able to avoid the overarching limitation of thresholding, but it's still relatively rigid and doesn't work great for colorful inputs. For example, if we load in an image of scissors and a small kit with differing colors, even adaptive thresholding will have issues truly segmenting it right, with certain dark features being outlined, but without entire objects being considered:

If we tweak the block size and C, we can make it consider larger patches to be part of the same object, but then run into issues with making the neighbor sizes too global, falling back to the same overarching issues with global thresholding:

Conclusion

In recent years, binary segmentation (like what we did here) and multi-label segmentation (where you can have an arbitrary number of classes encoded) has been successfully modeled with deep learning networks, which are much more powerful and flexible. In addition, they can encode global and local context into the images they're segmenting. The downside is - you need data to train them, as well as time and expertise.

For on-the-fly, simple thresholding, you can use OpenCV, and battle some of the limitations using adaptive thresholding rather than global thresholding strategies. For accurate, production-level segmentation, you'll want to use neural networks.

Last Updated: November 16th, 2023
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

David LandupAuthor

Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.

Great passion for accessible education and promotion of reason, science, humanism, and progress.

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms