Introduction
Thresholding is a simple and efficient technique to perform basic segmentation in an image, and to binarize it (turn it into a binary image) where pixels are either 0
or 1
(or 255
if you're using integers to represent them).
Typically, you can use thresholding to perform simple background-foreground segmentation in an image, and it boils down to variants on a simple technique for each pixel:
if pixel_value > threshold:
pixel_value = MAX
else:
pixel_value = 0
Advice: This essential process is known as Binary Thresholding. There are various ways you can tweak this general idea, and we've covered them in the previous guide - "OpenCV Thresholding in Python with cv2.threshold()".
Simple thresholding has glaring issues and requires fairly pristine input, which makes it not-so-practical for many use cases. The main offender is a global threshold which is applied to the entire image, whereas images are rarely uniform enough for blanket thresholds to work, unless they're artificial.
A global threshold would work well on separating characters in a black and white book, on scanned pages. A global threshold will very likely fail on a phone picture of that same page, since the lighting conditions may be variable between parts of the page, making a global cut-off point too sensitive to real data.
To combat this - we can employ local thresholds, using a technique known as adaptive thresholding. Instead of treating all parts of the image with the same rule, we can change the threshold for each local area with the one that seems fitting for it. This makes thresholding partly invariant to changes in lighting, noise and other factors. While much more useful than global thresholding, thresholding itself is a limited, rigid technique, and is best applied for help with image preprocessing (especially when it comes to identifying images to discard), rather than segmentation.
For more delicate applications that require context, you're better off employing more advanced techniques, including deep learning, which has been driving the recent advancements in computer vision.
Advice: If you'd like to learn more about multi-class semantic segmentation with Deep Learning - you can enroll our DeepLabV3+ Semantic Segmentation with Keras!
Adaptive Thresholding with OpenCV
Let's load in an image with variable lighting conditions, where one part of the image is in more focus than another, with the picture being taken from an angle. A picture I took of Harold McGee's "On Food and Cooking" will serve great!
img = cv2.imread('book.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img)
Now, using regular thresholding, we can try to separate out the letters from the background, since there's a clear color difference between them. All paper-color will be treated as the background. Since we don't really know what the threshold should be - let's apply Otsu's method to find a good value, anticipating that the image is somewhat bi-modal (dominated by two colors mostly):
img = cv2.imread('book.jpg')
# Otu's method requires grayscale images and blurring helps
# both accentuate bi-modal colors, but also removes some noise
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (7, 7), 0)
ret, mask = cv2.threshold(blurred, 0, 255, cv2.THRESH_OTSU)
print(f'Threshold: {ret}')
fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ax[0].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
ax[1].imshow(cv2.cvtColor(mask, cv2.COLOR_BGR2RGB))
Let's take a look at the result:
Ouch. The left part of the text is mainly faded, the shadow around the gutter totally ate a portion of the image, and the text is too saturated! This is an image "in the wild", and blanket rules such as global thresholding don't work well. What should the threshold be? It depends on the part of the image!
The cv2.adaptiveThreshold()
method allows us to do exactly this:
cv2.adaptiveThreshold(img,
max_value,
adaptive_method,
threshold_method,
block_size,
C)
The adaptive_method
can be a cv2.ADAPTIVE_THRESH_MEAN_C
or cv2.ADAPTIVE_THRESH_GAUSSIAN_C
, where C
is the last argument you set. Both of these methods calculate the threshold according to the neighbors of the pixel in question, where the block_size
dictates the number of neighbors to be considered (the area of the neighborhood).
ADAPTIVE_THRESH_MEAN_C
takes the mean of the neighbors and deductsC
, whileADAPTIVE_THRESH_GAUSSIAN_C
takes the Gaussian-weighted sum of the neighbors and deductsC
.
It also allows you to set a binarization strategy, but is limited to THRESH_BINARY
and THRESH_BINARY_INV
, and changing between them will effectively switch what's "background" and what's "foreground".
The method just returns the mask for the image - not the return code and the mask. Let's try segmenting the characters in the same image as before, using adaptive thresholding:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
# Read and prepare image
img = cv2.imread('book.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (7, 7), 0)
# Apply adaptive thresholding
mask = cv2.adaptiveThreshold(blurred,
255,
cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY,
31,
10)
# Plot results
fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ax[0].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
ax[1].imshow(cv2.cvtColor(mask, cv2.COLOR_BGR2RGB))
plt.tight_layout()
This results in a much cleaner image:
Note: The block_size
argument must be an uneven number.
In much the same way, we can apply Gaussian thresholding:
mask = cv2.adaptiveThreshold(blurred,
255,
cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,
31,
10)
Which also produces a pretty satisfactory image in the end:
Both the block size (neighbor area) and C
are hyperparameters to tune here. Try out different values and choose the one that works best on your image. In general, Gaussian thresholding is less sensitive to noise and will produce a bit bleaker, cleaner images, but this varies and depends on the input.
Limitations of Adaptive Thresholding
With adaptive thresholding, we were able to avoid the overarching limitation of thresholding, but it's still relatively rigid and doesn't work great for colorful inputs. For example, if we load in an image of scissors and a small kit with differing colors, even adaptive thresholding will have issues truly segmenting it right, with certain dark features being outlined, but without entire objects being considered:
If we tweak the block size and C
, we can make it consider larger patches to be part of the same object, but then run into issues with making the neighbor sizes too global, falling back to the same overarching issues with global thresholding:
Conclusion
In recent years, binary segmentation (like what we did here) and multi-label segmentation (where you can have an arbitrary number of classes encoded) has been successfully modeled with deep learning networks, which are much more powerful and flexible. In addition, they can encode global and local context into the images they're segmenting. The downside is - you need data to train them, as well as time and expertise.
For on-the-fly, simple thresholding, you can use OpenCV, and battle some of the limitations using adaptive thresholding rather than global thresholding strategies. For accurate, production-level segmentation, you'll want to use neural networks.