Simple NLP in Python with TextBlob: Parts of Speech (PoS) Tagging

Simple NLP in Python with TextBlob: Parts of Speech (PoS) Tagging

Introduction

In the field of Natural Language Processing (NLP), one of the fundamental tasks is Parts of Speech (PoS) tagging. PoS tagging involves assigning grammatical categories, like nouns, verbs, adjectives, etc., to words in a sentence. This process plays an important role in many NLP applications, including text analysis, information retrieval, and machine translation.

In this article, we will explore how to perform PoS tagging using the TextBlob library in Python.

What is Part of Speech (PoS) Tagging?

Part of Speech tagging is the process of labeling words in a sentence with their respective grammatical categories. Each word is assigned a tag based on its syntactic role and function within the sentence. These PoS tags provide valuable information about the word's behavior and its relationship with other words in the sentence.

For example, consider the sentence: "The cat is sleeping." Here, "cat" is a noun, "is" is a verb, and "sleeping" is a verb.

Installin TextBlob

Before we dive into PoS tagging with TextBlob, let's make sure that we have the necessary libraries installed. To install TextBlob, you can use the following command:

$ pip install textblob

Additionally, we need to install the required language resources by running the following command:

$ python -m textblob.download_corpora

Preprocessing Steps

Before performing PoS tagging, we'll need to preprocess the text by removing any unnecessary elements and normalizing the words. Typical preprocessing steps include removing punctuation, converting text to lowercase, and handling contractions. Contraction is a form of normalization, like converting "can't" to "cannot".

Let's take a look at an example to understand these steps better:

from textblob import TextBlob
import re

def preprocess_text(text):
    # Remove punctuation
    text = re.sub(r'[^\w\s]', '', text)
    
    # Convert to lowercase
    text = text.lower()
    
    # Handle contractions (e.g., "can't" becomes "cannot")
    text = TextBlob(text).correct()
    
    return text

# Example usage
sentence = "I can't wait to see the movie!"
preprocessed_sentence = preprocess_text(sentence)
print(preprocessed_sentence)

In the above code, we define a preprocess_text function that takes a sentence as input and performs the preprocessing steps. The function removes punctuation using regular expressions, converts the text to lowercase, and corrects any contractions using TextBlob's correct method. This preprocessing results in the following output:

i can wait to see the movie

Meaning Mapping Table of PoS Tags

To understand the PoS tags assigned by TextBlob, it's helpful to have a "meaning mapping" table that provides a description of each tag along with the examples. Here is a descriptive version of the table:

Tag Description Examples
CC Coordinating conjunction and, or, but
CD Cardinal number 1, 2, 3
DT Determiner the, a, an
EX Existential there there
FW Foreign word bonjour, hola
IN Preposition/subordinating conjunction in, on, after
JJ Adjective beautiful, happy
JJR Adjective, comparative bigger, stronger
JJS Adjective, superlative biggest, strongest
LS List item marker 1, 2, 3
MD Modal can, could, may
NN Noun, singular or mass cat, dog, happiness
NNS Noun, plural cats, dogs, books
NNP Proper noun, singular John, London, Google
NNPS Proper noun, plural Smiths, Apples, Microsoft
PDT Predeterminer all, both, half
POS Possessive ending 's, '
PRP Personal pronoun I, you, he
PRP$ Possessive pronoun my, your, his
RB Adverb quickly, very
RBR Adverb, comparative faster, stronger
RBS Adverb, superlative fastest, strongest
RP Particle up, off, down
SYM Symbol $, %, +
TO to to
UH Interjection oh, wow, hey
VB Verb, base form eat, run, play
VBD Verb, past tense ate, ran, played
VBG Verb, gerund or present participle eating, running, playing
VBN Verb, past participle eaten, run, played
VBP Verb, non-3rd person singular present eat, run, play
VBZ Verb, 3rd person singular present eats, runs, plays
WDT Wh-determiner which, what
WP Wh-pronoun who, what, whom
WP$ Possessive wh-pronoun whose
WRB Wh-adverb where, when, how

Basic Implementation (Extracting All PoS)

Now, let's explore the basic implementation of PoS tagging using TextBlob. The following code snippet demonstrates how to extract and print all the PoS tags from a given sentence:

from textblob import TextBlob

sentence = "The cat is sleeping."
blob = TextBlob(sentence)

for word, tag in blob.tags:
    print(word, "-", tag)

In the code above, we create a TextBlob object by passing the sentence to it. Then, we iterate over each word and its corresponding PoS tag using the tags property of the TextBlob object. We print the word and its tag on separate lines. The output would be the following:

The - DT
cat - NN
is - VBZ
sleeping - VBG

Advanced Implementation (Selective Extraction of PoS)

In some cases, we may only be interested in extracting specific PoS tags. Using the data returned by TextBlob, we can perform selective extraction by specifying the desired tags and filtering on those.

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

Here's an example:

from textblob import TextBlob

sentence = "The cat is sleeping."
blob = TextBlob(sentence)

desired_tags = ["NN", "VB"]
selected_words = [word for word, tag in blob.tags if tag in desired_tags]

print(selected_words)

In the above code, we define a list of desired PoS tags (desired_tags). We then create a new list (selected_words) using a list comprehension that filters the words based on their tags. Finally, we print the selected words:

['cat']

Drawbacks and Improvements

While TextBlob provides a convenient way to perform PoS tagging, it is important to note that it may not always produce perfect results. The accuracy of PoS tagging heavily relies on the quality of the underlying model and the context of the text being analyzed. In cases where high precision is required, more advanced techniques and models may be necessary.

To improve the accuracy of PoS tagging, you can consider using other libraries, such as spaCy or NLTK, which offer more sophisticated PoS tagging models. Additionally, fine-tuning or training custom models on domain-specific data can help improve the results for specific tasks.

Conclusion

PoS tagging is a fundamental task in Natural Language Processing, and TextBlob provides a simple and accessible way to perform PoS tagging in Python. In this article, we explored the concept of PoS tagging, learned how to install the necessary libraries, preprocess the text, and implemented basic and advanced PoS tagging using TextBlob.

We also discussed the meaning mapping table of PoS tags, highlighted the drawbacks and potential improvements, and provided real-life examples. Armed with this knowledge, you can now leverage PoS tagging in your NLP projects to gain valuable insights from text data.

Last Updated: June 12th, 2023
Was this article helpful?
Project

Image Captioning with CNNs and Transformers with Keras

# python# artificial intelligence# machine learning# tensorflow

In 1974, Ray Kurzweil's company developed the "Kurzweil Reading Machine" - an omni-font OCR machine used to read text out loud. This machine...

David Landup
David Landup
Details
Course

Data Visualization in Python with Matplotlib and Pandas

# python# pandas# matplotlib

Data Visualization in Python with Matplotlib and Pandas is a course designed to take absolute beginners to Pandas and Matplotlib, with basic Python knowledge, and...

David Landup
David Landup
Details

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms