HomeArticles

Simple NLP in Python with TextBlob: Parts of Speech (PoS) Tagging

Introduction

In the field of Natural Language Processing (NLP), one of the fundamental tasks is Parts of Speech (PoS) tagging. PoS tagging involves assigning grammatical categories, like nouns, verbs, adjectives, etc., to words in a sentence. This process plays an important role in many NLP applications, including text analysis, information retrieval, and machine translation.

In this article, we will explore how to perform PoS tagging using the TextBlob library in Python.

What is Part of Speech (PoS) Tagging?

Part of Speech tagging is the process of labeling words in a sentence with their respective grammatical categories. Each word is assigned a tag based on its syntactic role and function within the sentence. These PoS tags provide valuable information about the word's behavior and its relationship with other words in the sentence.

For example, consider the sentence: "The cat is sleeping." Here, "cat" is a noun, "is" is a verb, and "sleeping" is a verb.

Installin TextBlob

Before we dive into PoS tagging with TextBlob, let's make sure that we have the necessary libraries installed. To install TextBlob, you can use the following command:

$ pip install textblob

Additionally, we need to install the required language resources by running the following command:

$ python -m textblob.download_corpora

Preprocessing Steps

Before performing PoS tagging, we'll need to preprocess the text by removing any unnecessary elements and normalizing the words. Typical preprocessing steps include removing punctuation, converting text to lowercase, and handling contractions. Contraction is a form of normalization, like converting "can't" to "cannot".

Let's take a look at an example to understand these steps better:

from textblob import TextBlob
import re

def preprocess_text(text):
    # Remove punctuation
    text = re.sub(r'[^\w\s]', '', text)
    
    # Convert to lowercase
    text = text.lower()
    
    # Handle contractions (e.g., "can't" becomes "cannot")
    text = TextBlob(text).correct()
    
    return text

# Example usage
sentence = "I can't wait to see the movie!"
preprocessed_sentence = preprocess_text(sentence)
print(preprocessed_sentence)

In the above code, we define a preprocess_text function that takes a sentence as input and performs the preprocessing steps. The function removes punctuation using regular expressions, converts the text to lowercase, and corrects any contractions using TextBlob's correct method. This preprocessing results in the following output:

i can wait to see the movie

Meaning Mapping Table of PoS Tags

To understand the PoS tags assigned by TextBlob, it's helpful to have a "meaning mapping" table that provides a description of each tag along with the examples. Here is a descriptive version of the table:

Tag	Description	Examples
CC	Coordinating conjunction	and, or, but
CD	Cardinal number	1, 2, 3
DT	Determiner	the, a, an
EX	Existential there	there
FW	Foreign word	bonjour, hola
IN	Preposition/subordinating conjunction	in, on, after
JJ	Adjective	beautiful, happy
JJR	Adjective, comparative	bigger, stronger
JJS	Adjective, superlative	biggest, strongest
LS	List item marker	1, 2, 3
MD	Modal	can, could, may
NN	Noun, singular or mass	cat, dog, happiness
NNS	Noun, plural	cats, dogs, books
NNP	Proper noun, singular	John, London, Google
NNPS	Proper noun, plural	Smiths, Apples, Microsoft
PDT	Predeterminer	all, both, half
POS	Possessive ending	's, '
PRP	Personal pronoun	I, you, he
PRP$	Possessive pronoun	my, your, his
RB	Adverb	quickly, very
RBR	Adverb, comparative	faster, stronger
RBS	Adverb, superlative	fastest, strongest
RP	Particle	up, off, down
SYM	Symbol	$, %, +
TO	to	to
UH	Interjection	oh, wow, hey
VB	Verb, base form	eat, run, play
VBD	Verb, past tense	ate, ran, played
VBG	Verb, gerund or present participle	eating, running, playing
VBN	Verb, past participle	eaten, run, played
VBP	Verb, non-3rd person singular present	eat, run, play
VBZ	Verb, 3rd person singular present	eats, runs, plays
WDT	Wh-determiner	which, what
WP	Wh-pronoun	who, what, whom
WP$	Possessive wh-pronoun	whose
WRB	Wh-adverb	where, when, how

Basic Implementation (Extracting All PoS)

Now, let's explore the basic implementation of PoS tagging using TextBlob. The following code snippet demonstrates how to extract and print all the PoS tags from a given sentence:

from textblob import TextBlob

sentence = "The cat is sleeping."
blob = TextBlob(sentence)

for word, tag in blob.tags:
    print(word, "-", tag)

In the code above, we create a TextBlob object by passing the sentence to it. Then, we iterate over each word and its corresponding PoS tag using the tags property of the TextBlob object. We print the word and its tag on separate lines. The output would be the following:

The - DT
cat - NN
is - VBZ
sleeping - VBG

Advanced Implementation (Selective Extraction of PoS)

In some cases, we may only be interested in extracting specific PoS tags. Using the data returned by TextBlob, we can perform selective extraction by specifying the desired tags and filtering on those.

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

Here's an example:

from textblob import TextBlob

sentence = "The cat is sleeping."
blob = TextBlob(sentence)

desired_tags = ["NN", "VB"]
selected_words = [word for word, tag in blob.tags if tag in desired_tags]

print(selected_words)

In the above code, we define a list of desired PoS tags (desired_tags). We then create a new list (selected_words) using a list comprehension that filters the words based on their tags. Finally, we print the selected words:

['cat']

Drawbacks and Improvements

While TextBlob provides a convenient way to perform PoS tagging, it is important to note that it may not always produce perfect results. The accuracy of PoS tagging heavily relies on the quality of the underlying model and the context of the text being analyzed. In cases where high precision is required, more advanced techniques and models may be necessary.

To improve the accuracy of PoS tagging, you can consider using other libraries, such as spaCy or NLTK, which offer more sophisticated PoS tagging models. Additionally, fine-tuning or training custom models on domain-specific data can help improve the results for specific tasks.

Conclusion

PoS tagging is a fundamental task in Natural Language Processing, and TextBlob provides a simple and accessible way to perform PoS tagging in Python. In this article, we explored the concept of PoS tagging, learned how to install the necessary libraries, preprocess the text, and implemented basic and advanced PoS tagging using TextBlob.

We also discussed the meaning mapping table of PoS tags, highlighted the drawbacks and potential improvements, and provided real-life examples. Armed with this knowledge, you can now leverage PoS tagging in your NLP projects to gain valuable insights from text data.

# python # nlp # textblob

Last Updated: June 12th, 2023

Was this article helpful?