Count Number of Word Occurrences in List Python

Introduction

Counting the word frequency in a list element in Python is a relatively common task - especially when creating distribution data for histograms.

Say we have a list ['b', 'b', 'a'] - we have two occurrences of "b" and one of "a". This guide will show you three different ways to count the number of word occurrences in a Python list:

  • Using Pandas and NumPy
  • Using the count() Function
  • Using the Collection Module's Counter
  • Using a Loop and a Counter Variable

In practice, you'll use Pandas/NumPy, the count() function or a Counter as they're pretty convenient to use.

Using Pandas and NumPy

The shortest and easiest way to get value counts in an easily-manipulable format (DataFrame) is via NumPy and Pandas. We can wrap the list into a NumPy array, and then call the value_counts() method of the pd instance (which is also available for all DataFrame instances):

import numpy as np
import pandas as pd

words = ['hello', 'goodbye', 'howdy', 'hello', 'hello', 'hi', 'bye']

pd.value_counts(np.array(words))

This results in a DataFrame that contains:

hello      3
goodbye    1
bye        1
howdy      1
hi         1
dtype: int64

You can access its values field to get the counts themselves, or index to get the words themselves:

df = pd.value_counts(np.array(words))

print('Index:', df.index)
print('Values:', df.values)

This results in:

Index: Index(['hello', 'goodbye', 'bye', 'howdy', 'hi'], dtype='object')

Values: [3 1 1 1 1]

Using the count() Function

The "standard" way (no external libraries) to get the count of word occurrences in a list is by using the list object's count() function.

The count() method is a built-in function that takes an element as its only argument and returns the number of times that element appears in the list.

The complexity of the count() function is O(n), where n is the number of factors present in the list.

The code below uses count() to get the number of occurrences for a word in a list:

words = ['hello', 'goodbye', 'howdy', 'hello', 'hello', 'hi', 'bye']

print(f'"hello" appears {words.count("hello")} time(s)')
print(f'"howdy" appears {words.count("howdy")} time(s)')

This should give us the same output as before using loops:

"hello" appears 3 time(s)
"howdy" appears 1 time(s)

The count() method offers us an easy way to get the number of word occurrences in a list for each individual word.

Using the Collection Module's Counter

The Counter class instance can be used to, well, count instances of other objects. By passing a list into its constructor, we instantiate a Counter which returns a dictionary of all the elements and their occurrences in a list.

From there, to get a single word's occurrence you can just use the word as a key for the dictionary:

from collections import Counter

words = ['hello', 'goodbye', 'howdy', 'hello', 'hello', 'hi', 'bye']

word_counts = Counter(words)

print(f'"hello" appears {word_counts["hello"]} time(s)')
print(f'"howdy" appears {word_counts["howdy"]} time(s)')

This results in:

"hello" appears 3 time(s)
"howdy" appears 1 time(s)

Using a Loop and a Counter Variable

Ultimately, a brute force approach that loops through every word in the list, incrementing a counter by one when the word is found, and returning the total word count will work!

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

Of course, this method gets more inefficient as the list size grows, it's just conceptually easy to understand and implement.

The code below uses this approach in the count_occurrence() method:

def count_occurrence(words, word_to_count):
    count = 0
    for word in words:
        if word == word_to_count:
          # update counter variable
            count = count + 1
    return count


words = ['hello', 'goodbye', 'howdy', 'hello', 'hello', 'hi', 'bye']
print(f'"hello" appears {count_occurrence(words, "hello")} time(s)')
print(f'"howdy" appears {count_occurrence(words, "howdy")} time(s)')

If you run this code you should see this output:

"hello" appears 3 time(s)
"howdy" appears 1 time(s)

Nice and easy!

Most Efficient Solution?

Naturally - you'll be searching for the most efficient solution if you're dealing with a large corpora of words. Let's benchmark all of these to see how they perform.

The task can be broken down into finding occurrences for all words or a single word, and we'll be doing benchmarks for both, starting with all words:

import numpy as np
import pandas as pd
import collections

def pdNumpy(words):
    def _pdNumpy():
        return pd.value_counts(np.array(words))
    return _pdNumpy

def countFunction(words):
    def _countFunction():
        counts = []
        for word in words:
            counts.append(words.count(word))
        return counts
    return _countFunction

def counterObject(words):
    def _counterObject():
        return collections.Counter(words)
    return _counterObject
    
import timeit

words = ['hello', 'goodbye', 'howdy', 'hello', 'hello', 'hi', 'bye']

print("Time to execute:\n")
print("Pandas/NumPy: %ss" % timeit.Timer(pdNumpy(words)).timeit(1000))
print("count(): %ss" % timeit.Timer(countFunction(words)).timeit(1000))
print("Counter: %ss" % timeit.Timer(counterObject(words)).timeit(1000))

Which results in:

Time to execute:

Pandas/NumPy: 0.33886080000047514s
count(): 0.0009540999999444466s
Counter: 0.0019409999995332328s

The count() method is extremely fast compared to the other variants, however, it doesn't give us the labels associated with the counts like the other two do.

If you need the labels - the Counter outperforms the inefficient process of wrapping the list in a NumPy array and then counting.

On the other hand, you can make use of DataFrame's methods for sorting or other manipulation that you can't do otherwise. Counter has some unique methods as well.

Ultimately, you can use the Counter to create a dictionary and turn the dictionary into a DataFrame as as well, to leverage the speed of Counter and the versatility of DataFrames:

df = pd.DataFrame.from_dict([Counter(words)]).T

If you don't need the labels - count() is the way to go.

Alternatively, if you're looking for a single word:

import numpy as np
import pandas as pd
import collections

def countFunction(words, word_to_search):
    def _countFunction():
        return words.count(word_to_search)
    return _countFunction

def counterObject(words, word_to_search):
    def _counterObject():
        return collections.Counter(words)[word_to_search]
    return _counterObject

def bruteForce(words, word_to_search):
    def _bruteForce():
        counts = []
        count = 0
        for word in words:
            if word == word_to_search:
              # update counter variable
                count = count + 1
            counts.append(count)
        return counts
    return _bruteForce
    
import timeit

words = ['hello', 'goodbye', 'howdy', 'hello', 'hello', 'hi', 'bye']

print("Time to execute:\n")
print("count(): %ss" % timeit.Timer(countFunction(words, 'hello')).timeit(1000))
print("Counter: %ss" % timeit.Timer(counterObject(words, 'hello')).timeit(1000))
print("Brute Force: %ss" % timeit.Timer(bruteForce(words, 'hello')).timeit(1000))

Which results in:

Time to execute:

count(): 0.0001573999998072395s
Counter: 0.0019498999999996158s
Brute Force: 0.0005682000000888365s

The brute force search and count() methods outperform the Counter, mainly because the Counter inherently counts all words instead of one.

Conclusion

In this guide, we explored finding the occurrence of the word in a Python list, assessing the efficiency of each solution and weighing when each is more suitable.

Last Updated: October 27th, 2023
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Project

Building Your First Convolutional Neural Network With Keras

# python# artificial intelligence# machine learning# tensorflow

Most resources start with pristine datasets, start at importing and finish at validation. There's much more to know. Why was a class predicted? Where was...

David Landup
David Landup
Details
Course

Data Visualization in Python with Matplotlib and Pandas

# python# pandas# matplotlib

Data Visualization in Python with Matplotlib and Pandas is a course designed to take absolute beginners to Pandas and Matplotlib, with basic Python knowledge, and...

David Landup
David Landup
Details

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms