Differences Between Python's defaultdict and dict

Differences Between Python's defaultdict and dict

Introduction

In Python, dictionaries are one of the most flexible built-in data types. They are great for structuring data and can help you solve a myriad of problems. But what if I told you there's a more powerful version of dictionaries that you might not have heard of? Yep, I'm talking about the collections.defaultdict type.

In this article, we'll explore what a defaultdict is and how it differs from a regular Python dictionary (or dict for short).

What is a dict?

A dict, or dictionary, is a built-in Python data type that stores mutable, unordered collections of key-value pairs. Each key-value pair in the dictionary maps the key to its associated value, making it easier to retrieve values for specific keys.

student = {
    "name": "John Doe",
    "age": 20,
    "courses": ["Math", "Science"]
}

print(student["name"])
# Output: John Doe

In this code, the keys are "name", "age", and "courses", and they each have associated values. You can access any value by its key, as we did with student["name"].

But what happens when you try to access a key that doesn't exist in the dictionary? Well, Python throws a KeyError:

print(student["grade"])
# Output: KeyError: 'grade'

This is one of the limitations of a normal dictionary. It doesn't handle missing keys very well. In real-world applications, this can cause your program to crash if you're not careful. This is where defaultdicts come into play, but we'll get into that later in the article.

Note: You can avoid KeyError exceptions in normal dictionaries by using the get method, which returns None if the key is not found. However, this isn't always ideal, especially when you want to provide a default value other than None.

print(student.get("grade"))
# Output: None

What is a defaultdict?

A defaultdict is a specialized dictionary provided by the collections module in Python. It's a subclass of the built-in dict class. So, what makes it so special? Well, it doesn't throw a KeyError when you try to access or modify keys that aren't actually in the dictionary. Instead, it initializes it with an element of the data type that you pass as an argument at the creation of defaultdict. This can be very useful when you're working with large data structures.

Let's take a quick look at how you would initialize a defaultdict:

from collections import defaultdict

# Initializing with list as default_factory
dd = defaultdict(list)

In the example above, if you try to access a key that doesn't exist, Python will return an empty list [] instead of throwing a KeyError.

print(dd["non_existent_key"])
# Output: []

The argument you pass while initializing defaultdict is called default_factory. It's a function that provides the default value for the dictionary created. If this argument is absent, then the defaultdict essentially behaves like a normal dict.

Key Differences

Now that we understand what a defaultdict is, let's take a look at the key differences between a defaultdict and a typical Python dictionary.

  • Default Values: The most significant difference, as we've already seen, is that defaultdict automatically assigns a default value to a non-existent key. This is in different than a standard dict, which raises a KeyError when you try to access or modify a non-existent key.
# Standard dict
d = {}
print(d["non_existent_key"])
# Output: KeyError: 'non_existent_key'

# Defaultdict
from collections import defaultdict
dd = defaultdict(int)
print(dd["non_existent_key"])
# Output: 0
  • Initialization: While initializing a defaultdict, you need to provide a default_factory function which will decide the default value for the non-existent keys. On the other hand, a standard dict doesn't require or support this.
d = {}  # Standard dict
dd = defaultdict(list)  # Defaultdict
  • Use Cases: defaultdict is more useful when you're dealing with large data sets where you want to avoid handling KeyError exceptions. It's commonly used for grouping, counting, or accumulating operations.

When to Use defaultdict vs dict

Of course, the choice between defaultdict and dict depends on your specific needs. If you're dealing with a situation where you want to avoid key errors and you know in advance what kind of default value you'd want for non-existing keys, defaultdict is the way to go.

Let's say you're building a dictionary to count the frequency of words in a text. With a normal dictionary, you'd have to check if a word is already a key in the dictionary before incrementing its count. With defaultdict, you can simply set the default value type as int and increment the count without any checks.

On the other hand, if you want your program to throw an error when a non-existent key is accessed, or if you don't have a clear default value, a regular dict may be more suitable.

How to Use defaultdict

Using defaultdict is quite straightforward. You start by importing it from the collections module. Then, when you create a defaultdict, you pass in the default type for the dictionary. This could be int, list, set, dict, or even a user-defined function.

Let's take a look at an example. Suppose we want to create a dictionary to store the grades of students in different subjects. We can use a defaultdict with a list as the default type:

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

from collections import defaultdict

# Create a defaultdict with list as the default type
grades = defaultdict(list)

# Add grades
grades['Math'].append(85)
grades['English'].append(90)

print(grades)

When you run this code, you'll get the following output:

defaultdict(<class 'list'>, {'Math': [85], 'English': [90]})

As you can see, we didn't have to check if "Math" or "English" were already keys in the dictionary. We were able to directly append the grades. If we try to access the grades for a subject that hasn't been added yet, we'll get an empty list instead of a key error:

print(grades['Science'])

This will output:

[]

Note: Remember that the default type you pass to defaultdict is a function, not a value. So, you should pass list instead of [], or int instead of 0.

How to Use dict

The Python dict is a built-in data type used to store data in key-value pairs. Here's a simple example of how to use it:

# Creating a dictionary
my_dict = {'name': 'John', 'age': 30}

# Accessing a value
print(my_dict['name'])  # Output: John

# Updating a value
my_dict['age'] = 31
print(my_dict['age'])  # Output: 31

# Adding a new key-value pair
my_dict['job'] = 'Engineer'
print(my_dict)  # Output: {'name': 'John', 'age': 31, 'job': 'Engineer'}

One thing to remember when using dict is that it will raise a KeyError if you try to access a key that doesn't exist:

print(my_dict['hobby'])  # Raises KeyError: 'hobby'

Note: To avoid this, you can use the get() method, which returns None or a default value of your choice if the key doesn't exist.

print(my_dict.get('hobby'))  # Output: None
print(my_dict.get('hobby', 'default'))  # Output: default

Conclusion

In this article, we've taken a deeper dive into the world of Python dictionaries, focusing specifically on the dict and collections.defaultdict types. We've explored their key differences, like how defaultdict provides a default value for non-existent keys, thus avoiding KeyError exceptions. We've also looked at their use-cases, with dict being better for scenarios where you need to strictly control what keys exist in your dictionary, and defaultdict being more useful when you're dealing with large datasets and need to avoid constant key existence checks.

Last Updated: August 31st, 2023
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Free
Course

Graphs in Python - Theory and Implementation

# python# data structures# algorithms# computer science

Graphs are an extremely versatile data structure. More so than most people realize! Graphs can be used to model practically anything, given their nature of...

David Landup
Dimitrije Stamenic
Jovana Ninkovic
Details
Course

Data Visualization in Python with Matplotlib and Pandas

# python# pandas# matplotlib

Data Visualization in Python with Matplotlib and Pandas is a course designed to take absolute beginners to Pandas and Matplotlib, with basic Python knowledge, and...

David Landup
David Landup
Details

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms