Introduction
In this article, we will see how to use MongoDB, a non-relational database, with Django, a Python Web Framework.
Django is commonly used with PostgreSQL, MariaDB or MySQL, all relational databases, due to its ORM under the hood. MongoDB, being quite flexible, is commonly paired with lightweight frameworks such as Flask for the ease of prototyping. However, it is also increasingly being used in larger projects due to scalability, dynamic structures and query support.
The Django MongoDB Engine is used to declaratively define schemas.
Note: At the time of writing, this engine doesn't have the support for Python 3.x. The latest supported version is Python 2.7.
Non-relational vs Relational Databases
The key difference of this engine compared to other popular engines is that it works with a non-relational database, whereas Django applications are more commonly developed with relational databases.
Choosing between these two approaches boils down to the project you're working on, as each type has certain pros and cons depending on the situation. Non-relational databases are usually more flexible (both a pro and con), while relational databases are more conformed (also, both a pro and con).
Non-relational databases are also, usually, better for scalable systems that hold a lot of data. However, for small-to-mid systems, the ease of maintaining relational databases often prevails.
Relational Database
A relational database stores data in tables, which consist of columns and rows.
- A row represents an entity (e.g. a
Movie
) - A column represents an attribute of the entity (e.g.
name
of the movie, itslength
,year
of release, etc.) - A row represents one entry in a database (e.g.
{"The Matrix", 2h 16min, 1999.}
).
Every table row should have a unique key (an ID), which represents solely that one row.
Some of the most famous relational databases are: Oracle, PostgreSQL, MySQL and MariaDB.
Non-relational Database
A non-relational database does not store data in tables, rather it depends on the type of data. There are four different types of non-relational databases:
- Document-oriented Database (or Document Store)
- Manages a set of named string fields usually in the form of JSON, XML or YAML documents. These formats can also have derivatives.
- Wide-Column Store
- Organizes data in columns, in a similar structure to relational databases
- Graph Store
- Stores relations between entities (most complex type of non-relational database)
- Used when data is widely interconnected
- Key-Value Store
- Simple key-value pair collection
Some of the most famous non-relational databases are: MongoDB, Cassandra, Redis.
MongoDB is a document-based non-relational database, that saves documents in BSON (Binary JSON) format - a derivative of JSON.
Installation and Setup
To implement the Django MongoDB Engine in a project, we'll want to install three things:
- Django-nonrel - Support for non-relational databases (This will also install Django 1.5 for you and uninstall any previously installed version).
- djangotoolbox - Tools for non-relational Django applications.
- Django MongoDB Engine - The engine itself.
Let's install them via pip
, alongside Django itself:
$ pip install django
$ pip install git+https://github.com/django-nonrel/[email protected]
$ pip install git+https://github.com/django-nonrel/djangotoolbox
$ pip install git+https://github.com/django-nonrel/mongodb-engine
Let's initialize a Django project via the command-line to get a starting point:
$ django-admin.py startproject djangomongodbengine
Now, with a skeleton project that contains some basic files, we'll want to let Django know which engine we'd like to use. To do that, we'll update our settings.py
file, and more specifically, the DATABASES
property:
DATABASES = {
'default' : {
'ENGINE' : 'django_mongodb_engine',
'NAME' : 'example_database'
}
}
With the installation and setup done, let's take a look at some of the things we can do with the Django MongoDB Engine.
Models and Fields
When it comes to working with Models, in a standard MVC (Model-View-Controller) architecture, the classic approach is to use the django.db.models
module. The Model
class has CharField
s, TextField
s, etc. that allow you to essentially define the schema of your models and how they'll be mapped to the database by Django's ORM.
Let's add a Movie
model to our models.py
:
from django.db import models
class Movie(models.Model)
name = models.CharField()
length = models.IntegerField()
Here, we have a Movie
model that has two fields - name
and length
. Each of these are a Field
implementation, which represents a database column, with the given data type.
While there are a fair bit of field types, the models
module doesn't have great support for fields that have multiple values.
This is mainly because the models
module is meant to mainly be used with relational databases. When an object has a field with multiple values, such as a Movie
having many Actor
s, you'd have a One-to-Many relationship with another table.
With MongoDB, you can save them as a list within that document, without having to make a database reference to another table or document. This is where we feel the lack of fields such as ListField
and DictField
.
ListField
ListField
is a list-type attribute, an attribute which can hold multiple values. It belongs to the djangotoolbox.fields
module, and can be used to specify fields that contain list-like values, which are then saved into the BSON document.
Let's tweak our Movie
model from before:
from django.db import models
from djangotoolbox.fields import ListField
class Movie(models.Model):
name = models.CharField()
length = models.IntegerField()
year = models.IntegerField()
actors = ListField()
Note that we didn't specify the id
field. There is no need for it, since MongoDB will implicitly assign it to the instance of the Model
. Additionally, we added the actors
field, which is a ListField
.
Now, when creating a Movie
instance, we can assign a list to the actors
field, and save it to our MongoDB database as is, without creating a separate table to contain Actor
instances and referencing them in our Movie
documents:
movie = Movie.objects.create(
name = "The Matrix",
length = 136,
year = 1999,
actors = ["Keanu Reeves", "Laurence Fishburne"]
)
Running this piece of code results in a MongoDB document:
{
"_id" : ObjectId("..."),
"name" : "The Matrix",
"length" : 136,
"year" : 1999,
"actors" : [
"Keanu Reeves",
"Laurence Fishburne"
]
}
We can also extend()
the ListField
, and add more values to it:
movie.actors.extend(['Carrie-Ann Moss'])
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
This results in an updated BSON document:
{
"_id" : ObjectId("..."),
"name" : "The Matrix",
"length" : 136,
"year" : 1999,
"actors" : [
"Keanu Reeves",
"Laurence Fishburne",
"Carrie-Ann Moss",
"Carrie-Ann Moss"
]
}
SetField
SetField
is the same as ListField
except that it's interpreted as a Python set, which means no duplicates are allowed.
If we add the same actor twice:
movie.actors.extend(['Carrie-Ann Moss'])
We quickly realize that the output is a bit weird:
{
"_id" : ObjectId("..."),
"name" : "The Matrix",
"length" : 136,
"year" : 1999,
"actors" : [
"Keanu Reeves",
"Laurence Fishburne",
"Carrie-Ann Moss"
]
}
Since we'd like to avoid duplicate entries, having each individual staying an actual individual, it makes more sense to make the actors
a SetField
instead of a ListField
:
from django.db import models
from djangotoolbox.fields import ListField
class Movie(models.Model):
name = models.CharField()
length = models.IntegerField()
year = models.IntegerField()
actors = SetField()
Now, we can add multiple actors, some of which are duplicates, and we'll only have unique additions:
movie = Movie.objects.create(
name = "John Wick",
length = 102,
year = 2014,
actors = ["Keanu Reeves", "Keanu Reeves", "Bridget Moynahan"]
)
However, the resulting document will only have one entry for "Keanu Reeves"
, the one and only:
{
"_id" : ObjectId("..."),
"name" : "John Wick",
"length" : 102,
"year" : 2014,
"actors" : [
"Keanu Reeves",
"Bridget Moynahan"
]
}
DictField
DictField
stores Python dictionaries, as yet another BSON document, within your own document. These are preferable when you're not sure what the dictionary might look like - and you don't have a predefined structure for it.
On the other hand if the structure is familiar, it is recommended to use Embedded Models, as models within models. For example, an Actor
could be a model of its own, and we could let the Movie
model have multiple embedded Actor
models. On the other hand, if a variable set of values are to be added, they can be mapped as key-value elements and saved through a DictField
.
For example, let's add a reviews
field, that can have 0..n
reviews. While reviews
do have a predictable structure (name
, grade
, comment
), we'll implement them as a DictField
, before making a separate Model
for actors
and reviews
:
from django.db import models
from djangotoolbox.fields import SetField
from djangotoolbox.fields import DictField
class Movie(models.Model):
name = models.CharField()
length = models.IntegerField()
year = models.IntegerField()
actors = SetField()
reviews = DictField()
Now, when creating movies, we can add dictionaries of reviewers, and their reviews of the movies:
movie = Movie.objects.create(
name = "Good Will Hunting",
length = 126,
year = 1997,
actors = ["Matt Damon", "Stellan Skarsgard"],
reviews = [
{"Portland Oregonian" : "With its sweet soul and sharp mind..."},
{"Newsweek" : "Gus Van Sant, working from the tangy, well-written script..."}
]
)
Running this code results in:
{
"_id" : ObjectId("..."),
"name" : "Good Will Hunting",
"length" : 126,
"year" : 1997,
"actors" : [
"Matt Damon",
"Stellan Skarsgard"
],
"reviews" : [
{"Portland Oregonian" : "With its sweet soul and sharp mind..."},
{"Newsweek": "Gus Van Sant, working from the tangy, well-written script..."}
]
}
Embedded Models
Now, the reviews
field will, arguably, follow the same kind of structure - name
followed by comment
. actors
amount to more than just their names - they have a last_name
, date_of_birth
and other characteristics.
For both of these, we can make standalone models, much like we'd make with relational databases. With relational databases, though, we'd save them in their own tables and link to them from the Movie
table.
With MongoDB, we can turn them into Embedded Models - entire documents, embedded into another document.
Let's change our Movie
once again:
from django.db import models
from djangotoolbox.fields import ListField, EmbeddedModelField
class Movie(models.Model):
name = models.CharField(max_length=100)
length = models.IntegerField()
year = models.IntegerField()
actors = SetField(EmbeddedModelField("Actor"))
reviews = SetField(EmbeddedModelField("Review"))
Here, we've made a SetField
(which could have also been something like a ListField
) for both actors
and reviews
. However, this time around, we've made them SetField
s of other models, by passing EmbeddedModelField
into the constructors of SetField
s.
We've also specified which models in the constructor of the EmbeddedModelField
class.
Now, let's define those two as well, in the models.py
file:
class Actor(models.Model):
first_name = models.CharField(max_length=30)
last_name = models.CharField(max_length=30)
date_of_birth = models.CharField(max_length=11)
class Review(models.Model):
name = models.CharField(max_length=30)
comment = models.CharField(max_length=300)
Now, when creating a Movie
object, and saving it into the database, we can also add new Actor
and Review
instances to it:
movie = Movie.objects.create(
name = "Focus",
length = 105,
year = 2015,
actors = [
Actor(
first_name="Will",
last_name="Smith",
date_of_birth="25.09.1968."
)
],
reviews = [
Review(
name = "Portland Oregonian",
comment = "With its sweet soul and sharp mind..."
),
Review(
name = "Newsweek",
comment = "Gus Van Sant, working from the tangy, well-written script..."
)
]
)
This creates new BSON documents for each Actor
and Review
in the sets, and saves them as embedded objects into our movie
document:
{
"_id" : ObjectId("..."),
"name" : "Focus",
"length" : 105,
"year" : 2015,
"actors" : [
{
"name" : "Will",
"last_name" : "Smith",
"date_of_birth" : "25.09.1968"
}
],
"reviews" : [
{
"name" : "Portland Oregonian",
"comment" : "With its sweet soul and sharp mind..."
},
{
"name" : "Newsweek",
"comment" : "Gus Van Sant, working from the tangy, well-written script..."
}
]
}
Each entry in the reviews
BSON array is an individual Review
instance. The same goes for actors
.
File Handling
MongoDB has a built-in specification for storing/retrieving files in the file system called GridFS, which is used in the Django MongoDB Engine too.
Note: MongoDB stores files by separating them in pieces sized 255 kB
each. When the file is accessed, GridFS collects the pieces and merges them.
To import the GridFS system, we'll access the django_mongodb_engine_storage
module:
from django_mongodb_engine.storage import GridFSStorage
gridfs = GridFSStorage()
uploads_location = GridFSStorage(location = '/uploaded_files')
Another field we can use is the GridFSField()
, which allows us to specify fields that utilize the GridFS system to store data:
class Movie(models.Model):
name = models.CharField()
length = models.IntegerField()
year = models.IntegerField()
actors = SetField(EmbeddedModelField("Actor"))
reviews = SetField(EmbeddedModelField("Review"))
poster = GridFSField()
Now this image will be saved in chunks and lazy-loaded only on demand.
Conclusion
To sum up, the Django MongoDB Engine is a fairly powerful engine and the main downside of using it is that it works with old versions of Django (1.5) and Python (2.7), whereas Django is now at 3.2 LTS and the support for 1.5 ended a long time ago. Python is at 3.9 and support for 2.7 ended last year. In addition to all of that, Django MongoDB Engine seems to have stopped further development back in 2015.