Guide to Using The Django MongoDB Engine with Python

Introduction

In this article, we will see how to use MongoDB, a non-relational database, with Django, a Python Web Framework.

Django is commonly used with PostgreSQL, MariaDB or MySQL, all relational databases, due to it's ORM under the hood. MongoDB, being quite flexible, is commonly paired with lightweight frameworks such as Flask for the ease of prototyping. However, is also increasingly being used in larger projects due to scalability, dynamic structures and query support.

The Django MongoDB Engine is used to declaratively define schemas.

Note: At the time of writing, this engine doesn't have the support for Python 3.x. The latest supported version is Python 2.7.

Non-relational vs Relational Databases

The key difference of this engine compared to other popular engines is that it works with a non-relational database, whereas Django applications are more commonly developed with relational databases.

Choosing between these two approaches boils down to the project you're working on, as each type has certain pros and cons depending on the situation. Non-relational databases are usually more flexible (both a pro and con), while relational databases are more conformed (also, both a pro and con).

Non-relational databases are also, usually, better for scalable systems that hold a lot of data. However, for small-to-mid systems, the ease of maintaining relational databases oftentimes prevails.

Relational Database

A relational database stores data in tables, which consist of columns and rows.

  • A row represents an entity (e.g. a Movie)
  • A column represents an attribute of the entity (e.g. name of the movie, its length, year of release, etc.)
  • A row represents one entry in a database (e.g. {"The Matrix", 2h 16min, 1999.}).

Every table row should have a unique key (an ID), which represents solely that one row.

Some of the most famous relational databases are: Oracle, PostgreSQL, MySQL and MariaDB.

Non-relational Database

A non-relational database does not store data in tables, rather it depends on the type of data. There are four different types of non-relational databases:

  • Document-oriented Database (or Document Store)
    • Manages a set of named string fields usually in a form of JSON, XML or YAML documents. These formats can also have derivatives.
  • Wide-Column Store
    • Organizes data in columns, in a similar structure to relational databases
  • Graph Store
    • Stores relations between entities (most complex type of non-relational database)
    • Used when data is widely interconnected
  • Key-Value Store
    • Simple key-value pair collection

Some of the most famous non-relational databases are: MongoDB, Cassandra, Redis.

relational vs non-relational databases

MongoDB is a document-based non-relational database, that saves documents in BSON (Binary JSON) format - a derivative of JSON.

Installation and Setup

To implement the Django MongoDB Engine in a project, we'll want to install three things:

  1. Django-nonrel - Support for non-relational databases (This will also install Django 1.5 for you and uninstall any previously installed version).
  2. djangotoolbox - Tools for non-relational Django applications.
  3. Django MongoDB Engine - The engine itself.

Let's install them via pip, alongside Django itself:

$ pip install django
$ pip install git+https://github.com/django-nonrel/[email protected]
$ pip install git+https://github.com/django-nonrel/djangotoolbox
$ pip install git+https://github.com/django-nonrel/mongodb-engine

Let's initialize a Django project via the command-line to get a starting point:

$ django-admin.py startproject djangomongodbengine

Now, with a skeleton project that contains some basic files, we'll want to let Django know which engine we'd like to use. To do that, we'll update our settings.py file, and more specifically, the DATABASES property:

DATABASES = {
   'default' : {
      'ENGINE' : 'django_mongodb_engine',
      'NAME' : 'example_database'
   }
}

With the installation and setup done, let's take a look at some of the things we can do with the Django MongoDB Engine.

Models and Fields

When it comes to working with Models, in a standard MVC (Model-View-Controller) architecture, the classic approach is to use the django.db.models module. The Model class has CharFields, TextFields, etc. that allow you to essentially define the schema of your models and how they'll be mapped to the database by Django's ORM.

Let's add a Movie model to our models.py:

from django.db import models

class Movie(models.Model)
    name = models.CharField()
    length = models.IntegerField()

Here, we have a Movie model that has two fields - name and length. Each of these are a Field implementation, which represents a database column, with the given data type.

While there are a fair bit of field types, the models module doesn't have great support for fields that have multiple values.

This is mainly because the models module is meant to mainly be used with relational databases. When an object has a field with multiple values, such as a Movie having many Actors, you'd have a One-to-Many relationship with another table.

With MongoDB, you can save them as a list within that document, without having to make a database reference to another table or document. This is where we feel the lack of fields such as ListField and DictField.

ListField

ListField is a list-type attribute, an attribute which can hold multiple values. It belongs to the djangotoolbox.fields module, and can be used to specify fields that contain list-like values, which are then saved into the BSON document.

Let's tweak our Movie model from before:

from django.db import models
from djangotoolbox.fields import ListField

class Movie(models.Model):
    name = models.CharField()
    length = models.IntegerField()
    year = models.IntegerField()
    actors = ListField()

Note that we didn't specify the id field. There is no need for it, since MongoDB will implicitly assign it to the instance of the Model. Additionally, we added the actors field, which is a ListField.

Now, when creating a Movie instance, we can assign a list to the actors field, and save it to our MongoDB database as is, without creating a separate table to contain Actor instances and referencing them in our Movie documents:

movie = Movie.objects.create(
	name = "The Matrix",
	length = 136,
	year = 1999,
	actors = ["Keanu Reeves", "Laurence Fishburne"]
)

Running this piece of code results in a MongoDB document:

{
  "_id" : ObjectId("..."),
  "name" : "The Matrix",
  "length" : 136,
  "year" : 1999,
  "actors" : [
  	"Keanu Reeves", 
  	"Laurence Fishburne"
  ]
}

We can also extend() the ListField, and add more values to it:

movie.actors.extend(['Carrie-Ann Moss'])

This results in an updated BSON document:

{
  "_id" : ObjectId("..."),
  "name" : "The Matrix",
  "length" : 136,
  "year" : 1999,
  "actors" : [
  	"Keanu Reeves", 
  	"Laurence Fishburne",
  	"Carrie-Ann Moss",
    "Carrie-Ann Moss"
  ]
}

SetField

SetField is the same as ListField except that it's interpreted as a Python set, which means no duplicates are allowed.

If we add the same actor twice:

movie.actors.extend(['Carrie-Ann Moss'])

We quickly realize that the output is a bit weird:

{
  "_id" : ObjectId("..."),
  "name" : "The Matrix",
  "length" : 136,
  "year" : 1999,
  "actors" : [
  	"Keanu Reeves", 
  	"Laurence Fishburne",
  	"Carrie-Ann Moss"
  ]
}

Since we'd like to avoid duplicate entries, having each individual staying an actual individual, it makes more sense to make the actors a SetField instead of a ListField:

from django.db import models
from djangotoolbox.fields import ListField

class Movie(models.Model):
    name = models.CharField()
    length = models.IntegerField()
    year = models.IntegerField()
    actors = SetField()

Now, we can add multiple actors, some of which are duplicates, and we'll only have unique additions:

movie = Movie.objects.create(
	name = "John Wick",
	length = 102,
	year = 2014,
	actors = ["Keanu Reeves", "Keanu Reeves", "Bridget Moynahan"]
)

However, the resulting document will only have one entry for "Keanu Reeves", the one and only:

{
  "_id" : ObjectId("..."),
  "name" : "John Wick",
  "length" : 102,
  "year" : 2014,
  "actors" : [
  	"Keanu Reeves", 
  	"Bridget Moynahan"
  ]
}

DictField

DictField stores Python dictionaries, as yet another BSON document, within your own document. These are preferable when you're not sure what the dictionary might look like - and you don't have a pre-defined structure for it.

On the other hand if the structure is familiar, it is recommended to use Embedded Models, as models within models. For example, an Actor could be a model of its own, and we could let the Movie model have multiple embedded Actor models. On the other hand, if a variable set of values are to be added, they can be mapped as key-value elements and saved through a DictField.

For example, let's add a reviews field, that can have 0..n reviews. While reviews do have a predictable structure (name, grade, comment), we'll implement them as a DictField, before making a separate Model for actors and reviews:

from django.db import models
from djangotoolbox.fields import SetField
from djangotoolbox.fields import DictField

class Movie(models.Model):
    name = models.CharField()
    length = models.IntegerField()
    year = models.IntegerField()
    actors = SetField()
    reviews = DictField()

Now, when creating movies, we can add dictionaries of reviewers, and their reviews of the movies:

movie = Movie.objects.create(
	name = "Good Will Hunting",
	length = 126,
	year = 1997,
	actors = ["Matt Damon", "Stellan Skarsgard"],
    reviews = [
        {"Portland Oregonian" : "With its sweet soul and sharp mind..."},
    	{"Newsweek" : "Gus Van Sant, working from the tangy, well-written script..."}
    ]
)

Running this code results in:

{
  "_id" : ObjectId("..."),
  "name" : "Good Will Hunting",
  "length" : 126,
  "year" : 1997,
  "actors" : [
  	"Matt Damon", 
  	"Stellan Skarsgard"
  ],
  "reviews" : [
  	{"Portland Oregonian" : "With its sweet soul and sharp mind..."},
    {"Newsweek": "Gus Van Sant, working from the tangy, well-written script..."}
  ]
}

Embedded Models

Now, the reviews field will, arguably, follow the same kind of structure - name followed by comment. actors amount to more than just their names - they have a last_name, date_of_birth and other characteristics.

For both of these, we can make standalone models, much like we'd make with relational databases. With relational databases, though, we'd save them in their own tables and link to them from the Movie table.

With MongoDB, we can turn them into Embedded Models - entire documents, embedded into another document.

Let's change our Movie once again:

from django.db import models
from djangotoolbox.fields import ListField, EmbeddedModelField


class Movie(models.Model):
    name = models.CharField(max_length=100)
    length = models.IntegerField()
    year = models.IntegerField()
    actors = SetField(EmbeddedModelField("Actor"))
    reviews = SetField(EmbeddedModelField("Review"))

Here, we've made a SetField (which could've also been something like a ListField) for both actors and reviews. However, this time around, we've made them SetFields of other models, by passing EmbeddedModelField into the constructors of SetFields.

We've also specified which models in the constructor of the EmbeddedModelField class.

Now, let's define those two as well, in the models.py file:

class Actor(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)
	date_of_birth = models.CharField(max_length=11)
    
class Review(models.Model):
    name = models.CharField(max_length=30)
    comment = models.CharField(max_length=300)

Now, when creating a Movie object, and saving it into the database, we can also add new Actor and Review instances to it:

movie = Movie.objects.create(
	name = "Focus",
    length = 105,
    year = 2015,
    actors = [
        Actor(
            first_name="Will",
            last_name="Smith", 
            date_of_birth="25.09.1968."
        )
    ],
    reviews = [
        Review(
            name = "Portland Oregonian",
            comment = "With its sweet soul and sharp mind..."
        ),
        Review(
            name = "Newsweek",
            comment = "Gus Van Sant, working from the tangy, well-written script..."
        )
    ]
)

This creates new BSON documents for each Actor and Review in the sets, and saves them as embedded objects into our movie document:

{
  "_id" : ObjectId("..."),
  "name" : "Focus",
  "length" : 105,
  "year" : 2015,
  "actors" : [
      {
          "name" : "Will",
          "last_name" : "Smith",
          "date_of_birth" : "25.09.1968"
        }   
    ],
    "reviews" : [
        {
          "name" : "Portland Oregonian",
          "comment" : "With its sweet soul and sharp mind..."
        },
        {
          "name" : "Newsweek",
          "comment" : "Gus Van Sant, working from the tangy, well-written script..."
        }
    ]
}

Each entry in the reviews BSON array is an individual Review instance. Same goes for actors.

File Handling

MongoDB has a built-in specification for storing/retrieving files in the filesystem called GridFS, which is used in the Django MongoDB Engine too.

Note: MongoDB stores files by separating them in pieces sized 255 kB each. When the file is accessed, GridFS collects the pieces and merges them.

To import the GridFS system, we'll access the django_mongodb_engine_storage module:

from django_mongodb_engine.storage import GridFSStorage

gridfs = GridFSStorage()
uploads_location = GridFSStorage(location = '/uploaded_files')

Another field we can use is the GridFSField(), which allows us to specify fields that utilize the GridFS system to store data:

class Movie(models.Model):
    name = models.CharField()
    length = models.IntegerField()
    year = models.IntegerField()
    actors = SetField(EmbeddedModelField("Actor"))
    reviews = SetField(EmbeddedModelField("Review"))
    poster = GridFSField()

Now this image will be saved in chunks and lazy-loaded only on demand.

Conclusion

To sum up, the Django MongoDB Engine is a fairly powerful engine and the main downside of using it is that it works with old versions of Django (1.5) and Python (2.7), whereas Django is now at 3.2 LTS and the support for 1.5 ended a long time ago. Python is at 3.9 and support for 2.7 ended last year. In addition to all of that, Django MongoDB Engine seems to has stopped further development back in 2015.