Integrating H2 with Python and Flask

Introduction

H2 is a lightweight database server written in Java. It can be embedded in Java applications, or run as a standalone server.

In this tutorial, we'll review why H2 can be a good option for your projects. We'll also learn how to integrate H2 with Python by building a simple Flask API.

The Features of H2

H2 was built with performance in mind.

"H2 is a combination of: fast, stable, easy to use, and features".

Although H2 is prominent mainly because it can be embedded in Java applications, it has some interesting features that also apply to its server version. Let's see some of them next.

Size and Performance

The .jar file used for the server version is around 2MB. We can download it from the H2 site, bundled with extra scripts and documentation. If we search in Maven Central though, we can download the .jar file on its own.

H2 performance shines in its embedded version. Even so, the official benchmark shows that its client-server version is also impressive.

In-Memory Databases and Encryption

In-Memory databases are not persistent. All data is stored in memory, so speed is greatly increased.

The H2 site explains that In-Memory databases are particularly useful when prototyping, or when using read-only databases.

Encryption is another useful feature to protect data at rest. Databases can be encrypted with the AES-128 algorithm.

Other Useful Features

H2 also provides a cluster mode, the ability to run multiple servers and connect them together. Writes are done in all servers at the same time, while reads are done from the first server in the cluster.

H2 surprises for its simplicity. It provides several useful features and it's easy to set up.

Let's start an H2 server in preparation for the following sections:

$ java -cp ./h2-1.4.200.jar org.h2.tools.Server -tcp -tcpAllowOthers -tcpPort 5234 -baseDir ./ -ifNotExists

The arguments that start with tcp enable communication to the server. The ifNotExists argument allows the database to be created when accessing it for the first time.

Description of the API and General Diagram

Let's suppose we are writing an API to register all exoplanets found to date. Exoplanets are planets found outside our Solar System, orbiting other stars.

This is our simple API definition, a CRUD for one resource:

REST API Definition

This definition along with the rest of the code we'll see next is available in this GitHub repo.

This is how our application will look like at the end of this tutorial:

General diagram

Left of the diagram we see the API Client. That client can be the "Try it out" function of the Swagger Editor, or any other client, like Postman or cURL.

On the other end we find the H2 database server, running on TCP port 5234 as explained above.

Finally, our application in the middle is composed of three Python files. The first one will have the Flask app that will answer all REST API requests. All endpoints we described in the definition above will be added to this file.

The second file will have the persistence, functions that access the database to execute the CRUD operations, using the JayDeBeApi package.

Lastly, a third file will contain a schema representing the resource the API manages, the Exoplanet. We'll use the Marshmallow package to represent that schema. The first two python files will use this schema to represent resources and pass them to each other.

Let's start off with the persistence file.

Database Schema

To store the Exoplanet resource to an H2 database we should write the basic CRUD functions first. Let's start by writing the creation of the database. We use the JayDeBeApi package to access databases through JDBC:

import jaydebeapi

def initialize():
    _execute(
        ("CREATE TABLE IF NOT EXISTS exoplanets ("
         "  id INT PRIMARY KEY AUTO_INCREMENT,"
         "  name VARCHAR NOT NULL,"
         "  year_discovered SIGNED,"
         "  light_years FLOAT,"
         "  mass FLOAT,"
         "  link VARCHAR)"))

def _execute(query, returnResult=None):
    connection  = jaydebeapi.connect(
        "org.h2.Driver",
        "jdbc:h2:tcp://localhost:5234/exoplanets",
        ["SA", ""],
        "../h2-1.4.200.jar")
    cursor = connection.cursor()
    cursor.execute(query)
    if returnResult:
        returnResult = _convert_to_schema(cursor)
    cursor.close()
    connection.close()

    return returnResult

The initialize() function is simple enough because of the helper functions after. It creates the exoplanets table if it doesn't exist already. This function should be executed before our API starts receiving requests. We'll see later where to do that with Flask.

The _execute() function contains the connection string and credentials to access the database server. It is simpler for this example, but there is room for improvement regarding security. We could save our credentials elsewhere, like environment variables for example.

Also, we added the path to the H2 jar file to the connect() method, as it has the driver we need to connect to H2 - org.h2.Driver.

The JDBC connection string ends in /exoplanets. This means that when connecting for the first time a database called exoplanets will be created.

You may have noticed that _execute() can return the result of the SQL query using the _convert_to_schema() function. Let's now see how that function works.

Marshmallow Schemas and CRUD Database Functions

Some SQL queries return tabular results, particularly the SELECT statement. JayDeBeApi will format those results as a list of tuples. For example, for the schema defined in the last section we could get a result similar to this:

>>> connection  = jaydebeapi.connect(...
>>> cursor = connection.cursor()
>>> cursor.execute("SELECT * FROM exoplanets")
>>> cursor.fetchall()
[(1, 'Sample1', 2019, 4.5, 1.2, 'http://sample1.com')]

Nothing's stopping us from managing results in this format and eventually return it to the API client. But looking ahead we know we will use Flask, so it would be good to already return results in a format Flask recommends.

In particular, we'll be using Flask-RESTful to facilitate the use of API routes. That package recommends to use Marshmallow to parse requests. This step allows for normalization of the objects. This way we can discard unknown properties and highlight validation errors, for example.

Let's see how the Exoplanet class would look so we can discuss further:

from marshmallow import Schema, fields, EXCLUDE

class ExoplanetSchema(Schema):
    id = fields.Integer(allow_none=True)
    name = fields.Str(required=True, error_messages={"required": "An exoplanet needs at least a name"})
    year_discovered = fields.Integer(allow_none=True)
    light_years = fields.Float(allow_none=True)
    mass = fields.Float(allow_none=True)
    link = fields.Url(allow_none=True)
    class Meta:
        unknown = EXCLUDE

The definition of the properties looks familiar. It's the same as the database schema, including the definition of required fields. All fields have a type that defines some default validation. For example, the link field is defined as an URL, so a string that doesn't look like an URL won't be valid.

Specific error messages can also be included here, like the validation for a name.

For this example project, we want to discard, or exclude, all unknown fields and API client may send erroneously. This is achieved in the Meta nested class.

Now we can use the load() and loads() Marshmallow methods to convert and validate our resources.

Now that we're acquainted with Marshmallow, we can explain what the _convert_to_schema() does:

def _convert_to_schema(cursor):
    column_names = [record[0].lower() for record in cursor.description]
    column_and_values = [dict(zip(column_names, record)) for record in cursor.fetchall()]

    return ExoplanetSchema().load(column_and_values, many=True)

In JayDeBeApi, column names are saved in the description field of the cursor, while the data can be obtained with the fetchall() method. We used list comprehensions in the first two lines to get the column names and values, and zip() to merge them.

The last line takes the merged result and converts them to ExoplanetSchema objects that Flask can further process.

Now that we explained the _execute() function and the ExoplanetSchema class, let's see all the CRUD database functions:

def get_all():
    return _execute("SELECT * FROM exoplanets", returnResult=True)

def get(Id):
    return _execute("SELECT * FROM exoplanets WHERE id = {}".format(Id), returnResult=True)

def create(exoplanet):
    count = _execute("SELECT count(*) AS count FROM exoplanets WHERE name LIKE '{}'".format(exoplanet.get("name")), returnResult=True)
    if count[0]["count"] > 0:
        return

    columns = ", ".join(exoplanet.keys())
    values = ", ".join("'{}'".format(value) for value in exoplanet.values())
    _execute("INSERT INTO exoplanets ({}) VALUES({})".format(columns, values))

    return {}

def update(exoplanet, Id):
    count = _execute("SELECT count(*) AS count FROM exoplanets WHERE id = {}".format(Id), returnResult=True)
    if count[0]["count"] == 0:
        return

    values = ["'{}'".format(value) for value in exoplanet.values()]
    update_values = ", ".join("{} = {}".format(key, value) for key, value in zip(exoplanet.keys(), values))
    _execute("UPDATE exoplanets SET {} WHERE id = {}".format(update_values, Id))

    return {}

def delete(Id):
    count = _execute("SELECT count(*) AS count FROM exoplanets WHERE id = {}".format(Id), returnResult=True)
    if count[0]["count"] == 0:
        return

    _execute("DELETE FROM exoplanets WHERE id = {}".format(Id))
    return {}

All functions are mainly SQL queries, but create() and update() deserve some more explanation.

The INSERT SQL statement can receive column and values separated, in the form INSERT INTO table (column1Name) VALUES ('column1Value'). We can use the join() function to merge all columns and separate them with commas, and do something similar to join all values we want to insert.

The UPDATE SQL statement is a bit more complex. Its form is UPDATE table SET column1Name = 'column1Value'. So we need to alternate keys and values, and we did it using the zip() function.

All these functions return None when there is a problem. Later when we call them we will have to check for that value.

Let's save all database functions on its own file, persistence.py, so we can add some context when we call the functions, like this:

import persistence

persistence.get_all()

REST API With Flask

Now that we wrote a layer to abstract the access to the database, we are ready to write the REST API. We'll use the Flask and Flask-RESTful packages to make our definition as easy as possible. As we learned before, we'll also use Marshmallow to validate resources.

Flask-RESTful requires to define one class per API resource, in our case the Exoplanet resource only. Then we can associate that resource with a route like this:

from flask import Flask
from flask_restful import Resource, Api

app = Flask(__name__)
api = Api(app)

class Exoplanet(Resource):
    # ...

api.add_resource(Exoplanet, "/exoplanets", "/exoplanets/<int:Id>")

This way all our routes, /exoplanets and /exoplanets/<int:Id> will be directed to the class we defined.

For example, the GET /exoplanets endpoint will be answered by a method called get() inside the Exoplanet class. Because we also have the GET /exoplanet/<Id> endpoint, that get() method must have an optional parameter called Id.

Let's see the whole class to understand this better:

from flask import request
from flask_restful import Resource, abort
from marshmallow import ValidationError
import persistence

class Exoplanet(Resource):
    def get(self, Id=None):
        if Id is None:
            return persistence.get_all()

        exoplanet = persistence.get(Id)
        if not exoplanet:
            abort(404, errors={"errors": {"message": "Exoplanet with Id {} does not exist".format(Id)}})
        return exoplanet

    def post(self):
        try:
            exoplanet = ExoplanetSchema(exclude=["id"]).loads(request.json)
            if not persistence.create(exoplanet):
                abort(404, errors={"errors": {"message": "Exoplanet with name {} already exists".format(request.json["name"])}})
        except ValidationError as e:
            abort(405, errors=e.messages)

    def put(self, Id):
        try:
            exoplanet = ExoplanetSchema(exclude=["id"]).loads(request.json)
            if not persistence.update(exoplanet, Id):
                abort(404, errors={"errors": {"message": "Exoplanet with Id {} does not exist".format(Id)}})
        except ValidationError as e:
            abort(405, errors=e.messages)

    def delete(self, Id):
        if not persistence.delete(Id):
            abort(404, errors={"errors": {"message": "Exoplanet with Id {} does not exist".format(Id)}})

The remaining HTTP verbs are processed in the same way as GET, by the methods with name post(), put() and delete() .

As we said before, logic errors when accessing the database will cause the functions to return None. Those errors are captured here when needed.

Also, exceptions that represent validation errors are triggered by Marshmallow, so those errors are also captured and returned to the user along with an appropriate return error.

Conclusion

H2 is a useful database server, performant and easy to use. Although it is a Java package, it can also run as a standalone server, so we can use it in Python with the JayDeBeApi package.

In this tutorial we defined a simple CRUD application to illustrate how to access the database, and which functions are available. After that, we defined a REST API with Flask and Flask-RESTful.

Although several concepts were omitted for the sake of brevity, like authentication and paging, this tutorial is a good reference to start using H2 in our Flask projects.