Backing up and Restoring PostgreSQL Databases

Introduction

Making regular database backups is an essential maintenance task and fail point recovery strategy for anyone who is responsible for a database.

A common misnomer for software developers is that there will be a database administrator who will take care of these things for us. Unfortunately, in my experience this is far from the truth. Of the many projects I have ever worked on for multiple different companies, I have never once had the luxury of being able to put off those tasks to the fable unicorn known as a database administrator (DBA). Instead, I have had to do my best to keep the databases to the systems I work on in as best a situation as I can, which has involved settings up a routine backup strategy and knowing how to recover one if in the face of such a tragedy.

In this article I am going to be covering the methods I have used to back up and recover PostgreSQL databases on Linux servers.

Creating Database Backups

There are two ways that I am aware of how to create a backup of a PostgreSQL database, which I will be discussing in this section. Both ways utilize point-in-time copies of the database. One uses what is known as WAL archiving and the other utilizes a SQL dump.

A. Creating an Archived Copy of your Database at the File System Level with WAL Archiving

Before getting too far in the explanation of this method, I should say that a directory needs to be created on the system that will hold arhived files and must be owned by the postgres user. In my examples I will be utilizing the directory /home/username/pg which is owned by postgres and it will contain two subdirectories wals and cluster, also owned by postgres.

This first method uses the pair of PostgreSQL functions pg_start_backup(...) and pg_stop_backup(...). In order to use these functions the user executing them must have replication privelliges, which is defined in the pg_hba.conf config file. Additionally, to utilize this method of backup you need to add some settings and commands to the postgresql.conf file, which I will discuss next.

In the postgresql.conf file you need to set archive_mode to on and provide an archive command that will look something like so:

archive_mode = on  
archive_command = 'cp %p /home/username/pg/wals/%f'  

Here, "%p" is replaced with the path for the WAL (write ahead log) file being archived and "%f" is the name of the actual file.

The aforementioned Postgres functions work by creating a checkpoint (aka, a single point-in-time) copy of the Postgres data directory as a compressed archive. I have typically used this combination of functions within a bash script scheduled to run at set intervals within a cron task.

For example, a bash script on a Linux server hosting a Postgres database could implement such a cron task to backup the database like this:

1 1 * * * bash /path/to/backup.sh  

In the above example the cron task entry would run the script, backup.sh, in the first minute of the first hour of every day.

Below is the example's backup.sh bash script to perform the function of creating the archived copy:

#! /bin/bash

function backup_pg() {  
   ARCHIVE=$1
   echo Backing up database ...
   psql -U username -h localhost -c "SELECT pg_start_backup('$ARCHIVE');" databasename
   tar czf $ARCHIVE $SRC
   psql -U username -h localhost -c "SELECT pg_stop_backup();" databasename
   echo Completed!
}

SRC=/var/lib/pgsql/9.6/data  
DST=/home/username/pg/cluster/

TODAY=`date +%F`  
ARCHIVE=databasename-backup-$TODAY.tgz  
BAKDIR=Archive-$TODAY

mkdir -p $DST  
cd $DST/$BACKDIR

backup_pg $ARCHIVE  

As I mentioned previously this creates a point-in-time copy of the database and archives the entire data directory. You would want to replace databasename, username, as well as SRC and DST with the values that make sense for your setup. For more information of using this method of creating backup consult the official docs.

B. Creating a SQL Dump of the Database

This second way of backing up a database, which utilizes the pg_dump command, is my preferred method as I feel it requires fewer steps and hence is less complicated. This method still creates a point-in-time copy of the database, but it does so by creating a single file which contains commands that will recreate tables, contraints and indexes as well as populate the tables with data.

There are different ways to use this command which are detailed in the PostgreSQL docs, but they primarily differ in the format of the output.

The two most common ways I've used them are to produce a dump file of the database that is either in plain readable text filled with SQL commands, or to produce the output in PostgreSQL's custom binary format. I do appreciate the readability of the plain text output for the sake of curiosity and being able to take a look at what is going on, but in actuality there is very little practical use as these things are usually quite large.

In contrast the custom binary dump is not human readable, but it is compressed to save space and better handles restoration without causing confusing errors associated with constraint violations while loading table data, which come up a lot in the plain text method.

Again you could use the same cron task entry which would run the script backup.sh in the first minute of the first hour of every day.

1 1 * * * bash /path/to/backup.sh  

Here is an example bash script for creating a back up using pg_dump in custom binary format:

#!/bin/bash

TODAY=`date +%F`  
ARCHIVE=databasename-$TODAY.bk  
echo creating archive file $ARCHIVE  
cd /var/lib/pgsql/9.6/backups  
pg_dump -Fc databasename > $ARCHIVE  
echo 'archive complete'  

If one wanted to produce the plain text SQL output then you would swap out the line pg_dump -Fc databasename > $ARCHIVE with pg_dump databasename > $ARCHIVE.

Restoring the Database

Recovering a database can be a daunting task, as this is usually done in a time where you've expirienced a serious crash plus various team members and stakeholders my be in an emotionally charged frenzy. I hope this is something that you as a developer will never have to experience, but I am writing this artcle knowing that it is likely to be an unfortunate event for a certain percent of its readers. That being said remember just to relax and take it one step or command at time. After all, the worst has already happened so you have no where to go but up.

A. Restoring from a File System Backup

In order to restore from the WAL / file system backup you need to follow these steps:

  1. Stop the PostgreSQL server if it is running on what is likely to be a new Linux server
  2. Remove the existing files and directories under the data directory usually found in /var/lib/pgsql/x.x/data where "x.x" is the version of PostgreSQL that you are running, such as 9.6. Also make note of who owns these files and directories, which are usually owned by the postgres user
  3. Take your latest archived copy of the database that was created in the backup.sh script and depcompress it in the postresql data directory. Make sure the ownership of these files are the same ones as you noted in step 2
  4. Turn off all other apps or services that may try to access the database. It is also a good idea to modify your pg_hba.conf file to ensure no other users are able to access that database
  5. Create a recovery.conf file in the data directory and place the following command in it which is basically a reverse of the archive_command in the postgresql.conf file, discussed earlier. Again, make sure this file has the same owner as noted in step 2

    recovery.conf

    restore_command 'cp /home/username/pg/wals/%f %p'

  6. Start the PostgreSQL service, which will put the database into recovery mode utilizing the recovery.conf file's command to restore any data that resided in the WAL (write ahead log) after the point-in-time copy was made.

Whew! You're done.

B. Restoring from a SQL Dump Backup

Restoring a database from a SQL dump that was created using pg_dump involves far fewer steps than the previous method, but you will almost certainly recover less data using this method as you will not be utilizing the WAL to pick up any additional data that didn't make it into the point-in-time copy.

To restore the database you need to make sure the postgresql service is running on the machine you want to restore the database on. Then all that is required is to issue the following command as a user that has full write permissions to the PostgreSQL database.

pg_restore -Fc databasename-YYYY-MM-DD.bk  

Postgres will read the backup file and recreate the tables, indexes, and constraints as well as load the tables with the data present at the moment the original database was copied.

And voila, you are done!

I prefer this method not only because it is less complicated, but also I prefer this method of backup and recovery because as a developer I often find it useful, or even required, that I be able to easily load a production database on to my local development machine. This allows me to debug or develop a new feature with data that was recent and similar as possible to that of the live system, and SQL dumps make this possible.

So, even if you are using the WAL / file system method for backups and recovery you can still use pg_dump and pg_restore to make copies and restore your local dev machine.

Conclusion

In this article I have demonstrated two approaches to creating scheduled backups for a PostgreSQL database running on a Linux server and perform recovery in case the database were to become corrupt or otherwise unusable. I have also discussed the pros and cons of each method as well as discussed the need for a developer to know how to accomplish these tasks.

As always I thank you for reading and welcome comments and criticisms below.

Author image
Lincoln, Nebraska Twitter