Last year, around this time we, published details of how we setup PostgresSQL backups using Continuous Archiving. This technique involves continuous WAL archives combined with a daily "baseline" backup script:
Backups are only half the picture, though; the ability to easily restore data is just as critical. Unfortunately, the recovery process, as documented, can be a bit complex. Here is an distilled, exact step-by-step procedure we use to recover from backup.
This assumes that hot-backups are enabled in postgres. Ensure the following are setup in postgresql.conf on the box where the backups are located
wal_level = hot_standby # minimal, archive, or hot_standby archive_mode = on # allows archiving to be done archive_command = 'cp -n %p /backup/postgresql/%f'
Understanding the Backup Folder Structures
The PostgreSQL WAL-based backup approach is structured on a base backup, and then a set of transaction logs that can be applied on it.
Via the backup script, daily baselines are stored in the PostgreSQL backup folder, /backup/pg_backups, in folders like this:
drwxr-xr-x 2 106 113 4096 Aug 24 05:15 2011-08-23.09:00:01 drwxr-xr-x 2 106 113 4096 Aug 25 05:16 2011-08-24.09:00:01 drwxr-xr-x 2 106 113 4096 Aug 26 05:16 2011-08-25.09:00:01 drwxr-xr-x 2 106 113 4096 Aug 27 05:16 2011-08-26.09:00:01 drwxr-xr-x 2 106 113 4096 Aug 27 05:19 2011-08-27.09:00:01
In this case, the folder is named after the current date and the base backup. For example, the folder named 2011-08-25.09:00:01 would contain the base backup from 9 AM UTC, along with the WAL log files archived between 2011-08-25.09:00:01 and 2011-08-26.09:00:01. Thus, to restore to any point in that time you should use the files from that backup folder:
- base.tar.bz2 - full backup of the postgresql data folder (i.e. /var/lib/postgresql/9.x/main)
- pit-wal.tar.bz2 - point in time logs captured for transactions active while the base snapshot was being taken
- full-wal.tar.bz2 - the archive of all transaction logs present between the base backup and the following day's base backup
In the case of the current date, the folder corresponding to today's date will contain the base backup (base.tar.bz2) and the point in time logs pit-wal.tar.bz2. However, the full-wal.tar.bz2 file will not be present because the day's logs have not been archived yet. Instead, the current day's archive logs are stored in the archive logs directory, /backup/postgresql.
There are two options for data restoration:
- restore up to the last backup point in the current date
- restore to a previous day's backups
The process between these two is basically the same, the only difference is whether the WAL logs are extracted from the full-wal.tar.bz2 or taken from /backup/postgresql
Step 1. Copy data backup
Copy all the files from the appropriate backup folder to your home directory. The following assumes you have the archive files in the home directory of ~postgres (substitute as necessary).
In order to perform the restore, we'll create a temporary directory in which to stage the files.
cd ~postgres mkdir restore cd restore
Step 2. Extract base data and PIT logs
tar xjf ~postgres/base.tar.bz2 tar xjf ~postgres/pit-wal.tar.bz2 cd backup/postgresql
Step 3. Extract WAL data
- If you are restoring from a previous day
tar xjf ~postgres/full-wal.tar.bz2
- If you are restoring from the current day
cp /path/to/today/backup/postgresql/* .
Step 4. Shutdown Postgres
# shutdown postgres sudo su - service postgresql stop
Step 4. Restore main data directory
Start by moving into the base version directory. Move the existing main folder out of the way, then move the restored base back into place
cd /var/lib/postgresql/9.x/ mv main main.bak mv ~postgres/restore/var/lib/postgresql/9.x/main main cd main
Step 5. Create a recovery.conf
Now that the base backup is restored, you must to tell PostgreSQL how to apply the recovery procedure. First, create file called recovery.conf in the data directory, /var/lib/postgresql/9.x/main/recovery.conf. The contents should minimally include the following line, adjusted to the location of the WAL files.
restore_command = 'cp /home/postgres/restore/backup/postgresql/%f %p'
If you only want to apply transactions up to a specific point in time, for instance to recover to the middle of a specific day, add the recovery_target_time (note the time stamp will be relative to UTC):
recovery_target_time = '2011-08-27 04:00:00.0'
Step 6. Start PostgreSQL
service postgresql start
When PostgreSQL has finished restoring the files the recovery.conf file will be renamed to recovery.done.
Note: Information and additional details from:
- The continuing archive documentation: http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
- Zetetic blog post documenting exact setup of our archiving mode http://zetetic.net/blog/2011/02/01/postgresql-on-ebs-moving-to-aws-part-3