Point In Time Recovery From Backup using PostgreSQL Continuous Archving

2012-03-09 13:55:44 -0500

Last year, around this time we, published details of how we setup PostgresSQL backups using Continuous Archiving. This technique involves continuous WAL archives combined with a daily "baseline" backup script:

Backups are only half the picture, though; the ability to easily restore data is just as critical. Unfortunately, the recovery process, as documented, can be a bit complex. Here is an distilled, exact step-by-step procedure we use to recover from backup.

Prerequsites

This assumes that hot-backups are enabled in postgres. Ensure the following are setup in postgresql.conf on the box where the backups are located

wal_level = hot_standby                 # minimal, archive, or hot_standby
archive_mode = on               # allows archiving to be done
archive_command = 'cp -n %p /backup/postgresql/%f'

Understanding the Backup Folder Structures

The PostgreSQL WAL-based backup approach is structured on a base backup, and then a set of transaction logs that can be applied on it.

Via the backup script, daily baselines are stored in the PostgreSQL backup folder, /backup/pg_backups, in folders like this:

drwxr-xr-x    2 106      113           4096 Aug 24 05:15 2011-08-23.09:00:01
drwxr-xr-x    2 106      113           4096 Aug 25 05:16 2011-08-24.09:00:01
drwxr-xr-x    2 106      113           4096 Aug 26 05:16 2011-08-25.09:00:01
drwxr-xr-x    2 106      113           4096 Aug 27 05:16 2011-08-26.09:00:01
drwxr-xr-x    2 106      113           4096 Aug 27 05:19 2011-08-27.09:00:01

In this case, the folder is named after the current date and the base backup. For example, the folder named 2011-08-25.09:00:01 would contain the base backup from 9 AM UTC, along with the WAL log files archived between 2011-08-25.09:00:01 and 2011-08-26.09:00:01. Thus, to restore to any point in that time you should use the files from that backup folder:

  1. base.tar.bz2 - full backup of the postgresql data folder (i.e. /var/lib/postgresql/9.x/main)
  2. pit-wal.tar.bz2 - point in time logs captured for transactions active while the base snapshot was being taken
  3. full-wal.tar.bz2 - the archive of all transaction logs present between the base backup and the following day's base backup

In the case of the current date, the folder corresponding to today's date will contain the base backup (base.tar.bz2) and the point in time logs pit-wal.tar.bz2. However, the full-wal.tar.bz2 file will not be present because the day's logs have not been archived yet. Instead, the current day's archive logs are stored in the archive logs directory, /backup/postgresql.

There are two options for data restoration:

  1. restore up to the last backup point in the current date
  2. restore to a previous day's backups

The process between these two is basically the same, the only difference is whether the WAL logs are extracted from the full-wal.tar.bz2 or taken from /backup/postgresql

Step 1. Copy data backup

Copy all the files from the appropriate backup folder to your home directory. The following assumes you have the archive files in the home directory of ~postgres (substitute as necessary).

In order to perform the restore, we'll create a temporary directory in which to stage the files.

cd ~postgres
mkdir restore
cd restore

Step 2. Extract base data and PIT logs

tar xjf ~postgres/base.tar.bz2
tar xjf ~postgres/pit-wal.tar.bz2
cd backup/postgresql

Step 3. Extract WAL data

  • If you are restoring from a previous day
    tar xjf ~postgres/full-wal.tar.bz2
  • If you are restoring from the current day
    cp /path/to/today/backup/postgresql/* .

Step 4. Shutdown Postgres

# shutdown postgres
sudo su -
service postgresql stop

Step 4. Restore main data directory

Start by moving into the base version directory. Move the existing main folder out of the way, then move the restored base back into place

cd /var/lib/postgresql/9.x/
mv main main.bak
mv ~postgres/restore/var/lib/postgresql/9.x/main main
cd main

Step 5. Create a recovery.conf

Now that the base backup is restored, you must to tell PostgreSQL how to apply the recovery procedure. First, create file called recovery.conf in the data directory, /var/lib/postgresql/9.x/main/recovery.conf. The contents should minimally include the following line, adjusted to the location of the WAL files.

restore_command = 'cp /home/postgres/restore/backup/postgresql/%f %p'

If you only want to apply transactions up to a specific point in time, for instance to recover to the middle of a specific day, add the recovery_target_time (note the time stamp will be relative to UTC):

recovery_target_time = '2011-08-27 04:00:00.0'

Step 6. Start PostgreSQL

service postgresql start

When PostgreSQL has finished restoring the files the recovery.conf file will be renamed to recovery.done.

Note: Information and additional details from:

 

blog comments powered by Disqus