PostgreSQL on EBS: Moving to AWS Part 3

2011-01-31 19:00:00 -0500

Databases and EBS: What you need to know.

Just a few things that you should know about EBS.

EBS is slow

All your data travels over a network before it reaches a disk, or data from thedisk reaches your instance. This means that writes and reads can be slow orintermittent at times.

Further compounding the issue, your SAN is shared with hundreds (thousands?) ofother users! While these machines are some high powered “big iron”, it stillmeans you’re going to have I/O contention and a number of other issues.

Even further, your disk access is metered! This means all those operations aretickling tiny little counters. This isn’t a lot in reality, but it all adds up!

On the bright side, EC2 instances have a lot of RAM. Let’s play to the field!

PostgreSQL Configuration and Use

This might sound a little preachy and redundant, but here goes:

Indexes!

No database should go without being properly indexed, from head to toe, witheverything you query upon and the vast majority of the combinations you use inyour queries. Yes, write performance will suffer, but we’re about to renderthat much less troublesome by sending the writes to RAM as frequently aspossible.

Less time spent searching tables = less disk access = greater performance.

Build queries to be sent over the network

If you’re doing anything with an ORM, you’re probably guilty of this at leastonce or twice: building your queries to be sent to the app to be handled later.You know those kooky DBA types that say “do everything in the database”, well,they’re on to something here.

Well, ‘lo and behold you do something like this:

When something like this:

Would have not only likely saved you a lot of computational cycles, but quite abit of network traffic is reduced, and continues to pay off as your tables growin size. This happens a lot in the rails community, unfortunately.

(Yes, I’m aware this example is a bit contrived. You could easily prepare thatquery with find() or ARel’s composition methods.)

The skinny: the less you do in the database the more you’re spending on networkresources and time to deliver your result. The database is probably working Ntimes as hard, too, to deliver your responses.

Even if it takes the “pretty” out of your code, do it in the database.

Shared Buffer Cache

Shared Buffer Cache is the meat and potatoes of PostgreSQL tuning. Increasingthis value will greatly decrease the frequency at which your data is flushed todisk. An EC2 Large Instance will happily accomodate a 4GB PostgreSQLinstallation which would be more than enough for lots of reasonably traffickedapplications.

Why is this important? The less time it spends writing to disk, or the lessfrequently it writes to disk, can mean a lot for your application’sperformance!

Database backups on the cloud

We have a few options for backing things up. As usual with redundancy, the bestoption is to… be redundant. (See what I did there?) Using a strategy thatallows us the best of both worlds.

EBS snapshots

You’ve already seen our snapshot script:

Which iterates over your volumes and maintains the last 5 backups.Here is a detailed account of the script’s function.

We use the script, amongst other things, to back up our database partitions,which are composed of the database master, the transaction log, and the backupsof the WAL.

WAL archiving

Write Ahead Logging and Continuous Archiving for Point in Time Recoveryis a pretty sticky topic and you would do yourself well to read that whole document.

Instead of repeating it here verbatim, I’ll tell you what our backup script does:

This script manages the archiving of three tarballs:

base.tar.bz2, the base database system
full-wal.tar.bz2, the whole WAL for the last day.
pit-wal.tar.bz2, the point in time portion of the WAL.

The major difference between ‘full-wal’ and ‘pit-wal’ is that at the time thefirst backup is taken (the night of the backup), the data may not be fullycommitted to disk. Therefore, we write as much as we can to the ‘pit-wal’ filefor the purposes of crashes that day. The ‘full-wal’, as you might suspect, isthe fully written representation and is actually written out a day after thebackup occurred.

In a recovery scenario, both of these tarballs would be merged with theexisting WAL files in order of ‘pit-wal’, then ‘full-wal’ would be unpacked.

The WAL directory itself has some data hidden in the filenames, let’s checkthat out:

2011-02-01 09:05 000000030000000300000026
2011-02-01 09:05 000000030000000300000026.000076B8.backup
2011-02-01 10:12 000000030000000300000027
2011-02-01 11:30 000000030000000300000028
2011-02-01 12:57 000000030000000300000029
2011-02-01 14:10 00000003000000030000002A
2011-02-01 14:58 00000003000000030000002B
2011-02-01 15:30 00000003000000030000002C

The filenames themselves hold two important pieces of information:

The first 8 characters of the filename are the recovery version. As we’re good little children and test our backups, this is at version 3.
The last 8 characters of the filename are ordered, you can see this by comparing the times and the filenames themselves.
If there is an extension, that is a demarcation point where pg_start_backup()/pg_stop_backup() was invoked. This is what we use to create the ‘full-wal’ tarball.

As for the backup structure? Well, here’s a sneak peek:

2011-01-28 09:05 2011-01-27.09:00:01/
2011-01-29 09:05 2011-01-28.09:00:01/
2011-01-30 09:05 2011-01-29.09:00:01/
2011-01-31 09:05 2011-01-30.09:00:01/
2011-02-01 09:05 2011-01-31.09:00:01/
2011-02-01 09:07 2011-02-01.09:00:01/

The $today and $yesterday calls just generate these filenames. At the endof the script, we see this idiom:

cd $backup_dir
ls -1d * | sort -rn | tail -n +15 | xargs rm -vr 
cd $OLDPWD

Which is a way of saying, “show us the last 15 dirs and delete the rest”. Thiskeeps our filesystem size low and we rsync these files nightly.

The sed usage here is a little tricky but not anything incomprehensible. Basically,

breakpoint=`ls *.backup | sort -r | head -n1 | sed -e 's/\..*$//'

Finds the latest backup file. Now,

arline=`ls | sort | sed -ne "/^$breakpoint$/ =" `
archive=`ls | sort | head -n $arline`

Uses that as a demarcation point to determine the archive files. Those filesare archived and removed and result in full-wal. The rest leftover result inpit-wal.