PostgreSQL on EBS: Moving to AWS Part 3

2011-01-31 19:00:00 -0500


Databases and EBS: What you need to know.


Just a few things that you should know about EBS.

EBS is slow

All your data travels over a network before it reaches a disk, or data from thedisk reaches your instance. This means that writes and reads can be slow orintermittent at times.

Further compounding the issue, your SAN is shared with hundreds (thousands?) ofother users! While these machines are some high powered “big iron”, it stillmeans you’re going to have I/O contention and a number of other issues.

Even further, your disk access is metered! This means all those operations aretickling tiny little counters. This isn’t a lot in reality, but it all adds up!

On the bright side, EC2 instances have a lot of RAM. Let’s play to the field!

PostgreSQL Configuration and Use


This might sound a little preachy and redundant, but here goes:

Indexes!

No database should go without being properly indexed, from head to toe, witheverything you query upon and the vast majority of the combinations you use inyour queries. Yes, write performance will suffer, but we’re about to renderthat much less troublesome by sending the writes to RAM as frequently aspossible.

Less time spent searching tables = less disk access = greater performance.

Build queries to be sent over the network

If you’re doing anything with an ORM, you’re probably guilty of this at leastonce or twice: building your queries to be sent to the app to be handled later.You know those kooky DBA types that say “do everything in the database”, well,they’re on to something here.

Well, ‘lo and behold you do something like this:

When something like this:

Would have not only likely saved you a lot of computational cycles, but quite abit of network traffic is reduced, and continues to pay off as your tables growin size. This happens a lot in the rails community, unfortunately.

(Yes, I’m aware this example is a bit contrived. You could easily prepare thatquery with find() or ARel’s composition methods.)

The skinny: the less you do in the database the more you’re spending on networkresources and time to deliver your result. The database is probably working Ntimes as hard, too, to deliver your responses.

Even if it takes the “pretty” out of your code, do it in the database.

Shared Buffer Cache

Shared Buffer Cache is the meat and potatoes of PostgreSQL tuning. Increasingthis value will greatly decrease the frequency at which your data is flushed todisk. An EC2 Large Instance will happily accomodate a 4GB PostgreSQLinstallation which would be more than enough for lots of reasonably traffickedapplications.

Why is this important? The less time it spends writing to disk, or the lessfrequently it writes to disk, can mean a lot for your application’sperformance!

Database backups on the cloud


We have a few options for backing things up. As usual with redundancy, the bestoption is to… be redundant. (See what I did there?) Using a strategy thatallows us the best of both worlds.

EBS snapshots

You’ve already seen our snapshot script:

Which iterates over your volumes and maintains the last 5 backups.Here is a detailed account of the script’s function.

We use the script, amongst other things, to back up our database partitions,which are composed of the database master, the transaction log, and the backupsof the WAL.

WAL archiving

Write Ahead Logging and Continuous Archiving for Point in Time Recoveryis a pretty sticky topic and you would do yourself well to read that whole document.

Instead of repeating it here verbatim, I’ll tell you what our backup script does:

This script manages the archiving of three tarballs:

  • base.tar.bz2, the base database system
  • full-wal.tar.bz2, the whole WAL for the last day.
  • pit-wal.tar.bz2, the point in time portion of the WAL.

The major difference between ‘full-wal’ and ‘pit-wal’ is that at the time thefirst backup is taken (the night of the backup), the data may not be fullycommitted to disk. Therefore, we write as much as we can to the ‘pit-wal’ filefor the purposes of crashes that day. The ‘full-wal’, as you might suspect, isthe fully written representation and is actually written out a day after thebackup occurred.

In a recovery scenario, both of these tarballs would be merged with theexisting WAL files in order of ‘pit-wal’, then ‘full-wal’ would be unpacked.

The WAL directory itself has some data hidden in the filenames, let’s checkthat out:

2011-02-01 09:05 000000030000000300000026
2011-02-01 09:05 000000030000000300000026.000076B8.backup
2011-02-01 10:12 000000030000000300000027
2011-02-01 11:30 000000030000000300000028
2011-02-01 12:57 000000030000000300000029
2011-02-01 14:10 00000003000000030000002A
2011-02-01 14:58 00000003000000030000002B
2011-02-01 15:30 00000003000000030000002C

The filenames themselves hold two important pieces of information:

  • The first 8 characters of the filename are the recovery version. As we’re good little children and test our backups, this is at version 3.
  • The last 8 characters of the filename are ordered, you can see this by comparing the times and the filenames themselves.
  • If there is an extension, that is a demarcation point where pg_start_backup()/pg_stop_backup() was invoked. This is what we use to create the ‘full-wal’ tarball.

As for the backup structure? Well, here’s a sneak peek:

2011-01-28 09:05 2011-01-27.09:00:01/
2011-01-29 09:05 2011-01-28.09:00:01/
2011-01-30 09:05 2011-01-29.09:00:01/
2011-01-31 09:05 2011-01-30.09:00:01/
2011-02-01 09:05 2011-01-31.09:00:01/
2011-02-01 09:07 2011-02-01.09:00:01/

The $today and $yesterday calls just generate these filenames. At the endof the script, we see this idiom:

cd $backup_dir
ls -1d * | sort -rn | tail -n +15 | xargs rm -vr
cd $OLDPWD

Which is a way of saying, “show us the last 15 dirs and delete the rest”. Thiskeeps our filesystem size low and we rsync these files nightly.

The sed usage here is a little tricky but not anything incomprehensible. Basically,

breakpoint=`ls *.backup | sort -r | head -n1 | sed -e 's/\..*$//'

Finds the latest backup file. Now,

arline=`ls | sort | sed -ne "/^$breakpoint$/ =" `
archive=`ls | sort | head -n $arline`

Uses that as a demarcation point to determine the archive files. Those filesare archived and removed and result in full-wal. The rest leftover result inpit-wal.

Happy Hacking!


Let's talk about EBS, baby: Moving to AWS Part 2

2011-01-30 19:00:00 -0500


EBS — or the Amazon Elastic Block Store — is the way you get persistence on most EC2 instances. Let's talk about what EBS is good for, what it's not good for, and why it matters to the EC2 consumer.

About EBS

EBS is basically a volume system with dynamic attachment. You go into the EC2 system, select an EBS "volume", and attach it to an instance. There are additional ways, such as EBS rooting, to use EBS volumes.

EBS is implemented at Amazon via a Storage Area Network (SAN) that is dynamically attached to your instances. Each EBS "volume" you attach is a portion of the disks that make up the SAN; a portion that can and will be allocated sparsely.

This has performance drawbacks. EBS can be very slow and unresponsive at points (there are no availability guarantees on EC2 for any of its products), so it's important that your EBS-related task can handle intermittent outages even if very small. For the most part, things that need read performance or will block on writes will suffer. There are ways you can mitigate it, such as "striping" volumes, but in practice this is very troublesome.

What's the difference between EBS root and the instance store?

EBS rooting is where the device that your root-level filesystem lives on is an EBS volume. This is different because instance stores are ethereal, and will disappear after the machine is stopped. Therefore, it is wise to use EBS rooted machines for machines you want to last.

What's the difference between an EBS volume and a traditional physical disk?

In particular, the major (other than the provisioning, of course) issue is that EBS volumes will not necessarily be available at boot time, so you must mitigate that for any non-root volumes.

How does Zetetic use EBS?

We use it for two roles:

  • Database Servers (with a high ram setting)
  • Support machines (monitoring, repositories, wiki, ticket tracker, etc)

We'll talk about the database management in the next article; our support machines are very simple in execution but require a lot of configuration, so automating them is a bit of a bear.

Why not run everything as EBS root?

You could do that, but EBS is billed on a per-transaction (writes and reads) basis, and then there is the performance issue. EBS-rooted volumes additionally have a reboot penalty which (at the time of writing) is a dollar. That can get expensive quick! It's probably best to stick to EBS rooting where configuration management is hard and leave the rest to instance stores.

Hopefully this article has been a decent overview of EBS; next time we will cover PostgreSQL management in the cloud!


Introducing Codebook 1.5.1

2011-01-26 19:00:00 -0500


On January 22 we released Codebook 1.5.1 to the iTunes App Store. This is the best version of Codebook yet, with many bug fixes, interface enhancements, and a really cool new feature: Sync with Dropbox. Read on to find out what’s new in our secure notebook app.

Sync with Dropbox

Codebook 1.5 has a new Sync tab that provides a means for you to upload and sync your data with an encrypted replica stored in your Dropbox account. Simply enter your Dropbox account information and start a sync, and when it’s all done you’ll see that you’ve got a new folder named “Codebook” in your Dropbox account, containing your encrypted database file.

This resolves a number of long-standing feature requests from our users. No longer do you need to rely on the unreliable iTunes to keep a backup of your most important data. Not only that, but with an encrypted copy up on Dropbox, you have an incredibly easy means of restoring your data if it is somehow lost. Sync often!

But we didn’t just stop there. Thanks to our Ditto replication technology, the Sync feature allows you to share your data across multiple devices. If you use Codebook on your iPhone and an iPad, you can use the sync feature to replicate your data across both of them (or any other iOS devices you’re rocking). Just run Sync and Codebook will take care of the rest.

Note: The Sync feature is only available to users who purchase the Unlimited Upgrade inside Codebook. More detailed information about how Sync works is available on our Codebook FAQ.

Edit Screen Updated

This update to Codebook is all about details. If you remember what the edit screen looked like in Codebook 1.4 or 1.3, you’ll find this screen has seen some serious improvements, mostly to the effect of putting things where they ought to be and making it look and feel better. For instance, the background no longer squishes and resizes when the keyboard is revealed and hidden!

The new toolbar at the bottom of the screen provides a new trash-icon button to delete the current note—that’s an FAQ item we can remove. Power users figured out the swipe-to-delete thing on the list view screen, but it was still a nuisance to have to pop back to the previous screen to delete the current note — you’d have to then locate the note, which was be a real nuisance. And if you weren’t hip to swipe-to-delete, it just looked like you couldn’t delete a note in Codebook.

Also on the toolbar is a relocated forward-icon for that reveals a modal menu of Sharing Options (currently just the ability to e-mail the document somewhere, unencrypted, mind you). This just seems like a much better way of making the e-mail feature available without the user accidentally kicking it off (thereby dumping plain-text data into the Mail app and potentially mailing it by mistake). More handy little buttons will probably appear in this toolbar going forward.

And then there’s the add-icon, the big + button on the right side of the navigation bar. If you need to tap in a few notes, one after the next, now you can! It was getting really annoying to have to go back to the main list view just to create a new note.

Finally, we’ve started using the tear-off-page style animation to transition the view when you hit Add or Delete on the edit screen. Being able to seamlessly transition the views like this (whether you use animation or not) is what allows us to provide these two capabilities on the edit screen without forcing the user to go back to the list view first.

Graphics for Retina Display

We finally got all the icons and background graphics updated to support the hi-res display of the iPhone 4, and it’s such a huge improvement to the app. You’ve got to see it to understand the value here, but I’ll say that if you regularly use Codebook on an iPhone 4 it will be literally relieving to your eyes to interact with this new version. To be quite honest, does anyone like using apps on the iPhone 4 that don’t support Retina Display? I sure don’t.

Rotation and Landscape Orientation

If you prefer to work in landscape mode, you’re really going to love this release of Codebook. All screens (except for the login and dropbox link screens) support landscape orientation. Codebook 1.4.7 supported landscape-orientation for note editing, but it needed some love (like stretching the text view to the expanded view to make more of the current note visible while editing!) Codebook 1.5 brings a ton of tweaks to the rotation behavior, and makes landscape orientation available on the Sync and Settings screens, too.

Bugfixes

Version 1.5.0 had a mean little bug on the edit screen—if you switched to another application while editing, certain things happened to inaccurately re-draw the screen upon returning to Codebook that garbled the text and moved it off screen. No data was actually harmed, but it freaked people out and was obviously insanely annoying. We actually pulled the app from iTunes before more than a handful of people could install that one.

Codebook 1.4.7 suffered from a bug where the list view of notes would suddenly disappear, due to a bug in the code. The data was still there, but the app had to be restarted to work around the problem. The 1.5 series of Codebook fixes this issue.


gem-testers: Great QA Justice in a Gem

2011-01-18 19:00:00 -0500


If you're my age (I had a MC Hammer cassette tape in elementary school) and a rubyist, chances are you've done a dash of Perl in your time. Whatever the reason may be, we are now rubyists, and writing our web applications, database interaction and daemons in Ruby.

When it comes to Perl vs. Ruby, two things immediately stand out heavily weighted as superior in Perl's court:

  • Documentation
  • Quality Assurance

Today, I'm focusing on the latter. In fact, it has been my full-time focus for the last two months outside of work, and a healthy chunk of the last two years.

"Anything your QA infrastructure can do, mine can do better" - anonymous, possibly fictional Perl programmer

And they're right. Compared to the swath of tools available to the eager Perl tester, Ruby's testing facilities look weak in comparison. Granted, compared to the programming spectrum as a whole, Ruby's not so bad. Perl, however, has an amazingly great testing credo in which several things happen even if you don't care:

  1. When you install a module from CPAN, it is tested before it can be installed.
  2. Day in day out, results from testing are reported to a site where they can be reviewed by library authors. This is SERIOUS BUSINESS.
  3. Good Perl programmers simply do not use libraries that aren't tested or documented. This is fact.

Note that I am not saying use Perl; here. Perl is a fine system, but we're attracted to Ruby for a reason, right? Instead, let's make Ruby better. We need testing on a variety of environments to fix bugs in our gems. Gem authors frequently do not have the resources that, say, the whole freakin' internet has.

Gem Testers and it's companion gem, rubygems-test are an attempt to provide these things. The gem testers system is pretty crude at this point, but should be a good start. Here's what it does when you run gem test rubygems-test:

  • Runs 'rake test' for the provided gem.
  • Captures the output of the system and records any exit status.
  • Reports this to the Gem Testers Website
  • Returns a happy little URL you can give to people when you write that angry trouble ticket about their gem.

Additionally, gem users are unfortunately more and more frequently finding themselves confronted with gems that are inadequately suited to their needs. This isn't necessarily because they are bad; maybe they haven't been maintained, or simply don't work on their preferred platform. Gem Testers is not just for gem authors; users can quickly work with the results to adequately vet the stability of a particular gem (or even a particular version). Evaluating libraries can be one of the more important things you do when starting a new application or library feature.

Josiah Kiehl and I have made a herculean effort with the help of people on the rubygems and various ruby platform teams to support a wide range of ruby platforms. All of these rubies run rubygems-test on Windows, Mac, and other POSIX systems:

  • MRI 1.8
  • MRI 1.9
  • Rubinius 1.2
  • JRuby 1.5 and 1.6beta
  • Windows versions include both Luis Lavena's RubyInstaller and the 'garbagecollect' Visual C builds.
You're out of excuses

rubygems-test is easy to integrate, most likely supports your preferred platform, and gives you testing that you simply don't have time or resources to produce on your own. Heck, even gem building systems like Hoe and Ore are adding support for it. Why not take the plunge? I dare you to!

Want to contribute?

The entire project is open source, down to the Capfile. You can help us! rubygems-test and gem-testers have all the fixings for you to help!


Doing it right: Moving to AWS Part 1

2011-01-17 19:00:00 -0500


Greetings folks, I'm Erik Hollensbe, recent Zetetic recruit. One of my first tasks here has been to move Tempo and this site, and our marketing sites for products like Strip and Connect to AWS, specifically EC2.

Why EC2?

I'm going to presume you've heard the standard-fare cloud rhetoric; so I'm going to just skip that part and go straight into the meat:

EC2 has a remarkable commandline-driven API:

Anything you can do in the web interface (and then some) is represented in unix philsophy — "do one thing and do it well" — java or ruby-based utilities. For example we have a script that takes nightly snapshots of our in-use EBS volumes and then prunes everything but the last 5 of these:

You can see here we are able (for the most part) to cleanly parse and manage each individual step of the process from start to finish, having an easily accessible point anywhere in the script to break it, and check whether the moving parts are functioning in the way we suspect.

For example, ec2-describe-volumes yields output like this:


VOLUME vol-5ab82932 15 snap-ad6f2234 us-east-1d in-use 2010-12-14T17:07:21+0000
ATTACHMENT vol-5ab82932 i-f0370bed /dev/sda1 attached 2010-12-14T17:07:24+0000
TAG volume vol-5ab82932 Name somebox-root

(That's one line)

End-over-end enumerating all your volumes, which can be pulled apart with classic unix dissection utilities like sort, awk, and perl. You can see with the information being supplied to the script above, how the hard work is done for us, leaving us to extract the pieces we care about, culminating in a relatively boilerplate-free experience.

There's 101 tools at a quick count of my ec2-api-tools directory — filtering duplicates for Windows and Legacy systems — and that's not even counting the ec2-ami-tools which are used to build custom systems (more on that later).

EC2 was built by network engineers for network engineers

One of the most useful features of EC2 is its networking stack. EC2 machines are behind a NAT (10/8 spanning all DCs) and assigned external IPs dynamically. One can change the external routable IPs using the "Elastic IP" system. These IP addresses can be attached and detached from systems about as liberally as you attach a thumb drive to machines, giving you maximum flexibility. No more waiting for pesky TTLs to expire or having to change 20 /etc/hosts files or scripts.

EC2 also supports "security groups", which are basically network roles. Roles can be used in firewall rules to create logical partitioning between machines in your network.

For example, you can have a set of machines in the 'webserver' class and another set of machines in the 'database' class that only accept connections on port 5432 (the standard postgresql port) from 'webservers'. This means no matter how many 'webservers' you create, or how many times they change IP addresses, your firewall rules still keep out the hax0rs and let webservers talk to your databases. These rules are port/protocol dependent and the "security group" takes the place of CIDR notation for in/out.

Get as detailed or simple as you want

AMIs, EBS and instance stores, what you do with EC2 is up to you. If you want a classic (oy, calling it classic is a bit much, but hey) VPS, you can run with one of the pre-rolled EBS rooted instances. If you want a shiny new cloud system, you can run with instance stores which, when terminated, disappear forever. You can even roll your own kernels and initrd's to customize your system with the AMI toolkit. We will dive deeper into these in upcoming articles.

"Ok Erik, my brain's getting tired. What can I expect for next time?"

This is going to be a minimum 4-part series (including this one) which will at least cover:

  1. EBS management both on and off the root device — here
  2. PostgreSQL on the cloud - management and backup tips — here
  3. Deploying Web Applications for volatile systems

Thanks for reading! I will be happy to field any comments below. -Erik