Doing it right: Moving to AWS Part 1

2011-01-17 19:00:00 -0500


Greetings folks, I'm Erik Hollensbe, recent Zetetic recruit. One of my first tasks here has been to move Tempo and this site, and our marketing sites for products like Strip and Connect to AWS, specifically EC2.

Why EC2?

I'm going to presume you've heard the standard-fare cloud rhetoric; so I'm going to just skip that part and go straight into the meat:

EC2 has a remarkable commandline-driven API:

Anything you can do in the web interface (and then some) is represented in unix philsophy — "do one thing and do it well" — java or ruby-based utilities. For example we have a script that takes nightly snapshots of our in-use EBS volumes and then prunes everything but the last 5 of these:

You can see here we are able (for the most part) to cleanly parse and manage each individual step of the process from start to finish, having an easily accessible point anywhere in the script to break it, and check whether the moving parts are functioning in the way we suspect.

For example, ec2-describe-volumes yields output like this:


VOLUME vol-5ab82932 15 snap-ad6f2234 us-east-1d in-use 2010-12-14T17:07:21+0000
ATTACHMENT vol-5ab82932 i-f0370bed /dev/sda1 attached 2010-12-14T17:07:24+0000
TAG volume vol-5ab82932 Name somebox-root

(That's one line)

End-over-end enumerating all your volumes, which can be pulled apart with classic unix dissection utilities like sort, awk, and perl. You can see with the information being supplied to the script above, how the hard work is done for us, leaving us to extract the pieces we care about, culminating in a relatively boilerplate-free experience.

There's 101 tools at a quick count of my ec2-api-tools directory — filtering duplicates for Windows and Legacy systems — and that's not even counting the ec2-ami-tools which are used to build custom systems (more on that later).

EC2 was built by network engineers for network engineers

One of the most useful features of EC2 is its networking stack. EC2 machines are behind a NAT (10/8 spanning all DCs) and assigned external IPs dynamically. One can change the external routable IPs using the "Elastic IP" system. These IP addresses can be attached and detached from systems about as liberally as you attach a thumb drive to machines, giving you maximum flexibility. No more waiting for pesky TTLs to expire or having to change 20 /etc/hosts files or scripts.

EC2 also supports "security groups", which are basically network roles. Roles can be used in firewall rules to create logical partitioning between machines in your network.

For example, you can have a set of machines in the 'webserver' class and another set of machines in the 'database' class that only accept connections on port 5432 (the standard postgresql port) from 'webservers'. This means no matter how many 'webservers' you create, or how many times they change IP addresses, your firewall rules still keep out the hax0rs and let webservers talk to your databases. These rules are port/protocol dependent and the "security group" takes the place of CIDR notation for in/out.

Get as detailed or simple as you want

AMIs, EBS and instance stores, what you do with EC2 is up to you. If you want a classic (oy, calling it classic is a bit much, but hey) VPS, you can run with one of the pre-rolled EBS rooted instances. If you want a shiny new cloud system, you can run with instance stores which, when terminated, disappear forever. You can even roll your own kernels and initrd's to customize your system with the AMI toolkit. We will dive deeper into these in upcoming articles.

"Ok Erik, my brain's getting tired. What can I expect for next time?"

This is going to be a minimum 4-part series (including this one) which will at least cover:

  1. EBS management both on and off the root device — here
  2. PostgreSQL on the cloud - management and backup tips — here
  3. Deploying Web Applications for volatile systems

Thanks for reading! I will be happy to field any comments below. -Erik