I think Amazon S3 is awesome. I was looking into building a RAID NAS (Network-Attached Storage) for backing up all my important data and I nearly bought a setup that would run into the hundreds of dollars - but then I did a little fancy multiplication and addition and realized S3 would cost me less than one one-hundredth what the NAT would have cost.
In case you’re as in the dark about S3 as I recently was, here’s a little rundown: it’s a very simple, super fast, extremely large backup system that Amazon uses for all of it’s own storage needs. It’s opened the service up to the public on a pay-as-you-go basis.
Costs:
- $0.15 per GB-Month of storage used
- $0.20 per GB of data transferred
In other words, it’s damn cheap. Say you want to upload 500 MS Word documents that are around 45KB each? How much would it cost to store those on some easy-to-access, highly secure, permanent backup place? Less than one cent per month. Eight years later you’d only be out a half-dollar ($0.32 to be precise).
So S3 is my new storage/backup location of choice. The one difficulty of using it is that I need to figure out some way of automating the backups so that the backups are actually useful and can easily be recovered if necessary. In particular, I need some way to automatically backup the website data that is so crucial to me.
So I made a plugin.
The S3 plugin will allow you to backup your crucial website data to S3 via a handy Rake task (written by the talented Adam Greene).
Amazon has been an excellent supporter of Ruby/Rails lately (they fund 43things.com among other things) and they’ve made sure to release a ruby library for S3. I’ve combined that with Adam’s S3 rake task into a handy S3 backup plugin.
You can install it via the following two commands:
1 2 |
ruby script/plugin source http://svn.6brand.com/projects/plugins ruby script/plugin install -x s3 |
Then backing up is easy as:
1 2 3 |
rake s3:backup:db rake s3:backup:code rake s3:backup:scm |
or, to get them all together:
1 |
rake s3:backup |
13 responses so far ↓
1 adam greene // Jan 08, 2007 at 01:04 PM
wow! I was just starting to do the same thing. But this looks great.
The next version I was working on uses the AWS::S3 gem, as it significantly cuts down on the amount of code. Plus I need to integrate pushing my static files to S3, so I was going to add that in as well. I'm wondering if we should combine 'forces' ;)
thanks for this, Adam
2 Danger // Jan 10, 2007 at 02:16 PM
3 Aryk // Jan 31, 2007 at 06:29 PM
4 Danger // Jan 31, 2007 at 09:17 PM
5 Aryk // Feb 01, 2007 at 03:27 PM
6 Aryk // Feb 01, 2007 at 03:30 PM
7 Aryk // Feb 01, 2007 at 03:30 PM
8 Aryk // Feb 01, 2007 at 03:31 PM
9 Danger // Feb 01, 2007 at 07:10 PM
10 Ben // Nov 05, 2007 at 02:49 AM
Nice work! After sorting out a little trouble with server times, it worked a treat.
I’m in the process of adding tasks to handle backup and retrieve of any assets in the ‘shared’ directory (as used by the standard Capistrano deploys).
It’d be good to get a bit more automation on the retrieve stage as well.
I’ll forward you the code when I’ve finished.
11 Ben // Nov 05, 2007 at 05:03 AM
Hm. There’s a problem. The tar.gz file for code is being created properly and seems to be being stored effectively, however the retrieve task isn’t finding it.
It’s probably looking in the wrong bucket. I’ll see if I can fix it and upload the changes.
12 Ben // Nov 05, 2007 at 05:06 AM
By the way, the retrieve fails silently, leaving an XML error in place of the tar.gz file.
If you’re using this, check that retrieve works properly. Currently there’s no automation for this task, so you’d have to untar and unzip the file manually in any case.
13 Ben // Nov 05, 2007 at 06:12 AM
OK, as I suspected, the retrieve_file function used a hard-coded ‘db’ instead of the passed-in name.
The fix is to change a line in the ‘retrieve_file’ function:
data = conn.get(bucket_name(name), entry_key).object.data
(name was ‘db’ in the original)
I’ve changed the tasks to add a prefix (specified in the s3.yml file) to the bucket names to help avoid conflicts. Seems to be working so far… I’ll put the code up on the web somewhere once I’ve integrated it fully and run some more tests.
Next on the agenda: automated retrieve. I’m thinking about the best way to do this – anything to shave vital seconds off a retrieve in a catastrophic failure situation is a bonus. If you have any ideas, let me know. A DB restore would be good, and a more general rake task that can recreate the whole site, put files in the right places, re-point ‘current’ to the new code directory and run a cap deploy:restart…
I have a bit of a problem with the scm functions. They seem to require that the live directory has been directly deployed with a Subversion check-out. Unless you take care to disallow web access to files in the .svn directories, this is probably bad practice as it could potentially make your repository insecure.
If you use an svn export to deploy cleanly (or the default Capistrano method which leaves no .svn directories) the backup tasks can’t find out the SVN login information.
I’ll put in more config options to allow the repository and user details to be specified directly in due course.
Leave a Comment