Amazon Elastic Block Store for facile sharing and archiving of biological data

Amazon’s Web Services offers enormous potential for people who need to process, store, and share large amounts of data.

And it’s a huge boon for bioinformatics. It’s cost effective and it’s fasta. Hah. Get it? It’s “>fasta”. Archiving and sharing data has never been easier.

Here’s a quick tutorial on creating an Elastic Block Store volume that you can share with your colleagues.

1. Create a volume

  • From the AWS Management Console, click on the EC2 tab, then on “Elastic Block Store > Volumes”
  • Click on “Create Volume”.
  • Pick an appropriate size for your volume. For EBS volumes that I am going to use to store and archive data, I create a volume 1.5 times the size of the data. This lets me store an unpacked version and a packed version simultaneously, making it easy to update data at a later date.
  • Add some informative tags.

2. Attach the volume to an EC2 instance.

From the Volumes window in the Management Console, select the new volume, then right click and Select “Attach”. I attach devices starting at

3. Format the volume.

Once you’ve created and mounted a volume, you’ll need to attach it to an EC2 instance. Fire one up and SSH in.

ssh -i
> sudo mkfs.ext3 /dev/sdf

Mount points are available at /dev/sdf through /dev/sdp.

4. Mount the volume

> sudo mkdir /mnt/data
> sudo mount -t ext3 /dev/sdf /mnt/data

If you are potentially going to be dealing with many versions of data overtime, you might want to version your mount points. This will allow you to attach multiple EBS volumes at different sensible directories:

> sudo mkdir /mnt/data-v0.2
> sudo mount -t ext3 /dev/sdf /mnt/data-v0.2

Alternatively, you might consider handle versioning when creating snapshots of your volume.

5. Set the EBS volume to mount automatically (optional)

> sudo emacs /etc/fstab
/dev/sdh /mnt/data ext3 defaults 0 0

And you’re done! Now what?

Throw some data on there. Do some computes. Go nuts.

Share your data

Sharing your data is as easy as creating a snapshot.

1. Create a snapshot

Power down your instance. From the Management interface, select the volume and choose “Create Snapshot”.

Tips for effective data archiving and sharing

1. Add informative tags.

Be sure to add informative tags such as the release date and version of the data.

Release Date = 02 Jan 2011
Source = Todd’s Data Emporium
Contact =

2. Include informative READMEs on the volume itself.

3. Be sure to make the snapshot public!

Updating your data

Updating your data to the next release of your resource is simple. Mount the original volume to an instance, copy in new data, then create a new snapshot.

Hide ‘n Seek: What to do with empty data fields?

We’ve been working on a fundamental website redesign for a hefty biological database.

One design dilemma has been what to do with empty data fields. For example, on a Gene Summary we might have a “Variation” field listing variations found in the gene. Obviously, not all genes have variations.

Displaying field labels with empty contents clearly delineates the limits of our knowledge or curation, but at the same time leads to more visually confusing pages.

Current options we’re considering are:

1. Omit the field entirely.

Known unknowns (apologies to D. Rumsfeld), if you don’t know what you might know, you don’t know how much you do know. Or something like that.

2. Display the field label, but with empty contents.


3. Display the field label with a string:

Variations: no data available

This offers the same advantage as above, namely that gaps in our knowledge or curation are clearly indicated. But sparse entries become visually thick very fast.

We’re currently experimenting with other design patterns for handling this situation, too, including using color to de-emphasize empty fields or allowing users to turn off their display as a configuration option.

What do you prefer? Would you rather see all available data fields on a report page even if they’re empty? Or are you a minimalist and prefer that empty field be hidden?

A Worthy Mercurial (hg) Tutorial from Joel Spolsky

At WormBase, we’ve been busy re-writing the website from the ground up to build a modern information discovery space that will generically handle genomic data.

As the project manager, I made the executive decision to switch from CVS/SVN to a distributed version control system (DVCS). I’d used both git and mercurial personally for over a year and enjoyed their flexibility.

And given the already distributed nature of our project, DVCS was a natural fit. (In fact, I believe that DVCS should be roundly adopted across the genomics/bioinformatics research sector precisely for this reason).

Nonetheless, for small teams accustomed to the quirks of SVN, the transition to DVCS can be a rocky road. Recently I came across Joel Spolsky’s excellent HG Init: Mercurial Tutorial.

If you’re considering or in the process of switching to Mercurial, I highly recommend checking out Joel’s tutorial and circulating it to your team.

Managing multiple Perl module directories

If you develop in Perl or act as a system administrator, you have undoubtedly come up against the hassle of managing local collections of Perl modules.

I’ve tried everything in the past. I’ve built modules by hand specifying Makefile.PL prefix paths. I’ve flattened architecture specific directories. I’ve lived through the introduction of Module::Build and the inconsistencies between it and EUMM. I’ve built bundles, packages, even virtual machines. I’ve scripted in the shell and with CPAN/CPANplus.

Still, maintaining distinct directories of Perl modules for multiple current applications was a pain. Until now.

local::lib gets around the tedium of maintaining local Perl libraries. It modifies environment variables for you so you don’t have to screw with -I, INSTALL_BASE, –install_base, or PREFIX. Best of all, you can continue to use CPAN, too!

Here’s how easy it is:

 # install local::lib globally (assuming you have sudo/root)
 $ sudo perl -MCPAN -e 'CPAN::install(local::lib)
 # Set your local library dir
 $ cd ~/my_project/extlib
 # Set this as your local lib dir
 $ perl -Mlocal::lib=./

 # Update your environment for the current shell
 $ eval $(perl -Mlocal::lib=--self-contained,./)

 # Install a module
 $ perl -MCPAN -e 'CPAN::install(GD::SVG)'

A thing of beauty, really.