xinetd is great when it’s working but can be a complete pain to debug when things go wrong. As a start, try launching it in the foreground in debugging mode:
/usr/sbin/xinetd -d -dontfork
GitHub.com’s “Organizations” is a great tool for distributed bioinformatics teams. Here’s how I migrated some of our repositories from Mercurial to Git to take advantage of this feature
After much evangelizing, weeping, and wailing, I finally convinced everyone at one highly geographically and functionally distributed projects that we should at least try consolidating our code in one place.
Currently we have old legacy repositories in CVS, mid-range projects in SVN, new development in Git and Mercurial, and AFAIK a bunch of code in no SCM system at all.
Given that DVCS doesn’t have the directory level granularity of SVN, we definitely don’t want to consolidate everything in a single repository. So far, it seems that GitHub offers the best solution with its “Organizations” feature. This lets a team group multiple repositories under a single umbrella with a shared news feed and administration. Perfect.
hg-git looks like a useful tool if you want to maintain code in both git and mercurial. I don’t. Here’s how I handled a full-scale migration of our repositories:
todd> cd ~/projects todd> git clone http://repo.or.cz/r/fast-export.git todd> mkdir new_git_repository ; cd new_git_repository todd> git init todd> ../fast-export/hg-fast-export.sh -r ~/projects/old_hg_repository todd> git checkout HEAD todd> git remote add origin firstname.lastname@example.org:[organization]/[reponame].git todd> git push origin master
Bing! And you’re done. Or whatever.
Amazon’s Web Services offers enormous potential for people who need to process, store, and share large amounts of data.
And it’s a huge boon for bioinformatics. It’s cost effective and it’s fasta. Hah. Get it? It’s “>fasta”. Archiving and sharing data has never been easier.
Here’s a quick tutorial on creating an Elastic Block Store volume that you can share with your colleagues.
1. Create a volume
2. Attach the volume to an EC2 instance.
From the Volumes window in the Management Console, select the new volume, then right click and Select “Attach”. I attach devices starting at
3. Format the volume.
Once you’ve created and mounted a volume, you’ll need to attach it to an EC2 instance. Fire one up and SSH in.
> sudo mkfs.ext3 /dev/sdf
Mount points are available at /dev/sdf through /dev/sdp.
4. Mount the volume
> sudo mkdir /mnt/data
> sudo mount -t ext3 /dev/sdf /mnt/data
If you are potentially going to be dealing with many versions of data overtime, you might want to version your mount points. This will allow you to attach multiple EBS volumes at different sensible directories:
> sudo mkdir /mnt/data-v0.2
> sudo mount -t ext3 /dev/sdf /mnt/data-v0.2
Alternatively, you might consider handle versioning when creating snapshots of your volume.
5. Set the EBS volume to mount automatically (optional)
> sudo emacs /etc/fstab
/dev/sdh /mnt/data ext3 defaults 0 0
And you’re done! Now what?
Throw some data on there. Do some computes. Go nuts.
Sharing your data is as easy as creating a snapshot.
1. Create a snapshot
Power down your instance. From the Management interface, select the volume and choose “Create Snapshot”.
1. Add informative tags.
Be sure to add informative tags such as the release date and version of the data.
Release Date = 02 Jan 2011
Source = Todd’s Data Emporium
Contact = email@example.com
2. Include informative READMEs on the volume itself.
3. Be sure to make the snapshot public!
Updating your data to the next release of your resource is simple. Mount the original volume to an instance, copy in new data, then create a new snapshot.
We’ve been working on a fundamental website redesign for a hefty biological database.
One design dilemma has been what to do with empty data fields. For example, on a Gene Summary we might have a “Variation” field listing variations found in the gene. Obviously, not all genes have variations.
Displaying field labels with empty contents clearly delineates the limits of our knowledge or curation, but at the same time leads to more visually confusing pages.
Current options we’re considering are:
1. Omit the field entirely.
Known unknowns (apologies to D. Rumsfeld), if you don’t know what you might know, you don’t know how much you do know. Or something like that.
2. Display the field label, but with empty contents.
3. Display the field label with a string:
Variations: no data available
This offers the same advantage as above, namely that gaps in our knowledge or curation are clearly indicated. But sparse entries become visually thick very fast.
We’re currently experimenting with other design patterns for handling this situation, too, including using color to de-emphasize empty fields or allowing users to turn off their display as a configuration option.
What do you prefer? Would you rather see all available data fields on a report page even if they’re empty? Or are you a minimalist and prefer that empty field be hidden?