Todd Harris, PhD

Facilitating scientific discovery at the intersection of genetics, genomics, bioinformatics, big data, cloud computing, and open science.

  • About

Happy belated birthday, Mendel!

July 21, 2011 By Todd Harris Leave a Comment

Gregor Mendel: Geneticist, Rastafarian Luminary.

Photoshop art from my old grad school days.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: Uncategorized Tagged With: genetics, Mendel, photoshop

Debugging xinetd configuration problems

June 19, 2011 By Todd Harris 1 Comment

xinetd is great when it’s working but can be a complete pain to debug when things go wrong. As a start, try launching it in the foreground in debugging mode:

   /usr/sbin/xinetd -d -dontfork

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: cloud, development Tagged With: AMI, xinetd

GitHub’s “Organizations” for distributed #bioinformatics dev; migrating from Mercurial

February 12, 2011 By Todd Harris Leave a Comment

GitHub.com’s “Organizations” is a great tool for distributed bioinformatics teams. Here’s how I migrated some of our repositories from Mercurial to Git to take advantage of this feature

After much evangelizing, weeping, and wailing, I finally convinced everyone at one highly geographically and functionally distributed projects that we should at least try consolidating our code in one place.

Currently we have old legacy repositories in CVS, mid-range projects in SVN, new development in Git and Mercurial, and AFAIK a bunch of code in no SCM system at all.

Given that DVCS doesn’t have the directory level granularity of SVN, we definitely don’t want to consolidate everything in a single repository. So far, it seems that GitHub offers the best solution with its “Organizations” feature. This lets a team group multiple repositories under a single umbrella with a shared news feed and administration. Perfect.

hg-git looks like a useful tool if you want to maintain code in both git and mercurial. I don’t. Here’s how I handled a full-scale migration of our repositories:

todd> cd ~/projects
todd> git clone http://repo.or.cz/r/fast-export.git
todd> mkdir new_git_repository ; cd new_git_repository
todd> git init
todd> ../fast-export/hg-fast-export.sh -r ~/projects/old_hg_repository
todd> git checkout HEAD
todd> git remote add origin git@github.com:[organization]/[reponame].git
todd> git push origin master

Bing! And you’re done. Or whatever.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: development Tagged With: DVCS, git, mercurial, project management

Amazon Elastic Block Store for facile sharing and archiving of biological data

February 10, 2011 By Todd Harris Leave a Comment

Amazon’s Web Services offers enormous potential for people who need to process, store, and share large amounts of data.

And it’s a huge boon for bioinformatics. It’s cost effective and it’s fasta. Hah. Get it? It’s “>fasta”. Archiving and sharing data has never been easier.

Here’s a quick tutorial on creating an Elastic Block Store volume that you can share with your colleagues.

1. Create a volume

  • From the AWS Management Console, click on the EC2 tab, then on “Elastic Block Store > Volumes”
  • Click on “Create Volume”.
  • Pick an appropriate size for your volume. For EBS volumes that I am going to use to store and archive data, I create a volume 1.5 times the size of the data. This lets me store an unpacked version and a packed version simultaneously, making it easy to update data at a later date.
  • Add some informative tags.

2. Attach the volume to an EC2 instance.

From the Volumes window in the Management Console, select the new volume, then right click and Select “Attach”. I attach devices starting at

3. Format the volume.

Once you’ve created and mounted a volume, you’ll need to attach it to an EC2 instance. Fire one up and SSH in.

ssh -i @yourdns.amazonaws.com
> sudo mkfs.ext3 /dev/sdf

Mount points are available at /dev/sdf through /dev/sdp.

4. Mount the volume

> sudo mkdir /mnt/data
> sudo mount -t ext3 /dev/sdf /mnt/data

If you are potentially going to be dealing with many versions of data overtime, you might want to version your mount points. This will allow you to attach multiple EBS volumes at different sensible directories:

> sudo mkdir /mnt/data-v0.2
> sudo mount -t ext3 /dev/sdf /mnt/data-v0.2

Alternatively, you might consider handle versioning when creating snapshots of your volume.

5. Set the EBS volume to mount automatically (optional)

> sudo emacs /etc/fstab
/dev/sdh /mnt/data ext3 defaults 0 0

And you’re done! Now what?

Throw some data on there. Do some computes. Go nuts.

Share your data

Sharing your data is as easy as creating a snapshot.

1. Create a snapshot

Power down your instance. From the Management interface, select the volume and choose “Create Snapshot”.

Tips for effective data archiving and sharing

1. Add informative tags.

Be sure to add informative tags such as the release date and version of the data.

Release Date = 02 Jan 2011
Source = Todd’s Data Emporium
Contact = data@tharris.org

2. Include informative READMEs on the volume itself.

3. Be sure to make the snapshot public!

Updating your data

Updating your data to the next release of your resource is simple. Mount the original volume to an instance, copy in new data, then create a new snapshot.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: bioinformatics, development Tagged With: cloud, data archiving, EBS, EC2

  • « Previous Page
  • 1
  • …
  • 3
  • 4
  • 5
  • 6
  • 7
  • …
  • 25
  • Next Page »

Welcome!
My name is Todd Harris. A geneticist by training, I now work at the intersection of biology and computer science developing tools and systems to organize, visualize, and query large-scale genomic data across a variety of organisms.

I'm driven by the desire to accelerate the pace of scientific discovery and to improve the transparency and reproducibility of the scientific process.

Stay in touch!

Enter your address to receive notifications of new posts by email.

Join 1,296 other subscribers

Copyright © 2023 · Genesis Sample Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...