Todd Harris, PhD

Facilitating scientific discovery at the intersection of genetics, genomics, bioinformatics, big data, cloud computing, and open science.

  • About

Targeted gene deletions in C. elegans using transposon excision

April 28, 2010 By Todd Harris 1 Comment

“Targeted gene deletions in C. elegans using transposon excision” is now available in advance online publication form at Nature Methods.

Even after 40 years of intense genetics in the model system C. elegans, a large majority of genes have not yet been disabled by deletion. Although targeted deletions have been possible in flies and mice for years, the technology has been elusive in worms.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: bioinformatics Tagged With: C. elegans, deletions, genetics, publications

A Worthy Mercurial (hg) Tutorial from Joel Spolsky

April 4, 2010 By Todd Harris 1 Comment

At WormBase, we’ve been busy re-writing the website from the ground up to build a modern information discovery space that will generically handle genomic data.

As the project manager, I made the executive decision to switch from CVS/SVN to a distributed version control system (DVCS). I’d used both git and mercurial personally for over a year and enjoyed their flexibility.

And given the already distributed nature of our project, DVCS was a natural fit. (In fact, I believe that DVCS should be roundly adopted across the genomics/bioinformatics research sector precisely for this reason).

Nonetheless, for small teams accustomed to the quirks of SVN, the transition to DVCS can be a rocky road. Recently I came across Joel Spolsky’s excellent HG Init: Mercurial Tutorial.

If you’re considering or in the process of switching to Mercurial, I highly recommend checking out Joel’s tutorial and circulating it to your team.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: bioinformatics, development, howto, project management Tagged With: DVCS, mercurial, project management, tutorial

The (im)permanance of online biological resources

October 12, 2009 By Todd Harris Leave a Comment

As the number of online biological databases explodes, how will we ensure their availability over time?

I received a help desk request this morning from a user looking for an old bioinformatics project. The user had stumbled across a broken link for the project’s supporting website listed in the publication just a few years ago.

This project is now completed, no longer funded and no longer staffed. It may be over but expectations that the resources it created should continue to exist live on — in links from the original publication, in citations, and in search engines.

This particular project happens to be hosted on a heavily taxed machine that I administer. This machine itself is nearly obsolete, out of warranty and chugging along on its last legs. And as an essentially redundant production node in a cluster, it isn’t backed up.

Now, I know in part I should be responsible for anything that’s on a machine under my purview. But I’m not a system administrator by interest, job description, or training. It’s just one of the hats I necessarily wear.

But software and operating systems evolve, security updates are released and legacy software breaks. Without dedicated maintainers who know precisely what they are to maintain and how, legacy resources will quickly become obsolete. Ironically, perhaps only the printed record will testify to the fact that these online services once existed.

This conundrum raises a number of interesting questions.

What responsibility do we have to ensure that online resources generated (directly or indirectly) as part of publically funded projects remain available after their funding has run dry?

If we have a duty to ensure that these resources remain available — and I believe that we do — what is the easiest way to do it?

Should there be conditions on grant funds that code be documented, hardware requirements stated, and maintenance details described? Like reagent sharing requirements, perhaps there should be a burden-of-proof of resource longevity provided as a condition of publication?

Should there be a final review from funding agencies at the time of project completion to ensure that suitable plans exist for maintenance of the resource?

Minimally, when a project winds down, there should be a final document drafted describing in detail the maintenance of the resource. It should include a simple manifest describing the data, the website, software version dependencies and hardware requirements. And an accompanying tarball when feasible for facile restoration would be appreciated, too.

Virtualization is also an attractive approach. But virtualization — like the transfer of data from one storage medium to its successor to prevent obsolesence — has maintenance overhead to be factored in.

But after 10+ years of the development and disappearance of online biological resources, perhaps we need to consider a consolidated public repository, one established under the auspices of Ensembl, NCBI, DDBJ, or NHGRI — to host and maintain orphaned projects.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: bioinformatics, science policy Tagged With: archives, online databases, outreach, science policy

  • « Previous Page
  • 1
  • 2
  • 3
  • 4

Welcome!
My name is Todd Harris. A geneticist by training, I now work at the intersection of biology and computer science developing tools and systems to organize, visualize, and query large-scale genomic data across a variety of organisms.

I'm driven by the desire to accelerate the pace of scientific discovery and to improve the transparency and reproducibility of the scientific process.

Stay in touch!

Enter your address to receive notifications of new posts by email.

Join 1,296 other subscribers

Copyright © 2023 · Genesis Sample Theme on Genesis Framework · WordPress · Log in