Todd Harris, PhD

Facilitating scientific discovery at the intersection of genetics, genomics, bioinformatics, big data, cloud computing, and open science.

  • About

The $24 Poor Man’s Social Media Expression Pattern Database (PoMaSoMeExpPaDa)

September 17, 2008 By Todd Harris 2 Comments

Expression pattern images are some of the most information-rich data housed at model organism databases. They are time consuming to generate. They are time consuming to collect and annotate.

Moreover, copyright restrictions mean that many images remain captive at publisher’s websites, unable to be placed within the rich intellectual framework that exists at sites like WormBase and FlyBase. How many near identical images are stashed away in darkened confocal rooms? How many possibly informative rejects are tossed out due to the puny limitations of publication? Gabijillions?

I wanted to build an easy to use expression pattern image resource that got around these limitations. The system would allow people to add their own photos for display within a broader intellectual context, comment on photos, add tags, search for a variety of criteria, etc. The problem? Developer cycles. This is a lower than low priority project and there aren’t enough hands to go around as it is.

I started wondering if I could leverage a site like Flickr to create a Poor Man’s Expression Pattern Database. Flickr is a key exemplar for Web 2.0 community style features. Tags, contacts, comments, an API.

The images

I took approximately 6000 public, highly curated expression pattern images from WormBase. We display these on Expression Pattern Summary pages.

Uploading images

I wrote a script exploiting Flickr’s REST-like API to programmatically upload images.

For each image, the script added a text description of the expression pattern with hyperlinks back to WormBase genes, anatomy ontology terms, gene ontology terms, strains, transgenes, etc. Images were posted to a dedicated user named, ahem, wormbase.

Tags were added to each image corresponding to the unique gene ID, public gene names, and anatomy ontology terms.

Here’s an example image on Flickr.

Integration with WormBase

I wasn’t happy with the current Perl interfaces to the Flickr API so I wrote my own (Flickr::API::Simple; note that I haven’t released this to CPAN yet and probably never will).

To pull the correct images, tags, and comments from Flickr, individual expression pattern pages levy a query for images at Flickr from the wormbase user with tags corresponding to either the expression pattern ID or the current gene being displayed. Information is displayed inline on the page but served from Flickr.

Posts from the community
WormBase.

If a user has an image that they would like to share on the WormBase site proper, all they need to do is:

* Upload the image to their account
* Post the image to the WormBase group on Flickr
* Tag the image with the unique gene ID

These images will automatically be displayed on WormBase Expression Pattern pages using the exact mechanism as above: Expression Pattern pages search Flickr for images belonging to the WormBase group (instead of user), tagged with the current gene.

Summary

That’s it! A Poor Man’s Expression Pattern database with integration and cross links to a public genomics repository.

We get tagging, searching (clustered tag analysis), social features like commenting and blog integration for (nearly) free. We don’t have to spend six months time in development.

Cost: $24 bucks a year for a Flickr Pro account. This gives 24 GB of storage. Ridiculous. No electricty costs. No sysadmin. No maintenance. $24 dollars or 6 pints.
Time: about 2 hours of programming time to figure out the Flickr REST-like API. About 2 days of running time to upload images (I’m on a slow link).

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: development, user interface & design Tagged With: expression pattern, flickr, ideas, proof of concept, social media, WormBase

Implementing a simple web-log based recommender system

July 25, 2008 By Todd Harris Leave a Comment

I’ve now implemented such a system as an extension to Catalyst, the open source Perl web framework. The system isn’t yet ready for general distribution, but I’d like to share my approach.

First, I’ve gathered ten years of web access logs from WormBase, a generic model organism database where I work as the project manager.

Next, I correlated IP addresses with requests and tried to trace browsing patterns from one object to the next. This isn’t an exact science since we haven’t historically tried to uniquely identify users.

Data is loaded into a simple MySQL schema with object and object2related tables. Expediently simple.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: development Tagged With: catalyst, implementation, model organism database, Perl, recommender, recommender system, social media, UI

Recommender systems for biological databases

July 25, 2008 By Todd Harris Leave a Comment

Recommender systems [Wikipedia] seek to provide users with information related to what they are currently browsing. These are now ubiquitous in e-commerce sites such as Amazon, where each page contains a list of items viewed or purchased by other users.

I’ve long felt that a recommender system could revolutionize the browsing and mining of biological data. The idea would be to provide users with a list of related objects based on browsing history of cadres. See this post for some preliminary implementation notes.

I am worried that a recommender system won’t be received with open arms. Given that my current implementation is based on web log analysis it presents serious privacy issues. It’s conceivable that it’s use could inadvertently reveal the identity of uncloned and unpublished loci.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: development Tagged With: model organism database, recommender, recommender system, social media, UI, WormBase

Social media and career advancement

June 1, 2008 By Todd Harris Leave a Comment

Burgeoning scientists take note: Mitch Joel at Twist Image / Six Pixels of Separation has a great post on using social media for career development.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: careers Tagged With: career development, job seeking, social media

Welcome!
My name is Todd Harris. A geneticist by training, I now work at the intersection of biology and computer science developing tools and systems to organize, visualize, and query large-scale genomic data across a variety of organisms.

I'm driven by the desire to accelerate the pace of scientific discovery and to improve the transparency and reproducibility of the scientific process.

Stay in touch!

Enter your address to receive notifications of new posts by email.

Join 1,296 other subscribers

Copyright © 2023 · Genesis Sample Theme on Genesis Framework · WordPress · Log in