Hide ‘n Seek: What to do with empty data fields?

We’ve been working on a fundamental website redesign for a hefty biological database.

One design dilemma has been what to do with empty data fields. For example, on a Gene Summary we might have a “Variation” field listing variations found in the gene. Obviously, not all genes have variations.

Displaying field labels with empty contents clearly delineates the limits of our knowledge or curation, but at the same time leads to more visually confusing pages.

Current options we’re considering are:

1. Omit the field entirely.

Known unknowns (apologies to D. Rumsfeld), if you don’t know what you might know, you don’t know how much you do know. Or something like that.

2. Display the field label, but with empty contents.


Variations:

3. Display the field label with a string:


Variations: no data available

This offers the same advantage as above, namely that gaps in our knowledge or curation are clearly indicated. But sparse entries become visually thick very fast.

We’re currently experimenting with other design patterns for handling this situation, too, including using color to de-emphasize empty fields or allowing users to turn off their display as a configuration option.

What do you prefer? Would you rather see all available data fields on a report page even if they’re empty? Or are you a minimalist and prefer that empty field be hidden?

Google Sidewiki for Community Annotation

Might Google Sidewiki be the answer for scientific databases wishing to add community annotation features?

In the past, I’ve presented cautionary, real-world experiments in community annotation of genomic databases. They haven’t worked. Some mistook these tales of woe to mean that I’m against the idea of community based annotation (Combat). On the contrary, I see no other way for curated scientific resources to keep up with the immense flood of data we now face. We must leverage the scientific community to make sense of data stored in federated databases. Besides, I’m a firm believer in community intelligence and the emergent properties of data that may not be readily apparent from the perspective of most bench scientists.

Naturally I was intrigued by the announcement today of Google Sidewiki (announcement | home page). Google Sidewiki is a Google Toolbar extension for Firefox, IE, and soon Google Chrome.

With Google Sidewiki installed, users see a wiki-like page in the sidebar for sites that have it enabled. There, they can leave comments and read those left by others. Google monitors posts, placing those that it deems the most relevant at the top of the sidwewiki.

Sidewiki will never be a full-featured annotation tool. It’s much more of a commenting system. But that simple functionality is an important thing missing from most scientific databases.

So why am I so jazzed about this?

1. Ease of use

Users of biological databases aren’t really inclined to edit data online. They have better things to do, like experiments. Sidewiki keeps the activation energy low, encouraging participation.

2. Limited scope

Hand-in-hand with the ease of use is the limited scope of Sidewiki. It’s simple and lightweight.

3. Ease of implementation

For already overworked developers and managers of scientific databases, there’s nothing to install. What could be easier?

Possible uses

Sidewiki could be extremely useful as a quick way for scientists to communicate with maintainers of the resource. See a problem with an annotation? No problem, just make note of it in the wiki. Curators can monitor domain-specific posts for annotations that need updating. See a bug or have a feature request? Make note of it right in the wiki.

I’ll be posting our our use of it shortly at one of the databases I manage.

You might want to follow Google SideWiki on Twitter to stay abreast.

The $24 Poor Man’s Social Media Expression Pattern Database (PoMaSoMeExpPaDa)

Expression pattern images are some of the most information-rich data housed at model organism databases. They are time consuming to generate. They are time consuming to collect and annotate.

Moreover, copyright restrictions mean that many images remain captive at publisher’s websites, unable to be placed within the rich intellectual framework that exists at sites like WormBase and FlyBase. How many near identical images are stashed away in darkened confocal rooms? How many possibly informative rejects are tossed out due to the puny limitations of publication? Gabijillions?

I wanted to build an easy to use expression pattern image resource that got around these limitations. The system would allow people to add their own photos for display within a broader intellectual context, comment on photos, add tags, search for a variety of criteria, etc. The problem? Developer cycles. This is a lower than low priority project and there aren’t enough hands to go around as it is.

I started wondering if I could leverage a site like Flickr to create a Poor Man’s Expression Pattern Database. Flickr is a key exemplar for Web 2.0 community style features. Tags, contacts, comments, an API.

The images

I took approximately 6000 public, highly curated expression pattern images from WormBase. We display these on Expression Pattern Summary pages.

Uploading images

I wrote a script exploiting Flickr’s REST-like API to programmatically upload images.

For each image, the script added a text description of the expression pattern with hyperlinks back to WormBase genes, anatomy ontology terms, gene ontology terms, strains, transgenes, etc. Images were posted to a dedicated user named, ahem, wormbase.

Tags were added to each image corresponding to the unique gene ID, public gene names, and anatomy ontology terms.

Here’s an example image on Flickr.

Integration with WormBase

I wasn’t happy with the current Perl interfaces to the Flickr API so I wrote my own (Flickr::API::Simple; note that I haven’t released this to CPAN yet and probably never will).

To pull the correct images, tags, and comments from Flickr, individual expression pattern pages levy a query for images at Flickr from the wormbase user with tags corresponding to either the expression pattern ID or the current gene being displayed. Information is displayed inline on the page but served from Flickr.

Posts from the community
WormBase.

If a user has an image that they would like to share on the WormBase site proper, all they need to do is:

* Upload the image to their account
* Post the image to the WormBase group on Flickr
* Tag the image with the unique gene ID

These images will automatically be displayed on WormBase Expression Pattern pages using the exact mechanism as above: Expression Pattern pages search Flickr for images belonging to the WormBase group (instead of user), tagged with the current gene.

Summary

That’s it! A Poor Man’s Expression Pattern database with integration and cross links to a public genomics repository.

We get tagging, searching (clustered tag analysis), social features like commenting and blog integration for (nearly) free. We don’t have to spend six months time in development.

Cost: $24 bucks a year for a Flickr Pro account. This gives 24 GB of storage. Ridiculous. No electricty costs. No sysadmin. No maintenance. $24 dollars or 6 pints.
Time: about 2 hours of programming time to figure out the Flickr REST-like API. About 2 days of running time to upload images (I’m on a slow link).