<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Todd Harris</title>
	<atom:link href="http://toddharris.net/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://toddharris.net/blog</link>
	<description>Discussing Genomics, Bioinformatics, Social Media , Science Policy, and Outreach.</description>
	<lastBuildDate>Wed, 26 Oct 2011 19:46:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Ascaris suum draft genome published</title>
		<link>http://toddharris.net/blog/2011/10/26/ascaris-suum-draft-genome-published/</link>
		<comments>http://toddharris.net/blog/2011/10/26/ascaris-suum-draft-genome-published/#comments</comments>
		<pubDate>Wed, 26 Oct 2011 19:41:20 +0000</pubDate>
		<dc:creator>tharris</dc:creator>
				<category><![CDATA[this week in genomics]]></category>
		<category><![CDATA[ascaris]]></category>
		<category><![CDATA[hemlinths]]></category>
		<category><![CDATA[parasites]]></category>

		<guid isPermaLink="false">http://toddharris.net/blog/?p=486</guid>
		<description><![CDATA[A draft assembly of the 273 MB Ascaris suum genome has been published in Nature. A. suum is a model for human ascaris infection via the common round worm.]]></description>
			<content:encoded><![CDATA[<p></p><p>A draft assembly of the 273 MB <em>Ascaris suum</em> genome has been <a href="http://www.nature.com/nature/journal/vaop/ncurrent/full/nature10553.html">published in Nature</a>. <em>A. suum</em> is a model for human ascaris infection via the common round worm.</p>
]]></content:encoded>
			<wfw:commentRss>http://toddharris.net/blog/2011/10/26/ascaris-suum-draft-genome-published/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Running the Generic Genome Browser under PSGI/Plack</title>
		<link>http://toddharris.net/blog/2011/09/11/running-the-generic-genome-browser-under-psgiplack/</link>
		<comments>http://toddharris.net/blog/2011/09/11/running-the-generic-genome-browser-under-psgiplack/#comments</comments>
		<pubDate>Sun, 11 Sep 2011 19:17:09 +0000</pubDate>
		<dc:creator>tharris</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[gbrowse]]></category>
		<category><![CDATA[genome browsers]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[PSGI/Plack]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://toddharris.net/blog/?p=451</guid>
		<description><![CDATA[Here&#8217;s a simple approach for installing and running a local instance of GBrowse, leveraging the PSGI/Plack webserver web application stack. You don&#8217;t need root access, you don&#8217;t need Apache, and you don&#8217;t need to request any firewall exceptions (for now). Background Both the current implementation and installer of GBrowse are loosely tied to Apache. By [...]]]></description>
			<content:encoded><![CDATA[<p></p><p><em>Here&#8217;s a simple approach for installing and running a local instance of <a href="http://http://gmod.org/wiki/GBrowse">GBrowse</a>, leveraging the <a href="http://plackperl.org/">PSGI/Plack</a> webserver <-> web application stack.  You don&#8217;t need root access, you don&#8217;t need Apache, and you don&#8217;t need to request any firewall exceptions (for now).</em></p>
<h3>Background</h3>
<p>
Both the current implementation and installer of <a href="http://http://gmod.org/wiki/GBrowse">GBrowse</a> are loosely tied to Apache. By loosely, I mean that the installer generates suitable configuration and assumes installation paths as	if the instance will be run under Apache. The implementation is tightly tied to the CGI specification; it&#8217;s a suite of CGI scripts.  Although GBrowse will rununder any webserver that implements the CGI specification (are there any that DON&#8217;T?), this approach increases the administrative effort required for running a local instance, increases the complexity of configuration, makes it more difficult to run GBrowse under other environments, and makes it impossible to leverage powerful advances in Perl web application development.
</p>
<p>
Enter PSGI (the Perl Web Server Gateway Interface), a specification for glueing Perl applications to webservers. Plack is a reference implementation of this specification.  PSGI as implemented by Plack makes it simple to run Perl-based applications (even CGI-based ones like GBrowse) in a variety of environments.
</p>
<p>
In other words, PSGI abstracts the request/response cycle so that you can focus on your application.  Running your application under CGI, Fast CGI, or mod_perl is just a matter of changing the application handler.  The core Plack distribution provides a number of handlers out of the box (CGI, FCGI, mod_perl, for example) and even includes a light-weight webserver (HTTP::Server::PSGI) which is perfect for development.  Other webservers also implement the PSGI specification, including the high-performance preforking server Starman.
</p>
<p>
You can also do cool things via middleware handlers like <a href="http://search.cpan.org/dist/Plack/lib/Plack/App/URLMap.pm">mapping multiple applications to different URLs</a> with ease (how about running the last 10 versions of GBrowse all without touching Apache config or dealing with library conflicts), handle tasks like serving static files, mangling requests and responses, etc.
</p>
<h3>What this isn&#8217;t (yet)</h3>
<p>
This isn&#8217;t a rewrite of GBrowse using PSGI. It&#8217;s just some modifications to the current GBrowse to make it possible to wrap the CGI components so that they can be used via servers that implement the PSGI specification. There is a project to	rewrite GBrowse as a pure PSGI app. Stay tuned for details.
</p>
<h3>Conventions</h3>
<ol>
<li>Installation root.</li>
<p>   Our working installation root is configured via the environment variable GBROWSE_ROOT.</p>
<li>No root privileges required.</li>
<p>You do not need to be root. Ever. In fact, one of the great advantages of this approach is the ease with which you can install a local instance.</p>
<li>Self-contained, versioned installation paths.</li>
<p>This tutorial installs everything under a single directory for simplified management and configuration.  This path corresponds to the version of GBrowse being installed.</p>
<p>The current version of GBrowse is specified by environment variable (GBROWSE_VERSION).  If you want to use the same installation path from release to release, you can also create and adjust symlinks as necessary (~/gbrowse/current -> ~/gbrowse/gbrowse-2.40, for example, and set GBROWSE_VERSION=current). This isn&#8217;t necessarily required but means that you won&#8217;t need to set GBROWSE_VERSION every time you update to a new version of GBrowse.  At any rate, maintaining installations by version is a Good Practice and makes it easy to revert to older versions should the need arise.</p>
<li>Each installation has it&#8217;s own set of local libraries.</li>
<p>In keeping with the self-contained non-privileged design gestalt, we&#8217;ll install all required libraries to a local path tied to the installed version of GBrowse ($GBROWSE_ROOT/$GBROWSE_VERSION/extlib).  This makes it dead simple to run many possibly conflicting variants of GBrowse all with their own dedicated suite of libraries. Awesome.
</ol>
<h3>Installation</h3>
<ol>
<li>Set up your environment.</li>
<pre>
  // Set an environment variables for the your installation root and the version of GBrowse you are installing.
  > export GBROWSE_ROOT=~/gbrowse
  > export GBROWSE_VERSION=2.40
</pre>
<li>Prepare your library directory.</li>
<pre>
  // You may need to install the local::lib library first
  > (sudo) perl -MCPAN -e 'install local::lib'
  > cd ${GBROWSE_ROOT}
  > mkdir ${GBROWSE_VERSION}
  > cd ${GBROWSE_VERSION}
  > mkdir extlib ; cd extlib
  > perl -Mlocal::lib=./
  > eval $(perl -Mlocal::lib=./)
</pre>
<li>Check out <a href="http://github.com/tharris/GBrowse-PSGI">GBrowse fork</a> with modifications for running under PSGI/Plack.</li>
<pre>
  > cd ${GBROWSE_ROOT}
  > mkdir src ; cd src
  > git clone git@github.com:tharris/GBrowse-PSGI.git
  > cd GBrowse-PSGI
  # Here, the wwwuser is YOU, not the Apache user.
  > perl Build.PL --conf         ${GBROWSE_ROOT}/${GBROWSE_VERSION}/conf \
                  --htdocs       ${GBROWSE_ROOT}/${GBROWSE_VERSION}/html \
                  --cgibin       ${GBROWSE_ROOT}/${GBROWSE_VERSION}/cgi \
                  --wwwuser      $LOGNAME \
                  --tmp          ${GBROWSE_ROOT}/${GBROWSE_VERSION}/tmp \
                  --persistent   ${GBROWSE_ROOT}/${GBROWSE_VERSION}/tmp/persistent \
                  --databases    ${GBROWSE_ROOT}/${GBROWSE_VERSION}/databases \
                  --installconf  n \
                  --installetc   n
  > ./Build installdeps   # Be sure to install all components of the Plack stack:

      Plack
      Plack::App::CGIBin
      Plack::App::WrapCGI
      Plack::Builder
      Plack::Middleware::ReverseProxy
      Plack::Middleware::Debug
      CGI::Emulate::PSGI
      CGI::Compile

  // Should you need to adjust any values, run
  > ./Build.PL reconfig
  > ./Build install
</pre>
<p><em>Note: the curent installer script SHOULD NOT require a root password if using local paths like this example. When it asks if you want to restart Apache, select NO.  It&#8217;s not relevant for us.</em></p>
<li>Fire up a Plack server using plackup.</li>
<p>The Build script will have installed a suitable .psgi file at conf/GBrowse.psgi. Launch a simple plack HTTP server via:</p>
<pre>
   > plackup -p 9001 ${GBROWSE_ROOT}/${GBROWSE_VERSION}/conf/GBrowse.psgi
   // Open http://localhost:9001/
</pre>
<p><em>Note: By default, plackup will use HTTP::Server::PSGI.</em></p>
<h3>Where To From Here</h3>
<p>PSGI/Plack is really powerful. Here are some examples that take advantage of configuration already in the conf/GBrowse.psgi file.</p>
<p>Enable the Plack debugging middleware:</p>
<pre>
   > export GBROWSE_DEVELOPMENT=true
   > plackup -p 9001 ${GBROWSE_ROOT}/${GBROWSE_VERSION}/conf/GBrowse.psgi
   // Visit http://localhost:9001/ and see all the handy debugging information.
</pre>
<p>Run GBrowse under the preforking, lightweight HTTP server Starman:</p>
<pre>
   > perl -MCPAN -e 'install Starman'
   > starman -p 9001 ${GBROWSE_ROOT}/${GBROWSE_VERSION}/conf/GBrowse.psgi
</pre>
]]></content:encoded>
			<wfw:commentRss>http://toddharris.net/blog/2011/09/11/running-the-generic-genome-browser-under-psgiplack/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>An introduction to cloud computing for biologists (aka the 10-minute model organism database installation)</title>
		<link>http://toddharris.net/blog/2011/08/11/cloud-computing-for-biologists/</link>
		<comments>http://toddharris.net/blog/2011/08/11/cloud-computing-for-biologists/#comments</comments>
		<pubDate>Thu, 11 Aug 2011 19:39:48 +0000</pubDate>
		<dc:creator>tharris</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[WormBase]]></category>

		<guid isPermaLink="false">http://toddharris.net/blog/?p=396</guid>
		<description><![CDATA[This tutorial will explain the basic concepts of cloud computing and get you up and running in minutes. No knowledge of system administration or programming is necessary. As an example, it describes how to launch your own instance of the model organism database WormBase. Introduction to cloud computing If you aren&#8217;t familiar with cloud computing [...]]]></description>
			<content:encoded><![CDATA[<p></p><p><em>This tutorial will explain the basic concepts of cloud computing and get you up and running in minutes.  No knowledge of system administration or programming is necessary.  As an example, it describes how to launch your own instance of the model organism database <a href="http://www.wormbase.org/">WormBase</a>.</em></p>
<h2>Introduction to cloud computing</h2>
<p>If you aren&#8217;t familiar with cloud computing here&#8217;s all you need to know. At its simplest, cloud computing refers to using remote compute resources over the network as if they were a computer sitting on your desktop.  These services are typically virtualized and used in an on-demand fashion.</p>
<p>Several vendors provide cloud computing options. Here, we&#8217;ll focus on<br />
<a href="http://aws.amazon.com/ec2">Amazon&#8217;s Elastic Compute Cloud (EC2)</a>.</p>
<p>On EC2, developers can create  Amazon Machine Images (AMIs) which are essentially snapshots of a full computer system. For example, the WormBase AMI contains everything necessary to run WormBase &#8212; all software and databases with the operating system preconfigured.</p>
<p>Booting up an image is referred to launching an &#8220;instance&#8221;.  When you do so, you choose the size of the server to allocate (for example, how many cores and how much RAM) to run the instance with.  You can start, stop, or reboot the instance at any time.  Terminating the instance completely removes it from your account. The original reference AMI remains; you can launch a new instance from it any time.  This is what Amazon means by elastic. You can provision and decommission new servers with custom capacity in minutes mitigating overhead costs like data centers, surly IT departments, and draconian firewall regulations.</p>
<p>Amazon&#8217;s EC2 service is a &#8220;<a href="http://aws.amazon.com/ec2/pricing/">pay-for-what-you-use</a>&#8221; service; running an instance is <b>not free</b>. You are charged nominal rates for 1) the size of the instance allocated; 2) the amount of disk space the instance requires even if it isn&#8217;t running; 3) the amount of bandwidth the instance consumes; 4) how long the instance is running.</p>
<p>A complicated model organism database like WormBase typically require a &#8220;large&#8221; instance (see below). Running 24/7, the estimated cost would be approximately $2700/year.  Costs can be mitigated by starting and stopping the instance when needed, pausing the instance in its current state. This is conceptually similar to puting a desktop computer to sleep.  Alternatively, if you aren&#8217;t modifying the data on the server, you can safely terminate it when you are done, avoiding disk use charges, too.  Simply launch a new instance from the original WormBase AMI.  Launching from an AMI requires slightly more time (several minutes) than restarting a stopped instance (< minute). Requesting a dedicated instance in advance from Amazon further reduces the cost by approximately 30%.  </p>
<p><em>caveat emptor</em>: these are back-of-the-napkin calculations. Costs can vary dramatically especially if you start making many, many requests to the website. Bandwidth charges for accessing the website are nominal.</p>
<h2>Example: Personal Instances of WormBase through Amazon&#8217;s EC2</h2>
<p>In the past running a private instance of WormBase has been a time-consuming process requiring substantive computer science acumen.</p>
<p>Today I&#8217;m happy to announce WormBase Amazon Machine Images (wAMIs, pronounced &#8220;whammys&#8221;) for <a href="http://aws.amazon.com/ec2">Amazon&#8217;s Elastic Compute Cloud (EC2)</a>.  The WormBase AMI makes it absolutely trivial to run your own private version of WormBase.</p>
<p>Running your own instance gives you:<br />
* Dedicated resources<br />
* A feature-rich data mining platform<br />
* Privacy</p>
<h2>Contents of the WormBase AMI</h2>
<p>* The WS226 (and beyond) version of the database<br />
* The (mostly) full WormBase website<br />
* The Genome Browser with 10 species<br />
* A wealth of pre-installed libraries for data mining (to be covered in a subsequent post)</p>
<p>The first WormBase AMI is missing a few features:<br />
* WormMart<br />
* BLAST</p>
<h2>Launching your own instance of WormBase</h2>
<p>Here&#8217;s a really bad screen cast.  You might want to read through the rest of the tutorials for details.</p>
<p><object classid="clsid:02BF25D5-8C17-4B23-BC80-D3488ABDDC6B" codebase="http://www.apple.com/qtactivex/qtplugin.cab" width="500" height="287"><param name="src" value="/screencasts/2011/201106-wormbase-ec2-small.mov"><param name="autoplay" value="true"><param name="type" value="video/quicktime"><embed src="/screencasts/2011/201106-wormbase-ec2-small.mov" width="500" height="287" autoplay="false" type="video/quicktime" pluginspage="http://www.apple.com/quicktime/download/"></p>
<p></object></p>
<p>
<em>View the screencast in <a href="/screencasts/2011/201106-wormbase-ec2.mov">full size</a>.</em>
</p>
<p>The general steps for launching an instance of a new AMI are as follows.  Note that in the management console it is possible to execute many of these steps during the process of launching any one specific instance, too.</p>
<h3>1. Sign up for an Amazon Web Services account</h3>
<p>See up for an account at <a href="http://aws.amazon.com/">aws.amazon.com</a>. You&#8217;ll need a credit card.</p>
<h3>2. Create a keypair</h3>
<p><em>Note: You can also complete this step when you launch your instance if you prefer.</em></p>
<p>When you launch an instance Amazon needs to ensure that you are who you say you are (read: that you have the ability to pay for the resources that you consume), as well as give you a mechanism for logging into the server.  This authentication process is handled through the use of secret keys.  Even if you only intend to use the web interface of WormBase and not log in directly to the server, you will still need to generate a keypair.</p>
<p>To do this, log in to your Amazon AWS account and click on the EC2 tab.  In the left hand pane, click on &#8220;Keypairs&#8221;.  You&#8217;ll see a small button labeled &#8220;Create Keypair&#8221;.  Click, and create a new kaypair.  You can name it whatever you like.  When you click continue a file will be downloaded to your computer.  You will need this file if you intend to log on to the server.  Store it in a safe place as others can launch services using your account if they get access to this file!</p>
<h3>3. Configure a new security group</h3>
<p><em>Note: You can also complete this step when you launch your instance if you prefer.</em></p>
<p>Security groups are a list of firewall rules for what types of requests your instances respond to.  They can be standard services on standard ports (HTTP on port 80) or custom, and they can range from allowing the entire internet to a single IP address.  They are a quick way to lock down who gets to use your instance.  For now, we&#8217;ll create a security group that is very permissive.</p>
<p>Click &#8220;Create new group&#8221;, give the group a name and description.  From the dropdown, select &#8220;HTTP&#8221;.  Click Add Rule.  Repeat, this time selecting SSH.  Although not required, enabling SSH will allow us to actually log into the server to perform administrative or diagnostic tasks.  Click Add Rule, then Save.</p>
<h3>4. Find and launch an instance of the WormBase (WS226) AMI</h3>
<p>Now we&#8217;re ready to launch our own instance. See the video tutorial for description.</p>
<h3>5. Get the public DNS entry for your new instance</h3>
<p>Your new instance is elastic; it gets a new IP address every time it is launched (although Amazon has services that let it retain a static address, too).  You need to get the hostname so that you can connect to the server.  Click on &#8220;Instances&#8221;, select the running instance, and in the bottom pane, find the &#8220;Public DNS&#8221; entry.  Copy this entry, open a new tab in your browser and paste in the URI.  It will look something like this:</p>
<p>ec2-50-17-41-111.compute-1.amazonaws.com</p>
<h3>6. Stopping your instance</h3>
<p>When you are done with your instance, shut it down by going to the EC2 tab > Instances. Select the instance and from other the &#8220;Instance Action&#8221; drop down or by right clicking, select &#8220;Stop&#8221;.  You&#8217;re instance will be paused where you are.  Repeat these steps selecting &#8220;start&#8221; to restart it.  <em>Note: you will continue to accumulate charges associated with disk storage while the instance is stopped, but will not incur compute charges.</em> Alternatively, you can choose to &#8220;terminate&#8221; the instance. Once you do so, be sure to visit the &#8220;Volumes&#8221; and select the EBS volume that had been attached to the instance &#8212; it will be 150GB in size.  It will cost about $7/month to save this volume.</p>
<p>In a subsequent tutorial, I&#8217;ll show you how to go beyond the web browser to use the powerful command line data mining tools packaged with every WormBase AMI.</p>
<p><em>Questions? Contact me at todd@wormbase.org</em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://toddharris.net/blog/2011/08/11/cloud-computing-for-biologists/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Happy belated birthday, Mendel!</title>
		<link>http://toddharris.net/blog/2011/07/21/happy-belated-birthday-mendel/</link>
		<comments>http://toddharris.net/blog/2011/07/21/happy-belated-birthday-mendel/#comments</comments>
		<pubDate>Thu, 21 Jul 2011 17:00:12 +0000</pubDate>
		<dc:creator>tharris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[genetics]]></category>
		<category><![CDATA[Mendel]]></category>
		<category><![CDATA[photoshop]]></category>

		<guid isPermaLink="false">http://toddharris.net/blog/?p=416</guid>
		<description><![CDATA[Photoshop art from my old grad school days.]]></description>
			<content:encoded><![CDATA[<p></p><p><a href="http://toddharris.net/i/2011/07/mendel.jpg"><img src="http://toddharris.net/i/2011/07/mendel-233x300.jpg" alt="Gregor Mendel: Geneticist, Rastafarian Luminary." title="mendel" width="233" height="300" class="alignnone size-medium wp-image-417" /></a></p>
<p>Photoshop art from my old grad school days.</p>
]]></content:encoded>
			<wfw:commentRss>http://toddharris.net/blog/2011/07/21/happy-belated-birthday-mendel/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Debugging xinetd configuration problems</title>
		<link>http://toddharris.net/blog/2011/06/19/debugging-xinetd-at-system-launch/</link>
		<comments>http://toddharris.net/blog/2011/06/19/debugging-xinetd-at-system-launch/#comments</comments>
		<pubDate>Sun, 19 Jun 2011 13:54:14 +0000</pubDate>
		<dc:creator>tharris</dc:creator>
				<category><![CDATA[cloud]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[AMI]]></category>
		<category><![CDATA[xinetd]]></category>

		<guid isPermaLink="false">http://toddharris.net/blog/?p=390</guid>
		<description><![CDATA[xinetd is great when it&#8217;s working but can be a complete pain to debug when things go wrong. As a start, try launching it in the foreground in debugging mode: /usr/sbin/xinetd -d -dontfork]]></description>
			<content:encoded><![CDATA[<p></p><p>xinetd is great when it&#8217;s working but can be a complete pain to debug when things go wrong.  As a start, try launching it in the foreground in debugging mode:</p>
<pre>
   /usr/sbin/xinetd -d -dontfork
</pre>
]]></content:encoded>
			<wfw:commentRss>http://toddharris.net/blog/2011/06/19/debugging-xinetd-at-system-launch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GitHub&#8217;s &#8220;Organizations&#8221; for distributed #bioinformatics dev; migrating from Mercurial</title>
		<link>http://toddharris.net/blog/2011/02/12/githubs-organizations-for-distributed-bioinformatics-dev-migrating-from-mercurial/</link>
		<comments>http://toddharris.net/blog/2011/02/12/githubs-organizations-for-distributed-bioinformatics-dev-migrating-from-mercurial/#comments</comments>
		<pubDate>Sat, 12 Feb 2011 16:57:41 +0000</pubDate>
		<dc:creator>tharris</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[DVCS]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[mercurial]]></category>
		<category><![CDATA[project management]]></category>

		<guid isPermaLink="false">http://toddharris.net/blog/?p=374</guid>
		<description><![CDATA[GitHub.com&#8217;s &#8220;Organizations&#8221; is a great tool for distributed bioinformatics teams. Here&#8217;s how I migrated some of our repositories from Mercurial to Git to take advantage of this feature After much evangelizing, weeping, and wailing, I finally convinced everyone at one highly geographically and functionally distributed projects that we should at least try consolidating our code [...]]]></description>
			<content:encoded><![CDATA[<p></p><p><i>GitHub.com&#8217;s &#8220;Organizations&#8221; is a great tool for distributed bioinformatics teams. Here&#8217;s how I migrated some of our repositories from Mercurial to Git to take advantage of this feature</i></p>
<p>After much evangelizing, weeping, and wailing, I finally convinced everyone at one highly geographically and functionally distributed projects that we should at least try consolidating our code in one place.</p>
<p>Currently we have old legacy repositories in CVS, mid-range projects in SVN, new development in Git and Mercurial, and AFAIK a bunch of code in no SCM system at all.</p>
<p>Given that DVCS doesn&#8217;t have the directory level granularity of SVN, we definitely don&#8217;t want to consolidate everything in a single repository.  So far, it seems that <a href="http://github.com/">GitHub</a> offers the best solution with its &#8220;<a href="http://github.com/WormBase">Organizations</a>&#8221; feature.  This lets a team group multiple repositories under a single umbrella with a shared news feed and administration. Perfect.</p>
<p><a href="http://hg-git.github.com/"</a>hg-git</a> looks like a useful tool if you want to maintain code in both git and mercurial.  I don&#8217;t.  Here&#8217;s how I handled a full-scale migration of our repositories:</p>
<pre>
todd> cd ~/projects
todd> git clone http://repo.or.cz/r/fast-export.git
todd> mkdir new_git_repository ; cd new_git_repository
todd> git init
todd> ../fast-export/hg-fast-export.sh -r ~/projects/old_hg_repository
todd> git checkout HEAD
todd> git remote add origin git@github.com:[organization]/[reponame].git
todd> git push origin master
</pre>
<p>Bing! And you&#8217;re done. Or <a href="http://www.youtube.com/watch?v=Xz7_3n7xyDg">whatever</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://toddharris.net/blog/2011/02/12/githubs-organizations-for-distributed-bioinformatics-dev-migrating-from-mercurial/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Amazon Elastic Block Store for facile sharing and archiving of biological data</title>
		<link>http://toddharris.net/blog/2011/02/10/amazon-elastic-block-store-for-facile-sharing-and-archiving-of-biological-data/</link>
		<comments>http://toddharris.net/blog/2011/02/10/amazon-elastic-block-store-for-facile-sharing-and-archiving-of-biological-data/#comments</comments>
		<pubDate>Thu, 10 Feb 2011 18:09:21 +0000</pubDate>
		<dc:creator>tharris</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[data archiving]]></category>
		<category><![CDATA[EBS]]></category>
		<category><![CDATA[EC2]]></category>

		<guid isPermaLink="false">http://toddharris.net/blog/?p=362</guid>
		<description><![CDATA[Amazon&#8217;s Web Services offers enormous potential for people who need to process, store, and share large amounts of data. And it&#8217;s a huge boon for bioinformatics. It&#8217;s cost effective and it&#8217;s fasta. Hah. Get it? It&#8217;s &#8220;>fasta&#8221;. Archiving and sharing data has never been easier. Here&#8217;s a quick tutorial on creating an Elastic Block Store [...]]]></description>
			<content:encoded><![CDATA[<p></p><p><i>Amazon&#8217;s Web Services offers enormous potential for people who need to process, store, and share large amounts of data.</p>
<p>And it&#8217;s a huge boon for bioinformatics. It&#8217;s cost effective and it&#8217;s fasta. Hah. Get it? It&#8217;s &#8220;>fasta&#8221;. Archiving and sharing data has never been easier.</p>
<p>Here&#8217;s a quick tutorial on creating an Elastic Block Store volume that you can share with your colleagues.</i></p>
<p><b>1. Create a volume</b></p>
<ul>
<li>From the AWS Management Console, click on the EC2 tab,  then on &#8220;Elastic Block Store > Volumes&#8221;
<li>Click on &#8220;Create Volume&#8221;.
<li>Pick an appropriate size for your volume.  For EBS volumes that I am going to use to store and archive data, I create a volume 1.5 times the size of the data.  This lets me store an unpacked version and a packed version simultaneously, making it easy to update data at a later date.
<li>Add some informative tags.
</ul>
<p><b>2. Attach the volume to an EC2 instance.</b></p>
<p>From the Volumes window in the Management Console, select the new volume, then right click and Select &#8220;Attach&#8221;.  I attach devices starting at </p>
<p><b>3. Format the volume.</b></p>
<p>Once  you&#8217;ve created and mounted a volume, you&#8217;ll need to attach it to an EC2 instance. Fire one up and SSH in.</p>
<blockquote><p>
ssh -i <your.pem> <user>@yourdns.amazonaws.com<br />
> sudo mkfs.ext3 /dev/sdf
</p></blockquote>
<p><i>Mount points are available at /dev/sdf through /dev/sdp.</i></p>
<p><b>4. Mount the volume</b></p>
<blockquote><p>
> sudo mkdir /mnt/data<br />
> sudo mount -t  ext3 /dev/sdf /mnt/data
</p></blockquote>
<p>If you are potentially going to be dealing with many versions of data overtime, you might want to version your mount points.  This will allow you to attach multiple EBS volumes at different sensible directories:</p>
<blockquote><p>
> sudo mkdir /mnt/data-v0.2<br />
> sudo mount -t ext3 /dev/sdf /mnt/data-v0.2
</p></blockquote>
<p><i>Alternatively, you might consider handle versioning when creating snapshots of your volume.</i></p>
<p><b>5. Set the EBS volume to mount automatically (optional)</b></p>
<blockquote><p>
 > sudo emacs /etc/fstab<br />
 /dev/sdh /mnt/data ext3 defaults 0 0
</p></blockquote>
<p><b>And you&#8217;re done!  Now what?</b></p>
<p>Throw some data on there. Do some computes. Go nuts.</p>
<h3>Share your data</h3>
<p><i>Sharing your data is as easy as creating a snapshot.</i></p>
<p>1. Create a snapshot</p>
<p>Power down your instance. From the Management interface, select the volume and choose &#8220;Create Snapshot&#8221;.</p>
<h3>Tips for effective data archiving and sharing</h3>
<p>1. Add informative tags.</p>
<p>Be sure to add informative tags such as the release date and version of the data.</p>
<blockquote><p>
  Release Date = 02 Jan 2011<br />
  Source = Todd&#8217;s Data Emporium<br />
  Contact = data@tharris.org
</p></blockquote>
<p>2. Include informative READMEs on the volume itself.</p>
<p>3. Be sure to make the snapshot public!</p>
<h3>Updating your data</h3>
<p>Updating your data to the next release of your resource is simple. Mount the original volume to an instance, copy in new data, then create a new snapshot.</p>
]]></content:encoded>
			<wfw:commentRss>http://toddharris.net/blog/2011/02/10/amazon-elastic-block-store-for-facile-sharing-and-archiving-of-biological-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hide &#8216;n Seek: What to do with empty data fields?</title>
		<link>http://toddharris.net/blog/2011/02/08/hide-n-seek-what-to-do-with-empty-data-fields/</link>
		<comments>http://toddharris.net/blog/2011/02/08/hide-n-seek-what-to-do-with-empty-data-fields/#comments</comments>
		<pubDate>Tue, 08 Feb 2011 19:45:05 +0000</pubDate>
		<dc:creator>tharris</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[user interface & design]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[empty fields]]></category>
		<category><![CDATA[user interface]]></category>

		<guid isPermaLink="false">http://toddharris.net/blog/?p=356</guid>
		<description><![CDATA[We&#8217;ve been working on a fundamental website redesign for a hefty biological database. One design dilemma has been what to do with empty data fields. For example, on a Gene Summary we might have a &#8220;Variation&#8221; field listing variations found in the gene. Obviously, not all genes have variations. Displaying field labels with empty contents [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>We&#8217;ve been working on a fundamental website redesign for a hefty biological database.  </p>
<p>One design dilemma has been what to do with empty data fields.  For example, on a Gene Summary we might have a &#8220;Variation&#8221; field listing variations found in the gene. Obviously, not all genes have variations.</p>
<p>Displaying field labels with empty contents clearly delineates the limits of our knowledge or curation, but at the same time leads to more visually confusing pages. </p>
<p>Current options we&#8217;re considering are:</p>
<p>1. Omit the field entirely.</p>
<p>Known unknowns (apologies to D. Rumsfeld), if you don&#8217;t know what you might know, you don&#8217;t know how much you do know. Or something like that.</p>
<p>2. Display the field label, but with empty contents.</p>
<p><code><br />
Variations:<br />
</code></p>
<p>3. Display the field label with a string:</p>
<p><code><br />
Variations: no data available<br />
</code></p>
<p>This offers the same advantage as above, namely that gaps in our knowledge or curation are clearly indicated.  But sparse entries become visually thick very fast.</p>
<p>We&#8217;re currently experimenting with other design patterns for handling this situation, too,  including using color to de-emphasize empty fields or allowing users to turn off their display as a configuration option.</p>
<p>What do you prefer?  Would you rather see all available data fields on a report page even if they&#8217;re empty?  Or are you a minimalist and prefer that empty field be hidden?</p>
]]></content:encoded>
			<wfw:commentRss>http://toddharris.net/blog/2011/02/08/hide-n-seek-what-to-do-with-empty-data-fields/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>On my way to Science Online &#8217;11. Biological databases, represent! #scio11</title>
		<link>http://toddharris.net/blog/2011/01/13/on-my-way-to-science-online-11-biological-databases-represent-scio11/</link>
		<comments>http://toddharris.net/blog/2011/01/13/on-my-way-to-science-online-11-biological-databases-represent-scio11/#comments</comments>
		<pubDate>Thu, 13 Jan 2011 14:50:08 +0000</pubDate>
		<dc:creator>tharris</dc:creator>
				<category><![CDATA[meetings]]></category>
		<category><![CDATA[science online]]></category>

		<guid isPermaLink="false">http://toddharris.net/blog/?p=342</guid>
		<description><![CDATA[Another early morning for the 6:05AM from Bozeman. 4 AM doesn&#8217;t feel so bad when the stars are shining and its 30° F outside. Today I&#8217;m on my way to the Science Online &#8217;11 meeting &#8212; in fact, I&#8217;m posting this in the air between Bozeman and Minneapolis. This is the first year I&#8217;ve been [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Another early morning for the 6:05AM from Bozeman.  4 AM doesn&#8217;t feel so bad when the stars are shining and its 30° F outside.</p>
<p>Today I&#8217;m on my way to the <a href="http://scienceonline2011.com/">Science Online &#8217;11</a> meeting &#8212; in fact, I&#8217;m posting this in the air between Bozeman and Minneapolis.  This is the first year I&#8217;ve been able to attend, having been stymied by conflicting advisory board meetings the past two years.</p>
<p>Humbly joining luminaries from science writing and blogging, my motivation for attending is a bit different.  I&#8217;m most interested in exploring how we can make use of online tools and communities to make the process of science more transparent to other scientists, more accesible to the public, and in general, easier and more efficient.</p>
<p>Publicly-accessible web-based databases have become an essential component of daily research in biomedical sciences.  I&#8217;m the project manager and lead developer of <a href="http://www.wormbase.org/">one such database</a>. We know from user surveys that a vast majority of our users visit the site every day.  Most databases &#8212; including ours &#8212; are referential in nature.  You log on, look something up, and log off.  But these resources could be so much more than that.  We owe it ourselves to look at success cases in other fields to make these websites more interactive and useful.</p>
<p>At the moment, we are currently in the middle of a ground up rewrite of our site.  Inspired by the rise of web 2.0 social media and networking, we&#8217;re building a number of new tools into the site not commonly found on biological websites.</p>
<p>For example, can we glean biologically meaningful information from the browsing patterns of users?  I&#8217;ve tried to do this a number of times in the past using log file analysis with no limited success.  In our new site, we&#8217;ve built a tool that does this in real-time to collect the most popular objects. When correlated with unique users, we can also use this as an Amazon-style suggest feature (&#8220;Users interested in this gene were also interested in gene Y&#8221;).  We&#8217;ve extended this concept to a common &#8220;favorite this&#8221; design pattern to make possible matches even more relevant.</p>
<p>Features like this that revolve around community intelligence pose interesting questions for privacy and transparency.  One approach that we are considering is to only tally and only present results to users who have specifically opted in.</p>
<p>Well, we&#8217;re descending below 10K feet. Time to post.</p>
]]></content:encoded>
			<wfw:commentRss>http://toddharris.net/blog/2011/01/13/on-my-way-to-science-online-11-biological-databases-represent-scio11/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Volume 18, Number 3 of the Worm Breeder&#8217;s Gazette now available</title>
		<link>http://toddharris.net/blog/2010/12/26/volume-18-number-3-of-the-worm-breeders-gazette-now-available/</link>
		<comments>http://toddharris.net/blog/2010/12/26/volume-18-number-3-of-the-worm-breeders-gazette-now-available/#comments</comments>
		<pubDate>Sun, 26 Dec 2010 20:55:57 +0000</pubDate>
		<dc:creator>tharris</dc:creator>
				<category><![CDATA[open access]]></category>
		<category><![CDATA[outreach]]></category>
		<category><![CDATA[Worm Breeder's Gazette]]></category>
		<category><![CDATA[WormBook]]></category>

		<guid isPermaLink="false">http://toddharris.net/blog/?p=339</guid>
		<description><![CDATA[Volume 18, Number 3 of the resurrected, open access research newsletter of the Caenorhabditis elegans research field is now available. Go get it while the gettin&#8217;s good! The next issue of the Gazette will be release in June 2011, just prior to the 18th International Worm Meeting. You can submit articles now online at the [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Volume 18, Number 3 of the resurrected, open access research newsletter of the <em>Caenorhabditis elegans</em> research field is <a href="http://www.wormbook.org/wbg/">now available</a>.  Go get it while the gettin&#8217;s good!</p>
<p>The next issue of the Gazette will be release in June 2011, just prior to the 18th International Worm Meeting.  You can submit articles now online at the <a href="http://www.wormbook.org/wbg/">Worm Breeder&#8217;s Gazette</a>.  The deadline for submissions is June 1, 2011.</p>
]]></content:encoded>
			<wfw:commentRss>http://toddharris.net/blog/2010/12/26/volume-18-number-3-of-the-worm-breeders-gazette-now-available/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

