Running the Generic Genome Browser under PSGI/Plack

Here’s a simple approach for installing and running a local instance of GBrowse, leveraging the PSGI/Plack webserver <-> web application stack. You don’t need root access, you don’t need Apache, and you don’t need to request any firewall exceptions (for now).

Background

Both the current implementation and installer of GBrowse are loosely tied to Apache. By loosely, I mean that the installer generates suitable configuration and assumes installation paths as if the instance will be run under Apache. The implementation is tightly tied to the CGI specification; it’s a suite of CGI scripts. Although GBrowse will rununder any webserver that implements the CGI specification (are there any that DON’T?), this approach increases the administrative effort required for running a local instance, increases the complexity of configuration, makes it more difficult to run GBrowse under other environments, and makes it impossible to leverage powerful advances in Perl web application development.

Enter PSGI (the Perl Web Server Gateway Interface), a specification for glueing Perl applications to webservers. Plack is a reference implementation of this specification. PSGI as implemented by Plack makes it simple to run Perl-based applications (even CGI-based ones like GBrowse) in a variety of environments.

In other words, PSGI abstracts the request/response cycle so that you can focus on your application. Running your application under CGI, Fast CGI, or mod_perl is just a matter of changing the application handler. The core Plack distribution provides a number of handlers out of the box (CGI, FCGI, mod_perl, for example) and even includes a light-weight webserver (HTTP::Server::PSGI) which is perfect for development. Other webservers also implement the PSGI specification, including the high-performance preforking server Starman.

You can also do cool things via middleware handlers like mapping multiple applications to different URLs with ease (how about running the last 10 versions of GBrowse all without touching Apache config or dealing with library conflicts), handle tasks like serving static files, mangling requests and responses, etc.

What this isn’t (yet)

This isn’t a rewrite of GBrowse using PSGI. It’s just some modifications to the current GBrowse to make it possible to wrap the CGI components so that they can be used via servers that implement the PSGI specification. There is a project to rewrite GBrowse as a pure PSGI app. Stay tuned for details.

Conventions

  1. Installation root.
  2. Our working installation root is configured via the environment variable GBROWSE_ROOT.

  3. No root privileges required.
  4. You do not need to be root. Ever. In fact, one of the great advantages of this approach is the ease with which you can install a local instance.

  5. Self-contained, versioned installation paths.
  6. This tutorial installs everything under a single directory for simplified management and configuration. This path corresponds to the version of GBrowse being installed.

    The current version of GBrowse is specified by environment variable (GBROWSE_VERSION). If you want to use the same installation path from release to release, you can also create and adjust symlinks as necessary (~/gbrowse/current -> ~/gbrowse/gbrowse-2.40, for example, and set GBROWSE_VERSION=current). This isn’t necessarily required but means that you won’t need to set GBROWSE_VERSION every time you update to a new version of GBrowse. At any rate, maintaining installations by version is a Good Practice and makes it easy to revert to older versions should the need arise.

  7. Each installation has it’s own set of local libraries.
  8. In keeping with the self-contained non-privileged design gestalt, we’ll install all required libraries to a local path tied to the installed version of GBrowse ($GBROWSE_ROOT/$GBROWSE_VERSION/extlib). This makes it dead simple to run many possibly conflicting variants of GBrowse all with their own dedicated suite of libraries. Awesome.

Installation

  1. Set up your environment.
  2.   // Set an environment variables for the your installation root and the version of GBrowse you are installing.
      > export GBROWSE_ROOT=~/gbrowse
      > export GBROWSE_VERSION=2.40
    
  3. Prepare your library directory.
  4.   // You may need to install the local::lib library first
      > (sudo) perl -MCPAN -e 'install local::lib'
      > cd ${GBROWSE_ROOT}
      > mkdir ${GBROWSE_VERSION}
      > cd ${GBROWSE_VERSION}
      > mkdir extlib ; cd extlib
      > perl -Mlocal::lib=./
      > eval $(perl -Mlocal::lib=./)
    
  5. Check out GBrowse fork with modifications for running under PSGI/Plack.
  6.   > cd ${GBROWSE_ROOT}
      > mkdir src ; cd src
      > git clone git@github.com:tharris/GBrowse-PSGI.git
      > cd GBrowse-PSGI
      # Here, the wwwuser is YOU, not the Apache user.
      > perl Build.PL --conf         ${GBROWSE_ROOT}/${GBROWSE_VERSION}/conf \
                      --htdocs       ${GBROWSE_ROOT}/${GBROWSE_VERSION}/html \
                      --cgibin       ${GBROWSE_ROOT}/${GBROWSE_VERSION}/cgi \
                      --wwwuser      $LOGNAME \
                      --tmp          ${GBROWSE_ROOT}/${GBROWSE_VERSION}/tmp \
                      --persistent   ${GBROWSE_ROOT}/${GBROWSE_VERSION}/tmp/persistent \
                      --databases    ${GBROWSE_ROOT}/${GBROWSE_VERSION}/databases \
                      --installconf  n \
                      --installetc   n
      > ./Build installdeps   # Be sure to install all components of the Plack stack:
    
          Plack
          Plack::App::CGIBin
          Plack::App::WrapCGI
          Plack::Builder
          Plack::Middleware::ReverseProxy
          Plack::Middleware::Debug
          CGI::Emulate::PSGI
          CGI::Compile
    
      // Should you need to adjust any values, run
      > ./Build.PL reconfig
      > ./Build install
    

    Note: the curent installer script SHOULD NOT require a root password if using local paths like this example. When it asks if you want to restart Apache, select NO. It’s not relevant for us.

  7. Fire up a Plack server using plackup.
  8. The Build script will have installed a suitable .psgi file at conf/GBrowse.psgi. Launch a simple plack HTTP server via:

       > plackup -p 9001 ${GBROWSE_ROOT}/${GBROWSE_VERSION}/conf/GBrowse.psgi
       // Open http://localhost:9001/
    

    Note: By default, plackup will use HTTP::Server::PSGI.

    Where To From Here

    PSGI/Plack is really powerful. Here are some examples that take advantage of configuration already in the conf/GBrowse.psgi file.

    Enable the Plack debugging middleware:

       > export GBROWSE_DEVELOPMENT=true
       > plackup -p 9001 ${GBROWSE_ROOT}/${GBROWSE_VERSION}/conf/GBrowse.psgi
       // Visit http://localhost:9001/ and see all the handy debugging information.
    

    Run GBrowse under the preforking, lightweight HTTP server Starman:

       > perl -MCPAN -e 'install Starman'
       > starman -p 9001 ${GBROWSE_ROOT}/${GBROWSE_VERSION}/conf/GBrowse.psgi
    

A Worthy Mercurial (hg) Tutorial from Joel Spolsky

At WormBase, we’ve been busy re-writing the website from the ground up to build a modern information discovery space that will generically handle genomic data.

As the project manager, I made the executive decision to switch from CVS/SVN to a distributed version control system (DVCS). I’d used both git and mercurial personally for over a year and enjoyed their flexibility.

And given the already distributed nature of our project, DVCS was a natural fit. (In fact, I believe that DVCS should be roundly adopted across the genomics/bioinformatics research sector precisely for this reason).

Nonetheless, for small teams accustomed to the quirks of SVN, the transition to DVCS can be a rocky road. Recently I came across Joel Spolsky’s excellent HG Init: Mercurial Tutorial.

If you’re considering or in the process of switching to Mercurial, I highly recommend checking out Joel’s tutorial and circulating it to your team.

Installing Trac

We’re already using 37 signals Basecamp for project management. It works well for collaborative work and management of distributed teams but is tedious when used as a feature tracker.

Today we discussed the need for a bug/issue tracker. In the past, we’ve considered RT since it integrates with our existing email flow. But frankly, RT is a PITA to configure.

I know that some people are hot and heavy on Mantis. I’m not. For one, I don’t like it’s name or it’s logo.

Here’s how I installed Trac:


// install Python
cd ~/src
wget http://python.org/ftp/python/2.5.2/Python-2.5.2.tgz
cd ../build
tar xzf ../src/Python*
cd Python*
./configure; make ; sudo make install

// Install easy_install.py
cd ~/src
wget http://peak.telecommunity.com/dist/ez_setup.py
sudo ./ez_setup.py

// Install Trac
sudo easy_install Trac==0.11rc2

// Install mod_wsgi
cd ~/src
wget http://modwsgi.googlecode.com/files/mod_wsgi-2.0.tar.gz
tar xzf mod_wsgi*
cd mod_wsgi*
./configure
make
sudo make install

// Set up Trac environment
trac-admin /usr/local/wormbase/trac
... // follow the configuration prompts

// Test it
tracd --port 9001 /usr/local/wormbase/trac
http://localhost:9001/

// Configure wsgi
import os

os.environ['TRAC_ENV'] = '/usr/local/wormbase/trac'
os.environ['PYTHON_EGG_CACHE'] = '/usr/local/wormbase/trac/eggs'

import trac.web.main
application = trac.web.main.dispatch_request

// Configure apache

// apache conf
LoadModule wsgi_module libexec/mod_wsgi.so
AddModule mod_wsgi.c

// wormbase/conf/httpd.conf
WSGIScriptAlias /trac /usr/local/wormbase/cgi-perl/misc/trac.wsgi


WSGIApplicationGroup %{GLOBAL}
Order deny,allow
Allow from all


AuthType Basic
AuthName "WormBase Trac"
AuthUserFile /usr/local/wormbase/trac/.htpasswd
Require valid-user