Running the Generic Genome Browser under PSGI/Plack

Here’s a simple approach for installing and running a local instance of GBrowse, leveraging the PSGI/Plack webserver <-> web application stack. You don’t need root access, you don’t need Apache, and you don’t need to request any firewall exceptions (for now).

Background

Both the current implementation and installer of GBrowse are loosely tied to Apache. By loosely, I mean that the installer generates suitable configuration and assumes installation paths as if the instance will be run under Apache. The implementation is tightly tied to the CGI specification; it’s a suite of CGI scripts. Although GBrowse will rununder any webserver that implements the CGI specification (are there any that DON’T?), this approach increases the administrative effort required for running a local instance, increases the complexity of configuration, makes it more difficult to run GBrowse under other environments, and makes it impossible to leverage powerful advances in Perl web application development.

Enter PSGI (the Perl Web Server Gateway Interface), a specification for glueing Perl applications to webservers. Plack is a reference implementation of this specification. PSGI as implemented by Plack makes it simple to run Perl-based applications (even CGI-based ones like GBrowse) in a variety of environments.

In other words, PSGI abstracts the request/response cycle so that you can focus on your application. Running your application under CGI, Fast CGI, or mod_perl is just a matter of changing the application handler. The core Plack distribution provides a number of handlers out of the box (CGI, FCGI, mod_perl, for example) and even includes a light-weight webserver (HTTP::Server::PSGI) which is perfect for development. Other webservers also implement the PSGI specification, including the high-performance preforking server Starman.

You can also do cool things via middleware handlers like mapping multiple applications to different URLs with ease (how about running the last 10 versions of GBrowse all without touching Apache config or dealing with library conflicts), handle tasks like serving static files, mangling requests and responses, etc.

What this isn’t (yet)

This isn’t a rewrite of GBrowse using PSGI. It’s just some modifications to the current GBrowse to make it possible to wrap the CGI components so that they can be used via servers that implement the PSGI specification. There is a project to rewrite GBrowse as a pure PSGI app. Stay tuned for details.

Conventions

  1. Installation root.
  2. Our working installation root is configured via the environment variable GBROWSE_ROOT.

  3. No root privileges required.
  4. You do not need to be root. Ever. In fact, one of the great advantages of this approach is the ease with which you can install a local instance.

  5. Self-contained, versioned installation paths.
  6. This tutorial installs everything under a single directory for simplified management and configuration. This path corresponds to the version of GBrowse being installed.

    The current version of GBrowse is specified by environment variable (GBROWSE_VERSION). If you want to use the same installation path from release to release, you can also create and adjust symlinks as necessary (~/gbrowse/current -> ~/gbrowse/gbrowse-2.40, for example, and set GBROWSE_VERSION=current). This isn’t necessarily required but means that you won’t need to set GBROWSE_VERSION every time you update to a new version of GBrowse. At any rate, maintaining installations by version is a Good Practice and makes it easy to revert to older versions should the need arise.

  7. Each installation has it’s own set of local libraries.
  8. In keeping with the self-contained non-privileged design gestalt, we’ll install all required libraries to a local path tied to the installed version of GBrowse ($GBROWSE_ROOT/$GBROWSE_VERSION/extlib). This makes it dead simple to run many possibly conflicting variants of GBrowse all with their own dedicated suite of libraries. Awesome.

Installation

  1. Set up your environment.
  2.   // Set an environment variables for the your installation root and the version of GBrowse you are installing.
      > export GBROWSE_ROOT=~/gbrowse
      > export GBROWSE_VERSION=2.40
    
  3. Prepare your library directory.
  4.   // You may need to install the local::lib library first
      > (sudo) perl -MCPAN -e 'install local::lib'
      > cd ${GBROWSE_ROOT}
      > mkdir ${GBROWSE_VERSION}
      > cd ${GBROWSE_VERSION}
      > mkdir extlib ; cd extlib
      > perl -Mlocal::lib=./
      > eval $(perl -Mlocal::lib=./)
    
  5. Check out GBrowse fork with modifications for running under PSGI/Plack.
  6.   > cd ${GBROWSE_ROOT}
      > mkdir src ; cd src
      > git clone git@github.com:tharris/GBrowse-PSGI.git
      > cd GBrowse-PSGI
      # Here, the wwwuser is YOU, not the Apache user.
      > perl Build.PL --conf         ${GBROWSE_ROOT}/${GBROWSE_VERSION}/conf \
                      --htdocs       ${GBROWSE_ROOT}/${GBROWSE_VERSION}/html \
                      --cgibin       ${GBROWSE_ROOT}/${GBROWSE_VERSION}/cgi \
                      --wwwuser      $LOGNAME \
                      --tmp          ${GBROWSE_ROOT}/${GBROWSE_VERSION}/tmp \
                      --persistent   ${GBROWSE_ROOT}/${GBROWSE_VERSION}/tmp/persistent \
                      --databases    ${GBROWSE_ROOT}/${GBROWSE_VERSION}/databases \
                      --installconf  n \
                      --installetc   n
      > ./Build installdeps   # Be sure to install all components of the Plack stack:
    
          Plack
          Plack::App::CGIBin
          Plack::App::WrapCGI
          Plack::Builder
          Plack::Middleware::ReverseProxy
          Plack::Middleware::Debug
          CGI::Emulate::PSGI
          CGI::Compile
    
      // Should you need to adjust any values, run
      > ./Build.PL reconfig
      > ./Build install
    

    Note: the curent installer script SHOULD NOT require a root password if using local paths like this example. When it asks if you want to restart Apache, select NO. It’s not relevant for us.

  7. Fire up a Plack server using plackup.
  8. The Build script will have installed a suitable .psgi file at conf/GBrowse.psgi. Launch a simple plack HTTP server via:

       > plackup -p 9001 ${GBROWSE_ROOT}/${GBROWSE_VERSION}/conf/GBrowse.psgi
       // Open http://localhost:9001/
    

    Note: By default, plackup will use HTTP::Server::PSGI.

    Where To From Here

    PSGI/Plack is really powerful. Here are some examples that take advantage of configuration already in the conf/GBrowse.psgi file.

    Enable the Plack debugging middleware:

       > export GBROWSE_DEVELOPMENT=true
       > plackup -p 9001 ${GBROWSE_ROOT}/${GBROWSE_VERSION}/conf/GBrowse.psgi
       // Visit http://localhost:9001/ and see all the handy debugging information.
    

    Run GBrowse under the preforking, lightweight HTTP server Starman:

       > perl -MCPAN -e 'install Starman'
       > starman -p 9001 ${GBROWSE_ROOT}/${GBROWSE_VERSION}/conf/GBrowse.psgi
    

Using DropBox for automatic Perl library synchronization

Working efficiently across multiple machines just got a whole lot simpler.

Working on multiple machines with varied architectures can be a pain. It’s never obvious where the most recent version of something is. Merging changes is tedious and error-prone. Building software over and over is a waste of time.

I’ve heard of some gluttons that check in their entire home into GitHub. This glutton has done the same in SVN in the way-way-past. Although source code management systems can be shoe-horned to watch almost anything, they make the problem of synchronization worse. Have you ever frantically checked-in/out on several machines before leaving for the airport to make sure your laptop is up-to-date?

Rsync gets equally confusing when source and destination are constantly changing. And don’t even consider using the --delete flag unless you’re prepared for heartache.

DropBox solves all off these problems with aplomb. It’s in essence a network disk but with a stored local copy, and a thin synchronization layer. The service comes with 2GB of free storage. If you haven’t tried it yet, here’s an invitation. Following this link when you register will net you an extra 250MB.

Here’s how I’m using it to manage Perl versions and libraries. Nothing earth-shattering here. Since DropBox is in essence just a directory, this isn’t any different from installing your own local Perl.

Directory structure

I like to keep multiple versions of Perl around as well as distinct library collections for different architectures. See below for details.

# A local directory for all things Perl
cd ~/Dropbox/perl

todd> ls -1
5.8.8-darwin  # Perl 5.8.8 and core libraries built for darwin
5.10.1-darwin  # Perl 5.10.1 and core libraries built for darwin
lib/darwin-5.10.1 # site_perl for Perl 5.10.1/darwin 
lib/x86-64-5.8.8  # site_perl for Perl 5.8.8/x86-64

If you are already happy with your current Perl, skip to Installing Modules.

Download and unpack

todd> cd ~/src
todd> curl -O http://search.cpan.org/CPAN/authors/id/D/DA/DAPM/perl-5.10.1.tar.gz
todd> tar xzf perl-5.10.1.tar.gz 

Configure

todd> cd perl-5.10.1
todd> Configure -des -Dinc_version_list=none \
                  -Dprefix=/Users/todd/Dropbox/perl/5.10.1-darwin

Flags: -des means “Accept the defaults for your architecture”, -Dinc_version_list=none prevents Perl from accidentally including other library paths, -D/Users/todd/Dropbox/perl/5.10.1 sets the installation path.

Make and install

todd> make
todd> make test
todd> make install

Configure environment to use the new Perl

todd> export PATH=/Users/todd/Dropbox/perl/5.10.1/bin:${PATH}
todd> perl -v

This is perl, v5.10.1 (*) built for darwin-2level
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2009, Larry Wall
...

Installing modules

You can install modules directly into this local tree by :

todd> perl -MCPAN -e shell
cpan> install Module::Install

Alternatively, you can maintain libraries in a distinct path. This is useful if a) you want to test different sets of modules in isolation; b) you are already happy with your installed Perl version, or c) you don’t have sufficient privileges to install modules in to the default site_perl path. Although you can set this up during Configure, it’s easiest to use local::lib:

todd> perl -MCPAN -e 'CPAN::install(local::lib)'
todd> cd ~/DropBox/perl/lib/darwin-5.10.1
todd> perl -Mlocal::lib=./
todd> eval $(perl -Mlocal::lib=--self-contained,./)
todd> perl -MCPAN -e shell
cpan> install GO::Nuts

This will install the library GO::Nuts into the independent local library path.

And the best part is that GO::Nuts will instantly be available to your other environments! No more trying to remember what modules you installed during your last dev session. Just make sure the correct Perl (and/or Perl library directory) is in your path and go to it!

Managing multiple Perl module directories

If you develop in Perl or act as a system administrator, you have undoubtedly come up against the hassle of managing local collections of Perl modules.

I’ve tried everything in the past. I’ve built modules by hand specifying Makefile.PL prefix paths. I’ve flattened architecture specific directories. I’ve lived through the introduction of Module::Build and the inconsistencies between it and EUMM. I’ve built bundles, packages, even virtual machines. I’ve scripted in the shell and with CPAN/CPANplus.

Still, maintaining distinct directories of Perl modules for multiple current applications was a pain. Until now.

local::lib gets around the tedium of maintaining local Perl libraries. It modifies environment variables for you so you don’t have to screw with -I, INSTALL_BASE, –install_base, or PREFIX. Best of all, you can continue to use CPAN, too!

Here’s how easy it is:

 # install local::lib globally (assuming you have sudo/root)
 $ sudo perl -MCPAN -e 'CPAN::install(local::lib)
 
 # Set your local library dir
 $ cd ~/my_project/extlib
 
 # Set this as your local lib dir
 $ perl -Mlocal::lib=./

 # Update your environment for the current shell
 $ eval $(perl -Mlocal::lib=--self-contained,./)

 # Install a module
 $ perl -MCPAN -e 'CPAN::install(GD::SVG)'

A thing of beauty, really.

Implementing a simple web-log based recommender system

I’ve now implemented such a system as an extension to Catalyst, the open source Perl web framework. The system isn’t yet ready for general distribution, but I’d like to share my approach.

First, I’ve gathered ten years of web access logs from WormBase, a generic model organism database where I work as the project manager.

Next, I correlated IP addresses with requests and tried to trace browsing patterns from one object to the next. This isn’t an exact science since we haven’t historically tried to uniquely identify users.

Data is loaded into a simple MySQL schema with object and object2related tables. Expediently simple.