The Top Five 2017 AWS Re:Invent Announcements Impacting Bioinformatics

The sixth annual Amazon Web Services (AWS) Re:Invent conference was held last week in Las Vegas. Like years past, 2017 Re:Invent was a dizzying week of new announcements, enhancements to existing products, and interesting prognostications on the future of cloud computing.

The first Re:Invent conference was held in 2012. 6000 attendees gorged themselves on keynotes, sessions, hackathons, bootcamps, cloud evangelism, and video games. Not much has changed, except for the number of attendees. 2017 boasted 43,000 attendees with a conference campus spread across multiple venues on the Las Vegas strip. It was not difficult to get your 10,000 steps in this year.

I’ve been fortunate to attend Re:Invent every year. In 2012, I’d already been using AWS for nearly 5 years and was thoroughly convinced of it utility and value. In those early days, there was still a lot of reluctance and skepticism of the cloud, particularly in academic settings. To some extent, these biases still exist which partially explains the slower uptake of the cloud for academic projects. By now, I think it’s overwhelmingly clear that academic compute and basic research projects should be leveraging the many benefits of building and deploying on the cloud.

Without further ado, here are my top five announcements at the 2017 AWS Re:Invent impacting bioinformatics.

1. AWS Sagemaker
AWS Sagemaker is a fully managed service for building, training, and deploying machine learning workflows in the AWS cloud. Machine learning has always played an important role in bioinformatics. Simplifying training and deployment of ML workflows will have a profound impact on bioinformatics and big data. For one, Sagemaker offers the opportunity to introduce ML approaches to a broader audience, and to a broader range of research topics. Of any of the 100s of announcements at Re:Invent, I’m most excited to put Sagemaker to use.

2. Amazon Neptune
Bioinformatics is all about highly connected data. These relationships are often a bear to model in relational database management systems. Graph databases are a perfect fit for biological data. Amazon Neptune is the latest entry into the crowded Graph database space. Many commercial options currently available force decisions and raise significant issues of cost and lock-in. Neptune is still in a preview phase and I haven’t had any direct interaction with it, so I can’t address how it will perform against these challenges. However, given its integration with AWS, I expect it’s rate of adoption to increase rapidly. As a highly available and scalable managed database supporting graph APIs and designed for the cloud, Neptune could be an amazing tool for bioinformatics projects.

3. AWS Fargate
AWS Fargate promises to bring the serverless revolution to containers. Containers already have a strong presence in bioinformatics and have greatly simplified the maintenance and deployment of applications that may be, ahem, short on documentation. Still, they’ve required managing the underlying infrastructure. Fargate is a launch type for Amazon ECS that simplifies launching containers without having to manage the underlying infrastructure. You don’t have to define instance type or family or manage scaling or clusters. Just define CPU and memory, IAM, and networking, and let Fargate handle the infrastructure. While we are on containers, AWS also introduced ECS for Kubernetes (EKS). Although it doesn’t rank in my top five, it does bear mention here.

4. AWS Comprehend
Did I say that I was most excited about Sagemaker? Well, I’m also pretty psyched about the introduction of AWS Comprehend. Comprehend is a natural language processing (NLP) managed service that relies on machine learning to process test. At the end of the day, a big part of the most interesting part of bioinformatics is text. Comprehend offers a really cool way to get at that information. It can extract key phrases, known vocabularies, and custom lexica. It also does expected things like weighting occurrences and displaying them in context. Of course, it has an API and integrates with other AWS services, too.

5. AWS Glacier Select
Last but not least is AWS Glacier Select. Really, you ask? A storage enhancement made my top five list? Yes. Here’s way. Biology (and bioinformatics) is about data. Data is expensive to generate and expensive to keep around. You either pay a lot for storage, throw your data away and commit to regenerating it later, or place it in essentially inaccessible archival storage. 
That’s where Glacier Select comes in. Glacier is an AWS archival service for data that you don’t need immediately accessible. But Glacier Select actually lets you execute an SQL query against a Glacier archive. Since its archival storage, you also specify when you would like your results returned. Standard queries take 3–5 hours, and results can be deposited in an S3 bucket. Of course, there’s an API that you can build in to existing applications. I’m super psyched about cheap archival storage that can still be queried and think this has many applications in bioinformatics.

There were many, many other announcements that have direct applications in bioinformatics. I’d highly encourage you to watch the keynotes from Andy Jassy and Werner Vogels to dive in a little deeper.

Time to do away with the “Darwin” fish

I was really psyched to see Carl Zimmer’s recent NY Times article on the use of the word “theory”.

In Science, It’s Never Just A Theory

The misuse of “theory” in the general public — and sometimes even in scientific circles — has always been a personal pet peeve of mine. And don’t get me started on the various misuses of homology, orthology, and paralogy.

Now that everyone is clear on when and how to use “theory”, another thing I find equally annoying are the Darwin fish emblems. You’ve seen them. They are riffs on the ichthys or Jesus fish symbol, except they have legs and “Darwin” inscribed. It’s astounding to see the continued popularity of these emblems since they first appeared in the 1980s. At that time, I thought they were hilarious. Now, I just think they reek of a self-satisified smugness, vaunting the superiority of the owner’s intellect. And since their introduction, they’ve led to a stupid one-upsmanship almost as inane and tasteless as the Calvin (of Calvin and Hobbes) peeing stickers. [BTW: Don’t miss this hilarious take on the Peeing Calvin from The Onion]

The problem is that they conflate science and religion when the two things are completely different endeavors. Science isn’t a belief system. It doesn’t tell you what you should think. Science is a process and method of understanding the world around us. Nothing more and nothing less. It is, hopefully, introspective of itself and always moving to deeper understanding.

When scientists mistakenly place religion and science in the same sphere, they introduce unnecessary conflict. Religion and science aren’t simply two sides of the same coin. They are entirely different currencies altogether.

Do you like #Slack? And the TV series Silicon Valley? You aren’t alone…

If you are a Slack fanatic (a Slanatic?), you probably belong to several teams. Some of my teams have purpose-driven names, like $company-$department.

But others are, um, more whimsical. Recently I needed to start a new team for a nascent project. I thought what could possibly be a richer source of names than the hilarious Silicon Valley TV series

Well — surprise! — all the obvious names are taken.

Here are just a few names I tried that someone has already staked out:

  • bachman
  • dinesh
  • hooli
  • piedpiper
  • trescommas
  • rebillionization

There are obviously lots of other options, but you have to dig pretty deep to find something relevant. The gold rush is over.

Investing for busy scientists: roboadvisers to the rescue.

Roboadvisers like WealthFront offer an excellent way for scientists to easily invest in the stock market, letting them focus their energy on more important things, you know, like science.

Over the years, I’ve used a lot of different investment firms, brokerage houses, and online trading platforms: TIAA-CREF, American Century, Merrill Lynch, Fidelity, E*Trade, Schwab, to name a small sample.

The problem with all of these firms is that unless you are a very high net worth individual, they probably don’t have your best interest in mind. Your tiny account offers too low of a return for the cost of servicing it.

I’ve even stupidly wasted time with things like Interactive Brokers, read lots of books on short, day, swing, options, and forex trading, and desperately tried to read the candlesticks and shooting stars to predict the future. But the problem with trading on your own (or through a brokerage) is that you simply cannot compete against people with detailed reporting and insight into a companies’ finances and development, or against people exploiting technological nuances of the system. It’s also really nerve-wracking and very easy to find yourself facing the sunk-cost fallacy conundrum on an hourly or daily basis.

Stepping beyond that, the fees of brokerage firms — if you can actually figure out what they are — are exorbitant, often nullifying your return over time.

Still, investing is one of the best things young scientists can do. I did so in an exceptionally small way by dollar cost averaging throughout my graduate career. By my final year of graduate school, I had enough saved with a modest capital return to buy a new computer for writing my thesis and to take a little trip once said thesis was accepted, bound, signed, delivered, and shelved.

But investing takes time. And it carries significant risk. Nothing can remove the risk but the time required to invest can be reduced, and a new crop of companies extend benefits previously only available to the few.

Enter: The Roboadvisers

Roboadvisers are basically brokerages that manage funds algorithmically, typically tracking stock indices. In doing so, they often include tax-loss harvesting, automatic daily rebalancing, and direct indexing to all investors, services that were previously only available to the ultra rich. Better yet, roboadvisers are typically much cheaper than brokerages like Fidelity.

My current favorite is WealthFront. WealthFront offers incredibly low fees (0.25% per year) and they’ll manage the first $10,000 you invest for free. Better yet, there is a very low account minimum ($500) so you can get started early.

Everything they do is automated: tax-loss harvesting, dividend reinvestment, account rebalancing. And not just on an annual basis like most financial advisors — they do it all, daily.

And, as you might expect of a company disrupting the gross largesse of investment and retirement planning, they have a great website and mobile app that’s constantly improving, and dedicated and friendly support staff. Sign up now and get an additional $5000 managed for free.

This post may contain affiliate links. If you click on these links and purchase something, I may receive a miniscule commission. I only endorse things I find informative and useful.