Todd Harris, PhD

Facilitating scientific discovery at the intersection of genetics, genomics, bioinformatics, big data, cloud computing, and open science.

  • About

Investing for busy scientists: roboadvisers to the rescue.

April 22, 2016 By Todd Harris Leave a Comment

Roboadvisers like WealthFront offer an excellent way for scientists to easily invest in the stock market, letting them focus their energy on more important things, you know, like science.

Over the years, I’ve used a lot of different investment firms, brokerage houses, and online trading platforms: TIAA-CREF, American Century, Merrill Lynch, Fidelity, E*Trade, Schwab, to name a small sample.

The problem with all of these firms is that unless you are a very high net worth individual, they probably don’t have your best interest in mind. Your tiny account offers too low of a return for the cost of servicing it.

I’ve even stupidly wasted time with things like Interactive Brokers, read lots of books on short, day, swing, options, and forex trading, and desperately tried to read the candlesticks and shooting stars to predict the future. But the problem with trading on your own (or through a brokerage) is that you simply cannot compete against people with detailed reporting and insight into a companies’ finances and development, or against people exploiting technological nuances of the system. It’s also really nerve-wracking and very easy to find yourself facing the sunk-cost fallacy conundrum on an hourly or daily basis.

Stepping beyond that, the fees of brokerage firms — if you can actually figure out what they are — are exorbitant, often nullifying your return over time.

Still, investing is one of the best things young scientists can do. I did so in an exceptionally small way by dollar cost averaging throughout my graduate career. By my final year of graduate school, I had enough saved with a modest capital return to buy a new computer for writing my thesis and to take a little trip once said thesis was accepted, bound, signed, delivered, and shelved.

But investing takes time. And it carries significant risk. Nothing can remove the risk but the time required to invest can be reduced, and a new crop of companies extend benefits previously only available to the few.

Enter: The Roboadvisers

Roboadvisers are basically brokerages that manage funds algorithmically, typically tracking stock indices. In doing so, they often include tax-loss harvesting, automatic daily rebalancing, and direct indexing to all investors, services that were previously only available to the ultra rich. Better yet, roboadvisers are typically much cheaper than brokerages like Fidelity.

My current favorite is WealthFront. WealthFront offers incredibly low fees (0.25% per year) and they’ll manage the first $10,000 you invest for free. Better yet, there is a very low account minimum ($500) so you can get started early.

Everything they do is automated: tax-loss harvesting, dividend reinvestment, account rebalancing. And not just on an annual basis like most financial advisors — they do it all, daily.

And, as you might expect of a company disrupting the gross largesse of investment and retirement planning, they have a great website and mobile app that’s constantly improving, and dedicated and friendly support staff. Sign up now and get an additional $5000 managed for free.

[th-affiliate-disclaimer]

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: careers Tagged With: career tips, investing, roboadvisers, survive-and-thrive

How do we assess the value of biological data repositories?

May 18, 2015 By Todd Harris 2 Comments

In an era of constrained funding and shifting focus, how do we effectively measure the value of online biological data repositories? Which should receive sustained funding? Which should be folded into other resources? What efficiencies can be gained over how these repositories are currently operated?

Everyday it seems a new biological data repository is launched. Flip through the annual NAR Database issue if you do not believe this.

Quick aside: I want to differentiate repositories from resources. Repositories are typically of two varieties. The first collects primary data in support of a single or small consortium of labs. The second culls data from the scientific literature or collects it by author submissions before, during, or after the publication process. Resources may have some characteristics of repositories but are usually more tool oriented and aim to process data that users bring to it.

Encouraging the creation of a large number of repositories has been an important development in the rapid advancement of bioinformatics. These repositories, in turn, have played a critical role in the genomic- and post-genomic research world.

The current community of biological repositories allow for experimentation and innovation of data modeling and user interface. They provide opportunities for education. And they let small labs participate in the grand process of Scientific Curation: the documentation of scientific progress outside of the traditional prose-based publication narrative. We should continue to carefully create new repositories when warranted for the innovation and educational opportunities that they present.

On the flip side, these repositories are often brittle: graduate students and postdocs move on and create knowledge vacuums. Elements of the repository break due to software upgrades and security patches. Data becomes unreliable (eg as genomic coordinates are refined). Interest declines yet the cost of maintaining the resource remain. And daving many repositories carrying slightly different versions of data also introduces confusion for downstream analyses and hinders reproducibility.

Clearly, we need an effective way of measuring the reach and value of biological data repositories. When a repository crosses a certain threshold, it’s funding should be decreased or removed. Remaining funds should used (or allocated if necessary) to port the repository to a parent resource for long-term maintenance.

How can we determine the value of a biological repository?

1. Page views.

Simple metrics like page views, taken in isolation, are wholly insufficient for assessing the value of a repository. Each repository, for example, may have different tolerances for robots, different definitions of what constitutes a robot, or different architectures that mitigate the importance of page views. Page views is only one element that should be taken into account. I personally believe that it’s one of the least effective ways of defining the value of a repository and worry that too much emphasis might be placed on it.

2. Size of the user community.

How big is the user community? This should include registered users (if the repository has such an implementation), users determined via analytics, and the rate of citation.

3. Collective funding of the core user community.

How much money has the core user community of the repository been entrusted with? How much of that money would be lost or wasted if the repository were to be placed in a maintenance mode or defunded altogether? There is no sense in throwing good money after bad and sometimes tough choices must be made, but if the core research community — and all of the money vested in that as well — is affected, funding choices of the repository should be weighed very, very carefully.

Don’t get me wrong: repositories that serve relatively small communities (with a relatively small amount of funding have value. But the net value of such repositories cannot compare to one that serves a user community 10x the size with 100x the funding.

4. Frequency of use.

How frequently to members of the core user community access the repository? Is it essential for day-to-day lie at the bench? Or maybe it is used in a more referential manner on a periodic basis.

5. Difficulty of assimilation.

How difficult and time consuming would it be to fold an existing repository into another? Repositories containing very specialized data, data models, or analysis tools that still support moderately sized communities with substantive funding could actually be MORE expensive to fold into another repository than to continue its maintenance independently.

In sum, page views are insufficient. We need to define the size of the user community, the extent of its funding, and how critical the repository is to continued progress of that community. Finally, we need to carefully weigh sustained funding vs. maintenance funding vs. assimilation costs.

Without knowing these five parameters, making consistent decisions about the value of biological repositories will be challenging at best.

And even with these metrics in hand, the real question — and the one that is much more difficult to address — is:

What are the thresholds for continued support of a repository?
I will address my thoughts on this in an upcoming post.

What are your thoughts? What other metrics should we be using to determine the value of biological data repositories? Leave a comment or ping me on Twitter.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: bioinformatics, careers, funding, science policy Tagged With: big data, bioinformatics, data, funding

It’s time to reboot bioinformatics education

March 23, 2015 By Todd Harris 28 Comments

JPEG image-936B732E31E9-1Nearly 15 years after completion of the human genome, undergraduate and graduate programs still aren’t adequately training future scientists with the basic bioinformatics skills needed to be successful in the “big data in biology” era. Why?

As a project manager and developer of a long running model organism database (and a former bench scientist myself), I interact with biologists on a daily basis. Franky, I’m alarmed by what I see. Here are some examples of the types of questions I field on a daily basis:

I have a list of genes and I’d like to know the function of each.

I need all the [unspliced|spliced|upstream|downstream|translated] sequence for a group of genes.

I need my data in one very specific file format to support a legacy platform.

I need to do <this generic task> over and over again. It’s killing me and is a waste of my time. Help!

Many junior scientists percolating through the ranks lack the basic skills to address such questions. (I’ll talk about old dogs and new tricks in a subsequent post). More troubling, they often lack the core skills and initiative to tackle rudimentary informatics problems. These include common tasks like collecting and collating data from diverse sources, searching a wiki, reading a mailing list archive, or hacking a pre-existing script to suit a new purpose.

Bioinformatics is here to stay. Get used to it.

Ten or fifteen years ago, many research institutions displayed significant resistance to (and significant ignorance about) the field of bioinformatics. Was it really science? Was it sufficiently hypothesis driven? How did it fit into the mission of a research institute or primarily undergraduate teaching environment? Happily, that resistance has been overcome at most institutions.

Bioinformatics isn’t the same as learning a transient and fleeting laboratory skill. Becoming proficient at running Southern blots or learning a protein purification process might help a student address the discrete questions of their thesis. But in the long term, these are disposable skills learned at great cost.

Not so with bioinformatics. Bioinformatics is a way of thinking. It’s a critical process of organizing information that spills over into many aspects of modern research life. It’s also very easy to develop a useful skill set with a very small time investment.

Frustratingly, many students still have a mental block about programming. They’ve learned (through assimilation and not experience) that programming is difficult. Or they’ve been trained to expect a convenient web interface for everything they need to do. In an ideal world, there would be a web interface for everything. This isn’t an ideal world.

Why has bioinformatics education failed?

I believe that current efforts in bioinformatics education have failed for three reasons.

First, and most fundamentally, bioinformatics training still isn’t universally available. Because of the initial resistance to the field many institutions still lack qualified personnel capable of teaching entry and intermediate level bioinformatics courses.

Second, when bioinformatics training is offered, it’s often as an elective and not considered part of the core curricula.

Finally, the nature of much bioinformatics training is too rarefied. It doesn’t spend enough time on core skills like basic scripting and data processing. For example, algorithm development has no place in a bioinformatics overview course, more so if that is the only exposure to the field the student will have.

Can we fix bioinformatics education?

Yes. Look, it’s easy. Students need primer courses on basic skills first. And it needs to be MANDATORY. Maybe drop the radiation safety course if there isn’t time. Who uses radioactivity anymore anyways? Here are the three core areas that I think all students in cellular & molecular biology, genetics, and related subfields need to succeed.

Core Area 1: Data Discovery

Data discovery refers to a related set of knowledge and skills. What data is available and where can it be found? How can it be retrieved? What if there isn’t a web interface or the data needs to be fetched on a routine basis? Being able to answer such questions forms the basis for programmatically accessing and managing data.

Students should learn how to access common data repository structures like FTP sites, web-based data mining interfaces, wikis, and APIs. They should learn skills for programmatically mining data repositories by learning how to write basic web spiders.

Core Area 2: Data Management

Naming files and datasets consistently and unambiguously is rarely discussed. Nor is data organization and management. These skills are critical for effective analysis, for communication and publication, and for reproducibility.

Boring? Perhaps. But it is absolutely shocking what file naming and management schemes scientifically minded people have created.

Effective data management is not always intuitive. But there are conventions and strategies that can be immensely helpful for transparency, data sharing, and interoperability. Being able to programmatically manage data files is also incredibly useful and a great time saver: rearranging directories, renaming files, archiving files, basic I/O redirection. This is not just for bioinformatics per se, but applies to many areas of biology such as managing confocal images, for example.

Core Area 3: Data transmogrification

Finally, up-and-coming scientists should be able to easily convert files from one format into another.

Again, boring. But useful? You bet. Cast off your Excel shackles.

A quick note to current graduate level students

Are you a graduate student in cell biology, molecular biology, biochemistry, or genetics (or related subfields)?

You should be receiving bioinformatics training as part of your core curriculum. If you aren’t, your program is failing you and you should seek out this training independently. You should also ask your program leaders and department chairs why training in this field isn’t being made available to you.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: bioinformatics, careers, education, teaching Tagged With: bioinformatics, education

Social media and career advancement

June 1, 2008 By Todd Harris Leave a Comment

Burgeoning scientists take note: Mitch Joel at Twist Image / Six Pixels of Separation has a great post on using social media for career development.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Filed Under: careers Tagged With: career development, job seeking, social media

Welcome!
My name is Todd Harris. A geneticist by training, I now work at the intersection of biology and computer science developing tools and systems to organize, visualize, and query large-scale genomic data across a variety of organisms.

I'm driven by the desire to accelerate the pace of scientific discovery and to improve the transparency and reproducibility of the scientific process.

Stay in touch!

Enter your address to receive notifications of new posts by email.

Join 1,296 other subscribers

Copyright © 2023 · Genesis Sample Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...