Nearly 15 years after completion of the human genome, undergraduate and graduate programs still aren’t adequately training future scientists with the basic bioinformatics skills needed to be successful in the “big data in biology” era. Why?
As a project manager and developer of a long running model organism database (and a former bench scientist myself), I interact with biologists on a daily basis. Franky, I’m alarmed by what I see. Here are some examples of the types of questions I field on a daily basis:
I have a list of genes and I’d like to know the function of each.
I need all the [unspliced|spliced|upstream|downstream|translated] sequence for a group of genes.
I need my data in one very specific file format to support a legacy platform.
I need to do <this generic task> over and over again. It’s killing me and is a waste of my time. Help!
Many junior scientists percolating through the ranks lack the basic skills to address such questions. (I’ll talk about old dogs and new tricks in a subsequent post). More troubling, they often lack the core skills and initiative to tackle rudimentary informatics problems. These include common tasks like collecting and collating data from diverse sources, searching a wiki, reading a mailing list archive, or hacking a pre-existing script to suit a new purpose.
Bioinformatics is here to stay. Get used to it.
Ten or fifteen years ago, many research institutions displayed significant resistance to (and significant ignorance about) the field of bioinformatics. Was it really science? Was it sufficiently hypothesis driven? How did it fit into the mission of a research institute or primarily undergraduate teaching environment? Happily, that resistance has been overcome at most institutions.
Bioinformatics isn’t the same as learning a transient and fleeting laboratory skill. Becoming proficient at running Southern blots or learning a protein purification process might help a student address the discrete questions of their thesis. But in the long term, these are disposable skills learned at great cost.
Not so with bioinformatics. Bioinformatics is a way of thinking. It’s a critical process of organizing information that spills over into many aspects of modern research life. It’s also very easy to develop a useful skill set with a very small time investment.
Frustratingly, many students still have a mental block about programming. They’ve learned (through assimilation and not experience) that programming is difficult. Or they’ve been trained to expect a convenient web interface for everything they need to do. In an ideal world, there would be a web interface for everything. This isn’t an ideal world.
Why has bioinformatics education failed?
I believe that current efforts in bioinformatics education have failed for three reasons.
First, and most fundamentally, bioinformatics training still isn’t universally available. Because of the initial resistance to the field many institutions still lack qualified personnel capable of teaching entry and intermediate level bioinformatics courses.
Second, when bioinformatics training is offered, it’s often as an elective and not considered part of the core curricula.
Finally, the nature of much bioinformatics training is too rarefied. It doesn’t spend enough time on core skills like basic scripting and data processing. For example, algorithm development has no place in a bioinformatics overview course, more so if that is the only exposure to the field the student will have.
Can we fix bioinformatics education?
Yes. Look, it’s easy. Students need primer courses on basic skills first. And it needs to be MANDATORY. Maybe drop the radiation safety course if there isn’t time. Who uses radioactivity anymore anyways? Here are the three core areas that I think all students in cellular & molecular biology, genetics, and related subfields need to succeed.
Core Area 1: Data Discovery
Data discovery refers to a related set of knowledge and skills. What data is available and where can it be found? How can it be retrieved? What if there isn’t a web interface or the data needs to be fetched on a routine basis? Being able to answer such questions forms the basis for programmatically accessing and managing data.
Students should learn how to access common data repository structures like FTP sites, web-based data mining interfaces, wikis, and APIs. They should learn skills for programmatically mining data repositories by learning how to write basic web spiders.
Core Area 2: Data Management
Naming files and datasets consistently and unambiguously is rarely discussed. Nor is data organization and management. These skills are critical for effective analysis, for communication and publication, and for reproducibility.
Boring? Perhaps. But it is absolutely shocking what file naming and management schemes scientifically minded people have created.
Effective data management is not always intuitive. But there are conventions and strategies that can be immensely helpful for transparency, data sharing, and interoperability. Being able to programmatically manage data files is also incredibly useful and a great time saver: rearranging directories, renaming files, archiving files, basic I/O redirection. This is not just for bioinformatics per se, but applies to many areas of biology such as managing confocal images, for example.
Core Area 3: Data transmogrification
Finally, up-and-coming scientists should be able to easily convert files from one format into another.
Again, boring. But useful? You bet. Cast off your Excel shackles.
A quick note to current graduate level students
Are you a graduate student in cell biology, molecular biology, biochemistry, or genetics (or related subfields)?
You should be receiving bioinformatics training as part of your core curriculum. If you aren’t, your program is failing you and you should seek out this training independently. You should also ask your program leaders and department chairs why training in this field isn’t being made available to you.
I can’t think of more than a handful of programs that are Doing It Even Approximately Right. Do you have any pointers to programs that you think are good? Thanks!
I haven’t reviewed many recently but would be interested to do so.
The various Programming for Biology courses that have been offered in the past at CSHL provide a good model to follow.
I think the new OReilly Bioinformatics book by Vince Buffalo is the first textbook written that’s would actually be useful for a course.
Thanks for this tip. We do some training of biologists at our institute and this looks like a useful book to use with them or recommend.
Great post, I agree with you. I finished my Biology degree 2 years ago and no formation in informatics skills was given. I decided that I wanted to be a bioinformatician, but as I think that MS’s currently doesn’t teach enough programming (or data management) I’ve taken a little detour to do a 2 years Programming Vocational Training Course. Then I’ll take a MS.
Anyway, as my Vocational Training is focused on programming but not related with bioinformatics at all I kept searching on internet and I found that there are a lot of great resources as MOOC. As you said current programs should change to teach bioinformatics skills needed. One easy way could be encourage students to take this courses. I learnt from scratch to code in Python and now I’m going to more advanced topics as Algorithms designs and statistical analysis of data.
If any reader is interested I found this great list of online resources to start with from PLOS journal http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002632
Good work on pursuing your interests. I like your approach and think it will provide you with a very solid foundation.
Thanks for posting the link, too. Very useful.
I’d like to expand on this a bit. I think this should be rolled out to folks who are no longer in grad school too. It should be ongoing development of the workforce. These tools change all the time, new features roll out, new tech is coming along. Even if you saw some of it in grad school, there’s new and different stuff continuously.
I think those with deeper knowledge from years of work in the field could benefit a lot too. However, some of them are afraid to admit they need the training, can’t seem to schedule it, or don’t even quite get how it could benefit them either. We see this all the time.
Mary, this is a great point. I completely agree. I know many PIs that would benefit greatly from some basic bioinformatics skills.
I do think there is a sweet spot for training. Week long courses are a big commitment. An ongoing seminar series or a weekend bootcamp might be a reasonable approach.
Perhaps we should be starting to show the integral part bioinformatics and data analysis take in the life sciences, even before university?
At my daughter’s school, they have a biology teacher trying to convince 16-17 year olds that bioinformatics is the future. To try and help him out, I contacted the EBI to see if they have any sort of introduction or tour and was told there is an open day focused at undergraduates each year, but nothing for younger students.
Great post! One issue that I ran into early in my career is that particular individuals can sometimes act as gatekeepers. I had to teach myself bioinformatics and admit that I am not particularly proficient, but my efforts were harshly critisized because “I was doing it all wrong” etc. This was very discouraging and is still causing me to question my approaches and skills. I agree that this training has to happen, but it needs to happen in a supportive environment and in an environment where assumptions are not made about prior knowledge and experience so that everyone starts out on a level playing field.
Thank you for your thoughtful comment.
Don’t be discouraged by the naysayers.
Developers have a tendency to arrogance especially towards newbies. People with an incrementally larger amount of knowledge simply cannot suffer the inane questions of the fools immediately below them. Even the format of questions is important. That attitude has no place in the classroom. It’s bad enough on internet forums.
You’re no doubt familiar with TIMTOWTDI – “There Is More Than One Way To Do It”.
Your code does not need to be beautiful nor does it need to be efficient. If it gets the job done and saves you some time, then it is a success. Obviously, it gets more complicated if building things that other people need to maintain or interact with.
And just like learning a language, the sooner you start using programming skills the faster you will become proficient. Having a real world problem to solve is a great way to learn.
While I agree that $people are lacking all kind of $skills, you (and me, kind of) have learned some of those through the last 15 (and more) years. I haven’t learned any $skill formally, really. So, while I agree the next generation should learn a lot of $skills, curriculae hold a lot of stuff already. And you can’t teach everything.
What we definitely need to teach is how to learn. But that starts way before University.
Precisely. The best education is one that teaches you how to learn. Excellent point.
That’s the type of training in bioinformatics I think people should receive. It’s critical, as this week’s darling database and favorite framework will certainly be something different next week.
And there’s nothing wrong with giving people a leg up with practical skills to solve real-world problems, either. Indeed, in an era of declining funding we probably need better and faster knowledge transfer and less beard stroking in the ivory tower.
If biological curricula are too full, maybe the curricula should be revisited and revised.
I did what I could with informatics in undergraduate/graduate school using the tools that i was comfortable with, but there’s a definite barrier to progress for most people without some sort of formal education about specific topics. I didn’t have the fear of programming, but I had trouble applying what I learned in CS classes to my biological questions.
You hit the nail on the head with the core areas you suggest, especially the data management piece. In my experience, most researchers really only respond when the funding agencies start asking for things, and there’s definite movement on both the bioinformatics and data management fronts, with BD2K and data management plans becoming more commonplace. I have high hopes for data reuse in the future.
One place that researchers tend not to look for help (in general and with this topic specifically) is the library. I’m a former research scientist who was hired as faculty at a Health Sciences Library to act as a liaison between researchers and the library. People on my campus are hungry for this type of information, as evidenced by the attendance I see at the monthly informatics workshop I run. I’m also in the process of developing 60-90 minute classes that can be run as standalone workshops or integrated as individual modules in the PhD curriculum. You’ve definitely given me some food for thought as far as new modules to develop.
Not all libraries have the capacity to do this, but it’s worth investigating.
Tobin, thank you for your feedback.
I find it revealing that even with your experience in CS, you had some trouble applying those skills to your research problems.
That would be a good group to focus some attention on for training. Not just “bioinformatics for CS” but more like “Hey CS people, these are the real-world day-to-day computational issues biologists need help with. Don’t sneer because they use a GUI. Just help them out.” 😉
I’d love to hear more about the workshops you are running. Can you point me (or send me) more information on them? Sounds great!
Great Post ,,Really Encouraging me ,, Im from Computer Science and had a opportunity to improve the process of genetic laboratory without knowing the term Bioinformatics.Project changed a whole life targets.By self studying i understand Bioinformatics is really marvellous subject if person competent with the core values.But sorry I couldn’t still haven’t got got formal education on the Bioinformatics due to lot of problems in my home country.
But I’m encouraged with such post to grab the knowledge going abroad.However I fully agreed that Bioinformation from Computer Science need a good understanding on the biological domain application going to build and key risks handled carefully .
Thanks Todd,,
Really Nice post. It has taken many years on my own learning and teaching journey to realise how important it is to focus on these foundation skills before moving on to the sexy stuff. The other dirty secret we need to face is the need to understand software installation and configuration.
John –
That’s a great point. It’s amazing/frightening how much of the operational side of bioinformatics is being efficient at sysadmin level of things. Hopefully containerization and virtual machines will nullify this to some extent. But at the end of the idea, software needs to be built, installed, configured. Being able to do so efficiently and reproducibly is really important.
Hello Todd, we are in the process of developing a platform that will simplify things a lot to the level of visual interface, I would be interested in showing you. However, there are already other tools out there that are being used, but I don’t think too many are used in research due to their price.
Great post, our company is developing a platform for project-based bioinformatics education. I think it’s a great challenge that will lead to new discoveries. Data skills is like learning a new language, so I think biologists need to learn as well as bioinformaticians need to learn the lab process so that both groups can communicate…
Elia, I would love to see the platform! I’ve dropped you an email…