Thursday, June 26, 2014

How to Learn Bioinformatics


At least once a month someone asks me for help learning bioinformatics.  I love it when this happens because it usually means they want to take control of their own analysis thereby freeing up my time for problems that interest me.  This post is a collection of tips and resources for people wanting to learn how to do bioinformatics.

Keep These Things in Mind:
  • Learning the basics of bioinformatics is easy.  The basics as described in this post are often taught in high school.  However, don't get frustrated if you don't understand everything all at once.  Learning anything new takes time and practice no matter its difficulty.  
  •  A little bit goes a long way.  I estimate that nearly 90% of my work is occupied by simple routine procedures.  Learning how to do these tasks will substantially expand your ability to analyze and interpret your data.
  • Google it.  Google is the best resource for learning new techniques and trouble shooting problems.  If you have a question type exactly what you would say to a person into the google search bar.  When you take questions to your bioinformatics friends it's likely they won't know the answer offhand and will google it anyway.
  • Try it.  If you're not sure about something try it and see what happens.  Generally, there is very little danger is just trying a command to see if and how it works.  That being said it's a good idea to backup important files and data just incase something goes very wrong.  Every Unix programmer that I know has deleted a really important file using the rm command (which is one of the few irreversible Unix commands).  It's going to happen to you too so make a backup.

Learn the Unix Basics
  • Get on a Unix machine.  Doing is the most important aspect of learning Unix.  You will never fully understand the basic concepts if you only read about them.  Mac users have it easy because OSX is build on a unix shell.  Simply open the terminal application and you are ready to start with an online tutorial.  For non-mac users I recommend finding an old computer and installing a Linux/Unix operating system like Ubuntu.  A slightly more difficult approach would be to partition the drive of an existing computer to dual boot a Linux/Unix OS along with the existing OS.
  • Complete an online tutorial
  • Buy a book if you are a book learner.  However, the basic can pretty much all be learned using online materials.  My favorite Unix book is O'Reilly's Unix Power Tools.

Learn a Scripting Language
  • Pick a scripting language.  Scripting languages are computer languages that are not compiled (i.e. they are interpreted by the computer on the fly).  The two most popular bioinformatics scripting languages are Perl and Python.  Both languages have their strengths and weakness, but I personally prefer Perl.
  • Complete an online tutorial for your language.
  • Buy a book.  My favorite Perl book is Perl Best Practices by Damian Conway.  This book is a must have for all Perl programmers!  I don't have much experience with Python books, so I would recommend looking at book reviews before making a purchase.

Learn a Statistical/Graphing Language
  • Pick a language for doing statistical operations and building figures.  Languages like R and Matlab are prime choices for both statistics and graphics.  Both languages have their strengths and weaknesses, but I personally prefer R.  If you choose R I highly recommend using the ggplot2 library for building figures.  
  • Complete an online tutorial for your language.
  • Give up Excel.  Excel is a powerful program but lacks the flexibility of computer languages like R and Matlab.  While there is a steeper learning curve for R and Matlab, you will substantially enhance your ability to do statistical analyses and build graphics by getting away from Excel.

Learn Basic Bioinformatics Procedures and Corresponding Software Tools

For example:
This is only a small list of procedures and tools primarily focusing on DNA sequence analysis.  For a more comprehensive list see OMICtools.

Find a problem

I strongly encourage new bioinformaticians to find some real data to do meaningful science using the above principles and skills.  If you don't personally have data I recommend downloading data from a public repository (i.e. Genbank).  A similar alternative would be to choose a paper that uses a procedure you are interested in learning and recapitulate the results.