Writer’s note: This blog post was written for the class of Science One 2015/2016, 9 years on after my own experience in Science One 2006/2007. My classmate, Jacob Bayless, is giving a talk to them titled “Life After Science One”, and reached out to me for some perspectives on learning computation. Here’s my piece, for him, and for this year’s class and beyond.
10 years ago, I joined UBC as a student in the Science One program. That year was a fun year, and one of the best educational experiences I’ve had. During Science One, we learned to think integratively across disciplines. For example, we saw how order of magnitude estimation, a common tool in physics, could be applied to ecology, biochemistry, and thermodynamics. As another example, we learned about the application of ordinary differential equations to ecology and immunology in predator-prey systems. That way of thinking – by bridging disciplines and meshing ideas – is something I’ve re-discovered, re-encountered, and re-applied over and over in my research career to date. It doesn’t stop, even after Science One.
There was something I wish was emphasized back in my year, which you all now have the privilege of learning: computation and computational thinking.
While at UBC, I did some quantitative classes, including multivariable calculus, introductory programming (Java was all the rage back then), and statistics, but nothing more than that. By the end of my undergraduate training, I was thoroughly trained in molecular biology, but utterly helpless with programming ecosystems. I only knew how to transcribe and translate DNA sequences in Java.
Later, I transitioned into doing computational work during my PhD. I was ending my 2nd year in the MIT Biological Engineering department, and in search of a good topic to work on for my thesis. I was a 2-month old Pythonista (this is what Python programmers call ourselves, so you all are Pythonistas now!) at that point, and I was teaching myself Python to improve cloning workflows by automating PCR primer design. On the recommendation of a friend, I checked out the Boston Python meetup group. There, I met Giles, a software developer with a gene synthesis company, and a former Broadie. I told him this idea I had to classify all of the internal genes of the influenza virus. Looking back, it’s a bad idea to try attempting this for a thesis project, but Giles and I were both naive enough about the problem that we set about talking through it. I drew him a matrix, he came back with an idea, we Googled stuff up, I came up with an idea, and drew another thing.
Rinse, wash, repeat. It was such an energizing and exciting time! Giles looks at our ideas, and says, “I think you need a clustering algorithm. Try… affinity propagation. It’s a relatively new one, but nonetheless has a mature implementation available in Python. Search
scikit-learn for it.” In effect, he was asking a 2-month old Pythonista to do machine learning in Python. Well played, Giles; fast forward a few months, next thing I knew, he had effectively kickstarted the groundwork for my thesis, which would eventually evolve into a computational study of influenza’s capacity for reticulate evolution (through reassortment) and its importance in switching hosts (or ecological niches). My advisor Jon, though not a computationally trained person himself, trusted me with the freedom to learn, fail, and create under his mentorship, and I am hoping that in due time, we can reap the fruit of this trust.
We ended up never using machine learning in that paper above, but a few things happened that got me deeper into computation. My experience working with the
scikit-learn library piqued my interest in machine learning tools as applied to biology. I learned about network analysis through Allen Downey’s book, “Think Complexity”, and incorporated it as my main modelling tool as well. I went to PyCon 2014 and 2015 (in Montreal), giving tutorials on data analysis and network analysis. While on conference, I also learned a ton as well about how to use the
scikit-learn libraries, and good practices in the software development world that could be used in the research world. This year, I will be at PyCon 2016 giving a tutorial on statistical network analysis as well, while also continuing the learning journey in the Python and data science worlds. The learning journey continues; I soon discovered for myself that scikit-learn alone wasn’t enough, and I’ve started learning the internals of deep learning from a group up at Harvard, with the goal of applying it to developing highly interpretable models of phenotype from genotype. At the same time, the environment here (MIT & the Broad Institute) helps; there’s a Models, Inferences & Algorithms seminar series, during which we learn both the mathematical underpinnings of computational methods, and their application to biomedical problems.
For the past 9 years, the learning hasn’t stopped. I’ve found that thinking integratively, which is exactly what is taught in Science One, is a crucial element in the creative process. New ideas come from executing on the juxtaposition and composition of old ideas, in ways others have never done before. Computation has also played a huge role in enabling me to tackle problems that would otherwise be beyond my reach; from simple automation to modelling complex problems, it’s a superpower you’ll want to have. Computational thinking has helped me scale the scope of what I could solve. So, to SciOne 15/16, cherish the chance learn what you have learned so far. As soon as you can, find an outlet, a problem on which to apply what you’ve learned; go solve a problem for the world around you, whether big or small. Find people who will teach you, trust you with resources to finish the task, and pick you up when you’re down. And while you’re at it, have a ton of fun!