Lessons Learned from Looping over Matplotlib

I recently wrote some code that, in every iteration, demanded the creation of 6 figures, called from an imported class’ function. The basic logic went like this:

for i in xrange(2000):

    # do some logic
    create_figure(1)
    save_figure(1)

    create_figure(2)
    save_figure(2)

    ...


    create_figure(6)
    save_figure(6)

    # do some other logic

This turned out to be a huge time sink – by iteration 500, I was seeing my code run at 1 iterations per minute, as opposed to the 7-8 iterations per minute I was seeing early on.

What exactly was going on? As you might guess from the title, it was matplotlib at work.

This is probably not the most elegant or Pythonic way of figuring out the source of the issue, but it worked. I’d like to share it here so that others don’t suffer through the same.

Continue reading

Two ideas for using Evernote’s tagging productively

I have heard about Evernote’s tagging feature, and how one can use it effectively. I’d like to share two ways that tagging has helped me keep my stuff in order. (These aren’t the only ways to do it – there are a myriad of creative uses for tags!)

1. Keeping track of people associated with work/ideas

I use tags with a “#” placed before it, and then append different people’s names to my notes. For example, if I am taking notes on a discussion between myself and a colleague, at some point, I will tag the note with “#<Colleague’s Name Goes Here>”. That way, when I am looking for notes associated with a particular person, I can easily restrict the search terms by having the “#” symbol before their name.

2. Creating temporary “folders” for meetings and for archival

I use tags with “YYYYMMDD <meeting name>” to in preparation for a meeting to gather notes that may be scattered across a number of notebooks under one name space (to borrow a programming term). I try to limit the number of notes to about 6-7, so that I don’t end up scrolling through a whole bunch of other notes. Evernote is also good about providing a list of “related notes” as well, so that helps in limiting the number of notes that get tagged. The tag gets placed temporarily in the shortcuts area, and when it’s meeting time, it’s all at my fingertips. I also keep the tag archived under a “master tag” called “meeting notes”, for archival purposes.

What is the history of the H3 numbering system?

Where possible, researchers tend to refer to the amino acid positions on the influenza hemagglutinin protein by two numbers: one based on itself, and one based on the “H3 numbering system”. What exactly is the “H3 numbering system”? Is there a rational reason for using such a numbering system, or is this simply an artefact of history?

To answer this question, I did a bit of digging around on the history of the H3 numbering system. As it turns out, the numbering system is really an artefact of history.

Back in the 80s, when sequencing began to be used as a tool for comparative analysis, groups at the MRC (Cambridge, UK) and at Harvard needed a way of establishing a reference system. Since not much data were available for each subtype, researchers defaulted to an earlier virus - A/Aichi/2/68, which happened to be an H3 virus. Let me show you what I’ve found based on a backward citation trace. Continue reading

Strategies to declutter email

Tonight, I made a very interesting observation. I checked my inbox at about 10:45 pm or so, and found that i had zero messages in my inbox.

This isn’t the first time I’ve made the observation, but it’s only now, in the context of what I’ve been experiencing and reading, that I realize that this has been many months in the making. I used to look forward to notices and newsletters being delivered into my Inbox. After realizing how disruptive it was to my creative practice (as a graduate student researcher, that is), I began on a decluttering experiment that started from decluttering my email, and is still ongoing into other areas of my life. (After all, it’s less physically laborious to declutter information compared to physical items.)

The journey was long, but here are some of the strategies I have ended up using to declutter my email.

Continue reading

Email Productivity

After giving it much thought, I think I have realized what my biggest problem with email is.

It’s distracting. Immensely distracting.

Yet, for deep work, the thing I need to have the least of is distractions. The user interface for email apps right now just don’t cut for productive work. Our tools are supposed to get out of the way of what we do, not get in the way and prevent us from working.

Initially, I might want to send a simple email. If I fire up Gmail, though, I might be confronted with other unread messages, which are an immediate distraction.

The solution I have found is to use a menu-bar emailer app that lets me send but does not let me check. There’s a number of them out there, but the one I recently purchased (for $0.99) is QuickMailer, available on the App Store. It’s true to it’s name. Click on the menu bar icon, type up your email, and go back to work without having to check other email.

But there’s more to email productivity and habits. Let me share what has worked for me over the past year of experimentation.

Continue reading

Anaconda Distribution

I have previously written about actively choosing not to use the Anaconda distribution of Python & its packages for the reason of wanting to go vendor-independent. While I have been able to do that on my Macbook Air, which is the main computer on which I do my coding, I’ve gotten my hands on a Sony VAIO box (courtesy of my friend Thomas from San Francisco), which I have converted into a clean slate Linux box.

Because of that, I am going to try out the Anaconda distribution on this machine and see how it works out. I’ve got a few reasons why – some technical, some personal.

From memory, setting up SciPy, NumPy and Pandas gave me a ton of headache getting it right. On my friends’ computers, ensuring that the installation directories were correct was also an issue. Finally, having to manage between Python 2 and Python 3 was giving me troubles on other people’s computers as well. From the last PyCon, it also looked like Continuum’s Anaconda distro was getting a lot of momentum. Finally, I’ve wanted to find a way to put into action my own personal feelings of appreciation to Travis, Continuum’s CEO whom I met at the last PyData conference, for sharing his own life story with me and passing on a lot of advice. I think getting familiar with Anaconda will be the first step in the right direction.

Wrap-up on PyCon 2014 tutorial

People are interested in using Pandas in IPython. I think most people who came to the conference are already fundamentally well-grounded in doing data analysis. The lacking portion is the tools.

Feedback from the tutorial generally was positive, and I think they appreciated the time spent to wrestle with Pandas in the IPython HTML shell.

From the above, I think the following things can be changed:

  1. De-emphasize counting and ranking, merely mention it.
  2. Provide a richer data set on which more sophisticated analyses can be performed. This would tie in well with the next point.
  3. Provide a cheat sheet for Pandas functions. That would mean arithmetic operations (add, subtract, multiply, divide), basic data summarization (mean, mode), groupby operations (especially aggregation), and plotting.
  4. Keep the format where people have to work on a particular problem of choice, in groups, for a long period of time.
  5. Re-distribute the class to be more evenly balanced between coders and analyzers.
  6. Follow-up with additional resources if possible.

I really enjoyed this year’s tutorial sessions, and I’d like to give it a shot next year!