Lessons Learned from Looping over Matplotlib

I recently wrote some code that, in every iteration, demanded the creation of 6 figures, called from an imported class’ function. The basic logic went like this:

for i in xrange(2000):

    # do some logic
    create_figure(1)
    save_figure(1)

    create_figure(2)
    save_figure(2)

    ...


    create_figure(6)
    save_figure(6)

    # do some other logic

This turned out to be a huge time sink – by iteration 500, I was seeing my code run at 1 iterations per minute, as opposed to the 7-8 iterations per minute I was seeing early on.

What exactly was going on? As you might guess from the title, it was matplotlib at work.

This is probably not the most elegant or Pythonic way of figuring out the source of the issue, but it worked. I’d like to share it here so that others don’t suffer through the same.

Conceptually, what I did was to capture the time at which a block of logic was going to be executed, the time at which the logic was ended, and compute the time difference. So what I did was insert a bunch of time() statements everywhere.

from time import time

logic_time = np.zeros(100)
plotting_time = np.zeros(100)

for i in xrange(100):
    logic_start_time = time()
    # do some logic
    logic_end_time = time()

    logic_time[i] = logic_end_time - logic_start_time

    plotting_start_time = time()
    create_figure(1)
    save_figure(1)

    create_figure(2)
    save_figure(2)

    ...


    create_figure(6)
    save_figure(6)
    plotting_end_time = time()
    
    plotting_time[i] = plotting_end_time - plotting_start_time

I was then able to plot the time required for each cycle, effectively getting a line chart of time spent as a function of cycles. That was when I discovered that plotting time continued to increase in tenths of seconds with each cycle, eventually going from 0.-something seconds to 5.-something seconds. It’s not hard to see, then, why plotting over 500 cycles caused the run times for each loop to increase from less than 10 seconds to over 1 minute, and that by the time it reached 700 cycles, it was close to 2-3 minutes per cycle.

Because I reasoned with myself that the figures being generated were not publication-ready, and were occupying unnecessary disk space, I eventually decided to comment out the create/save_figure code. That solved the runtime problem, which was sufficient for what I needed. However, I did read other people’s blogs that described how to speed up matplotlib, which I believe is worth a read, especially if the plotting code is directly within the script (which was not my particular case).

Leave a Reply