Python’s Pickle is Magical

I only recently stumbled upon the Pickle methods in Python, and now that I’ve used it, I can only say, “Why did nobody tell me about this earlier?”

But before I go on, what the heck is pickling?

From the Python docs:

The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” [1] or “flattening”, however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

In true Pythonic style, the docs are well-written and easy to understand… if you understand what an object is, and what serialization is. I won’t even try to define them here. However, I will describe what I think about Pickling.

To me, Pickling is like what theoretical physicists might call “teleportation”. One essentially decomposes any Python object (thing – dictionary, list, NetworkX Graph, BioPython SeqRecord) into its elementary units, write it to disk, and re-composes it as the original object in another script or file.

I’ll give an example that sparked off my interest in Pickling.

I work with graphs in NetworkX, and I found myself stuck at what file format I should use for storing the graph. There’s adjacency matrices, edge lists, GML, GEFX… damn the list is long. The problem that I see is that it’s hard to know whether, after writing the graph to disk, I would be able to re-open it in its original form. My graphs are more than the simple Graph() objects provided; there’s DiGraphs() and MultiDiGraphs() as well.

So I tried my hand at GML. I tried writing it to disk, and re-opening it up in another IPython notebook. No dice; it was having trouble with the node names, which have “/” and “,” inside it.

Then I tried my hand at edge lists. Didn’t work either – it wouldn’t store the fact that I had a MultiDiGraph() object that I was trying to work with.

Then I saw this weird word called “Pickle“, which I thought was interesting. Clicking on it, I thought, “why not give it a shot?”

So I did. I saved the MultiDiGraph() as a Python Pickle, and tried re-opening it in another IPython notebook. BINGO! Everything re-composed as it was previously!

That’s when I realized why it’s called “pickling”. It’s a data preservation method. Pickling preserves food, Python Pickles preserve data. Not just the numbers, but the entire data representation.

For the record, I tried seeing what was inside the Python Pickle file. Opening up the file in Sublime Text, all I saw as the following:

8002 636e 6574 776f 726b 782e 636c 6173
7365 732e 6d75 6c74 6964 6967 7261 7068
0a4d 756c 7469 4469 4772 6170 680a 7101
2981 7102 7d71 0328 5504 6e6f 6465 7104
7d71 0528 5539 412f 6e6f 7274 6865 726e

……

So, it looks a lot like teleportation. Decompose the original object and capture the state of every quark, move it to some other place, and then re-compose it just as it was originally. All sounds pretty awesome to me!

One thought on “Python’s Pickle is Magical

  1. I’ve used python pickles when loading in huge data sets (numpy sparse matrices). It makes loading in the data much faster. I just load it in once with numpy.genfromtxt and the pickle it, and then load the pickle from then on. It really hurts when loading the data is *painfully* slow.

Leave a Reply