5 Great Things about the Anaconda Distribution

On Monday, I found Finder crashing way too often for my own liking, and MS Office misbehaving with the Save dialog. Therefore, I made the leap and decided to reformat my MacBook Air. The clean slate gave me an opportunity to formally try out Continuum Analytics’ distribution of Python.

The biggest selling point of the Anaconda Python distribution is that environments are a first-class citizen. Back when I wrote my first blog post detailing why I didn’t want to use any distributions of Python, including Anaconda, environments were basically a non-issue for me. However, in the intervening months since then, I have realized that in academic research, specified programming environments are a cornerstone of code reproducibility. In a published paper, if I am able to specify the Python and package versions that I am using, I increase the likelihood that another research group can download my code and run it locally. Therefore, to gain this benefit for the sake of science, I decided to give Anaconda a second shot (my first try didn’t convince me… but also because I didn’t realize the importance of reproducible code at that time). Here’s a short write-up about my experience with the Anaconda distribution at this point, with a focus on why I’ve begun to find it really, really great.

After downloading the Mac installer, which comes with Python 3.4, the first thing learned that I should do is to remove Anaconda, using the command:

conda remove anaconda

The reason for this was that when I tried updating all packages present in the distribution, anaconda was causing package specification conflicts. With a bit of searching around on the internet, I found that removing anaconda would solve that problem.

I think it’s worth mentioning that there’s no sudo-ing required here, which is the case when working with the built-in Python on the Mac. For the record, others have made the case that the system Python should not be messed around with, as apps like iPhoto use it – and I foolhardily ignored that advice to my photo library’s peril. sudo-ing and typing in my password almost every time I need to install a new package also meant a small frictional barrier to getting things up and running.

Therefore, here’s the first great thing about the Anaconda distribution: no more sudo-ing.

I then needed to ensure that I could use the usual tools I use for my day-to-day analytics. This meant ensuring that SciPy, NumPy, Pandas, statsmodels, and matplotlib were installed in addition to IPython (and the HTML notebook environment). Anaconda comes with that as well – completely in-line with the batteries built-in philosophy of Python. However, it’s sometimes not updated to the latest versions available. Therefore, I ran the next command:

conda update --all

Magic – everything was updated in an instant. No passwords required either.

Therefore, here’s the second great thing about the Anaconda distribution: updating everything is a cinch.

I then delved into my code. Soon, I found out that Python 3.4 with Pandas was having issues with one of my CSV files (utf-8 encoding errors, specifically), for which going back to Python 2.7 was going to help, according to the wonderfully knowledgeable inter-webs. Doing this with Anaconda proved to be easy enough as well:

conda create -n py27 python=2.7

Translated into English, this means: “Hey Anaconda, help me create a new environment named (-n) py27, using Python 2.7 (python=2.7).” If I so desired, I could have also named additional packages with their version specifications too, such as numpy=1.8, matplotlib=1.0 etc.

To activate the new environment, I just had to type in:

source activate py27

Each prompt in my Terminal then was prefixed with (py27), indicating that I was using the py27 environment. This isn’t a “permanent” change – if I launched a new Terminal instance, I would go back to the root environment, which uses Python 3.4. In the case that I wanted to go back to the root environment in the same terminal, all I had to type in was:

source deactivate

And voilá, I’m back in the root environment. Therefore, here’s the third great thing about the Anaconda distribution: creating, activating and deactivating environments is super easy.

I also happen to work on a separate project that that requires a different set of packages, such as pydna. pydna is also not supported on Python 3.4 right now, so the code I write has to be done using Python 2.7. I can create a new environment containing pydna and related packages using the name pydna to remind myself that this environment is the one needed when working on that project.

One can imagine actually accumulating so many different environments that it becomes hard to track in the long run. To know what environments are present, we can run the command:

conda info -e

In English, “Hey Anaconda, give me all the info about my environments (-e).” At this point, a list of environments is printed out to the terminal screen. To remove an environment that I don’t need anymore:

conda remove -n environment_name --all

Naturally, we would replace environment_name with the name of the environment that we wanted removed. I haven’t used it yet, but knowing that this command exists lets me know the fourth thing that’s great about the Anaconda distribution: environment management is extremely easy.

When installing new Python packages, Anaconda defaults to Binstar. However, there are some packages that aren’t available on Binstar yet – the most common place to get Python packages is PyPI (a.k.a. the Cheese Shop). (Incidentally, this is one of the reasons I initially wanted to stay away from the Anaconda distribution.) One such package is joblib, which allows me to run embarrassingly parallel code for embarrassingly parallel problems (e.g. running 8 multiple sequence alignments at one shot using the BioPython Commandline objects). I wasn’t able to install joblib by typing conda install joblib. Therefore, I went to the cheese shop, expanded the tarball in my Downloads folder, cd’ed into it, and ran python setup.py install with no issues at all. Because I was running the py27 environment at the time, joblib went straight into the py27 environment’s packages directory. Perfect stuff. Which leads me to the fifth great thing about the Anaconda distribution: manual package installation behaves exactly as one would expect.

I’m sure there are others out there have already been convinced about Anaconda compared to manual environment and package management. If so, I hope to be not judged in your eyes for coming late to the party. For those who haven’t yet been converted, I hope you make the jump earlier than later. Get it here. Also, shout-out to Travis and his team at Continuum – thanks for making great tools! I’m finally a convert :-D.

7 thoughts on “5 Great Things about the Anaconda Distribution

  1. Hi Eric. Excellent post — thanks very much. About manual package installation, it might be even easier that that: if a package is in PyPI, you should be able to “pip install packagename” (as long as the Anaconda version of pip is in your command search path).

    1. Anaconda was creating conflicts when I tried updating all of the packages together. I found online (somewhere, can’t remember now) that removing the anaconda package would do that, but it didn’t remove Python, conda (the package manager) and the other good stuff in the Anaconda distribution of Python.

  2. FYI: Glad you’re enjoying Anaconda; it’s a great distro. FYI: environment management is part of official python – it’s called “virtualenv” and it’s commands are exactly what you described. After installing official Python-2.7.9, type `pip install virtualenv`. Pip is now part of official python. Also you can install Python-3.x side by side with Python-2.7. Finally, folks can install Miniconda instead of Anaconda of they just want Python and conda. One important issue is to understand is that compilers ate nor compatible. EG: If Anaconda uses GCC but other binary packages use LLVM (XCode) then there may problems that are not immediately obvious.

Leave a Reply