Wrap-up on PyCon 2014 tutorial

People are interested in using Pandas in IPython. I think most people who came to the conference are already fundamentally well-grounded in doing data analysis. The lacking portion is the tools.

Feedback from the tutorial generally was positive, and I think they appreciated the time spent to wrestle with Pandas in the IPython HTML shell.

From the above, I think the following things can be changed:

1. De-emphasize counting and ranking, merely mention it.
2. Provide a richer data set on which more sophisticated analyses can be performed. This would tie in well with the next point.
3. Provide a cheat sheet for Pandas functions. That would mean arithmetic operations (add, subtract, multiply, divide), basic data summarization (mean, mode), groupby operations (especially aggregation), and plotting.
4. Keep the format where people have to work on a particular problem of choice, in groups, for a long period of time.
5. Re-distribute the class to be more evenly balanced between coders and analyzers.
6. Follow-up with additional resources if possible.

I really enjoyed this year’s tutorial sessions, and I’d like to give it a shot next year!

Python’s Pickle is Magical

I only recently stumbled upon the Pickle methods in Python, and now that I’ve used it, I can only say, “Why did nobody tell me about this earlier?”

But before I go on, what the heck is pickling?

From the Python docs:

The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” [1] or “flattening”, however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

In true Pythonic style, the docs are well-written and easy to understand… if you understand what an object is, and what serialization is. I won’t even try to define them here. However, I will describe what I think about Pickling.

Continue reading

Planning, writing, doing, and thinking: reflections on the writing cycle

After a hectic semester, I’ve been in the thick of writing the thesis proposal and finding committee members.

As I was writing the thesis proposal, I noticed two very interesting facts about the act of writing.

Writing for a purpose serves to highlight where the gaps are present in communicating for that purpose. Writing, thinking and doing follow a cycle that reinforce one another.

I’ve come to realize that this is really important, and knowing this fact impacts the way I approach writing. As opposed to merely writing because I have to do so in order to communicate an idea, I now do so with a view of trying to produce a polished piece of work. Here, I’d like to write out 6 observations from experiencing this writing cycle.

Continue reading

How to get up-and-running using Python, the easy_way, for the scientist

As a graduate student doing biological sequence analysis in Python, I spend quite a bit of time in my IPython HTML notebooks. (My preference – because it’s easier on the eyes than the Terminal is.) However, getting setup wasn’t easy, not when there’s so many ways just to get started with scientific computing.

If you’re a typical scientist, you’d want to get up and running as fast as possible with few obstruction. If you’re a scientist who’s takes the long-view of things, you’d also want your stuff to be setup to be “vendor-independent”. With Python, there’s a variety of ways to get things up-and-running: manual installation, Anaconda, Canopy… there’s a variety of ways to get the scientific computing packages for Python. Also, there’s different ways of installing Python packages, such as homebrew, pip, conda, easy_install… gosh, it’s a nightmare for the newcomer to Python. This clearly violates a Pythonic principle:

There should be one– and preferably only one –obvious way to do it.

I’ve gone through the nightmares of managing packages that were installed in disparate directories due to Anaconda, homebrew, pip and easy_install using separate directories, to the point that I wiped out all but the Apple-provided installation of Python, and started from scratch with my packages. I have also seen a fellow scientist struggle with packages breaking due to package updates happening in different directories…

Therefore, in this post, I’d like to provide a guide on how to get up and running using Python in a way that is simple but keeps your installations clear-cut and vendor-independent. Continue reading

6 Thoughts on Agile Results

I’m reading a book called “Getting Results the Agile Way” by J. D. Meier.

Meier worked at Microsoft, where he developed and refined this system. I’m studying it for ways to be more effective in my work as a graduate student, and hopefully to develop a personalized and refined system beyond graduate school into my work life.

The heart of the “Agile” system that Meier proposes is to do a “Monday Vision”, to plan for “Daily Outcomes”, and to do a “Friday Reflection”. Here’s some of my reflections on the system.

Continue reading

“Remember Everything”: How I’ve Used Evernote As A Digital Lab Notebook

If you were to ask me, I’d say that Evernote’s more than “just a pretty cool piece of software”. It’s quite literally become the centre of most of my workflows.

In particular, I use Evernote extensively as a documentation system for my research work, effectively using it as a laboratory notebook. I’ve found Evernote to be a good replacement for a traditional pen-and-paper notebook, and it fulfills the ultimate goal that I have for making myself more efficient and effective in what I do. Here’s how I’ve used Evernote to get that done. Continue reading

“Remember Everything”: How I’ve Used Evernote To Take Reading Notes

Evernote has been lauded in many corners as a must-have in a scientist’s toolkit. But I’m sure that like me, there are people out there who don’t know where to start in using Evernote as a scientist-in-training. I’ve decided to share a bit of my thoughts on how one can use Evernote as a scientist, but rather than provide a simple listaragraph (list + paragraph) on how Evernote can be used, I will share more deeply on the workflows that I’ve designed using Evernote. In this post, I’ll share how I use Evernote to take reading notes on the papers I read.

Continue reading