Continuing on the synthetic biology posts, I have decided to post about the script I wrote to automate the primer design stage for my project.
The current workflow I have right now involves identifying the construct that I want to build, then ordering the primers, and creating the plasmid files for them.
The problem I have is three-fold:
- At the primer design stage, I sometimes will copy the reverse complement sequence when I actually want the non-complemented sequence.
- Manually entering all the primers that I want to order takes on the order of hours, and I can have errors in transcription.
- Creating the plasmid files and naming them one-by-one is tedious, and often, the sequences that are copied/pasted from various ApE files retain the annotations from before – which may or may not be a hassle, but for the most part, there are duplicate annotations, which end up being confusing.
To solve these problems, I have written a series of scripts for this purpose. I have uploaded my scripts and a sample file to Github, and I am making them available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
In order to run this script, you need the following:
- Python 2.7 (later versions not supported – because of BioPython)
- Google Data Python Client
- A Google spreadsheet to hold your primer list, with the following headings: name, sequence, notes, length. (I’m assuming you have a Google account.)
Script 1: Generate Primers, Enter Into Catalog
The input to the first script (“gibson-ipcr-primer-design-tool (dictreader version).py”) is a CSV file that contains a list of all the constructs that I want to build, split by the PCR products that will be used in the Gibson assembly reaction, and with the strategy specified. Currently, I support “Gibson”, “Gibson30” and “iPCR”; later on, I’m hoping to add in Type IIs and MoClo. The input file is called “constructs-to-make-shortened2.csv” (as you probably see, I got lazy and decided not to standardize my file names). The output is a few things:
- A list of sorted primers with associated construct, part number, notes about the primer, sequence, length and direction (fw/re). This is “primers-with-notes.csv”.
- Updates to a specified Google Spreadsheet that contains my list of primers, checks for the existence of those primers, adds in primers not yet already inside the list, and outputs a list with the primer name. This is “primers-with-notes-names.csv”. (This is necessary for the master construction list.)
- A master construction list (“construction-master-list.csv”), with all the primers organized such that the primers are matched with the source template for PCR.
Script 2: PCR Worksheet Generator
The input to the second script (“pcr-worksheet-generator.py”) is the master construction list. It takes this CSV file and outputs a PCR worksheet containing the “source” (template), “primer1”, “primer2”, “bp” (band size), and “pcr number” (based on construct number and part number).
Script 3: Generate Plasmid Files
The input to the final script (“plasmid-genbank-generator.py”) is also the master construction list. It takes this CSV file and outputs Genbank files of the plasmids, complete with file names (based on the construct description). You can take these Genbank files, open up in ApE, and annotate it using your own Features file. The advantage of this is that the Genbank files are clean and devoid of any legacy annotations,
All the files are written within the same directory as the script. You can choose where to store these files later on. If you modify the script, you will be able to specify your own file names etc.
I hope to modify the code such that I’m able to write into a database, and do automated annotations as well, but that’s a code for another time. (For now, it’s faster to simply open the Genbank files in ApE, and use Cmd+K to auto-annotate my plasmids.)
Let me know this works for you!