Thursday, May 14, 2009

Protein synthesis functions

The chromosome module mentioned previously is part of my JCCC evolution build on the agents of evolution. Mutations are any sort of heritable change in the DNA, that is changes in the DNA passed on to future generations of cells. Chromosomal mutations are large scale rearrangements, involving many nucleotide bases. But a discussion of mutations has to include point mutations-changes on the level of a single nucleotide base.

Discussing these means dealing with protein synthesis first. So the last couple of days I have been scripting core functions to do the basic steps in protein synthesis , namely transcription and translation. Max Chatnoir over at Genome Island has a nice little collaborative game related to protein synthesis, ( but since my focus is on evolution at JCCC, I've decided to build my module around a series of functions that start with a strand of DNA, transcribe that DNA to get a messenger RNA and then translate that DNA.

Doing this involves a series of string manipulations and here are some functions I've written specifically to manipulate DNA and RNA represented as a sequence of letters:

string stringclean(string toclean, string allowed);

This function takes a string toclean and strips out blanks and any characters that are not allowed after converting upper case letters to lower case. Permissible characters are in the string allowed. For instance DNA nucleotide bases are represented as a,t,g or c so the the string allowed is "atgc". Were the string toclean representing RNA then allowed would be "augc". Just to be safe the function trims any leading and trailing spaces. The reason for this function is to try to catch elementary mistakes and strip out extra characters from genetic information copied and pasted from GenBank or FASTA formatted data.

string compdna(string dna);

This function takes a DNA strand and outputs the complementary strand. This is useful because data bases often give a so called sense strand which is like the RNA only with "t" shown instead of "u". For illustrating transcription you need to start with the complement of the the sense strand as happens in the cell. Hence the need for a function to generate the correct DNA strand.

string transcription(string dna);

This function takes what ever DNA strand is given it and mechanically does transcription. It doesn't recognize any sort of promoter region such as a -35 or Pribnow box. If you don't know what those are..well don't worry.

string transcodon(string codon);

Takes an mRNA codon and uses the standard genetic code table to translate the codon into the corresponding amino acid using the standard genetic code, used by most eukaryotes. This function is needed for the translation function:

string translation(string mrna);

This function takes the mRNA string and using the function transcodon, translates the mRNA into the polypeptide that would be produced in the cell at the ribosome. The function does not recognize the Shine-Dalgarno sequence and mRNA's start for simplicity with the start codon, 'aug'. The function terminates the polypeptide when it recognizes a stop codon. Polypeptides are represented by the now standard one letter abbreviations commonly used in protein data bases. Thus it will allow the student to compare the effects of frame shift mutations caused by insertions or deletions to substitutions on the resulting polypeptide.

These functions work for small genes with on the order of 250 nucleotide bases. One frustrating thing is the limited ability of SL to write data to files and for now the easiest way fo the user to save the output, is to e-mail it to themselves.That will be built into the module as an option. Otherwise users would have to cut and paste from the chat window.

Users will have the option of using data they obtain from another source by configuring a note card with the raw data copied from say NCBI, or using a small gene data set preloaded onto a note card.

Three other core functions are being developed:

string makesubstitution(string dna) makes a random base substitution in an original DNA strand.

string makeinsertion(string dna) makes a random insertion while
string makedeletion(string dna) makes a random deletion.

But these will be easy to make. The activities are being designed around a pencil and paper exercise I use in my classes currently, only now the students will be able to use more realistic data and quickly investigate at a number of different mutations.


Peter Miller said...

In pedant mode (and as you probably know), "Dalgarno" and "GenBank". Otherwise, looking good. I'm hoping to do stuff in this area over the Summer but think that some of the code will have to be server-based. Even then, I'm curious as to how the (little) server will cope with the load. The problem, of course, is that you only find out in class (unless you're running asynchronously -- probably a good idea). May have to try some crowd-sourcing...

Simone said...


Right now I am manually importing the data as a string from NCBI and cleaning them up in SL, hence the limitations. And I am looking for ways to get around those.

SL is getting better at handling crowds, but yes I will expect my students to work in SL individually or as collaborative pairs.