staff project download information miscellaneous
Vect   DNA to Protein Tutorial
  Installing Perl
Mac
Windows
Unix


Download

Change Log

Documentation
Reference Manual
GenBank Data Extraction
Statistical Data Extraction
Numerical Data Extraction
Patent Calculation
DNA to Protein Extraction

FAQ

Cookbook
 
Picky DownloadLucy2 DownloadTrend DownloadGRAMAUBViz DownloadgeneDBN Download

Final Part: Converting DNA to Protein

Now that the gene sequences are in upper case, we can turn them into protein sequences! :-D Insert a new rule “To Translate data that may also change length. Name it “Quoted Proteins” and change it to the following:

Generic DNA to protein translation from Upper Case Genes.

The reason it is named quoted is because the string returned has * after every protein sequence to mark the end of it.


Click to view larger image

 

Our final step will be to get rid of the *. Insert a new rule “Extract quoted data from other rules.” Name it “Proteins” and change the rule to look like the following:

Multiple quoted data from Quoted Proteins on the
left with nothing and on the right with *.

So we finally we got the protein sequences we want! :-)


Click to view larger image


For the ouput, select the rule "Gene Names" and copy it over to the output panel. Select the last rule "Protein Names" and copy it over to the output panel. To word wrap, set the Protein Name tag to be <Proteins: 60>.

Your template should look like the following:

><Gene Names>
<Proteins:60>

The :60 means to word wrap every 60 characters. Notice the > symbol before the gene name.

Your output should look like the following:



Here is a summary of what we did:

1)extract gene names from the genbank report
2)extract the entire dna sequence from the genbank report
3)extract from the genbank report the coordinates for protein sequences
4)get the start and end coordinates
5)use the coordinates to extract dna sequences that pertain to protein translation
6)reverse the dna sequences that need to be reversed
7)merge the reversed and non revrsed dna sequences.
8)turn the combined dna sequences into upper case letters
9)translate the dna sequence into protein sequence.

 

This tutorial written by Jia Zhen Lee
October 2004

 


Last modified June 13, 2008 . All rights reserved.

Contact Webmaster

lab