Final Part: Converting DNA to Protein
Now that the gene sequences are in upper case, we can turn them into protein sequences! :-D Insert a new rule “To Translate data that may also change length.” Name it “Quoted Proteins” and change it to the following:
Generic DNA to protein translation from Upper Case Genes.
The reason it is named quoted is because the string returned has * after every protein sequence to mark the end of it.
Click to view larger image
Our final step will be to get rid of the *. Insert a new rule “Extract quoted data from other rules.” Name it “Proteins” and change the rule to look like the following:
Multiple quoted data from Quoted Proteins on the left with nothing and on the right with " * ".
So we finally we got the protein sequences we want! :-)
Click to view larger image
For the ouput, select the rule "Gene Names" and copy it over to the output panel by clicking .
Select the last rule "Protein Names" and copy it over to the output panel. To word wrap, set the Protein Name tag to be <Proteins: 60>.
Your template should look like the following:
><Gene Names>
<Proteins:60>
The :60 means to word wrap every 60 characters. Notice the > symbol before the gene name.
Your output should look like the following:
Here is a summary of what we did:
1)extract gene names from the genbank report
2)extract the entire dna sequence from the genbank report
3)extract from the genbank report the coordinates for protein sequences
4)get the start and end coordinates
5)use the coordinates to extract dna sequences that pertain to protein translation
6)reverse the dna sequences that need to be reversed
7)merge the reversed and non revrsed dna sequences.
8)turn the combined dna sequences into upper case letters
9)translate the dna sequence into protein sequence.
This tutorial written by Jia Zhen Lee, October 2004.
Modified by Ye Lin, December 2005.
|