staff project download information miscellaneous
Vect   DNA to Protein Tutorial
  Installing Perl
Mac
Windows
Unix


Download

Change Log

Documentation
Reference Manual
GenBank Data Extraction
Statistical Data Extraction
Numerical Data Extraction
Patent Calculation
DNA to Protein Extraction

FAQ

Cookbook
 
Picky DownloadLucy2 DownloadTrend DownloadGRAMAUBViz DownloadgeneDBN Download

Part 2: Extracting Coordinate Sequences

We need to get more data from the genbank report! This time, we need the coordinates of the coding sequences. The coordinates point to locations in the long BAC DNA sequence that we just extracted in the previous step. The coordinates are given in the format “join(2423, 2941)” or “complement(2341, 6577)”. Complement means the coding sequence has to be reversed.

Let's get the coordinates from the genbank report. Coordinates are located throughout the genbank report, but they are always before the /gene tag.

Red arrows point to coordinates, green point to the /gene tag that always follows a coordinate sequence.

However, we only want the ones that are under CDS tag. So, Right Click on CDS and select New Block Open condition. CDS will be highlighted green.

We can't use the closing paratheses as a Block Close Condition because there can be multiple closing paratheses in a line of coordinates. Also the length of a coordinate sequence varies. So the logical choice would be to use the /gene tag as a Block Close Tag.

Right click on the /gene tag and select New Block Close Condition.
/gene should be highlighted red now. But we're not done! Because the coordinate sequences are of differing length, and end in different places. If you try to select the coordinate sequences now, you'll also select things you don't want, such as the entire /gene="at2g18300" tag.

So, Right click on the green highlighted CDS tag and select "Selection Exclusive". Do the same thing for the /gene tag.

Now, left click on the rows of coordinate sequences and you should find the coordinate sequences are now beautifully selected in grey :-) (Two clicks should be enough, first click on the first row and second click on the second row.) You need to use /gene as an end concatenation marker, so you want to drag-select "/gene" as the end maker for each coodinate set as well:

Click on Move to copy the data from to the Convert Data panel. Name the rule something meaningful, such as “Raw Coordinates”.

 

 

Last modified June 13, 2008 . All rights reserved.

Contact Webmaster

lab