staff project download information miscellaneous
Vect   GenBank Report Data Extraction (page 3)
  Installing Perl
Mac
Windows
Unix


Download

Change Log

Documentation
Reference Manual
GenBank Data Extraction
Statistical Data Extraction
Numerical Data Extraction
Patent Calculation
DNA to Protein Extraction

FAQ

Cookbook
 
Picky DownloadLucy2 DownloadTrend DownloadGRAMAUBViz DownloadgeneDBN Download

Tutorial 3: Extraction and Conversion of Gene Names

We will now extract the gene names in the Arabidopsis file.  In Vect, make sure you are in the 'Input Data' panel with the AC006439.txt file opened.  Notice that each gene name appears in several lines of the Arabidopsis file.  We will need to make several block conditions in order to extract only one of several identical gene names.  The figure to the right highlights the various locations for each gene name.

Right click on the 'CDS' letters and choose New Block Open Condition from the pull down menu.  A green box will appear.  Right click on the 'mRNA' letters and choose New Block Close Condition from the pull down menu.  A red box will appear. 

Right click and drag over the '/gene=' block and choose New Line Selection from the pull down menu.  Select the gene name with a left-click.  Now only gene names located in the 'CDS' block will be chosen and all others will be excluded.  Select 'Move' from the icon panel to move to the 'Convert Data' panel.

Note:  If you do not have any grey highlighted regions in the 'Input Data' panel then you have not selected any data and no data can be moved over to the next panel.

It is now important to get the gene names only without the quotes around it.  Gene names will be extracted through the 'Quoted Data Rule.'  In the 'Convert Data' panel select 'Insert' from the icon panel and select the 'Quoted Data Rule.'  Give your rule a descriptive name and select the rule you wish to apply it to.  Select the grey highlighted boxes labeled 'nothing' and put a quote (") mark in both.  Your data should be similar to the figure below.

Select Rule 2 , (the concatenated sequence) then Select the 'Copy' button from the icon panel to move your data to the 'Output Data' panel by. In the 'Output Panel' users can add any text format to the data set and view the changes by selecting the 'Output' icon in the icon panel.

The tag should not be modified but can be moved around. If users wish to limit the output to a set number of lines, the tag may be edited by including a ':width' before the closing bracket (>). This restricts the body from flowing past the specified width. Example: <gene sequence:60>.

To show the Perl code, move to the 'Perl Program' panel and select 'Compile.'  Your Perl program appears as shown below.  To run the program generated, select the 'Run' icon.  A new window will appear with the results of your Perl program.

Last modified June 13, 2008 . All rights reserved.

Contact Webmaster

lab