Tutorial
3: Extraction and Conversion of Gene Names
We
will now extract the gene names in the Arabidopsis file. In Vect, make
sure you are in the 'Input Data' panel with the AC006439.txt file opened.
Notice that each gene name appears in several lines of the Arabidopsis
file. We will need to make several block conditions in order to extract
only one of several identical gene names. The figure to the right highlights
the various locations for each gene name.
Right
click on the 'CDS' letters and choose New Block Open Condition from the
pull down menu. A green box will appear. Right click on the 'mRNA' letters and choose New Block Close Condition from the pull down menu.
A red box will appear.
Right click and drag over the '/gene=' block and choose New Line Selection from the pull down menu. Select the gene
name with a left-click. Now only gene names located in the 'CDS' block
will be chosen and all others will be excluded. Select 'Move' from
the icon panel to move to the 'Convert Data' panel.
Note: If you do not have any grey highlighted regions in the 'Input Data'
panel then you have not selected any data and no data can be moved over
to the next panel.
It
is now important to get the gene names only without the quotes around it. Gene names will be extracted
through the 'Quoted Data Rule.' In the 'Convert Data' panel select
'Insert' from the icon panel and select the 'Quoted Data Rule.' Give
your rule a descriptive name and select the rule you wish to apply it
to. Select the grey highlighted boxes labeled 'nothing' and put a quote
(") mark in both. Your data should be similar to the figure below.

Select Rule 2 , (the concatenated sequence) then Select
the 'Copy' button from the icon panel to move your data to the 'Output
Data' panel by. In the 'Output
Panel' users can add any text format to the data set and view the changes
by selecting the 'Output' icon in the icon panel.
The tag should
not be modified but can be moved around. If users wish
to limit the output to a set number of lines, the tag may be
edited by including a ':width' before the closing bracket (>). This
restricts the body from flowing past the specified width. Example: <gene
sequence:60>.
To
show the Perl code, move to the 'Perl Program' panel and select 'Compile.'
Your Perl program appears as shown below. To run the program generated,
select the 'Run' icon. A new window will appear with the results of
your Perl program.

|