Part 1: Extracting Gene Names & DNA (1)
First we are going to select all the gene names. These are the names that look like this:
/gene=”At2g18200”
If you look at the Arabidopsis file, you'll see that gene names are spread throughout the entire document. It's under mRNA, gene, CDS, etc. The good thing though, is that the gene names are all the same length in this file:
In the vect file, it looks something like the above
Right click and drag over /gene=”. Release the right click and a menu will come up. Select “New Block Open condition”. The letters /gene=” should be highlighted green.
Now, right click+drag to highlight “ at the end and select “New Block Close Condition.” The “ should be highlighted red. Everything else not in a /gene line should be highlighted pink. The pink means that it is no longer selectable.
Now left click once on the /gene=”At2g19323” line and all the gene names should be highlighted grey. Remember that block definitions do not select text; they just limit text selections to the blocks. It should look like the following now:
Click on the Move button in the tool panel and the selected lines will be copied over to the Convert Data panel. Give the rule a meaningful name, such as “RawGeneNames”.
Since this tutorial will have many rules that rely on each other, it's probably a good idea to follow the names we suggested so that you can follow the tutorial better.
|