Part 1: Extracting Gene Names & DNA (2)
Now that you are in the Convert Data panel, click on Insert button and select the rule “Extract quoted data from other rules” from the list. For the first ???? click on it and select the rule “RawGeneNames.” For the second and last ????, click on them and type in “ and “ respectively. This is telling Vect to extract anything contained within double quotes. This should extract out all the gene names minus the /gene=” “, to look like the following:
Return to the Input Data panel. Now we have gene names, we need DNA sequences. We first extract the giant BAC DNA sequence from the Arabidopsis file at the end of the genbank report, as pictured below:
Right click on the word ORIGIN and select “New Block Open Condition”. Anything above ORIGIN is now highlighted in pink. Now scroll down to the very end of the genbank report, there should be a // at the end. Right click on the // and select “New Block Close Condition.”
Now, left click one by one on all the columns containing the dna sequences, and also the end // marker, which will be used to indicate the end of concatenation in the next page. They should all be highlighted grey like the following screenshot. Make sure only the dna sequences and the end // marker are selected, if not, it means you goofed up somewhere!
Click on Move and name the rule something meaningful, such as “Raw DNA sequence”.
|