Part 4: Using Coordinates to Extract DNA
Now that we have coordinates and the dna sequence, let's use the coordinates to extract the dna sequences!
Insert a new rule called “To extract substrings from other rules”. Name it something meaningful like “Gene Fragments”. Change the rule to say the following:
Multi-part substrings extracted from Source DNA going from coordinates-1 in Start Coordinates to coordinates-1 in Start Coordinates to coordinates-1 in Stop Coordinates.
Click to view a larger image
What this does is extract dna sequences into multiple strings based on the coordinates. The way Vect works is that the substrings extracted are not one entire string. We have to concatenate the dna sequences we just extracted so that multi-line sequences will be merged into one string. This will let us manipulate data more easily and let us treat a single sequence as one sequence instead of a bunch of sequences.
Insert a new concatenantion rule and name it something like Extracted Genes. Change it to look like the following:
Concatenanted data from Gene Fragments up to level 1 with nothing in between items.
Click to view larger image
|