
In recent decades, biomedical researches have faced a new and daunting task of interpreting tremendous amounts of information that have been generated in such areas as genomics, microarrays and proteomics. Specialized data extraction and conversion programs are needed to interpret such data for rapid advancement in biomedical research.
Vect essentially is a visual extraction conversion program that provides biomedical researches with minimal computer programming background a way to generate data extraction and conversion programs. Vect provides a convenient graphical user interface that allows using common point and click methods to extract and arrange data in the way that is useful to their purposes and then provides the computer program to generate such results on other documents with the same formatting. Vect can be used with virtually any text format generated from computational technologies, making Vect a powerful tool for biomedical scientists.
* Please note that this version of Vect is still under development and subject to change. More features will be added in the near future.
Introduction to Tutorial
This tutorial is designed especially for biomedical researchers to give a basic understanding of the functions of Vect for data extraction and conversion. Users should be able to perform the following tasks upon completion of the tutorial;
- Load a target data file
- Select data to be used in the output
- Apply rules to the selected data set
- Arrange the data in a desired format for output
- Convert the final format to programming code
An Overview of Vect Functionality
Vect (Visual Extraction Conversion Tool) is a program designed to generate Perl programming code that extracts specific data from lengthy files and reports and arranges the data based on user preferences. Auto generation of code is done in various phases. These phases are streamlined, where output from one phase becomes the input of the next phase. The graphical user interface of Vect lets the user step through the phases of loading data files and defining rules to extract specific data and in return generates the Perl code that can run on other files with the same format. Even though Vect can work on files of any format, semi-structured files (having a predefined outline) will help the user create more meaningful and intuitive rules and results.
The following four phases are involved in the generation of Perl code using Vect:
Loading input data : During this phase, single or multiple input data files are loaded into Vect. These will usually be text/data files.
Creating rules for extraction: During this phase, users can apply various rules to the input file to extract specific information. There are two types of data extraction:
1) Data-dependent . These rules are defined from the input data and provide the easiest method of data extraction.
2) Rule-dependent. These rules provide the most powerful means of manipulating, formatting, filtering, converting and composing various data sets. Future developments of the Vect program will involve the addition of new Rule-dependent rules which will extend its abilities.
Formatting the output data: During this phase, the rules are organized together to visually output an organized and legible output file format.
Auto generation phase: In this final phase Vect generates Perl code based on the previous phases and results in a program that can be run on other files that have the same format as the original data file. Users can modify the auto generated Perl code to further customize the output data and its format.

|