GenCore 6 - Features and Technology
GenCore 6 is a comprehensive and feature-rich package for rigorous searches in Protein and Genomic databases. Originally serving as the underlying software engine for our hardware Bioccelerators, and through dozens of installations in high-throughput sites around the world, this package has evolved to become an advanced, high-throughput oriented, database search tool. The various features of the package were developed for many years through feedback from various large Bioinformatics sites, and improvements and research and development performed over the years.
The main features of GenCore 6 include:
1. Support for Multiple Sequence Database Formats:
The GenCore 6 package seamlessly supports a wide variety of input sequence database formats. These formats include: FASTA, BLAST (formatdb), GenBank, EMBL, SwissProt, GCG, NBRF, PIR, RAW, IG, PFAM (HMM).
All these formats can be directly accessed by the package with no need for any reformatting process. Furthermore, the package can integrate different database formats through the use of database configuration files and FARM files.
In addition, an optional indexing mechanism, that supports all the above formats, can be used for faster and more efficient access to individual sequences within the various databases.
2. Multi-Query Support:
The GenCore 6 package inherently supports multi-query processing. Multiple sequences, or even a database subset, can be submitted as the search query set. The query can be in any supported database format, as described above.
The software processing engine will divide the query set into small groups and automatically generate a result file per each query sequence.
3. Support for a Rich Set of Search Algorithms and Parameters:
The GenCore 6 package supports a wide variety of search algorithms, including Smith-Waterman, ProfileSearch, ProfileScan, Translated searches, Frame+, ProfileFrame+, HMMsearch, HMMScan, GeneWise.
The package enables full user control over the values of the different search parameters such as gap penalties, comparison tables, translation tables, scoring and normalization methods, and others. If not specified, optimal default values, that can also be customized by the user, are automatically used by the software.
4. Highly Optimized Code provides Significant Speed Improvement:
Through the use of generic or processor specific optimized code for the most popular search algorithms, GenCore 6 provides significant speed improvement relative to standard software implementations. Generic code optimizations provide up to 3-fold speed improvement on almost any processing machine type. Processor specific optimizations were developed for the SSE2 instruction sets, embedded within the Intel Xeon or Pentium-4, and the AMD Athlon 64 or Opteron processors. Such processor specific optimizations provide up to 10-fold improvement on Linux machines with supported processors.
5. Multi-Threaded Operation:
With full support for multi-threaded operation, GenCore 6 enables full utilization of multiprocessor Unix machines. Through the use of multi-threading all the processors within a single machine can process a single database search run. In addition, multi-threading enables full utilization of Intel's new hyperthreading technology and future multicore processors. Utilizing four threads on a dual Pentium-4 Linux machine provides about 3-fold speed improvement relative to a single threaded operation.
6. Fast and Robust Alignment even for Very Long Sequences:
Through the use of linear memory alignment algorithm, GenCore 6 is able to produce the accurate alignment for even very long sequences. Such linear memory alignment is used for all the supported search algorithms through unique technology developed by Biocceleration. In addition, since the alignment end point is usually known, accurate band linear memory alignment is used, thus providing orders of magnitude speed improvement relative to a full matrix alignment. The band alignment is constructed in such a way that guarantees the accuracy of the resulting alignment.
7. Multiple Normalization Methods:
GenCore 6 supports several scoring normalization methods, as might be required by different annotation and research projects. The supported normalization methods are the Pearson statistical method, logarithm normalization and standard normalization.
These normalization methods are supported for all the search algorithms and minimum and maximum thresholds can be set for any of the normalization methods scores.
8. Support for various Result File Formats:
For easy integration in different Bioinformatics environments, GenCore 6 support various result file formats. The currently supported formats are PFS, GeneWise, and IG/Maspar. Additional result file formats can be easily integrated into the package through a dedicated output format software layer.