Contact Us

Products > GenCore Grid Server > Info

The GenCore Grid Server Performance and Reliability
for Smith-Waterman Searches

Recent technological advances have brought to the fore issues related to I/O loads and failover capabilities in distributed homology search solutions. The growth in processing power of Linux clusters and grid systems due to increases in the number of processors in these systems as well as the density and performance of each processor, have radically changed the balance between processing power and I/O and created a need for innovative software solutions to address the new reality. To address this need, Biocceleration has developed the GenCore Grid Server, a robust solution for large-scale Smith-Waterman searches that delivers top performance, scalability and reliability while minimizing I/O load on the cluster infrastructure.

The GenCore Grid Server is a management and interface tool that controls and interfaces with the GenCore6 licenses and the queuing system within the grid (or Linux cluster) environment, and enables optimized performance and resource management.

The GenCore Grid Server is built of two main modules:

The Search Job Splitter/Merger:
This module enables splitting of any search job into any number of sub-jobs, each to be run on a single processing node. A single search job can be split into few, hundreds or even thousands of independent sub-jobs, while each sub-job handles the search against a partial segment of the database.

The produced sub-jobs are directed into the grid queuing system for processing, and the queuing system can assign each sub-job to any processing node that has a GenCore6 license installed on it. After assignment, the sub-job is executed by the GenCore6 system on the processing node and the partial results file is saved into a predefined location.

After all sub-jobs are completed, the merger system automatically takes care of merging the results of all the sub-jobs and produces a single result file that corresponds to the original search job. The Grid Server Splitter/Merger module maintains full compatibility with the GenCore6 results and output format, and produces identical results as if the original search job was run on a single processor, regardless of any search parameters, normalization modes etc., thus providing a seamless solution for hyper-parallelization of very large search requests.

The Caching Module:
The Caching module equips the grid system with a seamless, on the fly, database caching mechanism for GenCore6 search applications. By taking advantage of unused local disk space, located within each processing node, the Caching Module significantly reduces I/O requirements between the processing nodes and the main storage.

The queuing system Caching Module optimizes the cache hit ratio by assigning sub-jobs to processing nodes whose cache content is best suited for the particular sub-job. Upon initiation of sub-job execution on a particular node, a dedicated Cache Module component takes care of cache syncing and usage, seamlessly directing the GenCore6 search application to the appropriate location of the database segment, either within the local cache or on the main storage.

The caching process requires no database preprocessing, and can take advantage of even small unused disk space (1-10 GB) within each processing node, thus providing a robust I/O enhancement on almost any grid or Linux cluster architecture.

Inherent and Robust Failover Support:
Since all sub-jobs produced by the Splitter/Merger module are regular GenCore6 search jobs, and no third party parallelism tools are used (MPI etc.), the system provides inherent failover capability: If a single sub-job fails on a specific processing node, only this specific sub-job is executed again on a different processing node. Once execution completes successfully, the results are merged with the results of the rest of the sub-jobs, thus limiting the failure effect to a single sub-job re-execution.

The combination of the robust GenCore Grid Server with the advanced GenCore6 search package and the GenQueue queuing system, provides state-of-the-art parallelism for very large search jobs over large grid and cluster systems, thus enabling to cut the search time from months to hours, while maintaining full results compatibility and inherent failover capability.