The GenCore
Grid Server – Performance and Reliability
for Smith-Waterman Searches
Recent technological advances have brought to the fore issues related
to I/O loads and failover capabilities in distributed homology search
solutions. The growth in processing power of Linux clusters and grid
systems due to increases in the number of processors in these systems
as well as the density and performance of each processor, have
radically changed the balance between processing power and I/O and
created a need for innovative software solutions to address the new
reality. To address this need, Biocceleration has developed the GenCore
Grid Server, a robust solution for large-scale Smith-Waterman searches
that delivers top performance, scalability and reliability while
minimizing I/O load on the cluster infrastructure.
The GenCore Grid Server is a management and interface tool that
controls and interfaces with the GenCore6 licenses and the queuing
system within the grid (or Linux cluster) environment, and enables
optimized performance and resource management.
The GenCore Grid Server is built of two main modules:
The Search Job Splitter/Merger:
This module enables splitting of any search job into any number of
sub-jobs, each to be run on a single processing node. A single search
job can be split into few, hundreds or even thousands of independent
sub-jobs, while each sub-job handles the search against a partial
segment of the database.
The produced sub-jobs are directed into the grid queuing system for
processing, and the queuing system can assign each sub-job to any
processing node that has a GenCore6 license installed on it. After
assignment, the sub-job is executed by the GenCore6 system on the
processing node and the partial results file is saved into a predefined
location.
After all sub-jobs are completed, the merger system automatically takes
care of merging the results of all the sub-jobs and produces a single
result file that corresponds to the original search job. The Grid
Server Splitter/Merger module maintains full compatibility with the
GenCore6 results and output format, and produces identical results as
if the original search job was run on a single processor, regardless of
any search parameters, normalization modes etc., thus providing a
seamless solution for hyper-parallelization of very large search
requests.
The Caching Module:
The Caching module equips the grid system with a seamless, on the fly,
database caching mechanism for GenCore6 search applications. By taking
advantage of unused local disk space, located within each processing
node, the Caching Module significantly reduces I/O requirements between
the processing nodes and the main storage.
The queuing system Caching Module optimizes the cache hit ratio by
assigning sub-jobs to processing nodes whose cache content is best
suited for the particular sub-job. Upon initiation of sub-job execution
on a particular node, a dedicated Cache Module component takes care of
cache syncing and usage, seamlessly directing the GenCore6 search
application to the appropriate location of the database segment, either
within the local cache or on the main storage.
The caching process requires no database preprocessing, and can take
advantage of even small unused disk space (1-10 GB) within each
processing node, thus providing a robust I/O enhancement on almost any
grid or Linux cluster architecture.
Inherent and Robust Failover Support:
Since all sub-jobs produced by the Splitter/Merger module are regular
GenCore6 search jobs, and no third party parallelism tools are used
(MPI etc.), the system provides inherent failover capability: If a
single sub-job fails on a specific processing node, only this specific
sub-job is executed again on a different processing node. Once
execution completes successfully, the results are merged with the
results of the rest of the sub-jobs, thus limiting the failure effect
to a single sub-job re-execution.
The combination of the robust GenCore Grid Server with the advanced
GenCore6 search package and the GenQueue queuing system, provides
state-of-the-art parallelism for very large search jobs over large grid
and cluster systems, thus enabling to cut the search time from months
to hours, while maintaining full results compatibility and inherent
failover capability.
|