CoSort Helps IBM Set TPC-H Record
2.4Gb / minute Pre-sorting Speeds Decision Support Queries
Melbourne, Fla. (June 15, 2000) – Innovative Routines International, Inc. (better known as The CoSort Company) today announced that International Business Machines Corporation (IBM) selected Version 7.1.2 of IRI's UNIX® CoSort™ package to speed the preparation of data on the way to a record-setting business intelligence performance benchmark in May.
CoSort and the subsequent Transaction Processing Performance Council (TPC-H) tests were run on a 64-CPU NUMA-Q® 2000 Model E410 server using Version 4.5.1 of the DYNIX/ptx® operating system and Version 7.1 of IBM's DB2® Universal Database (UDB) Enterprise-Extended Edition (EEE). IBM reported that CoSort pre-sorted 272Gb of load data in under two hours. This CoSort throughput record set the stage for the benchmark's load and ad-hoc queries. In this benchmark, IBM's configuration significantly outperformed all previously released TPC-H benchmark results at the 300Gb scale factor in performance and price/performance scales.
According to the TPC Benchmark H Standard Specification Revision 1.2.1, "This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. The TPC-H Benchmark represents decision support environments where users don't know which queries will be executed against a database system; hence, the "ad-hoc" label. Given this ad-hocness, no pre-knowledge of the queries can be built into the DBMS system and the query execution times can be very long."
To optimize the benchmark results, IBM needed a software package to optimize query executions. One of the key optimizations was the use of the clustered indexes of DB2 Universal Database. By using CoSort to sort and load table row data in the same order as the index, internal sorts were avoided during query execution. This greatly improved the performance of a number of the queries.
IBM's Benchmark and CoSort Use Details
TPC-H Benchmark was executed on a 16 quad (64-processor) NUMA-Q 2000 system with 64 GB of main memory. The DB2 EEE database was configured with 16 database partitions, 1 per quad. The data for the TPC-H benchmark was generated using the TPC-supplied program dbgen.
Data were loaded into a DB2 EEE database by partition.
This load required preprocessing the input data and splitting it by partition key into the correct load file using the db2split utility. In this case each input data file created by dbgen was split into 16 load files by the db2split program. The output of the db2split program was used as input into the CoSort utility, SortCL (CoSort's sort control language). The data was sorted by executing 16 concurrent SortCL jobs. Each SortCL job executed using 4 sort processes with 100 MB of memory per processor. The filesystems that contained the input data, sort overflow work area, and output data were configured on physically separated disks to minimize I/O contention. CoSort sorted the two largest tables in date order in 18 minutes (ORDERS - 48Gb) and 96 minutes (LINEITEM - 224Gb), respectively.
"We were impressed by CoSort's ease-of-use and performance," stated Vincent Carbone, IBM Web Servers, Benchmark & Performance, Solutions Integration Group. "With minimal effort, we were able to develop scripts that resulted in the sorting and merging of 272GB of data in less than 2 hours. CoSort allowed us to regenerate the data as needed within our tight testing schedule."
About CoSort / Innovative Routines International, Inc.
CoSort is the world's first, fastest, and most widely licensed commercial sort package on high-end UNIX and Windows NT platforms. CoSort features several end-user and programmer interfaces to a parallel processing sort algorithm in a coroutine architecture. Included in its hardware-based, perpetual use prices are: third party sort replacement and conversion tools; support for more than 100 data and record types; data warehouse ETL, join and report generation functionality; SMP resource configuration and recovery; on-line documentation; and, on-site technical support in more than 30 countries. For more information on The CoSort Company and its family of data management solutions, call (321) 952-9400 or (800) 333-SORT, or visit www.cosort.com.
CoSort is a trademark of Innovative Routines International, Inc. IBM, DB2, DYNIX/ptx and NUMA-Q are registered trademarks of International Business Machines Corporation in the United States, other countries, or both.