One of the main concerns DBAs who need to unload big data from very large database (VLDB) tables have is speed. Faster unloading makes data accessible in different forms for different purposes and platforms. The faster that data is acquired, the faster it can be processed and delivered. The ultimate result is faster time-to-solution, and thus the productivity and competitiveness of companies that make money from information.
Most of the available tools and methods currently used for extracting data from large fact tables are simply too slow. Those that purport to be faster are complicated, proprietary, or require the purchase of a costly ETL package. A shift to an ELT or Hadoop paradigm entails major hardware (or DB appliance) expenses, taxes the underlying DB now transforming, and imposes steep learning curves and maintenance burdens.
Are there not other options for the unload, and downstream processes, that are faster, simpler, and more affordable?
Why Unload VLDB Tables?
The are many reasons why you would bulk-unload data from these tables in the first place:
Database Migration: The faster you can get data out of the legacy database(s), the faster you can map and move the data into the new database(s). Bulk data movement would be beneficial if data volumes are massive, there is a benefit to offline transformation, protection, and/or analysis (which is possible at once in IRI CoSort), and when pre-sorted bulk loads are the only way to meet SLA deadlines.
Database Reorg: Unload, sort, re-load to keep the database operating efficiently and optimize common queries by keeping the tables in join order. Unloading is performed in off-line, or external reorganizations. See this comparison on off-line and on-line reorg methods.
Data Integration: Large scale DW extract-transform-load (ETL) operations start with extraction of tables into the data staging area. Data dumped from tables and co-mingled with mainframe data sets, web logs, and other flat files can most efficiently be integrated and processed in the file system. External transformations via are not only more efficient because multiple actions can be staged in a single I/O pass, but because the computational overhead of all this work is removed from the database (see ETL vs. ELT).
Data Replication / Archive: By offloading fact tables, golden source operational data can be duplicated and stored in a portable format. Data in flat files can be queried, manipulated and re-formatted with tools like IRI NextForm or CoSort, and used to populate other databases and applications. Similarly, an accessible repository of this data can also be stored off-line for backup restore and retrieval, or distribution to parties who require access to the data in a different environment.
Business Intelligence: It can be faster and easier to import operational data into Excel and other BI tools in flat-file formats like CSV and XML rather than trying to bridge between the table and a spreadsheet or BI cube. Offloading tables into flat files therefore, is an initial and efficient step in data franchising – the preparation of data for BI operations.
A Look at Unload Methods
VLDB extraction methods vary in performance and functionality. A good utility for unloading big data from Oracle and other major DBs needs to be efficient, easy-to-use, and support certain reformatting functions that would not encumber the process.
The SQL SPOOL command can dump data to a flat file, but it is typically slow in volume. Native utilities like Oracle’s export or data pump are faster, but produce proprietary extracts that can only be re-imported into the same database, and cannot be analyzed as a flat file would.
If you need to rapidly unload large tables to portable flat files, consider IRI FACT (Fast Extract) for Oracle, DB2, Sybase, MySQL, SQL Server, Altibase, Tibero, etc. FACT uses parallelism and native database connection protocols (like Oracle OCI) to optimize extraction performance. The free FACT GUI (wizard) that’s part of the IRI Workbench, built on Eclipse™, presents available tables and columns to extract and uses SQL SELECT syntax.
Another benefit to using FACT is its metadata integration with the SortCL program in IRI Voracity for transforming, protecting, targeting, and even reporting on, the flat-file extracts. FACT also creates the database bulk load utility control file in anticipation of same-table re-populations using pre-CoSorted flat files in large scale ETL or reorg operations.