CoSort Integrates and Transforms Financial Data for Thomson Reuters
Eran Barak, global head of Strategy, Collaboration Services, Thomson Reuters.
The Thomson Corporation combined with Reuters Group PLC in 2008 to create Thomson Reuters, a leading source of financial information and BI data for many industries. With 50,000 employees and annual revenues at $13 billion, Thomson Reuters is double the size of its largest rival. Powered by the renowned Reuters News organization, Thomson Reuters combines industry expertise with innovative technologies to drivers critical information to leading decision-makers in the financial, legal, tax and accounting, scientific, health care and media markets. Thomson Reuters Collaboration Services group offers communication and community services in a secure and compliant manner to financial professionals.
Multi-CPU Intel servers running Windows Server 2003.
Thomson Reuters has multiple sources of user data across databases and internal applications, which serve as the basis for our BI reporting needs. To make better use of this data, Thomson Reuters needed to consolidate and manipulate it rapidly. CoSort helped us quickly integrate and transform huge files from these systems so that the data would be normalized and prepared for use with our reporting tools. Prior to using CoSort, Thomson Reuters could not easily normalize these large, disparate source files for ad hoc analysis.
Thomson Reuters makes use of CoSort's Sort Control Language (SortCL) data manipulation program. Open text SortCL job scripts define file filters and transformations, as well as new output target formats. The data integration features of SortCL used most are sorting, joining, deduplication, conditional (if-then-else) selection, aggregation and field remapping. Thomson Reuters leverages SortCL's metadata repository features to separate the data descriptions from our applications so we can define and modify layouts in one place, but use them in different jobs. SortCL jobs can run on the command line in batch from a Java graphical user interface, or via application programming interface call. Job events can be monitored on screen, while application statistics and other details can go to optional log files.
CoSort's SortCL tool wraps multiple data integration and transformation tasks into the same product, job script, and input/output pass. By combining that efficiency with parallelism and dynamic memory allocation, CoSort is able to provide a superior level of price-performance for high volume data processing. The job language of SortCL is both logical and explicit, and therefore simplifies task design and execution.
Where CSV fields contained binary data or line feeds, SortCL error messages did not identify the line number of the mismatch until IRI sent a patch. SortCL is also missing certain string manipulation functions that we would like to use beyond the substring and pattern matching features that are included.
Previously Thomson Reuters was using homegrown programs to aggregate and transform its large data volumes. But with more than 1.5GB of daily data to process, there was a need for a high-performance, highly functional tool on our Windows systems. Thomson Reuters' research effort concluded after a successful evaluation of the CoSort SortCL tool.
The CoSort SortCL tool allows Thomson Reuters to fully customize record layouts for fixed and delimited text file outputs like CSV and LDIF with bespoke and derived field layouts. The result of a process can be one or more outputs in one or more formats, all made in the same place and pass. SortCL's ability to integrate large, disparate data sources and produce these different targets simultaneously saves us a tremendous amount of time in development and execution.
IRI supported Thomson Reuters beyond normal parameters both during evaluation and after licensing CoSort. We appreciate IRI's responsiveness and flexibility, and we are in discussions regarding future product releases.
CoSort documentation is lengthy but straightforward. The SortCL job examples included in the manual and software installation made it easy to get up to speed; we were writing complex SortCL jobs in a few days. This speaks to the clarity of the metadata and the relevance of the documented examples.