{"id":10530,"date":"2016-09-14T11:55:17","date_gmt":"2016-09-14T15:55:17","guid":{"rendered":"http:\/\/www.iri.com\/blog\/?p=10530"},"modified":"2025-01-31T15:50:01","modified_gmt":"2025-01-31T20:50:01","slug":"optimizing-talend-transforms-with-cosort","status":"publish","type":"post","link":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/","title":{"rendered":"Improving Talend Performance"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Talend has been on the market for several years now, and its flexible UI components make it a very reasonable choice for developers when it comes to customization. Talend Studio is built in Java, which gives it a certain degree of control and flexibility in managing its performance. That said, however, I found that Talend has two general limitations in processing large data sets:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">It\u00a0requires too much RAM for data processing, especially sort, join, or aggregation. Talend will always try to do them in memory, and must have that memory pre-allocated to run the operation. For example, just to perform a lookup in a 2GB file, you must first allocate 2 GB as HEAP memory.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The performance, even after allocating enough RAM, would be dependent on the hardware resources and the configuration options used (e.g., in T-Map check mark \u201cStored on Disk\u201d or bulk loading, etc.) <\/span><\/li>\n<\/ol>\n<p><a href=\"http:\/\/www.iri.com\/products\/cosort\"><span style=\"font-weight: 400;\">IRI CoSort<\/span><\/a><span style=\"font-weight: 400;\"> continues to perform or accelerate high-volume data manipulations in DB, BI, and ETL environments. CoSort is also the default engine in the <\/span><a href=\"http:\/\/www.iri.com\/products\/voracity\"><span style=\"font-weight: 400;\">IRI Voracity<\/span><\/a><span style=\"font-weight: 400;\"> data management platform, which, like Talend, is built on Eclipse\u2122, and now overlaps in many areas, including <\/span><a href=\"http:\/\/www.iri.com\/solutions\/data-integration\/etl\"><span style=\"font-weight: 400;\">ETL<\/span><\/a><span style=\"font-weight: 400;\">. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Like other independent DW consultants have with <\/span><a href=\"http:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-transforms-in-datastage-ds-with-cosort-cs-shyam-padamati\/\"><span style=\"font-weight: 400;\">DataStage<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"http:\/\/www.iri.com\/blog\/data-transformation2\/informatica-pushdown-optimization-with-cosort\/\"><span style=\"font-weight: 400;\">Informatica<\/span><\/a><span style=\"font-weight: 400;\">, and <\/span><a href=\"http:\/\/www.iri.com\/blog\/data-transformation2\/cosort-speed-sort-process-pentaho\/\"><span style=\"font-weight: 400;\">Pentaho<\/span><\/a><span style=\"font-weight: 400;\">, I performed multiple benchmarks on large data sets in Talend natively, with CoSort (Voracity) alone, and from a Talend workflow calling CoSort. The head-to-head comparisons speak for themselves, while the embedded use of CoSort overcame Talend\u2019s inherent processing limitations and need for additional hardware.<\/span><\/p>\n<h3><b>Configuration<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Following are the hardware and software specifications used for this POC:<\/span><\/p>\n<p><b>Hardware<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">HP Proliant DL360-G7<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Intel\u00ae Xeon\u00ae CPU X5650 @ 2.67GHz 12MB cache, 24 cores (4 CPUs x 6 cores)<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">72GB (18 x 4GB single rank RDIMM)<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Virtual Machine constrained to 8 cores (2 CPUs x 4 cores) and 32GB RAM<\/span><\/li>\n<\/ul>\n<p><b>Software<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Talend Open studio 6<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">IRI Voracity\u00ae 1.0, which includes CoSort\u00ae 9.5.3<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">IRI Workbench, a free GUI option, built on Eclipse<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">The scenario took two file sources (Customer and Transactions) of various sizes and sorted, joined, aggregated, and filtered them. In CoSort, that\u2019s all expressed in one transform block, job script, and I\/O pass. Talend requires the files be joined in memory, and then split using a filter transformation into two data streams that are later sorted and aggregated. In both cases, two target files are written to the disk with filtered and formatted results. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">The source data was created automatically by <\/span><a href=\"http:\/\/www.iri.com\/products\/rowgen\"><span style=\"font-weight: 400;\">IRI RowGen<\/span><\/a><span style=\"font-weight: 400;\">. RowGen is sold as a standalone test data generation package for EDW DB and file targets, and is a spin-off of the CoSort <\/span><a href=\"http:\/\/www.iri.com\/products\/cosort\/sortcl\"><span style=\"font-weight: 400;\">SortCL<\/span><\/a><span style=\"font-weight: 400;\"> program. RowGen also uses the same Eclipse GUI (<\/span><a href=\"http:\/\/www.iri.com\/products\/workbench\"><span style=\"font-weight: 400;\">IRI Workbench<\/span><\/a><span style=\"font-weight: 400;\">) and <\/span><a href=\"http:\/\/www.iri.com\/products\/cosort\/sortcl-metadata\"><span style=\"font-weight: 400;\">metadata<\/span><\/a><span style=\"font-weight: 400;\"> as all IRI software, and its use is supported in a Voracity platform subscription.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Following are the relative benchmarks:<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-10531 size-full alignnone\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Table-Relative-Benchmarks.png\" alt=\"Table-Relative Benchmarks\" width=\"622\" height=\"311\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Table-Relative-Benchmarks.png 622w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Table-Relative-Benchmarks-300x150.png 300w\" sizes=\"(max-width: 622px) 100vw, 622px\" \/><\/p>\n<h3><b>Talend Workflow (Native) <\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Below is the Talend job design required to first join the same two sequential files, filter them on a condition and, later, sort and aggregate. Then final results are written to two separate sequential files. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">We Enabled the Run <\/span><span style=\"font-weight: 400;\">\u2192<\/span><span style=\"font-weight: 400;\"> Advanced settings <\/span><span style=\"font-weight: 400;\">\u2192<\/span><span style=\"font-weight: 400;\"> JVM settings to specify 16GB as the maximum memory for Talend to process the large data set under consideration. In the TMap transformation, we enabled the \u201cStore on Disk \u201coption to speed up the join process. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Things ran fine until the data sizes exceeded memory. I turned to CoSort to speed things up at all file sizes (from 2-4X) until Talend failed, and CoSort kept going.<\/span><\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/2-Job-POC-FLOW-TMAP.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-10533\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/2-Job-POC-FLOW-TMAP.png\" alt=\"Job POC Flow Tmap\" width=\"600\" height=\"513\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/2-Job-POC-FLOW-TMAP.png 878w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/2-Job-POC-FLOW-TMAP-300x256.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/2-Job-POC-FLOW-TMAP-768x656.png 768w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<h3><b>CoSort Process (Voracity Workflow)<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">The\u00a0<\/span><a href=\"http:\/\/www.iri.com\/products\/cosort\"><span style=\"font-weight: 400;\">IRI CoSort<\/span><\/a><span style=\"font-weight: 400;\">\u00a0product \u2014 and its\u00a0<\/span><a href=\"http:\/\/www.iri.com\/products\/cosort\/sortcl\"><span style=\"font-weight: 400;\">SortCL<\/span><\/a><span style=\"font-weight: 400;\">\u00a0data definition and manipulation 4GL program doing the transforms \u2014 is shown above in the IRI <\/span><a href=\"http:\/\/www.iri.com\/products\/cosort\/workbench\"><span style=\"font-weight: 400;\">Workbench<\/span><\/a><span style=\"font-weight: 400;\">\u2019s syntax-aware editor.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">SortCL is fast primarily because it combines the\u00a0<\/span><a href=\"http:\/\/www.iri.com\/solutions\/data-transformation\"><span style=\"font-weight: 400;\">transforms<\/span><\/a><span style=\"font-weight: 400;\">\u00a0(and other functionality) in the same job script and I\/O pass. It also uses the CoSort engine in the file system, which takes advantage of multi-threading, task consolidation, modern memory management, and proven I\/O optimization techniques. And, it is not a Java program that must be compiled at every runtime.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">SortCL jobs can be hand-written, generated via job wizard, or built in the ETL workflow palette (with an\u00a0<\/span><a href=\"http:\/\/www.iri.com\/products\/voracity\"><span style=\"font-weight: 400;\">IRI Voracity<\/span><\/a><span style=\"font-weight: 400;\">\u00a0subscription). The scripts are easy-to-read text files you can run (ad hoc or scheduled) in the<\/span> <span style=\"font-weight: 400;\">GUI or on any Windows, Unix, or Linux command line. That is how I was able to call it so easily into my Talend Workflow.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In the Hadoop version of Voracity, these same jobs can run in MapReduce 2, Spark, Storm, or Tez without code modification. It also supports HDFS file browsing, editing and transfer, and certification is underway for Cloudera, HortonWorks, and MapR distributions. IRI recommends running these jobs when speed and scalability demand it, usually with data sets above 5TB.<\/span><\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/1-CoSort-SortCL-script.jpeg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-10532\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/1-CoSort-SortCL-script.jpeg\" alt=\"Cosort SortCL Script\" width=\"600\" height=\"370\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/1-CoSort-SortCL-script.jpeg 624w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/1-CoSort-SortCL-script-300x185.jpeg 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p><i><span style=\"font-weight: 400;\">The CoSort \u2018SortCL\u2019 script is shown above, with an outline, in the IRI Workbench (Eclipse) GUI. The IRI Voracity ETL workflow and transform mapping diagrams of the same job are below those. All the metadata is: 1) re-entrant regardless of work-style, 2) manageable in EGit and other repositories, and 3) compatible with Erwin (formerly AnalytiX DS) Mapping Manager CATfx templates for spreadsheet-style mapping definition, stratification, and automatic conversion from Talend and other ETL tools to Voracity.<\/span><\/i><\/p>\n<p><b>Talend Workflow (with CoSort)<\/b><\/p>\n<p><span style=\"font-weight: 400;\">To combine the processes for the benefit of my Talend operation, I wrote a simple call to my CoSort SortCL script using the tSystem transformation. This \u201cpushes out\u201d all of the big data transformation overhead to CoSort with a simple command line call: <\/span><\/p>\n<pre><span style=\"font-weight: 400;\">sortcl \/spec=CoSortJob.scl<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">This performs all the same transformations against the same inputs with the same results, but at the faster times shown in the chart.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">CoSort also provides some options to enhance its own performance similar to Talend; e.g., setting a memory limit. RAM use, along with thread, overflow file, and other I\/O options, are set in resource controls for use at the job, user, or global level. In IRI Workbench, I just click on:<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">Run -&gt; Resource Control Analysis -&gt; Run Setup -&gt; Tune CoSort<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">This opened a dialog box with options to edit and set new values in effect in the workspace. I moved MEMORY_MAX down to 16GB to match what Talend could use, and left the remaining values at their default settings. It\u2019s worth noting that CoSort\u2019s tuning values are ceilings, not floors, so the product can run as a \u201cgood neighbor\u201d on multi-user systems.<\/span><\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/3-External-IRI-Call.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-10534\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/3-External-IRI-Call.png\" alt=\"External IRI Call\" width=\"600\" height=\"340\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/3-External-IRI-Call.png 1245w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/3-External-IRI-Call-300x170.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/3-External-IRI-Call-768x435.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/3-External-IRI-Call-1024x580.png 1024w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">From the graphic version of the benchmarks above, it is easy to see the value CoSort adds to Talend in data transformations. CoSort more efficiently performs multiple, complex data transformations in big data use cases without adding memory, server hardware, or Hadoop:<\/span><\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-10538 alignnone\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png\" alt=\"Talend-IRI Voracity Comparison Chart\" width=\"650\" height=\"287\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png 1110w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3-300x133.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3-768x339.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3-1024x452.png 1024w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">If you would like more information or have any comments suggestions, please provide feedback in the form below, or contact <\/span><a href=\"mailto:cosort@iri.com\"><span style=\"font-weight: 400;\">cosort@iri.com<\/span><\/a><span style=\"font-weight: 400;\">. Look for a subsequent article comparing the relative ETL ergonomics of IRI Voracity (powered by CoSort) and Talend.<\/span><\/p>\n<p><a href=\"http:\/\/bigdatadimension.com\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-10551 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/BigData-Dimension.png\" alt=\"BigData Dimension\" width=\"167\" height=\"72\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em>This third-party POC comparison was demonstrated by\u00a0Tahir Aziz of BigData Dimension Inc.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Talend has been on the market for several years now, and its flexible UI components make it a very reasonable choice for developers when it comes to customization. Talend Studio is built in Java, which gives it a certain degree of control and flexibility in managing its performance. That said, however, I found that Talend<\/p>\n<div><a class=\"btn-filled btn\" href=\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/\" title=\"Improving Talend Performance\">Read More<\/a><\/div>\n","protected":false},"author":102,"featured_media":10538,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[1],"tags":[1001,998,1183,388,71,100,81,1184,289,546,526,789,850,842,1185,497,1182,68,2017,2015,1181,2016,1094,940],"class_list":["post-10530","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-transformation2","tag-analytix-ds","tag-catfx","tag-cloudera","tag-datastage","tag-eclipse","tag-etl","tag-hadoop","tag-hortonworks","tag-informatica","tag-iri-cosort","tag-iri-rowgen","tag-iri-voracity","tag-iri-workbench","tag-java","tag-mapr","tag-pentaho","tag-poc","tag-sortcl","tag-talend-data-integration","tag-talend-etl","tag-talend-open-studio-6","tag-talend-studio","tag-transform-mapping-diagram","tag-workflow"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.4 (Yoast SEO v23.4) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Improving Talend Performance - IRI<\/title>\n<meta name=\"description\" content=\"Uncover the strengths and limitations of Talend for processing large data sets. Learn to speed Talend transforms with an IRI CoSort call out.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Improving Talend Performance\" \/>\n<meta property=\"og:description\" content=\"Uncover the strengths and limitations of Talend for processing large data sets. Learn to speed Talend transforms with an IRI CoSort call out.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/\" \/>\n<meta property=\"og:site_name\" content=\"IRI\" \/>\n<meta property=\"article:published_time\" content=\"2016-09-14T15:55:17+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-01-31T20:50:01+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1110\" \/>\n\t<meta property=\"og:image:height\" content=\"490\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Tahir Aziz\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Tahir Aziz\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/\"},\"author\":{\"name\":\"Tahir Aziz\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/27e0bb127bbe51a05e6ae7e353315959\"},\"headline\":\"Improving Talend Performance\",\"datePublished\":\"2016-09-14T15:55:17+00:00\",\"dateModified\":\"2025-01-31T20:50:01+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/\"},\"wordCount\":1142,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png\",\"keywords\":[\"AnalytiX DS\",\"CATfx\",\"Cloudera\",\"DataStage\",\"Eclipse\",\"ETL\",\"hadoop\",\"HortonWorks\",\"informatica\",\"IRI CoSort\",\"IRI RowGen\",\"IRI Voracity\",\"IRI Workbench\",\"Java\",\"MapR\",\"pentaho\",\"POC\",\"SortCL\",\"Talend Data Integration\",\"Talend ETL\",\"Talend Open studio 6\",\"Talend Studio\",\"transform mapping diagram\",\"workflow\"],\"articleSection\":[\"Data Transformation\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/\",\"url\":\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/\",\"name\":\"Improving Talend Performance - IRI\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png\",\"datePublished\":\"2016-09-14T15:55:17+00:00\",\"dateModified\":\"2025-01-31T20:50:01+00:00\",\"description\":\"Uncover the strengths and limitations of Talend for processing large data sets. Learn to speed Talend transforms with an IRI CoSort call out.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#primaryimage\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png\",\"width\":1110,\"height\":490,\"caption\":\"Talend-IRI Voracity Comparison Chart\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.iri.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Improving Talend Performance\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.iri.com\/blog\/#website\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"name\":\"IRI\",\"description\":\"Total Data Management Blog\",\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.iri.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\",\"name\":\"IRI\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"width\":750,\"height\":206,\"caption\":\"IRI\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/27e0bb127bbe51a05e6ae7e353315959\",\"name\":\"Tahir Aziz\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5d03f6244068d53050ed526ac4b32924?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5d03f6244068d53050ed526ac4b32924?s=96&d=blank&r=g\",\"caption\":\"Tahir Aziz\"},\"url\":\"https:\/\/www.iri.com\/blog\/author\/taziz\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Improving Talend Performance - IRI","description":"Uncover the strengths and limitations of Talend for processing large data sets. Learn to speed Talend transforms with an IRI CoSort call out.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/","og_locale":"en_US","og_type":"article","og_title":"Improving Talend Performance","og_description":"Uncover the strengths and limitations of Talend for processing large data sets. Learn to speed Talend transforms with an IRI CoSort call out.","og_url":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/","og_site_name":"IRI","article_published_time":"2016-09-14T15:55:17+00:00","article_modified_time":"2025-01-31T20:50:01+00:00","og_image":[{"width":1110,"height":490,"url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png","type":"image\/png"}],"author":"Tahir Aziz","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Tahir Aziz","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#article","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/"},"author":{"name":"Tahir Aziz","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/27e0bb127bbe51a05e6ae7e353315959"},"headline":"Improving Talend Performance","datePublished":"2016-09-14T15:55:17+00:00","dateModified":"2025-01-31T20:50:01+00:00","mainEntityOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/"},"wordCount":1142,"commentCount":0,"publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png","keywords":["AnalytiX DS","CATfx","Cloudera","DataStage","Eclipse","ETL","hadoop","HortonWorks","informatica","IRI CoSort","IRI RowGen","IRI Voracity","IRI Workbench","Java","MapR","pentaho","POC","SortCL","Talend Data Integration","Talend ETL","Talend Open studio 6","Talend Studio","transform mapping diagram","workflow"],"articleSection":["Data Transformation"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/","url":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/","name":"Improving Talend Performance - IRI","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#primaryimage"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png","datePublished":"2016-09-14T15:55:17+00:00","dateModified":"2025-01-31T20:50:01+00:00","description":"Uncover the strengths and limitations of Talend for processing large data sets. Learn to speed Talend transforms with an IRI CoSort call out.","breadcrumb":{"@id":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#primaryimage","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png","width":1110,"height":490,"caption":"Talend-IRI Voracity Comparison Chart"},{"@type":"BreadcrumbList","@id":"https:\/\/www.iri.com\/blog\/data-transformation2\/optimizing-talend-transforms-with-cosort\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.iri.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Improving Talend Performance"}]},{"@type":"WebSite","@id":"https:\/\/www.iri.com\/blog\/#website","url":"https:\/\/www.iri.com\/blog\/","name":"IRI","description":"Total Data Management Blog","publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.iri.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.iri.com\/blog\/#organization","name":"IRI","url":"https:\/\/www.iri.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","width":750,"height":206,"caption":"IRI"},"image":{"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/27e0bb127bbe51a05e6ae7e353315959","name":"Tahir Aziz","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5d03f6244068d53050ed526ac4b32924?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5d03f6244068d53050ed526ac4b32924?s=96&d=blank&r=g","caption":"Tahir Aziz"},"url":"https:\/\/www.iri.com\/blog\/author\/taziz\/"}]}},"jetpack_featured_media_url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/Talend-Voracity-Chart-3.png","_links":{"self":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/10530"}],"collection":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/users\/102"}],"replies":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/comments?post=10530"}],"version-history":[{"count":16,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/10530\/revisions"}],"predecessor-version":[{"id":18231,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/10530\/revisions\/18231"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media\/10538"}],"wp:attachment":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media?parent=10530"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/categories?post=10530"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/tags?post=10530"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}