Home » Solutions » Data Transformation » Scrub / Cleanse
Scrub/Cleanse 
» Resources
"Data scrubbing is about as enjoyable as cleaning an encrusted frying pan with a worn sponge. Specialized cleansing tools can be expensive, ranging in price from $20,000 to $300,000, depending on the scope of the project and the systems involved..."
John Edwards, CIO Magazine
Available Melissa Data Cleansing Functions:

• AddressObject
• AddressDoctor
• CleanAddress
• NameObject
• DQ*Plus
• DPV
• PhoneObject
• GeoCoder Object
• RBDI
• PersonatorAPI
• RightFielderAPI
• DoubleTakeAPI
make text smaller make text larger print this pageemail this page
Transform and Clean Big Data in the Same Pass
Challenges:
Data cleansing can be complicated, time-consuming, and expensive. The data quality functions inside your tools may not satisfy your business rules or do the whole job. Custom functions may have to run in separate batch steps, or within a special "script transform component" that you must connect to your tool's data flow and run in smaller chunks. When data volumes are large, cleansing times can really add up. The bottom line? If you have more than one million rows, you may find that improving data quality is an inefficient or cumbersome process.

Solutions:
The CoSort package's SortCL tool can scrub many large files at the same time it is transforming, protecting, and/or reporting from them. Native scrubbing functions you can perform or combine include:
• de-duplication
• character validation
• data homogenization
• find (scan) and replace
• horizontal, and conditional vertical selection

For advanced cleansing (based on complex business rules) at the field level, you can plug in your own functions or those in data quality vendor libraries. SortCL now supports custom transformations during the inrec or outfile phases of your job script. This means you can declare a cleansing function for any field in either place (i.e. up to two DQ routines per field, per job). One example in the CoSort documentation is a Melissa Data address standardization library.

The bottom line? With CoSort and the data quality library functions you have, you can cleanse your data in the same I/O in which you filter, transform, protect, and/or present it.

See also:
Select/Filter
Custom Transforms
Products > CoSort > SortCL

Request More Info:

* IRI WILL NOT share this info