
Using Data Templates to Find Data Format Errors
In Working towards Data Quality, we defined data quality (DQ) as a state in which data can be used for operations. What makes the quality of data high is the paucity of errors.
In Working towards Data Quality, we defined data quality (DQ) as a state in which data can be used for operations. What makes the quality of data high is the paucity of errors.
Note: This article, first published in 2014, refers to date reformatting technology in the SortCL data manipulation program central to the IRI CoSort data transformation product and larger IRI Voracity data management platform.
Introduction
In this article, I suggest ways to move your company’s data towards a higher state of quality. The highest quality occurs when the data meets the needs of your company.
Data profiling, or data discovery, refers to the process of obtaining information from, and descriptive statistics about, various sources of data. The purpose of data profiling is to get a better understanding of the content of data, as well as its structure, relationships, and current levels of accuracy and integrity.
Note: This article was originally drafted in 2015, but was updated in 2019 to reflect new integration between IRI Voracity and Knime (for Konstanz Information Miner), now the most powerful open source data mining platform available.
The increasing sophistication of software applications and the expanding role of database testers require high volumes of high quality, realistic test data that can faithfully represent existing, and stress-test new, platforms.
One of the best ways to speed up big data processing operations is to not process so much data in the first place; i.e. to eliminate unnecessary data ahead of time.
Data validation is a process that ensures a program operates with clean, correct and useful data. It uses routines known as validation rules that systematically check for correctness and meaningfulness of data that are entered into the system.
IRI Voracity is an affordable data management platform that streamlines information architectures and helps enterprises leverage the intrinsic value of data without the cost or complexity of multiple tools.