Quasi-identifiers, or indirect identifiers, are personal attributes that are true about, but not necessarily unique, to an individual. Examples are one’s age or date of birth, race, salary, educational attainment, occupation, marital status and zip code.
One of the biggest concerns with releasing a dataset is the risk that a potential attacker can identify the owners of particular records. Even though masking or removing unique identifiers, like names and Social Security Numbers, can reduce that risk substantially, it may still not be enough.
According to Simson L. Garfinkel at the NIST Information Access Division’s Information Technology Laboratory,
De-identification is not a single technique, but a collection of approaches, algorithms, and tools that can be applied to different kinds of data with differing levels of effectiveness.