{"id":11993,"date":"2018-02-05T09:55:09","date_gmt":"2018-02-05T14:55:09","guid":{"rendered":"http:\/\/www.iri.com\/blog\/?p=11993"},"modified":"2025-02-06T14:26:22","modified_gmt":"2025-02-06T19:26:22","slug":"hipaa-re-id-risk-scoring","status":"publish","type":"post","link":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/","title":{"rendered":"Scoring Datasets for Re-ID Risk"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">One of the biggest concerns with releasing a dataset is the risk that a potential attacker can identify the owners of particular records. Even though masking or removing unique identifiers, like names and Social Security Numbers, can reduce that risk substantially, it may still not be enough. Harvard professor Latanya Sweeney reported that 87% of the U.S population can be identified using only their gender, date of birth, and a 5-digit zip code<span id='easy-footnote-1-11993' class='easy-footnote-margin-adjust'><\/span><span class='easy-footnote'><a href='https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#easy-footnote-bottom-1-11993' title='&lt;a href=&quot;https:\/\/epic.org\/privacy\/reidentification\/Sweeney_Article.pdf&quot;&gt;https:\/\/epic.org\/privacy\/reidentification\/Sweeney_Article.pdf&lt;\/a&gt;'><sup>1<\/sup><\/a><\/span><\/span><span style=\"font-weight: 400;\">. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">To prevent data breaches and comply with the Health Insurance Portability and Accountability Act (HIPAA) and GDPR Recital 26, you should also de-identify such \u201cquasi-identifiers\u201d (along with \u201ckey identifiers\u201d like name and SSN) to the point where the risk of re-identification is statistically moot<span id='easy-footnote-2-11993' class='easy-footnote-margin-adjust'><\/span><span class='easy-footnote'><a href='https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#easy-footnote-bottom-2-11993' title='The HIPAA Expert Determination Method (EDM) requires a &lt;a href=&quot;https:\/\/www.hhs.gov\/hipaa\/for-professionals\/special-topics\/de-identification\/index.html#standard&quot;&gt;very low risk of re-identification&lt;\/a&gt;.'><sup>2<\/sup><\/a><\/span><\/span><span style=\"font-weight: 400;\">. IRI <\/span><a href=\"\/solutions\/data-masking\"><span style=\"font-weight: 400;\">data masking<\/span><\/a><span style=\"font-weight: 400;\"> software is designed to do that, especially to anonymize healthcare data (PHI) for privacy compliance, but how do you assess the results?\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This article shows how to credibly score the HIPAA re-identification risk of structured data de-identified with IRI <\/span><a href=\"\/products\/fieldshield\"><span style=\"font-weight: 400;\">FieldShield<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"\/products\/cellshield\"><span style=\"font-weight: 400;\">CellShield<\/span><\/a><span style=\"font-weight: 400;\"> or <a href=\"https:\/\/www.iri.com\/products\/darkshield\">DarkShield<\/a><\/span><span style=\"font-weight: 400;\"> using the risk scoring wizard now available as a plugin to <\/span><a href=\"\/products\/workbench\"><span style=\"font-weight: 400;\">IRI Workbench<\/span><\/a><span style=\"font-weight: 400;\">. While it was designed with HIPAA EDM security rule in mind, student data privacy laws like <a href=\"https:\/\/www.iri.com\/solutions\/data-masking\/ferpa\">FERPA<\/a> also indicates the application of such an approach.<\/span><\/p>\n<h2><b>Loading Data<\/b><\/h2>\n<figure id=\"attachment_11994\" class=\"thumbnail wp-caption aligncenter style=\"width: 610px\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11994\" src=\"\/blog\/wp-content\/uploads\/2018\/02\/arx-risk-scoring-gui-1-1024x592.png\" alt=\"FieldShield New Field Rule Wizard\" width=\"600\" height=\"347\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/arx-risk-scoring-gui-1-1024x592.png 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/arx-risk-scoring-gui-1-300x173.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/arx-risk-scoring-gui-1-768x444.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/arx-risk-scoring-gui-1.png 1165w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><figcaption class=\"caption wp-caption-text\">IRI FieldShield data masking job in the IRI Voracity platform\u2019s Eclipse IDE, IRI Workbench<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">In our example, we are going to analyze the re-ID risk of a delimited file with the unique identifier having already been masked by FieldShield. Our interest is now to ascertain that the dataset cannot be re-identified through its quasi-identifiers alone.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To open the Re-ID Scoring wizard, select <\/span><i><span style=\"font-weight: 400;\">New Re-ID Risk Scoring <\/span><\/i><span style=\"font-weight: 400;\">from the FieldShield <\/span><span style=\"font-weight: 400;\"> menu. You may also navigate to the wizard by selecting <\/span><i><span style=\"font-weight: 400;\">File -&gt; New -&gt; Other <\/span><\/i><span style=\"font-weight: 400;\">and selecting <\/span><i><span style=\"font-weight: 400;\">Re-ID Risk Scoring <\/span><\/i><span style=\"font-weight: 400;\">from the <\/span><i><span style=\"font-weight: 400;\">IRI <\/span><\/i><span style=\"font-weight: 400;\">category.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-12061\" src=\"\/blog\/wp-content\/uploads\/2018\/02\/workbench-new-risk-scorer.png\" alt=\"new risk scorer\" width=\"600\" height=\"370\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/workbench-new-risk-scorer.png 880w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/workbench-new-risk-scorer-300x185.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/workbench-new-risk-scorer-768x473.png 768w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<h2><b>Specifying the Data Source<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Once you have opened the Re-ID Scoring wizard you are prompted to select the source of the dataset that you wish to score. The wizard can take either flat, character delimited files, like Comma-Separated Values (CSV) or Tab-Separated Values (TSV) files, or a database table connected in Workbench via its <\/span><a href=\"https:\/\/www.eclipse.org\/datatools\/\"><span style=\"font-weight: 400;\">Data Tools Platform<\/span><\/a><span style=\"font-weight: 400;\"> (DTP) plug-in, as seen in the Data Source Explorer.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Depending on the choice, you will be required to provide specific configuration details on the next page. For flat files, the file path must be provided, along with the character set, delimiter, and frame character used within the dataset. For databases, a <\/span><i><span style=\"font-weight: 400;\">Connection Profile<\/span><\/i><span style=\"font-weight: 400;\"> from one of the DTP connections and the corresponding <\/span><i><span style=\"font-weight: 400;\">Database Table<\/span><\/i><span style=\"font-weight: 400;\"> need to be selected from a dropdown list or otherwise created from a dialog in the wizard. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Both file and table scoring require you to specify the <\/span><i><span style=\"font-weight: 400;\">Superset Region<\/span><\/i><span style=\"font-weight: 400;\"> that will be used to estimate the uniqueness of your dataset against a broader population. The region should match the national background of the most amount of records within your dataset.<\/span><\/p>\n<h2><b>Previewing your Data<\/b><\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-12060\" src=\"\/blog\/wp-content\/uploads\/2018\/02\/workbench-new-re-id-risk-scoring.png\" alt=\"new de-id risk scoring preview\" width=\"600\" height=\"428\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/workbench-new-re-id-risk-scoring.png 611w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/workbench-new-re-id-risk-scoring-300x214.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">To ensure that you have correctly configured your data source, a preview page shows a subset of your data which the wizard structured based on your configuration. For flat files, the <\/span><i><span style=\"font-weight: 400;\">delimiter<\/span><\/i><span style=\"font-weight: 400;\"> and <\/span><i><span style=\"font-weight: 400;\">frame<\/span><\/i><span style=\"font-weight: 400;\"> characters will influence the number of records and how each individual attribute is separated from one another. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">If your dataset is incorrectly delimited or shows up with an error in the preview, you may need to go back to the previous step and enter the correct configuration options. For flat files there is an additional option to specify if the first line within the file is a header row. Otherwise, default field names will be provided and used in the risk score.<\/span><\/p>\n<h2><b>Labeling Attributes<\/b><\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-12062\" src=\"\/blog\/wp-content\/uploads\/2018\/02\/workbench-risk-scoring-attributes.png\" alt=\"risk scoring attributes\" width=\"600\" height=\"428\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/workbench-risk-scoring-attributes.png 614w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/workbench-risk-scoring-attributes-300x214.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Once you are satisfied with the layout and content of your data, it is time to label each attribute based on the risk associated with exposing these attributes, and how they can be used by a potential attacker. These are, in decreasing order of risk for re-identification:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><i><span style=\"font-weight: 400;\">Identifying <\/span><\/i><span style=\"font-weight: 400;\">&#8211; Attributes that are unique to each record in the dataset. Examples: Name, Social Security Number.<\/span><\/li>\n<li style=\"font-weight: 400;\"><i><span style=\"font-weight: 400;\">Quasi-Identifying<\/span><\/i><span style=\"font-weight: 400;\"> &#8211; Attributes that, in combination with other quasi-identifiers, can be used to re-identify a record. Examples: Age, race, zip code.<\/span><\/li>\n<li style=\"font-weight: 400;\"><i><span style=\"font-weight: 400;\">Sensitive <\/span><\/i><span style=\"font-weight: 400;\">&#8211; Attributes that are meant to be private information which are at risk of being exposed in re-identification attacks. Examples: Medical conditions, salary.<\/span><\/li>\n<li style=\"font-weight: 400;\"><i><span style=\"font-weight: 400;\">Insensitive <\/span><\/i><span style=\"font-weight: 400;\">&#8211; Attributes that cannot be used to re-identify a record or reveal sensitive information about a person. Examples: an already redacted or encrypted name or country.<\/span><\/li>\n<\/ul>\n<h2><b>Generating Output<\/b><\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-12063\" src=\"\/blog\/wp-content\/uploads\/2018\/02\/workbench-risk-scoring-output.png\" alt=\"risk scoring output\" width=\"600\" height=\"428\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/workbench-risk-scoring-output.png 614w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/workbench-risk-scoring-output-300x214.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">The last page of the wizard allows you to specify your output folder within the Workbench and a name for the output files. From the information that was collected, the wizard will generate a <\/span><i><span style=\"font-weight: 400;\">riskscorer<\/span><\/i><span style=\"font-weight: 400;\"> model file which can be used to show your risk score, or to make modifications to previous configurations without having to re-create them from scratch in another wizard session.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The output page also provides an option for creating an HTML report based on the generated score. Once you are satisfied with your input, select <\/span><i><span style=\"font-weight: 400;\">Finish <\/span><\/i><span style=\"font-weight: 400;\">to generate your results. They will appear in the workbench and optionally in the folder you specified.<\/span><\/p>\n<h2><b>Results: Risk Score<\/b><\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-12054\" src=\"\/blog\/wp-content\/uploads\/2018\/02\/risk-score-results-1024x565.png\" alt=\"risk score graphs and charts\" width=\"600\" height=\"331\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-score-results-1024x565.png 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-score-results-300x165.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-score-results-768x423.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-score-results.png 1219w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">The score view allows you to quantify the risk of re-identification based on the quasi-identifiers in the dataset. The risk is calculated for 3 different attacker models:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><i><span style=\"font-weight: 400;\">Prosecutor <\/span><\/i><span style=\"font-weight: 400;\">&#8211; an attacker targeting a specific record based on their background knowledge of the target.<\/span><\/li>\n<li style=\"font-weight: 400;\"><i><span style=\"font-weight: 400;\">Journalist <\/span><\/i><span style=\"font-weight: 400;\">&#8211; an attacker with no background knowledge on anyone in the dataset, but is is trying to randomly re-identify a record.<\/span><\/li>\n<li style=\"font-weight: 400;\"><i><span style=\"font-weight: 400;\">Marketer <\/span><\/i><span style=\"font-weight: 400;\">&#8211; an attacker only interested in re-identifying as many records as possible<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">The score for your dataset is influenced by 3 factors: the number of records at risk, the highest risk that a record will be re-identified, and the success rate for re-identifying a randomly selected record.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Below the dials is the \u201cEquivalence Class Chart\u201d showing how many records are currently at risk of being re-identified. An <\/span><i><span style=\"font-weight: 400;\">equivalence class<\/span><\/i><span style=\"font-weight: 400;\"> represents the number of records that share the same set of quasi-identifiers, thus making the records in the class indistinguishable from one another without any background knowledge. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Higher equivalence classes correspond to records that are more protected from re-identification attacks. Although no standard exists for what the lowest equivalence class in a dataset should be, peer-reviewed research suggests that all records should belong to an equivalence class greater than or equal to 5 (that is, a 20% or lower risk of re-identification by chance)<span id='easy-footnote-3-11993' class='easy-footnote-margin-adjust'><\/span><span class='easy-footnote'><a href='https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#easy-footnote-bottom-3-11993' title='&lt;a href=&quot;https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2826964\/&quot;&gt;https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2826964\/&lt;\/a&gt;'><sup>3<\/sup><\/a><\/span><\/span><span style=\"font-weight: 400;\">. This is reflected by the red and green bars, which show the amount of records that are at risk or safe, respectively.<\/span><\/p>\n<h2><b>Results: Quasi-Identifier Risks<\/b><\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-12052\" src=\"\/blog\/wp-content\/uploads\/2018\/02\/risk-score-quasi-identifier-risks-1024x565.png\" alt=\"quasi-identifier risks\" width=\"600\" height=\"331\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-score-quasi-identifier-risks-1024x565.png 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-score-quasi-identifier-risks-300x166.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-score-quasi-identifier-risks-768x424.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-score-quasi-identifier-risks.png 1217w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Remember that <\/span><i><span style=\"font-weight: 400;\">quasi-identifiers <\/span><\/i><span style=\"font-weight: 400;\">are subset attributes which can be used to identify a record in a dataset, especially when combined with other quasi-identifiers. For example, age alone is a powerful indicator of a person\u2019s identity, but age and race could narrow down the possible identity even more. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">To represent the risk associated with the different quasi-identifier combination, IRI provides a view to show how different combinations of quasi-identifiers affect the re-identification risk. It is \u00a0based on quasi-identifier metrics, <\/span><i><span style=\"font-weight: 400;\">distinction <\/span><\/i><span style=\"font-weight: 400;\">and <\/span><i><span style=\"font-weight: 400;\">separation<\/span><\/i><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">Distinction<\/span><\/i><span style=\"font-weight: 400;\"> represents the ratio between the unique values for the quasi-identifiers and the total number of records. <\/span><i><span style=\"font-weight: 400;\">Separation <\/span><\/i><span style=\"font-weight: 400;\">represents the ratio between pairs of records with at least one different value for their quasi-identifiers, and the total number of ways that two different records can be paired<span id='easy-footnote-4-11993' class='easy-footnote-margin-adjust'><\/span><span class='easy-footnote'><a href='https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#easy-footnote-bottom-4-11993' title=' &lt;a href=&quot;https:\/\/ieeexplore.ieee.org\/abstract\/document\/9652545&quot;&gt;https:\/\/ieeexplore.ieee.org\/abstract\/document\/9652545&lt;\/a&gt;\u00a0'><sup>4<\/sup><\/a><\/span><\/span><span style=\"font-weight: 400;\">. In general, a higher distinction and separation are indicators that the quasi-identifiers are more likely to re-identify a record.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To figure out the distinction and separation values of different combinations of quasi-identifiers, select the dropdown tab to show those values when that quasi-identifier is used in combination with the remaining quasi-identifiers. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">The graph on the left updates its data points to reflect the distinction and separation values of the new quasi-identifier combinations you select in the dropdown tree on right right, which chains several quasi-identifiers together. The last level in the tree represents the distinction and separation values for using every quasi-identifier within your dataset to re-identify the dataset.<\/span><\/p>\n<h2><b>Results: Report<\/b><\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-12053\" src=\"\/blog\/wp-content\/uploads\/2018\/02\/risk-score-report.png\" alt=\"risk scoring report uniqueness estimate\" width=\"600\" height=\"371\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-score-report.png 989w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-score-report-300x185.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-score-report-768x474.png 768w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">The wizard can also generate an html report which saves into the project folder that you specified. The report provides a more formal and verbose representation of the results. Along with the tabs for the different attacker risks, quasi-identifiers, and equivalence classes, the report has additional information on the dataset\u2019s uniqueness that is not displayed in the two views discussed above.<\/span><\/p>\n<h2><b>Re-Scoring Dataset and Showing Views<\/b><\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-12055\" src=\"\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-view.png\" alt=\"re-scoring data risk with views\" width=\"600\" height=\"275\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-view.png 982w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-view-300x137.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-view-768x352.png 768w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">To view or modify the risk score model\u2019s quasi-identifier attributes outside the wizard where you first labeled them, right-click on the <\/span><i><span style=\"font-weight: 400;\">.riskscorer <\/span><\/i><span style=\"font-weight: 400;\">file within your project. Click into the IRI options to either <\/span><i><span style=\"font-weight: 400;\">show<\/span><\/i><span style=\"font-weight: 400;\"> your risk score again &#8212; which displays the risk score and quasi-identifiers views for that model &#8212; or <\/span><i><span style=\"font-weight: 400;\">edit <\/span><\/i><span style=\"font-weight: 400;\">your risk score model so that you can re-use it in the wizard without having to re-label all the attributes again.<\/span><\/p>\n<h2><b>Next Steps<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Having scored the risk of re-identification for the dataset, it may or may not be necessary to have FieldShield perform further de-identification; i.e., <a href=\"https:\/\/www.iri.com\/solutions\/data-masking\/static-data-masking\">masking<\/a> missed key-identifiers, or <a href=\"https:\/\/www.iri.com\/solutions\/data-masking\/static-data-masking\/blur\">generalizing \/ blurring<\/a> quasi-identifiers. IRI can help you with these processes, and also refer you to the services of a certified HIPAA statistician and legal counsel to help you assess and use the results, as well as prepare certification and breach defense paperwork. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">IRI provides a course in all of the above, described <\/span><a href=\"\/ftp9\/pdf\/FieldShield\/HIPAA_Data_Certification_Course_Outline.pdf\"><span style=\"font-weight: 400;\">here<\/span><\/a><span style=\"font-weight: 400;\">. \u00a0<\/span><span style=\"font-weight: 400;\">If you need help using the technologies together for HIPAA compliance or another application in which re-identifiability risk is at issue, please contact your IRI <\/span><a href=\"\/company\/contact\"><span style=\"font-weight: 400;\">representative<\/span><\/a><span style=\"font-weight: 400;\"> or expert <\/span><a href=\"\/partners\/experts\/data-masking\"><span style=\"font-weight: 400;\">partner<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the biggest concerns with releasing a dataset is the risk that a potential attacker can identify the owners of particular records. Even though masking or removing unique identifiers, like names and Social Security Numbers, can reduce that risk substantially, it may still not be enough. Harvard professor Latanya Sweeney reported that 87% of<\/p>\n<div><a class=\"btn-filled btn\" href=\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/\" title=\"Scoring Datasets for Re-ID Risk\">Read More<\/a><\/div>\n","protected":false},"author":112,"featured_media":12064,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[8,34],"tags":[20,14,1305,211,13,1300,1299,1301,1352,9,1219,1751,1354,412,604,1377,1752,1296],"class_list":["post-11993","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-protection","category-business","tag-data-anonymization","tag-data-masking","tag-data-masking-tools","tag-data-privacy-laws","tag-data-protection-2","tag-data-risk","tag-de-id","tag-de-id-phi","tag-ferpa","tag-fieldshield","tag-gdpr","tag-healthcare-data","tag-hipaa-expert-determination","tag-phi","tag-protected-health-information","tag-re-id-risk-determination","tag-re-identification","tag-risk-scoring"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.4 (Yoast SEO v23.4) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Scoring Datasets for Re-ID Risk - IRI<\/title>\n<meta name=\"description\" content=\"Understand the role of re-ID risk determination in HIPAA compliance. Learn how IRI FieldShield users score and lower re-identification risk.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Scoring Datasets for Re-ID Risk\" \/>\n<meta property=\"og:description\" content=\"Understand the role of re-ID risk determination in HIPAA compliance. Learn how IRI FieldShield users score and lower re-identification risk.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/\" \/>\n<meta property=\"og:site_name\" content=\"IRI\" \/>\n<meta property=\"article:published_time\" content=\"2018-02-05T14:55:09+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-02-06T19:26:22+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-charts.png\" \/>\n\t<meta property=\"og:image:width\" content=\"804\" \/>\n\t<meta property=\"og:image:height\" content=\"246\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Dmitry Kulakov\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Dmitry Kulakov\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/\"},\"author\":{\"name\":\"Dmitry Kulakov\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/6434d748d01ce766d6a2ff576d747cfb\"},\"headline\":\"Scoring Datasets for Re-ID Risk\",\"datePublished\":\"2018-02-05T14:55:09+00:00\",\"dateModified\":\"2025-02-06T19:26:22+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/\"},\"wordCount\":1638,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-charts.png\",\"keywords\":[\"data anonymization\",\"data masking\",\"data masking tools\",\"data privacy laws\",\"data protection\",\"data risk\",\"de-id\",\"de-id phi\",\"FERPA\",\"FieldShield\",\"GDPR\",\"healthcare data\",\"HIPAA Expert Determination\",\"PHI\",\"protected health information\",\"re-ID risk determination\",\"re-identification\",\"risk scoring\"],\"articleSection\":[\"Data Masking\/Protection\",\"IRI Business\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/\",\"url\":\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/\",\"name\":\"Scoring Datasets for Re-ID Risk - IRI\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-charts.png\",\"datePublished\":\"2018-02-05T14:55:09+00:00\",\"dateModified\":\"2025-02-06T19:26:22+00:00\",\"description\":\"Understand the role of re-ID risk determination in HIPAA compliance. Learn how IRI FieldShield users score and lower re-identification risk.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#primaryimage\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-charts.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-charts.png\",\"width\":804,\"height\":246,\"caption\":\"workbench risk scoring charts\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.iri.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Scoring Datasets for Re-ID Risk\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.iri.com\/blog\/#website\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"name\":\"IRI\",\"description\":\"Total Data Management Blog\",\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.iri.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\",\"name\":\"IRI\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"width\":750,\"height\":206,\"caption\":\"IRI\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/6434d748d01ce766d6a2ff576d747cfb\",\"name\":\"Dmitry Kulakov\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/c74394d71a1044376b7336db0b3ab4c7?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/c74394d71a1044376b7336db0b3ab4c7?s=96&d=blank&r=g\",\"caption\":\"Dmitry Kulakov\"},\"url\":\"https:\/\/www.iri.com\/blog\/author\/dmitryk\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Scoring Datasets for Re-ID Risk - IRI","description":"Understand the role of re-ID risk determination in HIPAA compliance. Learn how IRI FieldShield users score and lower re-identification risk.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/","og_locale":"en_US","og_type":"article","og_title":"Scoring Datasets for Re-ID Risk","og_description":"Understand the role of re-ID risk determination in HIPAA compliance. Learn how IRI FieldShield users score and lower re-identification risk.","og_url":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/","og_site_name":"IRI","article_published_time":"2018-02-05T14:55:09+00:00","article_modified_time":"2025-02-06T19:26:22+00:00","og_image":[{"width":804,"height":246,"url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-charts.png","type":"image\/png"}],"author":"Dmitry Kulakov","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Dmitry Kulakov","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#article","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/"},"author":{"name":"Dmitry Kulakov","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/6434d748d01ce766d6a2ff576d747cfb"},"headline":"Scoring Datasets for Re-ID Risk","datePublished":"2018-02-05T14:55:09+00:00","dateModified":"2025-02-06T19:26:22+00:00","mainEntityOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/"},"wordCount":1638,"commentCount":1,"publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-charts.png","keywords":["data anonymization","data masking","data masking tools","data privacy laws","data protection","data risk","de-id","de-id phi","FERPA","FieldShield","GDPR","healthcare data","HIPAA Expert Determination","PHI","protected health information","re-ID risk determination","re-identification","risk scoring"],"articleSection":["Data Masking\/Protection","IRI Business"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/","url":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/","name":"Scoring Datasets for Re-ID Risk - IRI","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#primaryimage"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-charts.png","datePublished":"2018-02-05T14:55:09+00:00","dateModified":"2025-02-06T19:26:22+00:00","description":"Understand the role of re-ID risk determination in HIPAA compliance. Learn how IRI FieldShield users score and lower re-identification risk.","breadcrumb":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#primaryimage","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-charts.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-charts.png","width":804,"height":246,"caption":"workbench risk scoring charts"},{"@type":"BreadcrumbList","@id":"https:\/\/www.iri.com\/blog\/data-protection\/hipaa-re-id-risk-scoring\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.iri.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Scoring Datasets for Re-ID Risk"}]},{"@type":"WebSite","@id":"https:\/\/www.iri.com\/blog\/#website","url":"https:\/\/www.iri.com\/blog\/","name":"IRI","description":"Total Data Management Blog","publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.iri.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.iri.com\/blog\/#organization","name":"IRI","url":"https:\/\/www.iri.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","width":750,"height":206,"caption":"IRI"},"image":{"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/6434d748d01ce766d6a2ff576d747cfb","name":"Dmitry Kulakov","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/c74394d71a1044376b7336db0b3ab4c7?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c74394d71a1044376b7336db0b3ab4c7?s=96&d=blank&r=g","caption":"Dmitry Kulakov"},"url":"https:\/\/www.iri.com\/blog\/author\/dmitryk\/"}]}},"jetpack_featured_media_url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2018\/02\/risk-scoring-charts.png","_links":{"self":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/11993"}],"collection":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/users\/112"}],"replies":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/comments?post=11993"}],"version-history":[{"count":20,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/11993\/revisions"}],"predecessor-version":[{"id":18236,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/11993\/revisions\/18236"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media\/12064"}],"wp:attachment":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media?parent=11993"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/categories?post=11993"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/tags?post=11993"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}