Defining Startpoint Data Security

by David Friedland

IRI has discussed startpoint security in further detail with the Outlook Series in a segment about data masking.

This article defines what we’d like to call “startpoint security” mostly by virtue of a comparison to endpoint security. Searches for the former didn’t yield anything on point, so at least for me, this is a case of first impression and an attempt to coin a term. My call for beholder (your) feedback on whether it’s bold, bogus or something in between, is via the comment button below.

We hear about endpoint security all the time. It refers to protection technologies for sensitive information in points along a network. Endpoint security covers mobile devices, laptops, and desktop PCs, as well as the servers and networking devices they connect through. It may also refer to storage devices like thumb and hard drives, and even more granular points within, including folders, files, and entire databases that can be encrypted, for example.

But what about securing data at its starting points; i.e., as data gets created by users and through applications that feed columns in databases (or values in files)? This is the only place where actually sensitive data is created, stored/queried, processed and moved along endpoints.

The data masking industry is built around the concept of atomically protecting the personally identifiable information (PII) directly in the data source. Securing PII directly at these startpoints instead of (or at least in addition to) its endpoints with different techniques has several benefits, including:

Efficiency – it’s much quicker (and less resource intensive) to encrypt or apply other de-identification functions to discrete values than to everything else around them
Usability – by masking only what’s sensitive, the data around it is still accessible
Breach nullification – any misappropriated data is already de-identified
Accountability – data lineage and audit logs pointing to specific element protections are a better way to verify compliance with privacy laws applicable to specific PII (identifiers).
Security – Multiple data masking techniques are harder to reverse than a single endpoint protection technique. For example, if the same encryption algorithm that was used to secure a network or hard drive is also used to mask just one field (while other fields are protected with other functions) is compromised, think about the difference in exposure.
Testing – masked production data can also be used for prototyping and benchmarking
Independence – data secured at its atomic source can move safely between databases, applications, and platforms (be they on-premise or the the cloud).

For those in the data governance industry, it may seem as if I’m just creating a new buzzword for data masking called startpoint security. I am not, however, equating them. Under my definition, startpoint security would also take into consideration the following:

Data Discovery – the ability to find via pattern, fuzzy logic, and other searches the PII
Data Classification – grouping discovered data into logical categories for global masking
Data Lineage – tracking PII value and/or location changes through time for surety, etc.
Data Latency – whether masking functions get applied to data at rest or in transit
Metadata Lineage – recording and analyzing the changes to layouts and job definitions
Authorization – managing who can mask, and/or access (restore), the data
Risk Scoring – determining the statistical likelihood of re-identification (think HIPAA )
Audit Logs – being able to query who masked what, and who saw what, when, and where.

Many of these additional considerations are not exclusive to startpoint security, but I think we can agree that classification, lineage, and latency are more relevant in the data-centric realm than they are to endpoint security.

What’s your opinion? Please leave me a comment below and we’ll start the discussion there.

Which Data Masking Function Should I Use?

Scoring Datasets for Re-ID Risk