IRI Blog Articles

Diving Deeper into Data Management

 

 

Post image for How to Mask Data in Web Logs

How to Mask Data in Web Logs

by Chaitali Mitra

This article is third in a 3-part series on CLF and ELF web log data. We first introduced CLF and ELF web log formats, then introduced IRI solutions for processing web log data, and here we conclude by masking private data in web log files.

Web log files are created by, and stored on, website servers to track visitors’ clickstream information trail. Some of the information in these logs is sensitive or personally identifiable.

As we know from articles in the data masking sections of IRI’s blog and website, there are multiple ways to shield personally identifiable information (PII) or otherwise sensitive data in structured sources. String masking, for example, covers over (or redacts) original values using other characters. Encryption, on the other hand, produces ciphertext that de-identifies the original value, but allows its restoration (decryption).

IRI FieldShield software protects PII in databases and many other data sources — including web logs — with multiple field-level security functions. FieldShield can mask, encrypt, and randomize IP addresses, as well as other items subject to data protection and privacy laws. Its other de-identification functions include: pseudonymization, hashing, and sub-string manipulation.

Consider the sample Extended Log Format (ELF) file below. It contains the visit date, time, IP address, server IP address, port, protocol, number of transferred bytes, and the URL of the opened page:

2014-05-24,12:55:15,32.09.130.15,96.48.225.22,GET,80,200,10801,"http://www.iri.com/products/fieldshield/why-is-fieldshield-better"
2014-05-24,20:55:15,96.47.227.21,96.46.220.42,GET,80,200,10801,"http://www.iri.com/solutions/data-masking/encryption/format-preserving-encryption"
2014-05-24,22:18:01,12.41.114.23,96.45.225.98,GET,80,200,10801,"http://www.iri.com/solutions/data-masking/de-identification/overview"
2014-05-24,13:15:06,96.46.230.79,96.47.126.99,GET,80,200,10801,"http://www.iri.com/products/workbench/fieldshield-gui/apply-rules"
2014-05-24 23:15:06,96.45.226.19,95.47.214.50,GET,80,200,10801,"http://www.iri.com/blog/data-protection/data-risk-fieldshield-mitigation/"
2014-05-25,23:15:22,11.11.111.11,95.47.214.50,GET,80,200,10801,"http://www.iri.com/blog/test-data/rowgen-v3-automates-database-test-data-generation/"

Use the Encryption and Decryption dialog in the IRI Workbench GUI for FieldShield to apply field-level encryption. Below is an example of encrypting each visitor’s IP address with a format-preserving AES-256 function:

Capture

Similar dialogs exist for string masking, pseudonymization, randomization, hashing, de-ID, encoding, etc.

The portable FieldShield job script created automatically in the GUI (or by hand, if you prefer), reflects both field encryption and redaction:

/INFILE=rawlog.elf
   /PROCESS=ELF
      /FIELD=(DATE, POSITION=1,TYPE=ASCII, SEPARATOR=" ")
      /FIELD=(TIME, POSITION=2,TYPE=ASCII, SEPARATOR=" ")
      /FIELD=(C_IP, POSITION=3, SEPARATOR=" ", TYPE=IP_ADDRESS)
      /FIELD=(S_IP, POSITION=4, SEPARATOR=" ", TYPE=IP_ADDRESS)
      /FIELD=(CSMETHOD, POSITION=5,TYPE=ASCII, SEPARATOR=" ") 
      /FIELD=(S_PORT, POSITION=6, SEPARATOR=" ")
      /FIELD=(STATUS, POSITION=7, SEPARATOR=" ")
      /FIELD=(BYTES, POSITION=8, SEPARATOR=" ")
      /FIELD=(CS_URI_STEM, POSITION=9, SEPARATOR=" ",TYPE=ASCII,FRAME='"')
   /OMIT WHERE C_IP EQ "11.11.111.11"

/OUTFILE=maskedlog.elf
   /PROCESS=ELF
   /HEADREC="DATE        TIME    MASKED IP    CS_URI_STEM\n\n"
      /FIELD=(DATE, POSITION=1, SIZE=12, TYPE=ASCII)
      /FIELD=(TIME, POSITION=15, SIZE=10, TYPE=ASCII)
      /FIELD=(ENC_AES256_C_IP=enc_fp_aes256_alphanum(C_IP), POSITION=30, SIZE=12, TYPE=IP_ADDRESS)
      /FIELD=(replace_chars(CS_URI_STEM , "*",7, 8, "#", 30, 8), POSITION=45, SIZE=55, TYPE=ASCII)

After running the script, we get the ELF-style output desired …  but in fixed position, and compliant with privacy regulations.

DATE        TIME    MASKED IP    CS_URI_STEM
2014-05-24 12:55:15 32.09.130.15 http:/********.com/products/f########ld/why-is-fieldshi
2014-05-24 13:15:06 05.07.569.95 http:/********.com/products/w########/fieldshield-gui/a
2014-05-24 20:55:15 98.68.117.52 http:/********.com/solutions/########king/encryption/fo
2014-05-24 22:18:01 69.67.212.32 http:/********.com/solutions/########king/de-identifica
2014-05-24 23:15:06 42.01.555.73 http:/********.com/blog/data-########on/data-risk-field

See additional formatting, filtering, transformation, and calculation functions in the previous blog on CLF and ELF Web Log Data Processing. Contact fieldshield@iri.com for assistance.

Print Friendly

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: