IRI Blog Articles

Diving Deeper into Data Management



Screenshot of Direct Data Masking for MongoDB in the IRI Workbench

Direct Data Masking for MongoDB

by Claudia Irvine

In previous articles, we demonstrated file-based examples of masking data in, and generating test data for, MongoDB. Thanks to IRI’s recent success with Progress Software’s DataDirect drivers for MongoDB in the Voracity data management (ETL, etc.) platform and its included components like FieldShield and RowGen, you can manipulate and mask Mongo collection data without intermediate steps.

To expose your collections (tables) in data source explorer views, and ingest their metadata for use in IRI job creation wizards or other Workbench tools, you will need the DataDirect JDBC driver for MongoDB. You will also need the ODBC driver to move data between MongoDB collections in IRI software engines like FieldShield.

This example uses a CUSTOMERS collection as a source and masks the PHONE field using IRI FieldShield, while using ODBC to load the protected results into another MongoDB collection called CUSTOMERS_MASK.

After following the installation instructions for both drivers, you must use the DataDirect Schema Tool (supplied in each driver download) to tell the driver how to map your NoSQL data model to a relational model that IRI Workbench can read.

This tool is a graphical wizard that reads your Mongo database and allows you to select the type of structure you want to use: Normalized, Flattened, or Custom. After selecting Normalized, the tool shows the data structure of the database below.

The data connections can now be set up. Add a DSN in the ODBC Admin screen. When prompted, use the schema file created above. In the Advanced tab, unclick the Read Only box.

In the IRI Workbench, add a JDBC data connection in the Data Source Explorer. On the Optional properties screen, make sure to add the SchemaDefinition=path\mySchema.config with an absolute path to the schema file created above. Also, add a “ReadOnly=false” property to reverse the driver’s default behavior.

By using the JDBC connector, the data in both tables can been seen. The CUSTOMERS_MASK collection is empty before starting the job, while the PHONE field is unmasked in CUSTOMERS.

Select the New Multi-Table Protect Job from the FieldShield menu. On the first run, you will be prompted to map your JDBC connection to the ODBC connection. You can also do this in Properties before running the wizard. This example uses ODBC as both extractor and loader to transfer the data.

After selecting the CUSTOMERS collection as the data source and moving on to the Field Modification Rules page, the PHONE column will be masked. This page allows you to use a regular expression to find your desired column and create a new rule or browse for an existing one to apply to that column.

As seen in the “Details” text box, the PHONE field will be masked with “*” character starting at position 4. This will allow the area code of the US phone number to still be visible after masking.

After finishing the wizard, a FieldShield job script, executable batch file, and flow file (usually for use in Voracity ETL project design) are created. Because only one script is created during this job, either the batch file or the script can be executed.

Shown below is the script with the masking function highlighted and a view of the masked data in the CUSTOMERS_MASK collection after execution. Everything was transferred ‘as is’ while the PHONE field was partially masked.

In the next release of Voracity, you will be able to munge, mask, and mine MongoDB data even faster. Native BSON handling (via the CoSort v10 SortCL engine) will dramatically improve throughput in high volume MongoDB environments.

Print Friendly

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: