Overview
IRI HQ Training
On-Site Training
Self-Directed Learning
Certification
- Installation
- IRI Workbench
- Job Scheduling
- Data Discovery
- Data Integration
- Data & Database Migration
- Data Governance
- Analytics & BI
Self-Directed Learning Site for IRI Software
This page links you to free, self-help content in the many data management objectives that IRI software products can help you achieve.
Please note that:
- All IRI software users, from freemium to licensed subscribers, can access this material. Only supported users, however, can get help from IRI beyond this content.
- The IRI software product or licensing agreement you have may or not support the functionality, or entitle you to support for that feature even if the feature itself is enabled in your product. For more information, please refer to this product-feature matrix.
- Additional training materials are available on request, custom designed for your company's requirements and certification-oriented learning. Please describe your material requirements below and an IRI services representative will contact you.
IRI Product Installation
- IRI Product Names & Architecture
- Installation Manual for all IRI Products
- IRI Workbench Demo Projects in Git
- Connecting to RDBs (via O/JDBC)
- Using Flat Files in Cloud Stores
- Connection Registry
- DB Data Type Mapping Wizard
- DSN Files
- IRI Ripcurrent (Real-Time DB CDC)
- Multi-Table Filtering
- Connecting to NoSQL DBs
- Cassandra
- using Flat (CSV) File Import/Export
- using Native Driver, All Collections
- Elasticsearch
- MarkLogic
- MongoDB
- using FieldShield (Flat Collections)
- using DarkShield (All Collections)
- Cassandra
IRI Workbench - General
- Getting Started
- Job Design Methods
- New Job Wizards (see Welcome > First Steps)
- Dialogs (use ? for in-context Help)
- Script Editor & Outline (see product manuals for syntax)
- Mapping Diagrams (see Work Flow from the Palette below)
- erwin Mapping Manager
- DataSwitch (no-code, AI-enabled data engineering)
- Java apps via Gulfstream API
- Flow Design (how-to articles for Voracity ETL and other batch jobs)
- Compatible Plugins
- Job Deployment
- Command line ($sortcl /spec=jobname.scl)
- Single Job Execution Options
- Batch Job (Work Flow) Execution Options
- Remote Connection/Executions
- Running Voracity Jobs in Hadoop
- Running Voracity Jobs with KNIME
- Job Scheduling
Data Discovery & Classification
- DB Profiling (includes 1+ table searching)
- Flat-File Profiling
- NoSQL DB Data Class Search (and Mask)
- Structured Metadata Discovery
- IRI Metadata Search (via Git)
- Structured (DB & Flat-File) Data Classification
- Directory Data Class Search (structured files)
- Schema Data Class Search (all tables in 1 or more RDB schema)
- Schema-wide Pattern Search (all tables in an RDB schema)
- Unstructured File Search, Extract, Structure & Profile (text, documents, images, faces)
- Data Discovery Charts in IRI Workbench
Data Integration
- Data Integration Architectures (and Voracity DI paradigms)
- Enterprise Data Warehouse
- Logical Data Warehouse
- Operational Data Store / Enterprise Data Hub
- Production Analytic Platform (4-part series)
- Data Lake
- ETL vs. ELT
- ETL Job Design (see IRI Workbench > Flow Design above)
- CoSort Script Reuse
- ETL Execution (see IRI Workbench > Job Deployment above)
- Faster Extraction
- Data Transformation
- Fast Loading
- Change Data Capture
- Slowly Changing Dimensions
- DB-Specific Optimization (see tabs)
- Video: Voracity ETL Workflow (wizard mode; see Flow Design above)
- Voracity ETL Job Preview (via live or test data)
- Legacy ETL Optimization
- Legacy ETL Tool Migration (via erwin Smart Connector)
Data & Database Migration
- File-Format Conversion
- Database Migration
- Database Subsetting
- Basic Data Replication
- Incremental Data Replication
- Data Federation
- Schema Migration: Relational to DataVault 2.0
- Schema Migration: Relational to Star
- Vision File Conversion
- XML (Complex) Parse/Process
Data Governance
- Data Masking
- Dynamic Data Masking (DDM)
- Static Data Masking (SDM)
- Which IRI Data Masking Product Should I Use?
- Getting Started with FieldShield
- Which Data Masking Function Should I Use?
- Multi-Table RDB Masking (with Referential Integrity)
- Real-Time (Incremental / Refresh) Class-Based Masking
- Rule-based SDM using Column Names Only
- Rule-based SDM using Data Classes (Recommended)
- Video Demo: Multi-table Classify/Mask (2021)
- Video Series: Multi-table Masking Tutorials (2020)
- Data Classification & Discovery (see links above)
- Applying Masking Rules to Classified Data
- Data Class DB Masking Wizard
- Multi-Flat-File Masking (in 1 or more directories)
- Applying Field Rules Using Classification
- Structured & Semi-Structured Data Sources
- Amazon S3 file buckets
- Amazon DynamoDB NoSQL (DarkShield API example)
- Apache Cassandra
- Structured or unstructured collections via DarkShield
- Structured collections (only) via FieldShield
- ASN.1 CDR files (see Example #5)
- Azure BLOB Storage via DarkShield API
- CosmosDB (MS NoSQL, DarkShield API Example)
- Couchbase, Redis & Solr NoSQL DBs
- CSV Files
- Dates & Ages
- Elasticsearch via DarkShield
- Excel Spreadsheets: comparing CellShield, DarkShield & FieldShield approaches
- via CellShield Personal Edition (PE)
- via CellShield Enterprise Edition (EE)
- Video: via DarkShield
- via DarkShield files API
- via FieldShield (see example #6)
- GCP Storage (Google Cloud Platform buckets) via DarkShield API
- HL7 and X12 EDI formats via DarkShield API for files
- HL7 with relations in Voracity(or see X12 via DarkShield, below)
- Kafka Streams (via DarkShield API, but is also supported in FieldShield)
- JSON & XML Files via DarkShield path filters
- Live Feeds
- MongoDB
- MQTT (IoT) Streams
- NIDs & SSNs
- NRIC
- Oracle & other RDBs
- via FieldShield (for structured/1NF columns only, with classification & search)
- via DarkShield (with structured & unstructured - C/BLOB, XML, text - columns)
- Parquet Files
- Pentaho
- PostgreSQL
- SAP HANA (subset and mask)
- Salesforce
- SharePoint
- Snowflake DB
- Splunk
- Web Logs
- Video: X12 via DarkShield (see blog with full X12 parser and HL7 here)
- Re-ID Risk Determination (for HIPAA Expert Determination Method security rule)
- Unstructured (Dark) Data Sources
- Getting Started with DarkShield (GUI)
- Finding & Masking PII in Text/EDI files, MS Office, etc.
- Finding & Masking PII in PDF and Image files via GUI
- DarkShield GUI Search Logs Analytics in Datadog
- DarkShield GUI Search Logs & Analytics in Splunk
- Analyzing DarkShield Search/Mask Results in Splunk
- Universal Forwarder (sending search/mask logs to Splunk)
- Adaptive Response to log events in Splunk ES
- Invoking DarkShield from a Splunk Phantom Playbook
- DarkShield (Workbench) CLI SDK
- DarkShield RPC API SDK - Plankton Web Services Framework
- DarkShield Base API (for all sources, silos, and feeds)
- DarkShield Files API (CSV/XML/JSON, text, PDFs, images, Parquet, MS Word & Excel)
- Video: DarkShield RPC API
- Credit Card PANs in Images
- DICOM (Medical Imaging) Files
- Image Preprocessing Techniques
- GitHub Demo Repository
- Load Balancing & Authentication (via NGINX reverse proxy)
- Masking Files in Amazon, Google & Microsoft Cloud Storage
- Named Entity Recognition (NER) via Tensorflow and PyTorch
- Restoring Masked Values (decryption and reverse pseudonymization)
- Getting Started with DarkShield (GUI)
- Data Quality
- Data Quality Rule Wizards
- Filter & De-Duplication
- Fuzzy Searching
- Data Validation
- Data Unification (Homogenization, Reconciliation)
- Finding Business Rule Violations
- Finding Format Errors
- Master Data Management (MDM)
- MD Consolidation
- MDM Registry (pending)
- Alternative (via GIT)
- Metadata Management
- Teamwork (via GIT)
- Version Control
- Lineage (via Git)
- Lineage & Impact Analysis (via Erwin/ADS)
- Data & Metadata Lineage (built-in, pending)
- Asset Security (via Git)
- Teamwork (via GIT)
- Role Based Access Controls (RBAC)
- AD-compatible IAM (pending)
- AD-compatible IAM (pending)
- Test Data Management (TDM)
- Considerations for developing test sets:
- Windocks Virtualized DB (Masked/Syntheized Clones)
- Value Labs Test Data Hub (On-Demand Dataset Portal)
- Test (Synthetic) Data Generation, for:
- Continuous Integration / Continuous Delivery-Deployment (CI/CD)
- in Amazon CodePipeline
- in GitLab
- in Azure DevOps
- in Jenkins
- Data Values (via IRI Workbench Rule Wizards)
- DataVault 2.0 Models
- ETL Ops (Voracity mapping preview example)
- RDBs via data masking (see SDM above)
- RDBs via auto-parse/populate/generate (synthesis)
- Video: IRI RowGen Test File & RDB Synthesis Operations
- RDBs via masked table subsets (DB subsetting with optional masking)
- Flat files via the job wizard (e.g., CSV; see more formats below)
- Specific formats or data classes
- ASN.1 CDR files (see Example #4)
- COBOL files
- Datadog (XML or JSON)
- Excel spreadsheets (see Examples #2 & 7)
- Customer (transaction) data
- EDI Files (HL7 & X12)
- Personally Identifiable Information (PII) (fake PII for DevOps)
- Credit Card Numbers
- Individual CSV, JSON, XML Files & ODBC (SQL) Tables
- MarkLogic DB
- NIDs (US & Korea SSNs, Italy CF, Netherlands BSN)
- PDF and Image Files
- UUIDs/GUIDs
- Cassandra (CSV)
- MongoDB
- Teradata
- Set Files: A Primer
- Weighted distributions
- Java & Hadoop (API)
- Continuous Integration / Continuous Delivery-Deployment (CI/CD)
Analytics & BI
- Production Analytic Platform (4-part series)
- Embedded BI
- see report examples in CoSort manual (SortCL program)
- IoT: Aggregation on the Edge
- Change Data Capture
- Slowly Changing Dimensions
- Predictive Analytics
- Clickstream Analytics
- BIRT Integration
- Datadog Integration
- Splunk Integrations
- Data Wrangling for Other BI & Analytic Tools