IRI Blog Articles

Diving Deeper into Data Management

 

 

ETL Flow thumb

Scheduling IRI Jobs in Stonebranch Universal Controller

by Vic Frisk

Beyond the client-side task launch scheduler provided in the  IRI Workbench for Voracity™ ETL and sub-product users, server-side IRI job scheduling is also possible in simple tools like cron, and more advanced workflow automation suites like StoneBranch Universal Automation Center™ (UAC), CA Autosys, Cisco TES, IBM Tivoli, and ASC ActiveBatch. These third-party automation tools provide enhanced features and more ergonomic user interfaces.

UAC, in particular, provides a modern web 2.0 interface for users to define, modify, control, and monitor the execution of IRI CoSort SortCL programs, and IRI Voracity ETL batches.

UAC is a more sophisticated alternative to cron because of its relative granularity, transparency and ease of use, and its centralized web console that can be deployed to networked instances through a client service. Some of the limitations of cron are:

  • Scheduling does not understand holidays, so avoiding running workloads on holidays or specific days (e.g., an inventory day) requires significant manual intervention.
  • Jobs are restricted to the server cron runs on, and the timezone of that server.
  • Workflow logic must be built into the script being executed.
  • Automation triggers are based only on time instead of file or DB activity.
  • There is no central point of control – users must manually log in and review workload outputs to assess success or failure, and for each new server, create accesses, crontab entries, etc.
  • Notifications are limited to email unless additional utilities are used.
  • There is limited support for logging and audit trails.

In contrast, UAC provides:

  • Task and workflow definitions across all servers, with web-based job status monitoring.
  • SMS or email notification when workflow fails, along with diagnostic output for debugging.
  • Cross-platform dependency monitoring across servers through the graphical workflow editor.
  • Monitoring for triggers by system activity, such as file changes or database updates.
  • Full audit trail and reporting with role-based security for compliance and compliance tracking.
  • Advanced calendar and time zone support.
  • Cron compatibility mode, using similar parameters, helps in migrating from cron.

The easiest way to take advantage of a workflow automation tool, such as UAC, to schedule the execution of a task based on IRI tools, is to start with the IRI Workbench. There are two basic types of jobs that can be developed in Workbench:

  • Individual job scripts, which are executed by running them with an IRI command line product.
  • Flows, which are executed as batch files or shell scripts, are a series of commands to be executed as a group.

Individual job scripts are text files containing instructions for the IRI product they were designed for. We recommend using the standard product-specific file extensions. The standard file extensions and an example of a job script invocation are shown in the following table:

IRI Product Metadata File Extension Invocation Command(s)
Voracity (data management) .flow, .bat all below, Java filename.jar, sqlplus filename.sql, etc.
CoSort (transformation, reporting) .scl sortcl / specification=myJobScript.scl
FACT (VLDB extraction) .ini fact myJobScript.ini
NextForm (DB/data migration) .ncl nextform /specification=myJobScript.ncl
FieldShield (data masking) .fcl fieldshield /specification=myJobScript.fcl
RowGen (test data generation) .rcl rowgen /specification=myJobScript.rcl

IRI Voracity workflows are often a series of steps that perform a more complex job with more than one command. The contents of the sample batch file for this article consist of a couple of FACT extract jobs, SQL scripts, CoSort jobs, and database loads.

Not only is the batch file or shell script needed for the job to execute, but also all of the scripts and configuration files which were generated by IRI Workbench wizards or the Flow palette. The Flow and batch file are shown here in IRI Workbench:

IRI ETL Flow

 

If you are creating and editing individual job scripts, or multi-file Flows using IRI Workbench on a different computer than where they will execute, there are several ways to move the needed files to the execution system. They can be edited in place on the execution system, or they can be transferred after editing and testing when they are ready for production.

Some of the options for accessing the execution system include:

  • SMB or CIFS file shares, which are native to Windows and most Linux systems.
  • Secure file transfer using the Remote File Explorer in IRI Workbench.
  • Separate FTP or web-based file transfer utilities.

When working with Flows created in IRI Workbench, there are two approaches you can take with regard to workflow automation. One approach is to design the workflow automation in the scheduling tool itself. This approach may be more powerful, but it also locks you into a particular tool. In UAC, there is a graphical drag-n-drop-based editor for designing and editing a workflow. The individual tool executions can be combined with conditional checks and triggers:

Workflow GUI

 

The other approach is to keep the flow logic in the batch file, as designed in the Flow diagram. With this approach, all the control logic is coded into the batch file and the command line tools and utilities. An advantage to this method is that you can execute this type of batch with any execution method. They can be executed manually, scheduled with cron, or scheduled with any other automation tool, including the scheduler built into IRI Workbench.

Use the task editor screens to schedule an individual job or a batch file using UAC. Many options, like environment variables, can also be configured on a per-task basis:

Task editor screen to schedule an individual job or a batch file

 

When using UAC, the dashboard allows real-time monitoring of all task instances and job statuses:

Dashboard

 

Detailed graphical historical reports can be created for tasks:

Color Reporting

 

When IRI jobs are run, for example, there is immediate access to any output that was sent to the console. This includes data written to standard output, and error messages written to standard error.

The UAC output tab content shown below is the stdout report (federated) target and stderr on-screen events from the execution of an IRI CoSort SortCL program:

Stdout Report 1

 

Stdout Report 2

 

Having multiple methods and tools for scheduling and executing workflows allows you to use the right tool, simple or complex, for any data processing requirement.

Print Friendly

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: