IRI Blog Articles

Diving Deeper into Data Management

 

 

Post image for Executing ETL & Batch Flows

Executing ETL & Batch Flows

by Susan Gegner

Batch files contain commands in plain text that can be executed by the command line interpreter of the operating system to accomplish a specific purpose. Windows batch files usually have an extension of .bat or .cmd, while Linux and Unix usually have .sh or no extension at all.

We could have a batch file with only a few commands, or one with many lines, some calling other programs (including other batch files). In IRI Workbench, a batch file will also run the blocks of a (Voracity ETL) flow, like the one shown below. The steps for execution are the same for all batch files, and similar to those used to run a standalone CoSort or any other SortCL-based language job.

Flow into Your Batch File

The final step in creating a Flow is to export it for execution. This process writes the steps necessary to execute each block of the Flow into a batch file.

Exporting for execution

If the batch file for your flow has not yet been created, then:

  1. Right-click in the diagram for the flowlet
  2. Click IRI Diagram Actions
  3. Click Export Flow Component
  4. In the File name field, type the name for the batch file
  5. Select the Platform from the drop-down. This selection determines whether a .bat or .sh file is created.
  6. Click Finish.

The batch file is created and placed in the project along with other files that are created as part of the flow.

Command Line Options

The .bat files run on Windows, and .sh files run on Linux and Unix. Your batch files ultimately run on the command line, within or outside the IRI Workbench, and on a local or remote (see Remote Execution below) computer. They can be moved to any compatible system as long as all the components needed for execution — including job scripts and executables — are there, too.

These batch files can be a complete process or a component of a greater process. Execution can be accomplished by actually logging into the system independently, or from a shell user interface directly in Workbench to a command line prompt. These are available using ShelExec, Wicked Shell, or the Launch Shell feature in the Remote Systems Explorer view.

Batch file executions can also be automated in the IRI Workbench task scheduler, or any external scheduler like cron, Stonebranch UAC, Autosys, or Grand Logic JobServer.

Local Execution Options

In the Workbench, batch files are considered a type of external tool. The menu for external tools is located on the main toolbar. Access it by clicking on the down arrow next to the white arrow in the green circle with the red toolbox.

Option 1: Run As

To use this option, you must first click on a batch file in the Project Explorer or in the editor, then select Run As > Batch Program. You can instead right-click to bring up a context menu where you can also select Run As > Batch Program.

Once the batch file is executed any output will be placed in any locations defined by the job and a definition of the job is recorded in External Tools Configurations.

Option 2: External Tools Configuration

You can also configure, run, and schedule batch files (including the ones that represent flows)  using the Eclipse External Tools Configurations option. This is an opportunity to save specific runtime parameters for your batch files, and select previously defined batch files to run.

Note the External Tools Configuration option is different from the Run Configurations option available for parameterized individual job executions.

When you execute a batch file, the information to run the job is recorded in External Tools Configurations. In that window, the panel on the left has a tree. Under the tree’s Program item, are the names used for running batch files and possibly for other utilities. The information necessary to execute the batch file is recorded in the right panel. Usually the only information needed is on the Main tab. Click on a name in the tree, then click Run to execute it.

The items in the Program list are also listed in the top portion of the menu that displays when you select the run menu for External Tools from the main menu bar. Each is preceded with the icon that is an arrow with a green circle and a red toolbox. Additionally, as a shortcut, you can click on a batch file in either the Project Explorer or in the editor. Then on the menu bar, click on the circled arrow with the toolbox to execute the batch job.

See this article on using the built-in task scheduler to automate the execution of these jobs.

Example

In the window shown below, the batch file called MonthEnd.bat is highlighted in the tree. In the panel on the right, the Name shows the name of the batch file by default, but you can change it to something more descriptive about the configuration.

In the Main tab, the Location field has the value ${workspace_loc:/MonthEnd/MonthEnd.bat}  to provide the absolute path and name for the batch file.The Working Directory ${workspace_loc:/MonthEnd} has the absolute path to the batch file. In this case, it is to the path to the project directory MonthEnd

Create, manage, and run configurations

Remote Execution

You can also execute batch files on a remote system from within IRI Workbench using Ssh Terminals in the Remote Systems view. See this demonstration on setting up a remote system connection, and remote project that appears in the local Project Explorer view.

Create the batch script in a local project, and then copy it into a remote project folder. In this case, we built SortChiefs.sh and copied it into the remote project folder called Force5Project1, which is a directory on the remote system named force5. To run this batch script on force5 from the command line, we will use the Ssh Terminals option that was created for force5 when we established that connection.

Right-click on Ssh Terminals for force5, then click Launch Terminal.

A tab for force5 will appear in the Terminals view where you will be logged using the access details provided during the remote project setup. If you want, you can launch multiple Ssh Terminals to the same remote system, each with its own tab. They will be labelled force5, force5 1, force5 2, and so on for the system force5.

Now that we are logged into force5, we change to the directory where SortChiefs.sh resides. After running that batch script from the command line of an Ssh terminal for force5, any output written into that project directory will also appear in the corresponding project folder in the IRI Workbench Project Explorer view.

In the window shown below, the batch file SortChiefs.sh is displayed in the editor. Listed under the Remote Systems view are Ssh Terminals force5, force5 1, and force5 2.

In the Terminals view are the tabs where we are logged into each terminal. The force5 2 tab shows the two commands we entered to run our batch job:

cd RemoteProjects/Force5Project1
sh SortChiefs.sh

Terminals view

The two files chiefs.out and SortChiefs.log were created as a result of running the batch file, and are now in the Project Explorer in the remote project Force5Project1.

If you need help creating or running batch flows in the IRI Workbench, contact voracity@iri.com.

Print Friendly

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: