ePrivacy and GPDR Cookie Consent by Cookie Consent

Processor Command Line Interface Code

Processor Command Line Interface Code intends to transform files in the configuration using Bash script. Bash is a command-line interface for interacting with the operating system. Bash shell script allows you to run an entire script of commands, which might contain a single simple command, list of commands, or even functions, loops, and other control flow tools.

Requirements

Fundamental knowledge of programming concepts and some experience with Bash scripting language.

Features

Distribution: Debian Jessie

Available Utilities:

  • Complete BASH with standard Unix utilities.
  • jq 
  • Additional utilities can be installed on request.

Limitations:

  • 2 GB RAM 
  • 1 vCPU
  • 3 hours of execution

Data In/Data Out

Data In

Files for processing and transformation can be located in /data/in/files/ or in /data/in/tables/ depending on the previous component in the dataflow. 

Data Out

Files should be moved to /data/out/tables or /data/out/files depending on the need of the next component.

Learn more: about the folder structure please go to this article.

Code Editor, Script

Processor-CLIC.png

This field is intended for the Bash script that you write for processing the file.

Learn more: how to search & replace within a code editor.

Script location and paths

 

The script file script.sh is located in the root /data folder. For accessing the data files, you can use an absolute or relative path.

Data In

  • an absolute path /data/in/tables/..  and /data/in/files/..
  • a relative path in/tables/..  and in/files/.. 

Data Out

  • an absolute path /data/out/tables/..  and /data/out/files/..
  • a relative path out/tables/..  and out/files/.. 

Standard output

The analog of the console log in Meiro  Integrations is the activity log. If you run echo 'Hello world!' script in the configuration, the system will write the result in the activity log in the following way.

 

Hello-word-script.png

 

Example 1: moving and renaming files

By default, CSV  files are saved by connectors and processors in the /data/out/tables. The next configuration in dataflow will locate it in /data/in/tables. In the same time, AWS S3 loader requires all files to be located in the /data/out/files. It is possible to move the file from tables to files folder.

  • move all files from /data/in/tables to /data/out/files 

mv in/tables/* out/files/

mv in/tables/*.csv out/files/

  • Move and rename the file. Move .csv files from /data/in/tables to /data/out/tables 

mv in/tables/test.csv out/files/newfile.csv

 

Example 2: downloading a file into the required folder

This example demonstrates how to download a file using the URL link.

Data out folder: /data/out/tables/titanic_data.csv 

wget  -O /data/out/tables/titanic_data.csv "http://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv"

 

Example 3: creating headers for a table

Sometimes the data you receive does not have any headers which make it inconvenient for future transformations. This example demonstrates how to create a new file with the headers for the table and join it with the table you have. For the purposes of this demonstration, we create the example file ourselves using the script.

 

  • Create an example file: table with 5 rows and 2 columns

for num in 1 2 3 4 5

do

echo $num, $((num*2)) >> out/files/numbers.csv

done

for num in 1 2 3 4 5
do
echo $num, $((num*2)) >> out/files/numbers.csv
done
  • Create a file with headers for the columns.

echo '"number", multiply_by_two' > out/files/headers.csv

  • Concatenate rows from numbers.csv file with headers.csv file

cat  out/files/numbers.csv >> out/files/headers.csv 

Reproducing and debugging

If you want to reproduce running the code on your computer for testing and debugging, or you want to write the script in a local IDE and copy-paste it in Meiro  Integrations configuration, the easiest way to do this will be to reproduce the structure of folders as follows:

/data
    script.sh
    /in
          /tables
          /files
    /out
          /tables
          /files

The script file should be located in the /data folder, input files and tables in the folder  in/tables in the corresponding subfolders, output files and tables in out/files and out/tables respectively.

For reproducing the scripts from example 1, you need to save any CSV file named test.csv to the folder /data/in/tables, paste code from the example to the script file and run it. Files will be moved to the folder /data/out/tables/`, and/or renamed.

Tutorials:

How to create a header in your file using Command Line Interface Code processor,

How to move files from one folder in configuration to another using Command Line Interface Code processor.