Processor Command Line Interface Code
Processor Command Line Interface Code intends to transform files in the configuration using Bash script. Bash is a command-line interface for interacting with the operating system. Bash shell script allows you to run an entire script of commands, which might contain a single simple command, list of commands, or even functions, loops, and other control flow tools.
Requirements
Fundamental knowledge of programming concepts and some experience with Bash scripting language.
Useful Links: GNU Coreutils manual, Shell Scripting Tutorial
Features
Distribution: Debian Jessie
Available Utilities:
- Complete BASH with standard Unix utilities.
- jq
- Additional utilities can be installed on request.
Limitations:
- 2 GB RAM
- 1 vCPU
- 3 hours of execution
Data In/Data Out
Data In
Files for processing and transformation can be located in /data/in/files/
or in /data/in/tables/
folder depending on the previous component in the dataflow.
Data Out
Files should be moved to /data/out/tables
or /data/out/files
depending on the need of the next component.
To learn more about the folder structure please go to this article.
Code Editor
Script
This field is intended for the Bash script that you write for processing the file.
Script location and paths
The script file script.sh
is located in the root /data
folder. For accessing the data files, you can use an absolute or relative path.
Data In
- an absolute path
/data/in/tables/..
and/data/in/files/..
- a relative path
in/tables/..
andin/files/..
Data Out
- an absolute path
/data/out/tables/..
and/data/out/files/..
- a relative path
out/tables/..
andout/files/..
Standard output
The analogue of console log in Meiro Integrations is the activity log. If you run echo 'Hello world!'
script in the configuration, the system will write the result in the activity log in the following way.
Examples
In this section, we demonstrate how to solve frequently encountered problems with files and tables:
- Moving and renaming files.
- Creating headers for a table.
- Downloading a file into the required folder.
Example 1
By default, CSV files are saved by connectors and processors in the /data/out/tables
. The next configuration in dataflow will locate it in /data/in/tables
. In the same time, AWS S3 loader requires all files to be located in the /data/out/files
. It is possible to move the file from tables to files folder.
- move all files from /data/in/tables to /data/out/files
mv in/tables/* out/files/
- It is possible to move only required files using wildcard or regular expressions. Move CSV files from /data/in/tables to /data/out/files
mv in/tables/*.csv out/files/
- Move and rename the file. Move .csv files from /data/in/tables to /data/out/tables
mv in/tables/test.csv out/files/newfile.csv
Example 2
This example demonstrates how to download a file using the URL link.
Data out folder: /data/out/tables/titanic_data.csv
URL: http://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv
wget -O /data/out/tables/titanic_data.csv "http://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv"
Example 3
Sometimes the data you receive does not have any headers which make it inconvenient for future transformations. This example demonstrates how to create a new file with the headers for the table and join it with the table you have. For the purposes of this demonstration, we create the example file ourselves using the script.
- Create an example file: table with 5 rows and 2 columns
for num in 1 2 3 4 5
do
echo $num, $((num*2)) >> out/files/numbers.csv
done
for num in 1 2 3 4 5
do
echo $num, $((num*2)) >> out/files/numbers.csv
done
- Create a file with headers for the columns.
echo '"number", multiply_by_two' > out/files/headers.csv
- Concatenate rows from numbers.csv file with headers.csv file
cat out/files/numbers.csv >> out/files/headers.csv
Reproducing and debugging
If you want to reproduce running the code on your computer for testing and debugging, or you want to write the script in a local IDE and copy-paste it in Meiro Integrations configuration, the easiest way to do this will be to reproduce the structure of folders as follows:
/data
script.sh
/in
/tables
/files
/out
/tables
/files
The script file should be located in the /data
folder, input files and tables in the folder in/tables
in the corresponding subfolders, output files and tables in out/file
s and out/tables
respectively.
For reproducing the scripts from example 1, you need to save any CSV file named test.csv
to the folder /data/in/tables
, paste code from the example to the script file and run it. Files will be moved to the folder /data/out/tables/
`, and/or renamed.
Recommended articles: How to create a header in your file using Command Line Interface Code processor, How to move files from one folder in configuration to another using Command Line Interface Code processor