Connector AWS S3

The AWS S3 connector allows you to connect an AWS S3 bucket with Meiro Integrations and use it as a data source for the data workflow. 

Requirements

You need an AWS account with an S3 bucket in it to set up the configuration for an AWS S3 connector. You can create your account here. If you have not used AWS before, we recommend you to check out these articles:

Features

Incremental mode

The extractor keeps track of the downloaded files in a state file and runs only the unprocessed files. 

Wildcard

Using * (wildcard) at the end of an expression in a key field allows you to search for the necessary files in the bucket. For example, use MyFolder/* to connect to all the files in the directory MyFolder

Data In/Data Out

Data In 

N/A

Data Out

Archived files (GZip), located in the /data/out/files folder.

Learn more: about the folder structure please go to this article.

Parameters

Parameters.png

Access Key ID (required)

The AWS Access Key ID, looks like AKIA****  and you need to create it in the Credential section of your AWS S3 account:

My_AWS -> My Security Credentials -> Access keys (access key ID and secret access key) -> Create New Access Key -> Download Key File 

More details: how to create your AWS S3 Access Key can be found here.

Secret Access Key (required)

The AWS Secret Access Key is provided by the AWS when you create a new AWS Access Key:

My_AWS -> My Security Credentials -> Access keys (access key ID and secret access key) -> Create New Access Key -> Download Key File. 

More details: how to create your AWS S3 Secret Access Key can be found here.

Bucket (required)

​Provide an AWS S3 bucket name which is a globally unique identifier and the region will be autodetected. 

Key (required)

Search the key prefix for the files in the AWS S3 bucket, it can optionally be used with a *  wildcard at the end. For example, if you want to connect only the files from a particular folder “myfolder” in the bucket, you should input myfolder/*. 

Save As (optional)

Provide the name of the folder inside the /data/out/files directory where you’d like to store your downloaded files. If not indicated, files will be saved in /data/out/files.

Include Sub-Folders (true/false)

​Download data from the bucket with all subfolders. Available only with the wildcard *.

New Files Only (true/false)

Turns on an incremental mode of loading the data. After the first configuration run  in the incremental mode, the file state.json in the Data Out bucket will have these properties:

  • lastDownloadedFileTimestamp for the timestamp of the last change in AWS S3 connector (seconds since Jan 01 1970 UTC)
  • processedFilesInLastTimestampSecond for names of the processed files in the last timestamp.

Each time you run the configuration, Meiro Integrations will check the values of these properties and download only the unprocessed files to the output bucket.

Limit (required)

The maximum number of files to download. If the key matches more files, the oldest files will be downloaded.