Connector AWS S3
The AWS S3 connector allows you to connect an AWS S3 bucket with Meiro Integrations and use it as a data source for the data workflow.
Requirements
You need an AWS account with an S3 bucket in it to set up the configuration for an AWS S3 connector. You can create your account here. If you have not used AWS before, we recommend you to check out these articles:
- How to create and activate AWS account
- How to create an S3 Bucket
- How do I create an AWS Access Key?
- Where is My Secret Access Key?
- Best Practices for Managing AWS Access Keys
Features
Incremental mode
The extractor keeps track of the downloaded files in a state file and runs only the unprocessed files.
Wildcard
Using * (wildcard) at the end of an expression in a key field allows you to search for the necessary files in the bucket. For example, use MyFolder/*
to connect to all the files in the directory MyFolder
directory..
Data In/Data Out
Data In
N/A
Data Out
Archived files (GZip), located in the /data/out/files
folder.
To learn more about the folder structure please go to this article.
Parameters
Access Key ID (required)
The AWS Access Key ID, looks like AKIA**** and you need to create it in the Credential section of your AWS S3 account:
My_AWS -> My Security Credentials -> Access keys (access key ID and secret access key) -> Create New Access Key -> Download Key File
More details on how to create your AWS S3 Access Key can be found here.
Secret Access Key (required)
The AWS Secret Access Key is provided by the AWS when you create a new AWS Access Key:
My_AWS -> My Security Credentials -> Access keys (access key ID and secret access key) -> Create New Access Key -> Download Key File.
More details on how to create your AWS S3 Secret Access Key can be found here.
Bucket (required)
Provide an AWS S3 bucket name which is a globally unique identifier and the region will be autodetected.
Key (required)
Search the key prefix for the files in the AWS S3 bucket, it can optionally be used with a * wildcard at the end. For example, if you want to connect only the files from a particular folder “myfolder” in the bucket, you should input myfolder/*
.
Save As (optional)
Provide the name of the folder inside the /data/out/files
directory where you’d like to store your downloaded files. If not indicated, files will be saved in /data/out/files
.
Include Sub-Folders (true/false)
Download data from the bucket with all subfolders. Available only with the wildcard *
.
New Files Only (true/false)
Turns on an incremental mode of loading the data. After the first configuration run in the incremental mode, the file state.json
in the Data Out bucket will have these properties:
lastDownloadedFileTimestamp
for the timestamp of the last change in AWS S3 connector (seconds since Jan 01 1970 UTC)processedFilesInLastTimestampSecond
for names of the processed files in the last timestamp.
Each time you run the configuration, Meiro Integrations will check the values of these properties and download only the unprocessed files to the output bucket.
Limit (required)
The maximum number of files to download. If the key matches more files, the oldest files will be downloaded.