Format events
Learn from this article:
|
Format events into format suitable for customer_events table: |
Format events into format suitable for customer_events table
The purpose of this component is to prepare the events' format for loading the events into the customer_events table. For using the component use this link to repo: https://github.com/meiroio-components/cdp-format-events.git.
Remember: Events should be supplied in new-line-delimited-JSON format (suffix .ndjson).
Configuration
For example, given a file in data/in/files/events_mc_subscribed.ndjson with the following 2 rows:
{"email":"robin@meiro.io", "meta": {"date": "2018-08-18T14:15:16Z"}, "status": "subscribed", "list_id": "12345b", "list_name": "Loyal customers"} {"email":"foo@bar.io", "meta": {"date": "2018-08-18T15:16:17Z"}, "status": "subscribed", "list_id": "12345b", "list_name": "Loyal customers"}
The goal is to produce data/out/tables/events_mc_subscribed.csv, which can be uploaded to customer_events table:
| id | customer_entity_id | event_id | source_id | event_time | type | version | payload | created_at |
|---|---|---|---|---|---|---|---|---|
| md5(...) | md5(...) | mailchimp | 2018-08-18T14:15:16Z | subscribed | 0-1-0 | {"email":"robin@"... - 1:1 copy of original} | current_utc_iso8601 stamp | |
| md5(...) | md5(...) | mailchimp | 2018-08-18T15:16:17Z | subscribed | 0-1-0 | {"email":"foo@bar.io"... - 1:1 copy of original} | current_utc_iso8601 stamp |
In a nutshell, this component extracts some values from the ndjson events to:
- construct the id of the event
- construct the event_id (a reference to the
eventstable) - extract event time
- set the required columns
The config.json describes where to find these values in the event jsons.
The id calculation
The event id is calculated as an md5 of event_time, source, event_type by default.
Optionally you can specify extra values to be included in the hash by extra_id_rules parameter (below). The values are a dot-separated json paths (meta.value) would resolve to 42 in this json {"foo": "bar", "meta": {"value": 42}}
Important: The order of the extra_id_rules DOES matter(!), as we are dealing with hashes.
The event_id calculation
The formula, in sql syntax, is md5("source_id" || "type" || "version").
Vanilla config
{ "events": [ { "filename": "mc_subscribed_events.ndjson", "optional": true, "version": "0-1-0", "event_type": "subscribed", "source": "mailchimp", "event_time_rule": "meta.date", "extra_id_rules": ["email", "list_id"], "event_time_exclude": true } ] }
|
|
Which input file contains the events. |
optional |
If |
source |
Hardcoded |
event_time_rule |
|
extra_id_rules |
An array of values which are included in the event |
event_type |
If set to |
event_time_exclude |
If set to True, |
|
|
Hardcoded value of the |
Remember: By default, all files defined the the events array need to be supplied, otherwise an error will be thrown, if you want to continue on missing files, set optional to true (false by default).
No Comments