Transform events (CR)
Transform events into suitable format for customer_events table
The purpose of this component is to prepare the events' format for loading the events into the customer_events
table.
CSV structure
Before loading data into the customer_events
table, we would need to transform the data into a specific structure so that it is ready for loading.
Here are the columns required in the csv, their data type and the values expected
Column Name | Type | Values |
source_id | string | The source id of a source defined under Sources in Meiro Business Explorer It should be in snake casing and human readable. |
type | string |
The type of event defined under Events in Meiro Business Explorer It should be in snake casing and human readable. |
version | string |
The version of event defined under Events in Meiro Business Explorer It should follow this format X-X-X. |
user_identifiers | string |
A concatenated string of identifiers of a customer. Example: john.doe@meiro.io, john.doe@meiro.io420987654321 |
event_time | string - in datetime format | A datetime string in this format 'YYYY-MM-DD HH:MM:SS' This event time must be in UTC timezone |
payload | json | This is where the bulk of the data goes. So long as the payload is a valid json format, anything goes. |
Remember: Events should be supplied in csv format, with these 6 columns.
source_id, type, version, user_identifiers, event_time and payload
Sample code in Python
For example, given a file in data/in/files/events_mc_subscribed.ndjson
with the following 2 rows:
{"email":"robin@meiro.io", "meta": {"date": "2018-08-18T14:15:16Z"}, "status": "subscribed", "list_id": "12345b", "list_name": "Loyal customers"}
{"email":"foo@bar.io", "meta": {"date": "2018-08-18T15:16:17Z"}, "status": "subscribed", "list_id": "12345b", "list_name": "Loyal customers"}
The goal is to produce data/out/tables/events_mc_subscribed.csv
, which can be uploaded to customer_events
table:
source_id | type | version | user_identifiers | event_time | payload |
---|---|---|---|---|---|
mailchimp | subscribed | 1-0-0 | robin@meiro.io | 2018-08-18T14:15:16Z | {"email":"robin@"... - 1:1 copy of original} |
mailchimp | subscribed | 1-0-0 | foo@bar.io | 2018-08-18T15:16:17Z | {"email":"foo@bar.io"... - 1:1 copy of original} |
Here is a sample code in python to do this transformation.
import json, csv
from os.path import exists
if exists('/data/in/files/events_mc_subscribed.ndjson'):
with open('/data/in/files/events_mc_subscribed.ndjson','r') as inf, open('/data/out/tables/events_mc_subscribed.csv', 'w') as outf:
outcsv = csv.DictWriter(outf, ['source_id', 'type', 'version', 'user_identifiers', 'event_time', 'payload'])
outcsv.writeheader()
for l in inf:
jl = json.loads(l)
if not jl['user_id']:
continue
outcsv.writerow({
'source_id': 'mailchimp',
'type': 'subscribed',
'version': '1-0-0',
'user_identifiers': jl['email'],
'event_time': jl['meta']['date'],
'payload': json.dumps(jl)
})
Sample code in SQL
It is also possible to skip the transformation step in Python and transform the data during extraction with a SQL script. Below is an example.
Assume we have mc_subscription_status
table with these data:
status | list_id | list_name | status_date | |
robin@meiro.io | subscribed | 12345b | Loyal customers | 2018-08-18T14:15:16Z |
foo@bar.io | subscribed | 12345b | Loyal customers | 2018-08-18T15:16:17Z |
Here is a sample code in python to do this transformation.
WITH base as (
SELECT email, status, list_id, list_name, status_date
FROM mc_subscription_status
)
SELECT
'mailchimp' as source_id,
'subscribed' as type,
'1-0-0' as version,
status_date as event_time,
email as user_identifiers,
to_json(base)
FROM base
WHERE status = 'subscribed'
No Comments