ePrivacy and GPDR Cookie Consent by Cookie Consent Skip to main content

Transform events (CR)

Transform events into suitable format for customer_events table

The purpose of this component is to prepare the events' format for loading the events into the customer_events table. 

CSV structure

Before loading data into the customer_events table, we would need to transform the data into a specific structure so that it is ready for loading.

Here are the columns required in the csv, their data type and the values expected

Column Name Type Values
source_id string The source id of a source defined under Sources in Meiro Business Explorer
It should be in snake casing and human readable.
type string

The type of event defined under Events in Meiro Business Explorer

It should be in snake casing and human readable.

version string

The version of event defined under Events in Meiro Business Explorer

It should follow this format X-X-X. 
Example: 1-0-0

user_identifiers string

A concatenated string of identifiers of a customer.
It can be a single identifier (like an email or phone number) or a combination of identifiers.

Example: john.doe@meiro.io, john.doe@meiro.io420987654321

event_time string - in datetime format A datetime string in this format 'YYYY-MM-DD HH:MM:SS'
This event time must be in UTC timezone
payload json This is where the bulk of the data goes. So long as the payload is a valid json format, anything goes.

Remember: Events should be supplied in csv format, with these 6 columns.
source_id, type, version, user_identifiers, event_time and payload

Sample code in Python

For example, given a file in data/in/files/events_mc_subscribed.ndjson with the following 2 rows:

{"email":"robin@meiro.io", "meta": {"date": "2018-08-18T14:15:16Z"}, "status": "subscribed", "list_id": "12345b", "list_name": "Loyal customers"}
{"email":"foo@bar.io", "meta": {"date": "2018-08-18T15:16:17Z"}, "status": "subscribed", "list_id": "12345b", "list_name": "Loyal customers"}

The goal is to produce data/out/tables/events_mc_subscribed.csv, which can be uploaded to customer_events table:

source_id type version user_identifiers event_time payload
mailchimp subscribed 1-0-0 robin@meiro.io 2018-08-18T14:15:16Z {"email":"robin@"... - 1:1 copy of original}
mailchimp subscribed 1-0-0 foo@bar.io 2018-08-18T15:16:17Z {"email":"foo@bar.io"... - 1:1 copy of original}

Here is a sample code in python to do this transformation.

import json, csv
from os.path import exists

if exists('/data/in/files/events_mc_subscribed.ndjson'):
    with open('/data/in/files/events_mc_subscribed.ndjson','r') as inf, open('/data/out/tables/events_mc_subscribed.csv', 'w') as outf:
        outcsv = csv.DictWriter(outf, ['source_id', 'type', 'version', 'user_identifiers', 'event_time', 'payload'])
        outcsv.writeheader()
        for l in inf:
            jl = json.loads(l)
            if not jl['user_id']:
                continue
            outcsv.writerow({
                'source_id': 'mailchimp',
                'type': 'subscribed',
                'version': '1-0-0',
                'user_identifiers': jl['email'],
                'event_time': jl['meta']['date'],
                'payload': json.dumps(jl)
            })

Sample code in SQL

It is also possible to skip the transformation step in Python and transform the data during extraction with a SQL script. Below is an example.

Assume we have mc_subscription_status table with these data:

email status list_id list_name status_date
robin@meiro.io subscribed 12345b Loyal customers 2018-08-18T14:15:16Z
foo@bar.io subscribed 12345b Loyal customers 2018-08-18T15:16:17Z

Here is a sample code in python to do this transformation.

WITH base as (
  SELECT email, status, list_id, list_name, status_date
  FROM mc_subscription_status
  )
SELECT 
	'mailchimp' as source_id,
    'subscribed' as type,
    '1-0-0' as version,
    status_date as event_time,
	email as user_identifiers,
    to_json(base)
FROM base
WHERE status = 'subscribed'