ePrivacy and GPDR Cookie Consent by Cookie Consent Skip to main content

Format events

Format events into format suitable for customer_events table

The purpose of this component is to prepare the events' format for loading the events into the customer_events table. For using the component use this link to repo: https://github.com/meiroio-components/cdp-format-events.gitgit.

Remember: Events should be supplied in new-line-delimited-JSON format (suffix .ndjson).

Configuration

For exampleexample, given a file in in data/in/files/events_mc_subscribed.ndjson  with the following 2 rowsrows:

{"email":"robin@meiro.io", "meta": {"date": "2018-08-18T14:15:16Z"}, "status": "subscribed", "list_id": "12345b", "list_name": "Loyal customers"}
{"email":"foo@bar.io", "meta": {"date": "2018-08-18T15:16:17Z"}, "status": "subscribed", "list_id": "12345b", "list_name": "Loyal customers"}

theThe goal is to produce produce data/out/tables/events_mc_subscribed.csv, which can be uploaded to customer_events table table:

id customer_entity_id event_id source_id event_time type version payload created_at
md5(...)   md5(...) mailchimp 2018-08-18T14:15:16Z subscribed 0-1-0 {"email":"robin@"... - 1:1 copy of original} current_utc_iso8601 stamp
md5(...)   md5(...) mailchimp 2018-08-18T15:16:17Z subscribed 0-1-0 {"email":"foo@bar.io"... - 1:1 copy of original} current_utc_iso8601 stamp

In a nutshell, this component extracts some values from the ndjson events in order toto:

  • construct the id of the event
  • construct the event_id (a reference to the events table)
  • extract event time
  • set the required columns

the The config.json  describes where to find these values in the event jsonsjsons.

The  id  calculation

theThe event event id is calculated as an md5 of event_time, source, event_type  by defaultdefault.
Optionally you can specify extra values to be included in the hash by extra_id_rules parameter (below). The values are a dot-separated json paths (meta.value) would resolve to 42 in this json {"foo": "bar", "meta": {"value": 42}}

!!! Important !!!Important: theThe order of the extra_id_rules DOES  matter, as we are dealing with hashes!hashes.

The event_id calculation

The formula, in sql syntax, is md5("source_id" || "type" || "version").

Vanilla config

{
       "events": [
        {
          "filename": "mc_subscribed_events.ndjson",
          "optional": true,
          "version": "0-1-0",
          "event_type": "subscribed",
          "source": "mailchimp",
          "event_time_rule": "meta.date",
          "extra_id_rules": ["email", "list_id"],
          "event_time_exclude": true
         }
      ]
  } 
  • if  hardcoded  an if if

    filename -

    which
    Which input file contains the eventsevents.
    optional -

    If true, doesn't raise error if the file is not found (which can happen if there are no events for this particular batch).

  • source -

    Hardcoded source_id (as defined in the sources  table).

  • event_time_rule - 

    "path.to.event_time.in.payload" (jq style) used to populate the event_time column

  • extra_id_rules -

    An array of values which are included in the event event id calculation. This is must include values that uniquely identify the event (i.e. a customer_id + the event_id in the source system etc.). The values of this array are "paths" (=rules) of where to find the actual values in the event json.

  • event_type -

    If set to to '' or null or left undefined, it is infered from the filename. Used as the value of the customer_events.type column

  • column.

  • event_time_exclude -

    If set to True,  event_time won't be included to event id calculation and id will be calculated as  md5 of source, event_type, extra_ids (if available). If not set or set to False - event_time will be included in calculation as usual.

    • version -
    • hardcoded

    Hardcoded value of the the version column column.

     

    Remember: By default, all files defined the the the events array need to be supplied, otherwise an error will be thrown, if you want to continue on missing files, set optional to true (false by default).