ePrivacy and GPDR Cookie Consent by Cookie Consent Skip to main content

Compound attributes

This article describes how to implement attribute with the compound data type.

Imagine use-case with flights in CDP. Flight can have multiple attributes like Originorigin city, Destinationdestination city, Datedate of flight and Price.price.

With regular data types like string, boolean, date, datetime, int or float, we are able to create separate attributes like:

List of destination cities - London, Prague, …

Last destination city - London

Total revenue from all flights in USD - 1024

Dates of all flights - 2018-12-01, 2019-01-01, …

However, using separate attributes one cannot build a segment containing “all customers who flew to London in December 2018”, because we do not know the which date of flight belongs to which destination. That is why we need attributes of the compound data type.

The goal is to create 1 attribute with the origin, destination, date and price as sub-attributes (dimensions) of the attribute.

This is done by creating a new, independent, isolated attribute with the compound data type (sub-attributes can have any of standard scalar data types) in the attributes table. Then, calculating values for the attribute.

In attributes table create a new attribute (row). Fill in all standard values like ID, name, description, all flags, etc. Then, for the data type, fill compound like this:

compound([["origin","Origin","string"],["destination","Destination","string"],["date","Date","date"],["price","Price","float"]])

To see, what is going on, let’s break it down with some whitespace:

compound(
	[
		["origin","Origin","string"],
		["destination","Destination","string"],
		["date","Date","date"],
		["price","Price","float"]
	]
)

As you can see, the compound() is the name of the data type. Inside parenthesesparentheses, there is a JSON array or arrays. The outer array is the array of sub-attributes. Each nested array is one sub-attribute.

The meaning of elements in the nested array is based on order. FirstThe first value is ID of sub-attribute (needs to be unique on sub-attribute level - no 2 sub-attributes in one attribute can have the same ID). SecondThe second value is athe name of the sub-attribute (shown in UI). ThirdThe third value is the data type of the sub-attribute, which can be any attribute scalar type: string, date, datetime, boolean, int or float. Sub-attribute can not be of type compound (no nested compound attributes). We are done with the attribute definition.

The value for such attribute will look like this:

["Prague","London","2018-12-01","128"]

The value is, again, a JSON array of 4 elements. First element is Origin,the origin, second Destination,destination, third Datedate and fourth Price.price. As you can see - the values follow the order of the sub-attributes defined in the attributes table’s column data_type. Note that all elements in the array must have double-quotes - be strings. The array must be homogenous.

There are multiple ways on how to calculate this attribute, here is one suggestion on how to do it.

SELECT
	customer_entity_id,
	'attribute_id' as "attribute_id",
	json_build_array(
		"origin"::text, 
		"destination"::text, 
		"date"::text, 
		"price"::text
	) as "value"
FROM (
	SELECT
		"customer_entity_id",
		payload->>'origin_city' as "origin",
		payload->>'destination_city' as "destination",
		payload->>'flight_date' as "date",
		payload->>'flight_price' as "price"
	FROM
		"public"."customer_events"
	WHERE
		"source_id"='flights_system'
		 AND "version" IN ('1-0-0')
		 AND "type" IN ('flight')
		 AND "customer_entity_id" IS NOT null
	) AS prep

The whole idea behind this query is that one takes values of interest from the customer_events and combine it into the JSON array using json_build_array function.