Customer Entity Retention Principles
What is Customer Entity Retention?
When ingesting events into CDP, we must be mindful of how long to keep these events in the system. In general, we want to make sure that we retain events of important business values, and set an expiry to events that do not serve business values beyond a period of time.
For example, page views on the web serve no important business values after 6 - 12 months (of course, this depends on the nature of the client's business too).
Customer Entity Retention (CER) is our approach to ensure that we optimize the client's instance performance and value of data capture by keeping only relevant events. CER is defined as the number of days to retain certain types of events. This is determined at the beginning of the project and defined in the Order Form. However, throughout the engagement with the customers there are natural checkpoints to review whether the CER limits are still reasonable & practical:
- If customer's data volume is quickly exceeding their license limits
- If there are new use cases that may require more historical events
Best Practice: Keep ALL transaction events - do not set any Customer Entity Retention rule for transactions.
How to execute CER
There are 2 approaches to Customer Entity Retention which we use: Entity based and Event-based.
Entity based
How it works:
If there is customer entity with all identifying attributes IS NULL and no new event for XY seconds, I will delete it.
How to use it:
- In
cdp_attr.attributes
table there is a flagis_identifying
marking if the attribute is identifying or not. By default all are identifying. Turn it from 1 to 0 to mark that the attribute is not identifying (should not be used as in customer retention calculation). - In public.settings there is a row with id
customer_entity_ttl
with valueinteger
marking the time of inactivity (in seconds) after which it removes the entity
Example: There is flag is_identifying = 0
for all attributes except for email and phone attributes. In public.settings table there is a row with id customer_entity_ttl
= 7776000 (90 days in seconds). There is customer entity A with both email and phone attributes empty (IS NULL
in pivoted_customer_attributes
) and now() - max(event_time)
from all events belonging to this entity is greater than 7776000 -- entity is removed. There is customer entity B with now() - max(event_time)
from all events belonging to this entity greater than 7776000, attribute phone is NULL
, but attribute email has a value -- entity IS NOT REMOVED. In order to remove customer entity both conditions (all identifying attributes IS NULL, no activity for XY seconds) must be met in order for the entity to be removed.
Events based
How it works:
If there are events older than XY seconds, remove them.
How to use it:
There is a table cdp_ce.events
with columns id
and ttl
(time-to-live). Add a new row for each event ID you want to remove with TTL in seconds.
Example: There is a setting for event ID of page_view from ME with ttl = 7776000 (90 days in seconds). Customer entity has customer events assigned to it of type page_view which are older than 90 days, they get removed. No other events have record in this table, they will NOT be removed.
No Comments