ePrivacy and GPDR Cookie Consent by Cookie Consent Skip to main content

Customer Entity Retention Principles

What is Customer Entity Retention? 

When ingesting events into CDP, we must be mindful of how long to keep these events in the system. In general, we want to make sure that we retain events of important business values, and set an expiry to events that do not serve business values beyond a period of time. 

For example, page views on the web serve no important business values after 6 - 12 months (of course, this depends on the nature of the client's business too). 

Customer Entity Retention (CER) is our approach to ensure that we optimize the client's instance performance and value of data capture by keeping only relevant events. CER is defined as the number of days to retain certain types of events. This is determined at the beginning of the project and defined in the Order Form. However, throughout the engagement with the customers there are natural checkpoints to review whether the CER limits are still reasonable & practical: 

  • If customer's data volume is quickly exceeding their license limits 
  • If there are new use cases that may require more historical events

Best Practice: Keep ALL transaction events - do not set any Customer Entity Retention rule for transactions.  

How to execute CER

There are 2 approaches to Customer Entity Retention which we use: Entity based and Event-based. 

Entity based

How it works:

If there is customer entity with all identifying attributes IS NULL and no new event for XY seconds, I will delete it.

How to use it:
  1. In cdp_attr.attributes table there is a flag is_identifying marking if the attribute is identifying or not. By default all are identifying. Turn it from 1 to 0 to mark that the attribute is not identifying (should not be used as in customer retention calculation).
  2. In public.settings there is a row with id customer_entity_ttl with value integer marking the time of inactivity (in seconds) after which it removes the entity

Example: There is flag is_identifying = 0 for all attributes except for email and phone attributes. In public.settings table there is a row with id customer_entity_ttl = 7776000 (90 days in seconds). There is customer entity A with both email and phone attributes empty (IS NULL in pivoted_customer_attributes) and now() - max(event_time) from all events belonging to this entity is greater than 7776000 -- entity is removed. There is customer entity B with now() - max(event_time) from all events belonging to this entity greater than 7776000, attribute phone is NULL, but attribute email has a value -- entity IS NOT REMOVED. In order to remove customer entity both conditions (all identifying attributes IS NULL, no activity for XY seconds) must be met in order for the entity to be removed.

Events based

How it works:

If there are events older than XY seconds, remove them.

How to use it:

There is a table cdp_ce.events with columns id and ttl (time-to-live). Add a new row for each event ID you want to remove with TTL in seconds.

Example: There is a setting for event ID of page_view from ME with ttl = 7776000 (90 days in seconds). Customer entity has customer events assigned to it of type page_view which are older than 90 days, they get removed. No other events have record in this table, they will NOT be removed.