Problems with large customer entities and how to investigate and solve them
Problems with large customer entities
When a customer entities grows too large, this is the following impact:
- Some attributes will be missing values.
- Some customers will be unsearchable based on attributes such as e-mail due to 1.
- Some customers will not be able to be segmented due to 1.
- Some customers will not be able to be sent to exports due to 1.
The exact reason is as follows:
While processing attributes, the algorithm only considers the latest 10,000 customer events. This is to prevent the process to be stuck on the recalculation of 1 entity, not allowing time for other entities to be processed, as you might be aware, the attribute calculations of each attribute is done on all events that belong to an entity, so you can imagine how long it would take to recalculate the attributes if the entity is too large (> 10,000 events).
Not only that, there is a limit for 250 values per attribute per entity. The reason for this is to improve search results during segmentation, which uses OpenSearch. It's usually very unlikely for attributes to have more than 250 values unless this scenario with large customer entities happen, which is why it is set to 250.
Investigating large customer entities
Other than the QA checks that you can do here , you can follow these steps as a guide in investigating a customer entity that has grown too large.
Example problems and causes
Remember that profile stitching is as good as the most unique identifier in all of the stitching rules, so most of the time problems arise due to the identifier not being as unique as it is assumed to be. It can be helpful for finding out the root cause of large entities if we draw from previous real-world examples, so here is a list of some reasons based on our experience as to why an entity can grow so large:
1. Some device identifier is being used as a stitching rule (refer to issue 3 in this doc), and many users or even test/bot users were using the same device to perform actions on website/app, which caused the profile stitching process to assume they are all coming form the same customer entity.
Solution to 1. :
There are multiple ways to solve this depending on the data available or can be made available to you.