How to create context queries that require information of every customer entity
In some use-cases, you may find yourself having to create context queries to query data on the entire
However, we need to keep in mind that a full-scan on the
public.customer_events table would eat up a lot of disk throughput, and has the potential of killing the database by using up more than it can handle.
In order to avoid that, instead create an attribute on the value you are trying to calculate with the context query - eg. top 90th percentile of total page views grouped by customer. After which, you can directly make use of the
public.pivoted_customer_attributes table to calculate the percentile rather than aggregate and group by on the
An example to illustrate how you might implement a context query that requires to query the entire customer dataset:
This query is getting the top 95 percentile value of the total number of events per customer
select percentile_disc(0.95) WITHIN GROUP (ORDER BY count_all) as value
count(*) as count_all
where "customer_entity_id" IS NOT NULL
and source_id = 'me_web'
group by 1
Best practice approach for implementing the query
Add attribute with definition:
This calculates the total number of events for each customer entity.
New context query:
This query will then get the 95th percentile value from the calculated attribute.
select percentile_disc(0.95) WITHIN GROUP (
ORDER BY count_all
) as value
select web_num_events::int as count_all
WHERE web_num_events IS NOT NULL
This query will give you the maximum number of events at every 10th percentile
with tmp as (
select max(value) as value, ntile * 10 as percentile
from (select web_num_events as value, ntile(10) over (order by web_num_events::int) as ntile
where web_num_events is not null) as buckets
group by 2
order by 2
select jsonb_object(array_agg(value)::text, array_agg(percentile)::text) as value