How to create context queries that require information of every customer entity
In some use-cases, you may find yourself having to create context queries to query data on the entire public.customer_events
table.
However, we need to keep in mind that a full-scan on the public.customer_events
table would eat up a lot of disk throughput, and has the potential of killing the database by using up more than it can handle.
In order to avoid that, instead create an attribute on the value you are trying to calculate with the context query - eg. top 90th percentile of total page views grouped by customer. After which, you can directly make use of the public.pivoted_customer_attributes
table to calculate the percentile rather than aggregate and group by on the public.customer_events
table.
An example to illustrate how you might implement a context query that requires to query the entire customer dataset:
Original approach
Context query:
This query is getting the top 95 percentile value of the total number of events per customer
select percentile_disc(0.95) WITHIN GROUP (ORDER BY count_all) as value
from (
select
customer_entity_id,
count(*) as count_all
from public.customer_events
where "customer_entity_id" IS NOT NULL
and source_id = 'me_web'
group by 1
) z;
Best practice approach for implementing the query
Add attribute with definition:
This calculates the total number of events for each customer entity.
{
"id": "web_num_events",
"type": "sum",
"types": [],
"value": "1",
"filter": "",
"weight": "1",
"sources": [
"me_web"
],
"versions": [
"1-0-0",
"1-0-1"
],
"outer_value": "value",
"event_time": "event_time"
}
New context query:
This query will then get the 95th percentile value from the calculated attribute.
select percentile_disc(0.95) WITHIN GROUP (
ORDER BY count_all
) as value
from (
select web_num_events[1]::int as count_all
from public.pivoted_customer_attributes
WHERE web_num_events IS NOT NULL
) z
This query will give you the maximum number of events at every 10th percentile
with tmp as (
select max(value) as value, ntile * 10 as percentile
from (select web_num_events[1] as value, ntile(10) over (order by web_num_events[1]::int) as ntile
from public.pivoted_customer_attributes
where web_num_events is not null) as buckets
group by 2
order by 2
)
select jsonb_object(array_agg(value)::text[], array_agg(percentile)::text[]) as value
from tmp