ePrivacy and GPDR Cookie Consent by Cookie Consent Skip to main content

How to create context queries that require information of every customer entity

In some use-cases, you may find yourself having to create context queries to query data on the entire public.customer_events table. 

However, we need to keep in mind that a full-scan on the public.customer_events table would eat up a lot of disk throughput, and has the potential of killing the database by using up more than it can handle. 

In order to avoid that, instead create an attribute on the value you are trying to calculate with the context query - eg. top 90th percentile of total page views grouped by customer. After which, you can directly make use of the public.pivoted_customer_attributes table to calculate the percentile rather than aggregate and group by on the public.customer_events table.

An example to illustrate how you might implement a context query that requires to query the entire customer dataset:

Original approach

Context query:

This query is getting the top 95 percentile value of the total number of events per customer

select percentile_disc(0.95) WITHIN GROUP (ORDER BY count_all) as value 
from (
  select 
    customer_entity_id,
    count(*) as count_all
  from public.customer_events
  where "customer_entity_id" IS NOT NULL 
  and source_id = 'me_web'
  group by 1 
) z;


Best practice approach for implementing the query

Add attribute with definition:

This calculates the total number of events for each customer entity.

{
    "id": "web_num_events",
    "type": "sum",
    "types": [],
    "value": "1",
    "filter": "",
    "weight": "1",
    "sources": [
        "me_web"
    ],
    "versions": [
        "1-0-0",
        "1-0-1"
    ],
    "outer_value": "value",
    "event_time": "event_time"
}

New context query:

This query will then get the 95th percentile value from the calculated attribute.

select percentile_disc(0.95) WITHIN GROUP (
ORDER BY count_all
) as value
from (
select web_num_events[1]::int as count_all
from public.pivoted_customer_attributes
WHERE web_num_events IS NOT NULL
) z


This query will give you the maximum number of events at every 10th percentile

with tmp as (
select max(value) as value, ntile * 10 as percentile
from (select web_num_events[1] as value, ntile(10) over (order by web_num_events[1]::int) as ntile
from public.pivoted_customer_attributes
where web_num_events is not null) as buckets
group by 2
order by 2
)
select jsonb_object(array_agg(value)::text[], array_agg(percentile)::text[]) as value
from tmp