ePrivacy and GPDR Cookie Consent by Cookie Consent Skip to main content

Guide on investigating segment exports' performance

Sometimes, when a client is doing a lot of segment exports frequently on large volumes of data, it can cause a lot of exports to be stuck in the queue and take longer than expected to finish. In this case, the first thing we want to do is to analyze what exports are taking the most time to finish, duration they are stuck in the queue etc.

Find out destinations that are taking longest to run

Here is a query that can be run to find out the average time it takes to export to each destination, which can be a good starting point to find out what destination is affected the most:

with exports as (
    select
           segment_id,
           id as segment_export_id,
           segment_export_destination_id as target_id,
           user_id,
           status,
           created
    from segment_exports
    union all
    select
        segment_id,
        segment_export_id,
        segment_export_destination_id as target_id,
        user_id,
        status,
        created
    from segment_exports_history
), grouped as (
select
       min(segment_id) as segment_id,
       array_agg(status::text) as statuses,
       min(target_id) as target_id,
       min(user_id) as user_id,
       min(created) as min_created,
       max(created) as max_created,
       max(created) - min(created) as duration
from exports
group by segment_export_id
order by min_created desc
limit 5000
)
select
    target_id,
    avg(duration)
from grouped
where statuses @> array['waiting', 'running', 'finished']
group by 1
order by 2;

After getting the results of the above query, we would then want to investigate further on the destinations that are taking the most time on average.

Find out segment exports that are taking longest to run

This query would get the segment ids for each export and order them by the duration it took for it to run, filtered by a segment export’s created date (you can modify this date range):

with exports as (
    select
           segment_id,
           id as segment_export_id,
           segment_export_destination_id as target_id,
           user_id,
           status,
           created
    from segment_exports
    where status = 'finished'
    union all
    select
        segment_id,
        segment_export_id,
        segment_export_destination_id as target_id,
        user_id,
        status,
        created
    from segment_exports_history
    where status = 'running'
)
select
       segment_export_id,
       min(segment_id) as segment_id,
       array_agg(status) as statuses,
        min(created) as min_created,
       max(created) as max_created,
       max(created) -min(created) as duration
from exports
group by segment_export_id
having min(created) >= '2023-04-01'
order by 6 desc
limit 10000;

With the result in the above query, you can then start to diagnose the segments and destinations that are taking a long time to run, and see if it is due to the data volume or due to the MI workspace not being efficient at dealing with large volumes of data..

Suggestions for resolution and mitigation

To deal with the issue of segments exporting huge amounts of data, explore these options:

  • Exporting data incrementally, for example only exporting the latest 7 days of data, instead of exporting the entire segment every single time.

  • Consider reducing the frequency of the scheduled export.

  • Think about the business use case, and see if it’s possible to make the segment more granular and thus less in volume


If the issue lies with the MI workspace, it has to be dealt with based on the set up of each destination. But some general suggestions are:

  • If it’s possible to “upsert” the segment, parallelize the processes by splitting the export into smaller chunks. Keep in mind that this would use up more workers in MI, which can be a problem if MI has a lot of jobs running in parallel in other workspaces.

  • If the MI workspace is doing any kind of sorting or deduplication, consider doing it in the command line processor instead of a Python component, which in some cases can be faster.