How to clean wrongly stitched customer entities without rerunning PS
PS edge cases
It may happen when setting up PS rules, that you miss out some edge cases and this caused stitching customer entities wrongly. Examples of this cases can be:
ps_dentifier
in the payload was parsed to text valueNone
or0
. CPS is checking if PS identifieris not NULL
, butNone
will be a valid value for the algorythm and it will stitch all customer event wherepd_identifier = "None"
to one huge entity.ps_dentifier
was parsed intointeger
andfloat
for different data sources:12345
and12345.0
are different values for CPS and entities with these values won't be stitched.
Usually at the very beginning of the project when only small samples of the data were loaded, it makes more sense to rerun PS from scratch. However if big volume of the data was already processed rerunning PS from zero can take days to catchup and it is simply not effective. In this case it might be more sufficient to clean only problematic customer entities and customer events.
Preparation
Before the cleaning:
- Define the problematic customer entity(s). If it is just one huge entity which needs to be "unstitched" - save it's value, if there many entities - decide how to filter these entities.
- Make sure you know how PS rule should be fixed and test it on a relevant data sample
- Disable CPS component in MI and stop all the processes with app_name "Meiro CPS" from running.
--Check DB there is no query with app_name "Meiro CPS" running:
SELECT * FROM pg_stat_activity;
--If there is, kill it:
SELECT pg_terminate_backend(pid);
- Create a separate schema where tables with entities and events which needs to be restiched will be stored.
Warning: Make sure CPS is not running until you finish with the cleaning.
How to make the cleaning
Step 1
Create a table where entities to delete will be stored.
create table temp_table.entites_to_clean as
select customer_entity_id from public.customer_events
where --condition based on which entity(ies) are filtered
Step 2
create table temp_table.events_to_clean as
select id from public.customer_events
where customer_entity_id in (select * from temp_table.entities_to_clean)
Step 3
cdp_ps.matching
table (this will "unstitch" the events).delete from cdp_ps.matching
where matching.customer_entity_id in (select customer_entity_id::uuid from temp_table.entites_to_clean)
Step 4
cdp_attr.customer_entities_queue_to_delete
to clean everything related to these customer_entity_id
s.insert into cdp_attr.customer_entities_queue_to_delete(customer_entity_id)
select distinct customer_entity_id::uuid from temp_table.entites_to_clean
Step 5
cdp_ps.customer_events_to_process
to restitch the events.insert into cdp_ps.customer_events_to_process(customer_event_id)
select * from temp_table.events_to_clean
Step 6
Update PS rules and enable CPS for running.
Important: Run the queries one by one and make sure that the previous step was finished before running the next one.