ePrivacy and GPDR Cookie Consent by Cookie Consent Skip to main content

How to clean wrongly stitched customer entities without rerunning PS

PS edge cases

It may happen when setting up PS rules, that you miss out some edge cases and this caused stitching customer entities wrongly. Examples of this cases can be:

  • ps_dentifier in the payload was parsed to text value None or 0. CPS is checking if PS identifier is not NULL, but None will be a valid value for the algorythm and it will stitch all customer event where pd_identifier = "None" to one huge entity.
  • ps_dentifier was parsed into integer and float for different data sources: 12345 and 12345.0 are different values for CPS and entities with these values won't be stitched.

Usually at the very beginning of the project when only small samples of the data were loaded, it makes more sense to rerun PS from scratch. However if big volume of the data was already processed rerunning PS from zero can take days to catchup and it is simply not effective. In this case it might be more sufficient to clean only problematic customer entities and customer events.

Preparation

PostgreSQL process (to be deprecated)

Before the cleaning:

  • Define the problematic customer entity(s). If it is just one huge entity which needs to be "unstitched" - save it's value, if there many entities - decide how to filter these entities.
  • Make sure you know how PS rule should be fixed and test it on a relevant data sample
  • Disable CPS component in MI and stop all the processes with app_name "Meiro CPS" from running. 
--Check DB there is no query with app_name "Meiro CPS" running:
SELECT * FROM pg_stat_activity;

--If there is, kill it:
SELECT pg_terminate_backend(pid);
  • Create a separate schema where tables with entities and events which needs to be restiched will be stored.

Warning: Make sure CPS is not running until you finish with the cleaning.

How to make the cleaning

PostgreSQL process (to be deprecated)

Step 0

If events need to be deleted, delete them as the first step.

delete from cdp_ce.customer_events ce where id in --condition based on which entity(ies) are filtered
Step 1

Create a table where entities to delete will be stored.

create table temp_table.entities_to_clean as
select customer_entity_id from public.customer_events
where --condition based on which entity(ies) are filtered
Step 2
Create a table to store all IDs of customer_events belonging to these customer entities
create table temp_table.events_to_clean as
	select id from public.customer_events
	where customer_entity_id in (select * from temp_table.entities_to_clean)
Step 3
 Remove all rows with customer_entity_ids from cdp_ps.matching table (this will "unstitch" the events).
delete from cdp_ps.matching
where matching.customer_entity_id in (select customer_entity_id::uuid from temp_table.entities_to_clean)
Step 4
Insert customer_entity_id and delete action into cdp_attr.customer_entities_audit to clean everything related to these customer_entity_ids.
insert into cdp_attr.customer_entities_audit(customer_entity_id, created, action)
select distinct customer_entity_id::uuid,current_timestamp, 'delete' from temp_table.entities_to_clean
Step 5
insert all customer_events ids belonging to the entities to clean into cdp_ps.customer_events_to_process to restitch the events.
insert into cdp_ps.customer_events_to_process(customer_event_id, created)
select id::uuid, current_timestamp from temp_table.events_to_clean
Step 6

Update PS rules and enable CPS for running.

Important: Run the queries one by one and make sure that the previous step was finished before running the next one. 


Help! I have accidentally deleted the customer event ids from cdp_ps.matching and lost the customer event ids to clean before inserting them into cdp_ps.customer_events_to_process !

Don't panic, first, make sure CPS is not running, then run the following query to insert ALL events that are not present in cdp_ps.matching table, a.k.a. events that have not been stitched.

insert into cdp_ps.customer_events_to_process(customer_event_id, created)
select id::uuid, current_timestamp from cdp_ce.customer_events ce
left join cdp_ps.matching m on ce.id::uuid = m.customer_event_id where m.customer_event_id is null;

Warning: Take note that this query can be expensive depending on how large the volume of data is. To avoid this problem from happening in the first place, follow the previous steps carefully. This is intended to be a workaround only.