Profile Stitching edge cases and solutions (wip)
This article describes the profile stitching (PS) issues / edge cases Analysts have faced during implementation. We hope that it can serve as a reference point on the possible solutions to PS.
General tips
Remove trailing and leading whitespaces, and use lower case for emails
LOWER(TRIM(payload ->> 'email'))
Validate your emails and phone numbers during data processing
# email regular expression
import re
EMAIL_REGEX = r'(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)'
email_pattern = re.compile(EMAIL_REGEX)
if email_pattern.fullmatch(email):
return email.strip().lower()
# phonenumbers package can parse phone numbers with country code and split it to country code and phone number
import phonenumbers
x = phonenumbers.parse(phone_number)
event['payload']['country_code'] = x.country_code
event['payload']['phone_number_validated'] = x.national_number
Issue 1: Physical/In-store events
Some clients have a physical store or have in-store events for customers. In most cases, a tablet is used by multiple customers for browsing, logging in and making purchases. This results in many different customers having the same meiro user id
because they used the same device, which causes large entities during PS
Possible fixes:
- Do not stitch
meiro user id
for these problematic events - Add flags/geolocation fields (like IP addresses) and update PS rules to exclude the appropriate fields
-- in this example, the offline flag is used to show if the event was in-store (offline = '1') or online (offline = '0')
CASE
WHEN payload->'payload'->>'offline' = '1'
THEN NULL
ELSE payload->>'user_id'
END
- Process change - allow for browsing on incognito mode only. Close and open the incognito window for each new customer. The action of closing and re-opening incognito windows will generate new
meiro user ids
, ensuring that different customers will have differentmeiro user ids
. - In Meiro Events, implement
MeiroEvents.resetIdentity()
on specific triggers, like on a log out event. WhenMeiroEvents.resetIdentity()
method is called, the trackedmeiro user id
will be refreshed, so another customer using the same device will be tracked under a newmeiro user id
.
Issue 2: Emails with same tracking ID opened by different users
Many Email Direct Marketing (EDM) platforms now decorate their emails links with an ID to track user's browsing activity. This ID is usually unique per email campaign and per email address. However, users may forward their email or email links containing this ID to other users, which causes all such users to be stitched together because they have the same email ID.
Possible fix:
- Edit PS rule to only stitch the FIRST instance of email ID, ignoring all subsequent events with the same email ID. This can be done by add specific rule to
value_expression_prep
incdp_ps.stitching_attributes_config
for the event you would like to fix. Example of the rule:
CASE
-- a general check on the id to ensure it has 32 characters
WHEN
length(payload->'url_parts'->'query_parsed'->'sid'->>0)=32
THEN
CASE
-- compares the list of identifiers already stitched (m.identifiers and p.identifiers), against the identifier
-- if it has been stitched already (count is > 0), then do not this identifier
-- here, 'stitching_se_sid' is the attribute_id and
-- st.payload->'url_parts'->'query_parsed'->'sid'->>0 is the identifier in the payload
WHEN
(
(
SELECT COUNT(*)
FROM cdp_ps.matching m
INNER JOIN cdp_ce.customer_events ce ON ce.id::uuid = m.customer_event_id
WHERE ce.source_id = 'eshop_website' AND ce."type" = 'page_view'
AND m.identifiers && ARRAY [md5('stitching_se_sid' || (st.payload->'url_parts'->'query_parsed'->'sid'->>0))::uuid]
) > 0
) OR
(
(
SELECT COUNT(*)
FROM cdp_ps.prep p
INNER JOIN cdp_ce.customer_events ce ON ce.id::uuid = p.customer_event_id
WHERE ce.source_id = 'eshop_website' AND ce."type" = 'page_view'
AND p.identifiers && ARRAY [md5('stitching_se_sid' || (st.payload->'url_parts'->'query_parsed'->'sid'->>0))::uuid]
) > 0
)
THEN
NULL
ELSE
payload->'url_parts'->'query_parsed'->'sid'->>0
END
END