Tutorial: How to implement product feed
Here, we will provide some general guidelines on product feeds and how it can be implemented as part of the data model in CDP
The client should provide a link to their product feed. It is usually a xml, html, or csv file with important information about their products. Here are the important fields that a product feed must have to fulfill a majority of our use cases.
The ID of the product, also referred to the SKU sometimes.
Ensure that the Product ID matches with the product IDs from other sources of data (Purchase, cart events from Meiro Events, transactions from their POS database etc)
|Product Name||Name of the product|
|URL||The link to the product's page of the website|
A publicly accessable link to the image of the product. The link usually ends with .jpg .jpeg or .png
The product image is often attached to emails or other activation mediums.
Here are some useful but not necessary fields in a product feed. They are useful for specific use cases.
Monetary values of the product
Useful in cases where you want to notify customers of a price drop
Fields that categorize the product
Flag that says if the product is in stock
Useful in cases where you want to notify a customer that a product is back in stock
The product feed workspace
The product feed workspace can be as simple as just 2 components.
The first component is a Python processor which extracts the product feed from the link and parses the data into a csv.
The second component loads the product feed into the CDP
How to parse your product feed
If your product feed is in csv, you can use Pandas package with our Python processor to format the product feed. Here is an example script you can use to extract and parse a .csv feed. Remember that the product feed source can vary from client to client and you will have to change the script accordingly to parse the feed.
import csv import requests import pandas as pd product_feed_url = 'https://feed.<redacted>.csv' download = requests.get(product_feed_url) decoded_content = download.content.decode('utf-8') cr = csv.reader(decoded_content.splitlines(), delimiter=',') list_pdts = list(cr) # remove header row list_pdts.pop(0) # format feed as a dataframe with our own column names df = pd.DataFrame(list_pdts, columns = ['product_id', 'product_name', 'url', 'img_url', 'brand', 'category', 'price', 'currency']) df.to_csv('out/tables/product_feed.csv', index=False)
if your product feed is in xml/html, you can use BeautifulSoup package with our Python processor to parse and format the product feed. Here is an example script.
import requests from bs4 import BeautifulSoup import pandas as pd response = requests.get('https://feed.<redacted>.xml') products = BeautifulSoup(response.content, "html.parser") # in this product feed, the products are called 'entry' # this will vary with client, so please check and change the word accordingly list_pdts = products.findAll('entry') # here we parse the product feed by extracting the neccesary fields # please explore the feed and use cases to get all required fields data =  for pdt in list_pdts: product_name = pdt.product_name.get_text() if pdt.product_name is not None else None image_link = pdt.image_link.get_text() if pdt.image_link is not None else None price = pdt.price.get_text() if pdt.price is not None else None currency = pdt.currency.get_text() if pdt.currency is not None else None row = [pdt.id.get_text(), product_name, pdt.link.next_sibling, image_link, price, currency] data.append(row) df = pd.DataFrame(data, columns=['product_id', 'product_name', 'url', 'img_url', 'price', 'currency']) df.to_csv('out/tables/product_feed.csv', index=False)
Where to store your product feed
Your product feed should live in the
external_data schema of the CDP