Tutorial: How to implement product feed (WIP)
Here, we will provide some general guidelines on product feeds and how it can be implemented as part of the data model in CDP
Pre-requisites
The client should provide a link to their product feed. It is usually a xml, html, or csv file with important information about their products. Here are the important fields that a product feed must have to fulfill a majority of our use cases.
Field | Description |
Product ID |
The ID of the product, also referred to the SKU sometimes. Ensure that the Product ID matches with the product IDs from other sources of data (Purchase, cart events from Meiro Events, transactions from their POS database etc) |
Product Name | Name of the product |
URL | The link to the product's page of the website |
Image URL |
A publicly accessable link to the image of the product. The link usually ends with .jpg .jpeg or .png The product image is often attached to emails or other activation mediums. |
Here are some useful but not necessary fields in a product feed. They are useful for specific use cases.
Field | Description |
Price, Currency |
Monetary values of the product Useful in cases where you want to notify customers of a price drop |
Brand, Category |
Fields that categorize the product |
In Stock |
Flag that says if the product is in stock Useful in cases where you want to notify a customer that a product is back in stock |
The product feed workspace
The product feed workspace can be as simple as just 2 components.
The first component is a Python processor which extracts the product feed from the link and parses the data into a csv.
The second component loads the product feed into the CDP
How to parse your product feed
If your product feed is in csv, you can use Pandas package with our Python processor to format the product feed. Here is an example script you can use to extract and parse a .csv feed. Remember that the product feed source can vary from client to client and you will have to change the script accordingly to parse the feed.
import csv
import requests
import pandas as pd
product_feed_url = 'https://feed.<redacted>.csv'
download = requests.get(product_feed_url)
decoded_content = download.content.decode('utf-8')
cr = csv.reader(decoded_content.splitlines(), delimiter=',')
list_pdts = list(cr)
# remove header row
list_pdts.pop(0)
# format feed as a dataframe with our own column names
df = pd.DataFrame(list_pdts, columns = ['product_id', 'product_name', 'url', 'img_url', 'brand', 'category', 'price', 'currency'])
df.to_csv('out/tables/product_feed.csv', index=False)
if your product feed is in xml/html, you can use BeautifulSoup package with our Python processor to parse and format the product feed. Here is an example script.
import requests
from bs4 import BeautifulSoup
import pandas as pd
response = requests.get('https://feed.<redacted>.xml')
products = BeautifulSoup(response.content, "html.parser")
# in this product feed, the products are called 'entry'
# this will vary with client, so please check and change the word accordingly
list_pdts = products.findAll('entry')
# here we parse the product feed by extracting the neccesary fields
# please explore the feed and use cases to get all required fields
data = []
for pdt in list_pdts:
product_name = pdt.product_name.get_text() if pdt.product_name is not None else None
image_link = pdt.image_link.get_text() if pdt.image_link is not None else None
price = pdt.price.get_text() if pdt.price is not None else None
currency = pdt.currency.get_text() if pdt.currency is not None else None
row = [pdt.id.get_text(), product_name,
pdt.link.next_sibling, image_link,
price, currency]
data.append(row)
df = pd.DataFrame(data, columns=['product_id', 'product_name', 'url', 'img_url', 'price', 'currency'])
df.to_csv('out/tables/product_feed.csv', index=False)
Where to store your product feed
Your product feed should live in the external_data
schema of the CDP