Product data scraping
How Bambuser Live Shopping product data scraping works
This document provides a more detailed view of how the Bambuser Live Shopping Product data scraping works.
If you do not have access to Bambuser workspace, you can still test how our product scraper will behave with your product URLs. Visit this page and enter your product URL to test.
When you add a product to a show in the Bambuser Live Shopping Workspace, some basic product details are scraped from the content of the given product URL (e.g. https://yourcompany.com/products/pink-shirt).
The following properties will be extracted from the page:
- a product name or title
- an image URL (product thumbnail)
- a brand name
- a reference (often called SKU - these can all be fetched or entered manually by the admin when setting up the show)
- price
- currency
All fields can also be inserted and modified manually, however, the Bambuser product scraper tends to reduce the manual work by automating the product data insertion and make the consumers' life easier.
How does the product scraper work?
The scraper looks for different kinds of structured product data and metadata, using the following priority order:
- Schema.org markup
- JSON-LD
- Microdata
- OpenGraph meta-tags (
og:
) - Generic HTML tags
If the scraper is not able to find a product reference (SKU) it will use the provided product URL as a reference.
1. Schema.org markup
Specification: https://schema.org/Product
Google's testing tool can be used to see if your site supports this: https://search.google.com/structured-data/testing-tool/u/0/
JSON-LD Recommended
Example:
<script type="application/ld+json">
{
"@type": "Product",
"@context": "http://schema.org/",
"name": "My Product Name",
"description": "My Description",
"brand": { "@type": "Thing", "name": "My Brand Name" },
"image": "https://yoursite.com/path-to-image.jpg",
"sku": "product-sku-12345",
"offers":[{
"@type":"Offer",
"priceCurrency":"EUR",
"price":"45"
}],
}
</script>
Microdata Beta
Exampe:
<div itemscope itemtype="http://schema.org/Product">
<span itemprop="name">My Product Name</span>
<span itemprop="brand">My Brand Name</span>
<img itemprop="image" src="https://yoursite.com/path-to-image.jpg">
<span itemprop="description">Some optional description</span>
<span itemprop="sku" content="product-sku-12345"></span>
<div itemprop="offers" itemscope itemtype="http://schema.org/Offer">
<meta itemprop="priceCurrency" content="EUR">
<span itemprop="price">45</span>
</div>
</div>
2. OpenGraph meta-tags (og:
)
Specification: https://developers.facebook.com/docs/payments/product/
Example:
<meta property="og:type" content="og:product" />
<meta property="og:title" content="My Product Name" />
<meta property="product:brand" content="My Brand Name" />
<meta property="og:image" content="http://path-to-thumbnail" />
<meta property="og:description" content="Some optional description!" />
<meta property="product:retailer_item_id" content="product-sku-12345" />
<meta property="product:price:amount" content="45">
<meta property="product:price:currency" content="EUR">
3. Generic meta tags
If the aforementioned structured product data are not found, the product scraper looks for generic information found on most websites such as the title element, images.
<head>
<title> My Product Page Name </title>
</head>
<body>
...
<img src="https://yoursite.com/path-to-image.jpg">
...
</body>
The Bambuser Product Scraper server is located in the US. Your assets need to be accessible from US-based IP addresses. Otherwise, you need to whitelist our product scraper as described in the following.
Whitelist the scraper
An example use case for when you need to whitelist our scraper is when you intend to add products from your staging/test environment that is not publicly accessible. You can make an exemption for our scraper user-agent or whitelist static IP address.
By User-agent:
The scraper will identify itself with the following user-agent: BambuserLiveShopping/1.0
. You can make an exception for requests made by this user-agent.
Once whitelisted the user-agent, it should start working right away.
By Static IP address:
You can also whitelist our scraper through the static IP address:
- Global server:
35.224.84.15
- EU server:
35.240.106.166
Beside whitelisting our static IP address from your side, you also need to inform Bambuser staff to enable 'Static IP proxy' for your organization.
Whitelist Bambuser Image Transformer
If your products are scraped correctly, but the images/thumbnails are not shown properly, you may need to whitelist our image transformer.
This can be due to restrictions from your CDN that blocks requests from our cloud-based image transformer.
This issue is also common if you are using Akamai CDN with strict rules.
Solution
Follow the same process as for whitelisting the scraper on your CDN rules.
For best performance, we recommend that you only whitelist Bambuser Image Transformer by user-agent (BambuserLiveShopping/1.0
) as this option does not require an additional proxy stage.
Troubleshooting
- Our product data is not getting scraped correctly, why?
- Thumbnail image URL scraped correctly, but image is blank/ white
Product data scraping FAQ
We highly recommend you to use the JSON-LD format of the Schema.org/Product
Absolutely! You can then update product details such as Title and Thumbnail manually. The product scraper is only a tool to automate manual data insertion and make your life easier.