Data Feeds

Overview #
Regardless of your eCommerce platform, and whether we have an integration or not, you can always sync data with Clerk.io through one or more feeds in JSON format.
We support two different variations of the feeds:
- Multiple files for different objects
- A single file containing all objects
The two solutions use the same object structure, but have various features available for securing and importing them, which are outlined in this guide.
All object types except orders are mirrored from the feeds to Clerk.io’s database. If you remove an object from the feed, Clerk.io will remove it from the database when it’s imported. Orders are logged and kept in the database.
We recommend generating the JSON feed(s) at least once a day, but ideally more often. They can also be generated on demand when Clerk.io’s importer requests them.
The feed(s) should be available at a URL that is accessible from Clerk.io’s servers.
https://your-website.com/json-feed.json
Data Types #
We support attributes of the types: int
, float
, str
, array
, bool
.
Null values #
Unchecked null
values are a sure way for errors to sneak in over time. If an attribute does not exist for a given product, category or order simply just omit the attribute.
ID value types #
We highly recommend using integers as IDs but it is possible to use strings as well. You must always commit to 1 type in your feed, meaning all IDs for your objects must be of the same type.
Attribute names #
Object attributes can only contain alphanumerical values (A-Z, 0-9) and underscores.
Thus, a valid attribute name could be brand_name
but not läbel-mærke
Using dashes or special characters in the attribute names will cause them to be ignored in the sync.
Objects Structure #
JSON feeds consist of one list of objects, with a range of fields that make up their data.
Objects must as a minimum contain the required fields for the type for Clerk.io’s AI to function properly, and optionally they can contain any extra attributes available in the eCommerce platform.
Products #
Each object represents a single product. If you have configurable products, we recommend sending just the parent product, and including attributes that describe the children, such as color
, size
, material
, etc.
Below you can see the required fields and examples of optional ones that are often used by eCommerce stores.
Attribute | Importance | Type | Description |
---|---|---|---|
id | Required | int/str | The product ID, which should be unique for each product |
name | Required | str | The product name. |
description | Required | str | The product description. |
price | Required | float | The product’s current selling price. |
list_price | Optional | float | The product’s original list price. Useful to show discounts. |
image | Required | str | The full URL for the product image. When used for thumbnails we recommend a maximum image size of 200x200px. |
url | Required | str | The product URL. |
categories | Required | array | An array of category IDs that the product belongs to. |
created_at | Required | int | The UNIX timestamp of when the product was created. |
brand | Optional | str | The product’s brand. |
color_names | Optional | array | An array of color names for the product. |
color_codes | Optional | array | An array of color codes for the product. |
reviews_amount | Optional | int | The number of reviews for the product. |
reviews_avg | Optional | float | The average review score for the product. |
Example JSON #
[
{
"id": 135,
"name": "Lightsaber",
"description": "Antique Rebel Lightsaber",
"price": 99995.95,
"image": "https://galactic-empire-merch.com/images/a-r-lightsaber.jpg",
"url": "https://galactic-empire-merch.com/antique-rebel-lightsaber",
"brand": "Je'daii",
"categories": [987, 654],
"created_at": 1199145600,
"color_names": ["Green","Red"],
"color_codes": ["#7CFC00","#FF3131"],
"reviews_amount": 164,
"reviews_avg": 4.8
},
{
"id": 261,
"name": "Death Star Deluxe",
"description": "Death Star - Guaranteed idiot proof",
"price": 99999999999999.95,
"image": "https://galactic-empire-merch.com/images/death-star.jpg",
"url": "https://galactic-empire-merch.com/death-star",
"brand": "Imperial Inc.",
"categories": [345678],
"created_at": 1197565600
}
]
Keep Products Without Indexing #
For some setups, you might want to keep products in Clerk.io’s database without showing them in any results.
If you sell tickets or used items that will be available for a time before never coming back, it’s a good idea to keep the history of these products intact, so Clerk can use it to improve results.
To do this, add the special attribute index: false
to the product objects that should be kept without being indexed. Clerk will then use the history of their sales for showing results, but they will never be shown in any API calls.
For other products, simply leave the attribute out or set it to index: true
.
Categories #
Each object represents a single category. Clerk.io builds an internal category tree based on the subcategories provided for each category.
Below you can see the required fields and examples of optional ones that are often used by eCommerce stores.
Attribute | Importance | Type | Description |
---|---|---|---|
id | Required | int/str | The unique ID for the category. |
name | Required | str | The category name. |
url | Required | str | The category URL. |
subcategories | Required | array | An array of category IDs that are subcategories of this category. Can be an empty list for categories without subcategories. |
image | Optional | str | Full URL for the category image. |
description | Optional | str | The category description. |
Example JSON #
[
{
"id": 1,
"name": "Imperial Goods",
"subcategories": [42, 25],
"url": "https://galactic-empire-merch.com/imperial-goods"
},
{
"id": 42,
"name": "Tatooine",
"subcategories": [],
"url": "https://galactic-empire-merch.com/imperial-goods/tatooine"
},
{
"id": 25,
"name": "Coruscant",
"subcategories": [],
"url": "https://galactic-empire-merch.com/imperial-goods/coruscant"
}
]
Orders #
Orders are logged and are not deleted when removed from the feed. They generally only have to be sent during the first import and can then be removed again to save server capacity. They can be deleted via our CRUD API.
parcels
data can currently only be synced via CRUD API. Check the
documentation here.
Each object represents a single order. Clerk.io uses the product IDs and email address/customer ID inside orders to analyze customer behaviour and identify trends. Along with products
, it is the most important object type.
Below you can see the required fields and optional fields. It is not possible to send additional attributes for orders.
Attribute | Importance | Type | Description |
---|---|---|---|
id | Required | int/str | The order ID, this should be unique for each order. |
products | Required | array | The products in the order. Each product is an object with an ID, quantity, and unit price. |
time | Required | unix timestamp | The time the order was placed as a Unix Timestamp. |
customer | Optional | int/str | The customer ID. |
email | Optional | str | The customer email. Needed for using our Auto-Email and Audience products. |
Example JSON #
[
{
"id": 123458,
"customer": 789,
"email": "vader@the-death-star.com",
"products": [{"id":456,"quantity":1,"price":200.00}, {"id":789,"quantity":2,"price":120.00}],
"time": 1389871120
},
{
"id": 123456,
"customer": 456,
"email": "obi.wan@kenobi.me",
"products": [{"id":456,"quantity":1,"price":200.00}, {"id":789,"quantity":2,"price":120.00},{"id":123,"quantity":2,"price":60.00}],
"time": 1389870977
},
{
"id": 123457,
"customer": "",
"products": [{"id":789,"quantity":2,"price":120.00}],
"time": 1389871090
}
]
Customers #
Each object represents a single Customer. The attributes provided are then merged with the customer’s email
or customer
ID from orders to create a single customer profile for use with
Audience segmentation.
Below you can see the required fields and examples of optional ones that are often used by eCommerce stores.
Attribute | Importance | Type | Description |
---|---|---|---|
id | Required | int/str | The customer ID, this should be unique for each customer. |
name | Required | str | The customer’s full name. |
email | Required | str | The customer’s email. |
subscribed | Required | bool | Boolean indicating whether the customer has subscribed to newsletters. This must be true for Clerk.io to send marketing emails to this customer. |
zip | Optional | str | The customer’s zip code. |
gender | Optional | str | The customer’s gender |
age | Optional | int | The customer’s age. |
is_b2b | Optional | bool | Boolean indicating whether the customer is a business customer. |
Example JSON #
[
{
"id": 135,
"name": "Luke Skywalker",
"email": "luke@rebels.com",
"subscribed": true,
"gender": "male",
"zip": "1134",
"is_b2b": "false"
},
{
"id": 165,
"name": "Leia Organa",
"email": "leia@royalty.org",
"subscribed": false,
"gender": "female",
"age": 19,
"interests": ["politics", "outlaws"],
"is_b2b": true
}
]
Pages #
Each object represents a single page. Pages are generally all types of eCommerce content that is not classified as a product or category. It could be articles, blog posts, landings pages, brand pages and other types of written content.
Below you can see the required fields and examples of optional ones that are often used by eCommerce stores.
Attribute | Importance | Type | Description |
---|---|---|---|
id | Required | int/str | Page ID, this should be unique for each page. |
type | Required | str | Type of the content. Used to separate pages such as CMS pages, blog posts and landing pages. |
url | Required | str | Full URL of the page. |
title | Required | str | Title of the page. |
text | Required | str | Full body of text for the page. |
image | Optional | str | The full URL for the page image. |
Example JSON #
[
{
"id": 135,
"type": "cms",
"url": "https://galactic-empire-merch.com/imperial-goods/tatooine",
"title": "Open Hours",
"text": "The main text about our opening hours..."
},
{
"id": 1354,
"type": "blog",
"url": "https://galactic-empire-merch.com/imperial-goods/tatooine",
"title": "New Blog Post",
"text": "The main text about our opening hours...",
"keywords": ["blog", "post", "new"]
}
]
Multi-language #
Clerk.io works best when you create separate Stores for each language. Each Store can be configured with the language of the content, which makes Search understand grammar and spelling mistakes much better.
Further, customers from different regions or countries tend to have different preferences and search patterns, which means that it works best to separate the order data into different Stores as well.
An alternative to this is to build multi-language JSON feeds, where all text attributes are provided as objects with language codes as keys, and their translations as values.
All text attributes must have language keys even if the content inside them is the same, to make sure they are searchable for the language.
When making API calls, include the parameter language
and the matching language key, to fetch the correct data.
Example Multi-language JSON #
[
{
"id": 135,
"name": {
"english":"Lightsaber",
"spanish":"Sable de luz",
"italian":"Spada laser"
},
"description": {
"english":"Antique Rebel Lightsaber",
"spanish":"Sable de luz rebelde antiguo",
"italian":"Antica spada laser ribelle"
},
"price": 99995.95,
"image": {
"english":"https://galactic-empire-merch.com/images/a-r-lightsaber.jpg",
"spanish":"https://galactic-empire-merch.com/es/images/a-r-lightsaber.jpg",
"italian":"https://galactic-empire-merch.com/it/images/a-r-lightsaber.jpg"
},
"url": {
"english":"https://galactic-empire-merch.com/antique-rebel-lightsaber",
"spanish":"https://galactic-empire-merch.com/es/antique-rebel-lightsaber",
"italian":"https://galactic-empire-merch.com/it/antique-rebel-lightsaber"
},
"brand": "Je'daii",
"categories": [987, 654],
"created_at": 1199145600,
"color_names": ["Green","Red"],
"color_codes": ["#7CFC00","#FF3131"],
"reviews_amount": 164,
"reviews_avg": 4.8
},
{
"id": 261,
"name": {
"english":"Death Star Deluxe",
"spanish":"Estrella de la Muerte de lujo",
"italian":"La Morte Nera Deluxe"
},
"description": {
"english":"Death Star - Guaranteed idiot proof",
"spanish":"Estrella de la Muerte: a prueba de idiotas garantizada",
"italian":"Morte Nera - A prova di idiota garantita"
},
"price": 99999999999999.95,
"image": {
"english":"https://galactic-empire-merch.com/images/death-star.jpg",
"spanish":"https://galactic-empire-merch.com/es/images/death-star.jpg",
"italian":"https://galactic-empire-merch.com/it/images/death-star.jpg"
},
"url": {
"english":"https://galactic-empire-merch.com/death-star",
"spanish":"https://galactic-empire-merch.com/es/death-star",
"italian":"https://galactic-empire-merch.com/it/death-star"
},
"brand": "Imperial Inc.",
"categories": [345678],
"created_at": 1197565600
}
]
Example call #
curl -X GET \
https://api.clerk.io/v2/recommendations/popular \
-H 'Content-Type: application/json' \
-d 'key=your_store_public_key&limit=10&language=italian'
Supported languages #
The language must be specified with its exact name. If your language is not on the below list, choose a language using the same root, or simply “english”. It will still work, but grammar-neutralisation in Search will be less effective.
- danish
- dutch
- english
- finnish
- french
- german
- italian
- norwegian
- portuguese
- russian
- spanish
- swedish
Multiple Feeds #

This is the recommended approach as it is efficient for your server and offers the highest degree of control,
With this approach, you build individual feed files for each of your objects. This uses the sync method called Clerk.io JSON Feed V2.
These support content-type
: application/x-ndjson
or application/json
.
Each feed should contain an array of objects.
URL #
https://awsumstuff.com/feed/products.json
Output #
[
{
"id": 135,
"name": "Lightsaber",
"description": "Antique Rebel Lightsaber",
"price": 99995.95,
},
...
]
Pagination #
This is an optional feature that allows you to paginate results by coding your feed to accept the following query parameters:
limit
: The number of objects to return per page.offset
: The index of the first object to return on a page.
Clerk.io’s importer can be configured to send these parameters to your feed code. You simply have to select the amount of objects you want to fetch per page.
When you configure your feed URL, you can then use {{limit}}
and {{offset}}
to append the data as query parameters.
{{limit}}
will contain the number you configure in the importer settings. {{offset}}
will start at 0 in the first call, and continuously grow based on limit
.
E.g.
- Call 1:
limit=100&offset=0
- Call 2:
limit=100&offset=100
- Call 3:
limit=100&offset=200
The stopping condition is when your feed returns an empty array.
URL #
https://awsumstuff.com/feed/products.json?limit={{limit}}&offset={{offset}}
Increments #
Using this feature means that Clerk.io will stop deleting objects when importing, so you need to use CRUD API calls to remove objects from Clerk.io’s database.
The multi-feed solution supports the optional feature of sending only the data that has changed since a chosen number of days, rather than sending all data every time.
To do this, start by ensuring that your feed is configured to only return objects that have been changed in a specified amount of days, when the request includes the query parameter modified_after
.
Then, add a number of days in the field labeled Incremental time {{modified_after}} found in Clerk.io’s importer settings.
This will cause Clerk.io’s importer to keep all data in the database, and only update objects that are included in the feeds.
To use the number of days you have configured, add the modified_after
query parameter to your feed and include the tag that will insert the number of days you have configured. For example:
https://awsumstuff.com/feed/products.json?modified_after={{modified_after}}&limit={{limit}}&offset={{offset}}
Security #
We recommend that the JSON feed only accepts an SSL encrypted connection and uses HTTP Authentication if possible.
In addition, from the importer settings, you can activate Token Authentication. Clerk.io will then include an authorisation header on every HTTP request, that you need to verify before returning the data:
X-Clerk-Authorization: Bearer THE_TOKEN
You can verify the token with a POST request to the token/verify endpoint:
curl -X POST \
https://api.clerk.io/v2/token/verify \
-H 'Content-Type: application/json' \
-d '{"token": "THE_TOKEN", "key": "your_store_public_key"}'
Single Feed #

Parameters #
Apart from the objects themselves, this approach supports two additional parameters:
created
: A unix timestamp of when the feed was last updated. Clerk.io’s importer uses this to identify whether new data should be fetched.strict
: Whentrue
all data will be imported as-is. Whenfalse
Clerk.io will attempt to clean up the data, for example by removing duplicate products or categories, and converting stringified numbers into integers or floats.
Example Feed #
{
"products": [ ... ],
"categories": [ ... ],
"orders": [ ... ],
"customers": [ ... ],
"pages": [ ... ],
"config": {
"created": 1567069830,
"strict": false
}
}
Security #
Your data is extremely business-sensitive so security is of the highest priority!
We recommend that the JSON feed only accepts an SSL encrypted connection and uses HTTP Authentication if possible.
In addition, Clerk provides an additional layer of security by letting you verify that feed request is from a trusted source (ie. us).
The system is based on a shared secret; a Private API key which can be created in my.clerk.io under Settings > API Keys.
All Clerk.io requests via HTTP or HTTPS include two query parameters hash
and salt
.
salt
is just a random string used to salt the hash function while hash
is a SHA512 hash computed from the Private API Key in the following way:
hash = SHA512(salt + private_key + str(int(floor(unix_timestamp() / 100))))
An example request could be the following URL:
https://example.com/clerk-product-feed.php?salt=f4Ke...A02X&hash=4DFF...340F
By fetching both the salt
and hash
parameters from the request, you can do the same computation on your server, and compare the hash
values to confirm that they are the same, meaning the request comes from Clerk.io