Any (webshop)

Data Feeds

Data Feeds

Overview #

Regardless of your eCommerce platform, and whether we have an integration or not, you can always sync data with Clerk.io through one or more feeds in JSON format.

We support two different variations of the feeds:

  • Multiple files for different objects
  • A single file containing all objects

The two solutions use the same object structure, but have various features available for securing and importing them, which are outlined in this guide.

All object types except orders are mirrored from the feeds to Clerk.io’s database. If you remove an object from the feed, Clerk.io will remove it from the database when it’s imported. Orders are logged and kept in the database.

We recommend generating the JSON feed(s) at least once a day, but ideally more often. They can also be generated on demand when Clerk.io’s importer requests them.

The feed(s) should be available at a URL that is accessible from Clerk.io’s servers.

https://your-website.com/json-feed.json

Data Types #

We support attributes of the types: int, float, str, array, bool.

Null values #

Unchecked null values are a sure way for errors to sneak in over time. If an attribute does not exist for a given product, category or order simply just omit the attribute.

ID value types #

We highly recommend using integers as IDs but it is possible to use strings as well. You must always commit to 1 type in your feed, meaning all IDs for your objects must be of the same type.

Attribute names #

Object attributes can only contain alphanumerical values (A-Z, 0-9) and underscores.
Thus, a valid attribute name could be brand_name but not läbel-mærke

Using dashes or special characters in the attribute names will cause them to be ignored in the sync.

Objects Structure #

JSON feeds consist of one list of objects, with a range of fields that make up their data.

Objects must as a minimum contain the required fields for the type for Clerk.io’s AI to function properly, and optionally they can contain any extra attributes available in the eCommerce platform.

Products #

Each object represents a single product. If you have configurable products, we recommend sending just the parent product, and including attributes that describe the children, such as color, size, material, etc.

Below you can see the required fields and examples of optional ones that are often used by eCommerce stores.

AttributeImportanceTypeDescription
idRequiredint/strThe product ID, which should be unique for each product
nameRequiredstrThe product name.
descriptionRequiredstrThe product description.
priceRequiredfloatThe product’s current selling price.
list_priceOptionalfloatThe product’s original list price. Useful to show discounts.
imageRequiredstrThe full URL for the product image. When used for thumbnails we recommend a maximum image size of 200x200px.
urlRequiredstrThe product URL.
categoriesRequiredarrayAn array of category IDs that the product belongs to.
created_atRequiredintThe UNIX timestamp of when the product was created.
brandOptionalstrThe product’s brand.
color_namesOptionalarrayAn array of color names for the product.
color_codesOptionalarrayAn array of color codes for the product.
reviews_amountOptionalintThe number of reviews for the product.
reviews_avgOptionalfloatThe average review score for the product.

Example JSON #

[
  {
    "id": 135,
    "name": "Lightsaber",
    "description": "Antique Rebel Lightsaber",
    "price": 99995.95,
    "image": "https://galactic-empire-merch.com/images/a-r-lightsaber.jpg",
    "url": "https://galactic-empire-merch.com/antique-rebel-lightsaber",
    "brand": "Je'daii",
    "categories": [987, 654],
    "created_at": 1199145600,
    "color_names": ["Green","Red"],
    "color_codes": ["#7CFC00","#FF3131"],
    "reviews_amount": 164,
    "reviews_avg": 4.8
  },
  {
    "id": 261,
    "name": "Death Star Deluxe",
    "description": "Death Star - Guaranteed idiot proof",
    "price": 99999999999999.95,
    "image": "https://galactic-empire-merch.com/images/death-star.jpg",
    "url": "https://galactic-empire-merch.com/death-star",
    "brand": "Imperial Inc.",
    "categories": [345678],
    "created_at": 1197565600
  }
]

Keep Products Without Indexing #

For some setups, you might want to keep products in Clerk.io’s database without showing them in any results.

If you sell tickets or used items that will be available for a time before never coming back, it’s a good idea to keep the history of these products intact, so Clerk can use it to improve results.

To do this, add the special attribute index: false to the product objects that should be kept without being indexed. Clerk will then use the history of their sales for showing results, but they will never be shown in any API calls.

For other products, simply leave the attribute out or set it to index: true.

Categories #

Each object represents a single category. Clerk.io builds an internal category tree based on the subcategories provided for each category.

Below you can see the required fields and examples of optional ones that are often used by eCommerce stores.

AttributeImportanceTypeDescription
idRequiredint/strThe unique ID for the category.
nameRequiredstrThe category name.
urlRequiredstrThe category URL.
subcategoriesRequiredarrayAn array of category IDs that are subcategories of this category. Can be an empty list for categories without subcategories.
imageOptionalstrFull URL for the category image.
descriptionOptionalstrThe category description.

Example JSON #

[
  {
    "id": 1,
    "name": "Imperial Goods",
    "subcategories": [42, 25],
    "url": "https://galactic-empire-merch.com/imperial-goods"
  },
  {
    "id": 42,
    "name": "Tatooine",
    "subcategories": [],
    "url": "https://galactic-empire-merch.com/imperial-goods/tatooine"
  },
  {
    "id": 25,
    "name": "Coruscant",
    "subcategories": [],
    "url": "https://galactic-empire-merch.com/imperial-goods/coruscant"
  }
]

Orders #

Orders are logged and are not deleted when removed from the feed. They generally only have to be sent during the first import and can then be removed again to save server capacity. They can be deleted via our CRUD API.
parcels data can currently only be synced via CRUD API. Check the documentation here.

Each object represents a single order. Clerk.io uses the product IDs and email address/customer ID inside orders to analyze customer behaviour and identify trends. Along with products, it is the most important object type.

Below you can see the required fields and optional fields. It is not possible to send additional attributes for orders.

AttributeImportanceTypeDescription
idRequiredint/strThe order ID, this should be unique for each order.
productsRequiredarrayThe products in the order. Each product is an object with an ID, quantity, and unit price.
timeRequiredunix timestampThe time the order was placed as a Unix Timestamp.
customerOptionalint/strThe customer ID.
emailOptionalstrThe customer email. Needed for using our Auto-Email and Audience products.

Example JSON #

[
  {
    "id": 123458,
    "customer": 789,
    "email": "vader@the-death-star.com",
    "products": [{"id":456,"quantity":1,"price":200.00}, {"id":789,"quantity":2,"price":120.00}],
    "time": 1389871120
 },
  {
    "id": 123456,
    "customer": 456,
    "email": "obi.wan@kenobi.me",
    "products": [{"id":456,"quantity":1,"price":200.00}, {"id":789,"quantity":2,"price":120.00},{"id":123,"quantity":2,"price":60.00}],
    "time": 1389870977
 },
  {
    "id": 123457,
    "customer": "",
    "products": [{"id":789,"quantity":2,"price":120.00}],
    "time": 1389871090
 }
]

Customers #

Each object represents a single Customer. The attributes provided are then merged with the customer’s email or customer ID from orders to create a single customer profile for use with Audience segmentation.

Below you can see the required fields and examples of optional ones that are often used by eCommerce stores.

AttributeImportanceTypeDescription
idRequiredint/strThe customer ID, this should be unique for each customer.
nameRequiredstrThe customer’s full name.
emailRequiredstrThe customer’s email.
subscribedRequiredboolBoolean indicating whether the customer has subscribed to newsletters. This must be true for Clerk.io to send marketing emails to this customer.
zipOptionalstrThe customer’s zip code.
genderOptionalstrThe customer’s gender
ageOptionalintThe customer’s age.
is_b2bOptionalboolBoolean indicating whether the customer is a business customer.

Example JSON #

[
  {
    "id": 135,
    "name": "Luke Skywalker",
    "email": "luke@rebels.com",
    "subscribed": true,
    "gender": "male",
    "zip": "1134",
    "is_b2b": "false"
 },
  {
    "id": 165,
    "name": "Leia Organa",
    "email": "leia@royalty.org",
    "subscribed": false,
    "gender": "female",
    "age": 19,
    "interests": ["politics", "outlaws"],
    "is_b2b": true
 }
]

Pages #

Each object represents a single page. Pages are generally all types of eCommerce content that is not classified as a product or category. It could be articles, blog posts, landings pages, brand pages and other types of written content.

Below you can see the required fields and examples of optional ones that are often used by eCommerce stores.

AttributeImportanceTypeDescription
idRequiredint/strPage ID, this should be unique for each page.
typeRequiredstrType of the content. Used to separate pages such as CMS pages, blog posts and landing pages.
urlRequiredstrFull URL of the page.
titleRequiredstrTitle of the page.
textRequiredstrFull body of text for the page.
imageOptionalstrThe full URL for the page image.

Example JSON #

[
  {
    "id": 135,
    "type": "cms",
    "url": "https://galactic-empire-merch.com/imperial-goods/tatooine",
    "title": "Open Hours",
    "text": "The main text about our opening hours..."
 },
  {
    "id": 1354,
    "type": "blog",
    "url": "https://galactic-empire-merch.com/imperial-goods/tatooine",
    "title": "New Blog Post",
    "text": "The main text about our opening hours...",
    "keywords": ["blog", "post", "new"]
 }
]

Multi-language #

Clerk.io works best when you create separate Stores for each language. Each Store can be configured with the language of the content, which makes Search understand grammar and spelling mistakes much better.

Further, customers from different regions or countries tend to have different preferences and search patterns, which means that it works best to separate the order data into different Stores as well.

An alternative to this is to build multi-language JSON feeds, where all text attributes are provided as objects with language codes as keys, and their translations as values.

All text attributes must have language keys even if the content inside them is the same, to make sure they are searchable for the language.

When making API calls, include the parameter language and the matching language key, to fetch the correct data.

Example Multi-language JSON #

[
  {
    "id": 135,
    "name": {
      "english":"Lightsaber",
      "spanish":"Sable de luz", 
      "italian":"Spada laser"
      },
    "description": {
      "english":"Antique Rebel Lightsaber",
      "spanish":"Sable de luz rebelde antiguo",
      "italian":"Antica spada laser ribelle"
      },
    "price": 99995.95,
    "image": {
      "english":"https://galactic-empire-merch.com/images/a-r-lightsaber.jpg",
      "spanish":"https://galactic-empire-merch.com/es/images/a-r-lightsaber.jpg",
      "italian":"https://galactic-empire-merch.com/it/images/a-r-lightsaber.jpg"
      },
    "url": {
      "english":"https://galactic-empire-merch.com/antique-rebel-lightsaber",
      "spanish":"https://galactic-empire-merch.com/es/antique-rebel-lightsaber",
      "italian":"https://galactic-empire-merch.com/it/antique-rebel-lightsaber"
      },
    "brand": "Je'daii",
    "categories": [987, 654],
    "created_at": 1199145600,
    "color_names": ["Green","Red"],
    "color_codes": ["#7CFC00","#FF3131"],
    "reviews_amount": 164,
    "reviews_avg": 4.8
 },
  {
    "id": 261,
    "name": {
      "english":"Death Star Deluxe",
      "spanish":"Estrella de la Muerte de lujo", 
      "italian":"La Morte Nera Deluxe"
      },
    "description": {
      "english":"Death Star - Guaranteed idiot proof",
      "spanish":"Estrella de la Muerte: a prueba de idiotas garantizada",
      "italian":"Morte Nera - A prova di idiota garantita"
      },
    "price": 99999999999999.95,
    "image": {
      "english":"https://galactic-empire-merch.com/images/death-star.jpg",
      "spanish":"https://galactic-empire-merch.com/es/images/death-star.jpg",
      "italian":"https://galactic-empire-merch.com/it/images/death-star.jpg"
      },
    "url": {
      "english":"https://galactic-empire-merch.com/death-star",
      "spanish":"https://galactic-empire-merch.com/es/death-star",
      "italian":"https://galactic-empire-merch.com/it/death-star"
      },
    "brand": "Imperial Inc.",
    "categories": [345678],
    "created_at": 1197565600
 }
]

Example call #

curl -X GET \
  https://api.clerk.io/v2/recommendations/popular \
  -H 'Content-Type: application/json' \
  -d 'key=your_store_public_key&limit=10&language=italian'

Supported languages #

The language must be specified with its exact name. If your language is not on the below list, choose a language using the same root, or simply “english”. It will still work, but grammar-neutralisation in Search will be less effective.

  • danish
  • dutch
  • english
  • finnish
  • french
  • german
  • italian
  • norwegian
  • portuguese
  • russian
  • spanish
  • swedish

Multiple Feeds #

Multiple Feeds Example

This is the recommended approach as it is efficient for your server and offers the highest degree of control,

With this approach, you build individual feed files for each of your objects. This uses the sync method called Clerk.io JSON Feed V2.

These support content-type: application/x-ndjson or application/json.

Each feed should contain an array of objects.

URL #

https://awsumstuff.com/feed/products.json

Output #


[
  {
    "id": 135,
    "name": "Lightsaber",
    "description": "Antique Rebel Lightsaber",
    "price": 99995.95,
 },
  ...
]

Pagination #

This is an optional feature that allows you to paginate results by coding your feed to accept the following query parameters:

  • limit: The number of objects to return per page.
  • offset: The index of the first object to return on a page.

Clerk.io’s importer can be configured to send these parameters to your feed code. You simply have to select the amount of objects you want to fetch per page.

When you configure your feed URL, you can then use {{limit}} and {{offset}} to append the data as query parameters.

{{limit}} will contain the number you configure in the importer settings. {{offset}} will start at 0 in the first call, and continuously grow based on limit.

E.g.

  • Call 1: limit=100&offset=0
  • Call 2: limit=100&offset=100
  • Call 3: limit=100&offset=200

The stopping condition is when your feed returns an empty array.

URL #

https://awsumstuff.com/feed/products.json?limit={{limit}}&offset={{offset}}

Increments #

Using this feature means that Clerk.io will stop deleting objects when importing, so you need to use CRUD API calls to remove objects from Clerk.io’s database.

The multi-feed solution supports the optional feature of sending only the data that has changed since a chosen number of days, rather than sending all data every time.

To do this, start by ensuring that your feed is configured to only return objects that have been changed in a specified amount of days, when the request includes the query parameter modified_after.

Then, add a number of days in the field labeled Incremental time {{modified_after}} found in Clerk.io’s importer settings.

This will cause Clerk.io’s importer to keep all data in the database, and only update objects that are included in the feeds.

To use the number of days you have configured, add the modified_after query parameter to your feed and include the tag that will insert the number of days you have configured. For example:

https://awsumstuff.com/feed/products.json?modified_after={{modified_after}}&limit={{limit}}&offset={{offset}}

Security #

We recommend that the JSON feed only accepts an SSL encrypted connection and uses HTTP Authentication if possible.

In addition, from the importer settings, you can activate Token Authentication. Clerk.io will then include an authorisation header on every HTTP request, that you need to verify before returning the data:

X-Clerk-Authorization: Bearer THE_TOKEN

You can verify the token with a POST request to the token/verify endpoint:

curl -X POST \
  https://api.clerk.io/v2/token/verify \
  -H 'Content-Type: application/json' \
  -d '{"token": "THE_TOKEN", "key": "your_store_public_key"}'

Single Feed #

Single Feed Example
With this approach, you assemble all of your objects into a single JSON file. This uses the sync method called Clerk.io JSON Feed.

Parameters #

Apart from the objects themselves, this approach supports two additional parameters:

  • created: A unix timestamp of when the feed was last updated. Clerk.io’s importer uses this to identify whether new data should be fetched.
  • strict: When true all data will be imported as-is. When false Clerk.io will attempt to clean up the data, for example by removing duplicate products or categories, and converting stringified numbers into integers or floats.

Example Feed #

{
  "products": [ ... ],
  "categories": [ ... ],
  "orders": [ ... ],
  "customers": [ ... ],
  "pages": [ ... ],

  "config": {
    "created": 1567069830,
    "strict": false
  }
}

Security #

Your data is extremely business-sensitive so security is of the highest priority!

We recommend that the JSON feed only accepts an SSL encrypted connection and uses HTTP Authentication if possible.

In addition, Clerk provides an additional layer of security by letting you verify that feed request is from a trusted source (ie. us).

The system is based on a shared secret; a Private API key which can be created in my.clerk.io under Settings > API Keys.

All Clerk.io requests via HTTP or HTTPS include two query parameters hash and salt.

salt is just a random string used to salt the hash function while hash is a SHA512 hash computed from the Private API Key in the following way:

hash = SHA512(salt + private_key + str(int(floor(unix_timestamp() / 100))))

An example request could be the following URL:

https://example.com/clerk-product-feed.php?salt=f4Ke...A02X&hash=4DFF...340F

By fetching both the salt and hash parameters from the request, you can do the same computation on your server, and compare the hash values to confirm that they are the same, meaning the request comes from Clerk.io