« Back

Rebuilding the search

The shut-off of the zanox Shop@ network in November of last year made searching products and AdMedia in the API much more complex. Without the pool of freely available ads and products, each search query now requires a list of program IDs where the publisher is confirmed. No results are returned if this list is not provided. This is often cumbersome, as getting your confirmed program IDs requires making another API call, or a number of them if you are accepted to more programs than the maximum result page size.

Based on this feedback we decided to improve a thing or two to make lives of our developers easier. As it happens, a small feature request evolved into a full-scale rebuild of zanox search infrastructure and today I am happy announce that we are nearly there!

Without going too much into detail, I can still provide some insight for the tech-savvy. Parallel to not-the-latest-version Lucene, which drives our current product search, we have built up a fully-automated Solr 4 environment.

So what can you expect? Well, several things, actually.

Near real-time product availability in search

The first thing we improved on is delay between importing products from the Advertiser to their availability in search results. While current system refreshes its index once a day - at midnight, the new search will continuously index new products as they come in. So the delay will be reduced from the maximum of 24h to around the maximum of 45m. That's how long it takes us to index our biggest product feed. Averages should be much lower though - 10-20m.

Search in all confirmed programs without providing program IDs

The default behaviour of search was changed to automatically search all confirmed programs if no program IDs are provided in the search query. This way, developers will save the before-mentioned API call(s) to retrieve all program IDs.

Search products of all zanox programs without applying

Have you ever wondered which programs have products relevant to you? Now you can search within all programs with the URL query parameter partnership=all. Note, that no tracking links will be returned for programs, that have not confirmed you.

Fixing known bugs

Many of you are familiar with API returning less than 50 items in one page, even though total results is much bigger than 50. This behaviour is also described in the known bugs section of the product search documentation.

<page>0</page>
<items>4</items>
<total>21528</total>

This issue has been addressed in the new search version.

One important feature that is NOT in the current Beta is product categories. We are still working on improved machine learning algorithms that categorize the products, and we will add it in the near future.

We see the above features as must-haves for the rollout. But the new technology will enable us to add more bells and whistles in the future development iterations. What we have in the "pipe" is improved product categorization, better performance and in the long-term - also moving admedia search and program search onto the new backend.

Looks good, so when is it going into production? Well, actually it is live already, first accessible via our REST API. The main stream of requests is still chanelled to the old search, but if you append the query parameter solr=true to the resource URL, your results will come from the Solr cluster. The developer team is still measuring and tuning the performance, writing automated tests and fixing any "teething issues", but it is there for everyone who wants to test it out.

When we are sure that the new search meets our quality expectation, we will start switching more and more requests to the new backend. The good news for you, the developers? You don't have to add a single line of code, the API interface remains the same. If you want to switch over to the new search before we do it for you - you can use the solr=true parameter. Additionally, you don't have to provide any program IDs, unless you want to limit results to those particular programs. Here's an example:

http://api.zanox.com/xml/2011-03-01/products?connectid=43EEF0445509C7205827&programs=7408&q=nike&solr=true

Having said that, we would be very happy to hear from the early adopters, trying out our search. What are your impressions? What works, what doesn't and what can we improve? Head to the documentation section to read about the new parameters and their usage.

 

Comments