Powering E-commerce (Product Based) WebSites With Solr Search

20 Jul

Powering E-commerce (Product Based) WebSites With Solr Search

in CloudComputing, Performance

Apache Solr/Lucene is a popular open source search engine widely used to power search across websites. A sizeable number of such websites are e-commerce sites (as expected!).

 

When it comes to building the search infrastructure of an ecommerce site with Solr, One would find that Solr  provides an excellent and tunable platform and in no-time the site will be up and running powered by Solr Search. Then comes the interesting and challenging part of building the search infrastructure such that it is  ‘Highly Relevant’

 

This article will focus on illustrating commonly found problems in Search Relevancy by using Solr out of the Box.  In a follow up article, I will illustrate the techniques that can be used to solve these problems.

 

Search Relevancy in simple terms is the measure of how accurate are the search results with respect to the expectation of the user.  The goal of the e-commerce site is naturally user conversion and user conversion is driven by the accuracy of the search results.

 

When using Solr for an Ecommerce Product Site, there are some common relevancy issues that I have observed:

 

#1 When searching for a product, the accessories of the products are shown higher in rank than the product itself.

 

As an illustration, consider this example:

 

Search Term:  “iPhone”

Search Results

 

#1 Black Liner for iPhone

 

#2 Callmate iPhone Car Charger

 

#3 iPhone and iPad Touch QuickSteps (Book)

 

#4 Apple iPhone 4S mobile

 

#5 …

 

 

 

 

As you can see, the actual ‘iPhone’ is t the 4th place in the search results in this case.  I have found cases where the actual product is on the 5th + page in the search results.

 

 

#2 Using descriptive terms in the search query results in irrelevant search results

 

As an illustration, consider this example:

 

 

Search Term:  “Loreal Hair Color”

Search Results

 

#1 Loreal Paris Color Protect Conditioner

 

#2 Loreal Paris Color Protect Shampoo

 

#3 Loreal Crème Gloss Chocolate Color

 

#4 ..

 

#5 …

 

 

 

As you can see, the most relevant product is in the 3rd place in the search results in this case.  In this case, note that the product title for the actual Hair Color ‘Loreal Crème Gloss Chocolate Color’ does not contain the descriptive term used by the user ‘Hair Color’

 

Consider an alternative example:

 

Search Term:  “Leather Wallet”

Search Results

 

#1 Keychain cum Wallet

 

#2 Khaki Neck Wallet

 

#3 Designer Ladies Wallet

 

#4 Roller Pen & Leather Wallet Set

 

#5 …

 

 

 

When using the material of the product (leather in this case) in the search query, the user expects that products that only have that material should be shown in search results.  However, the actual results shown above vary.

 

 

#3 Different terms used to search for the same product, result in  different search results.

 

As an illustration, consider the following queries

 

‘Cell Phone’  and ‘Mobile Phone’

 ipad and i pad and i-pad

 Video Camera and Camcorder

  Tv and Television

 

Some of these are Acronyms , some of these are Synonyms and some are different terms used by different users to describe the same product.

 

 

 

Stay Tuned for the sequel of this article where I will describe some neat techniques to solve above relevancy issues…

 

copyright 2012 10jumps Llc.

copyright 2012 10jumps LLC.