By: Ram on Sep 04, 2022
Objectives of Relevance Ranking is to position relevant products on the top. When relevance rank is measured, we need to consider the distributions of the clicks and if more clicks on the top products, there can be higher relevant to the customers or users.
For measuring effectiveness of the relevance ranking, there are multiple metrics used and one of these is Normalised Discounted Cumulative Gain (NDCG).
NDCG is commonly used metrics for evaluating relevance ranking for product recommendations, search ranking and ranking of products on the category or brand listing pages in Ecommerce. Also, the metric is used for information retrieval systems or Web Searches.
For ecommerce platform, on the category page, it shows 10 products to the customers, and we want to show the most relevant products on the top. We have 3 levels for categorizing the relevance of the products. The 3 levels are 3: Most relevant 2: Relevant and 1: Not Relevant or Low Relevance.
Products |
Position |
Ranking |
Arrow Orange Regular Fit Check Cotton Shirt - Flexi Collection"-Arrow-Apparel |
1 |
3 |
Levi's Peacoat Blue Polka Dot Shirt-Levi's-Apparel |
2 |
2 |
Pepe Jeans Mustard Regular Fit Checks Shirt-Pepe Jeans-Apparel |
3 |
1 |
Levi's Burnt Orange Printed Shirt-Levi's-Apparel |
4 |
3 |
WES Casuals by Westside White Striped Relaxed-Fit Shirt-WES Casuals-Apparel |
5 |
2 |
ETA by Westside White Hooded Resort-Fit Shirt-ETA-Apparel |
6 |
1 |
Mufti Cream Cotton Slim Fit Printed Shirt-Mufti-Apparel |
7 |
2 |
ETA by Westside Indigo Geometrical Printed Resort-Fit Shirt-ETA-Apparel |
8 |
1 |
Mufti Blue Solid Slim Fit Shirt-Mufti-Apparel |
9 |
3 |
Spykar Blue Cotton Slim Fit Shirt-Spykar-Apparel |
10 |
1 |
Now, we want to consider these ranks and come up with a single metric to measure the goodness of ranking.
One simple way to capture the relevance is by taking sum of the ranks across top N products and higher value indicate better products are shown. And this can be formulated for the first N products as below and Ri is Rank for the product on ith place
Above example
CG10 = 3+2+1+3+2+1+2+1+3+1
It does not consider the position of the right product and does not reward for showing the best products first. Even just relevant products across places can get higher scores.
A natural enhancement is to give higher importance to the scenarios when a most relevant products are shown on the top. And one of the ways is to assign weight to the positions and weights goes down as position number increases. This is arrived at using inverse of the position log.
Weight of the position 1 is 1 and the eight is 1/ Log2 (1+position) = 1/ Log2 (1+1) = 1/1 = 1
And the weight of position 2 is lower and it is 1/ Log2 (1+position) = 1/ Log2 (1+2) = 1/1.5849625
= 0.63093
Weights of the position goes down at inverse of log rate and it looks like below.
Using the logarithmic reduction factor, it produces a smooth reduction curve. DCG is calculated for the top N positions, and the total gain accumulated.
Discounted Cumulative Gain (DGC) for the above scenario with N products is
A different formulation is a very common in the industry across web search companies and competitions on Kaggle platform. The formulation is as shown below, and it is theoretically sound. This places much stronger focus on the more relevant products at the top. It is also called Discount Cumulative Gain.
One of the challenges with Discounted Cumulative Gain (DCG) is that it is not standardised across queries when the number of items returned is not same. Hence, DCG is not comparable across queries as the length of result set is different. Also, DCG value is not in a range hence difficult to make a judgement on the degree of goodness.
For converting the DCG value between 0 and 1 where 1 is the prefect ranking value for a given output list (e.g., a list of product recommendations or retrieved document list), an ideal ranking is defined. The output result is sorted based on the relevance value and then Discounted Cumulative Gain is calculated, and this is called Ideal Discounted Cumulative Gain (IDCG).
Ratio of DCG and IDGC is referred as Normalized Discounted Cumulative Gain (NDCG). IDCG provides a nice normalization factor.
There are a few limitations with NDCG Metric