Ramgopal Prajapat:

Learnings and Views

Relevance Ranking Metric-NDCG

By: Ram on Sep 04, 2022

Objectives of Relevance Ranking is to position relevant products on the top. When relevance rank is measured, we need to consider the distributions of the clicks and if more clicks on the top products, there can be higher relevant to the customers or users.

For measuring effectiveness of the relevance ranking, there are multiple metrics used and one of these is Normalised Discounted Cumulative Gain (NDCG).

NDCG is commonly used metrics for evaluating relevance ranking for product recommendations, search ranking and ranking of products on the category or brand listing pages in Ecommerce. Also, the metric is used for information retrieval systems or Web Searches.

Scenario

For ecommerce platform, on the category page, it shows 10 products to the customers, and we want to show the most relevant products on the top.  We have 3 levels for categorizing the relevance of the products. The 3 levels are 3: Most relevant 2: Relevant and 1: Not Relevant or Low Relevance.

 

Products

Position

Ranking

Arrow Orange Regular Fit Check Cotton Shirt - Flexi Collection"-Arrow-Apparel

1

3

Levi's Peacoat Blue Polka Dot Shirt-Levi's-Apparel

2

2

Pepe Jeans Mustard Regular Fit Checks Shirt-Pepe Jeans-Apparel

3

1

Levi's Burnt Orange Printed Shirt-Levi's-Apparel

4

3

WES Casuals by Westside White Striped Relaxed-Fit Shirt-WES Casuals-Apparel

5

2

ETA by Westside White Hooded Resort-Fit Shirt-ETA-Apparel

6

1

Mufti Cream Cotton Slim Fit Printed Shirt-Mufti-Apparel

7

2

ETA by Westside Indigo Geometrical Printed Resort-Fit Shirt-ETA-Apparel

8

1

Mufti Blue Solid Slim Fit Shirt-Mufti-Apparel

9

3

Spykar Blue Cotton Slim Fit Shirt-Spykar-Apparel

10

1

 

Now, we want to consider these ranks and come up with a single metric to measure the goodness of ranking.

Cumulative Ranks or Cumulative Gains (CG)

One simple way to capture the relevance is by taking sum of the ranks across top N products and higher value indicate better products are shown.  And this can be formulated for the first N products as below and Ri is Rank for the product on ith place

Text

Description automatically generated

 

Above example

CG10 = 3+2+1+3+2+1+2+1+3+1

Challenges with Cumulative Gain

It does not consider the position of the right product and does not reward for showing the best products first. Even just relevant products across places can get higher scores.

 

Table

Description automatically generated with medium confidence

Discounted Cumulative Gain (DCG)

A natural enhancement is to give higher importance to the scenarios when a most relevant products are shown on the top. And one of the ways is to assign weight to the positions and weights goes down as position number increases. This is arrived at using inverse of the position log.

Weight of the position 1 is 1 and the eight is 1/ Log2 (1+position) = 1/ Log2 (1+1) = 1/1 = 1

And the weight of position 2 is lower and it is 1/ Log2 (1+position) = 1/ Log2 (1+2) = 1/1.5849625

 = 0.63093

Weights of the position goes down at inverse of log rate and it looks like below.

Chart, line chart

Description automatically generated

Using the logarithmic reduction factor, it produces a smooth reduction curve. DCG is calculated for the top N positions, and the total gain accumulated.

Discounted Cumulative Gain (DGC) for the above scenario with N products is

 

Table

Description automatically generated

 

Industry DCG

A different formulation is a very common in the industry across web search companies and competitions on Kaggle platform.  The formulation is as shown below, and it is theoretically sound.  This places much stronger focus on the more relevant products at the top.  It is also called Discount Cumulative Gain.

A picture containing text

Description automatically generated

Table

Description automatically generated

 

Normalised Discounted Cumulative Gain (NDCG)

One of the challenges with Discounted Cumulative Gain (DCG) is that it is not standardised across queries when the number of items returned is not same.  Hence, DCG is not comparable across queries as the length of result set is different.   Also, DCG value is not in a range hence difficult to make a judgement on the degree of goodness.

Ideal Discounted Cumulative Gain (IDCG)

For converting the DCG value between 0 and 1 where 1 is the prefect ranking value for a given output list (e.g., a list of product recommendations or retrieved document list), an ideal ranking is defined. The output result is sorted based on the relevance value and then Discounted Cumulative Gain is calculated, and this is called Ideal Discounted Cumulative Gain (IDCG).

 

Chart

Description automatically generated with medium confidence

Ratio of DCG and IDGC is referred as Normalized Discounted Cumulative Gain (NDCG). IDCG provides a nice normalization factor.

 

Graphical user interface, text, application, email

Description automatically generated

 

Graphical user interface, table

Description automatically generated with medium confidence

 

There are a few limitations with NDCG Metric

  • A longer list with not relevant products is not penalized
  • Does not measure missing of relevant products in the recommendation list or shown product list
  • When the number of products is few (for example, recommend only 2 products), this measure may not be effective.

 

References

 

 

Leave a comment