Ramgopal Prajapat:

Learnings and Views

Data Science and ML Ranking and Recommendation Case Studies

By: Ram on Nov 16, 2022

Data Science and Machine Learning project to learn Ranking Algorithms - Learning to Rank

Project: FIFA Soccer Rankings

Original data Source: https://www.kaggle.com/datasets/tadhgfitzgerald/fifa-international-soccer-mens-ranking-1993now


"The world football governing body FIFA has been ranking international teams since 1992. This dataset contains all available FIFA men's international soccer rankings from August 1993 to April 2018. The rankings and points have been scraped from the official FIFA website."

For the dataset, only rankings between Aug-2011 and Apr-2018 is considered for the below dataset


Develop ML based Ranking Model that ranks each of the team/country based on the performance parameters.

You have historical parameters and features available for the ML Model to learn from.  The ranking is estimated on the rank date.

Dataset:  https://ramgopalprajapat.com/static/files/2022/11/16/fifa_ranking.csv

Project: WTA Matches and Rankings

Original Data Source: https://www.kaggle.com/datasets/joaoevangelista/wta-matches-and-rankings


The WTA (Women's Tennis Association) is the principal organizing body of women's professional tennis, it governs its own tour worldwide. On its website, it provides a lot of data about the players as individuals as well the tour matches with results and the current rank during it.



Develop ML based Ranking Model that ranks players based on their performance. Currently each of the players are assigned points based on their performance on the matches. 

Players Ranking (rankings_v1.csv) and matches history (player_matches_lost_won_6months.csv).  you can use the data available and predict the ranking of the players. Currently, the features available are matches won/lost by a player in the latest 6 months and avg rank differences between players.  The ranking is estimated on the rank date.


Ranking Dataset


Match History Dataset:  https://ramgopalprajapat.com/static/files/2022/11/16/player_matches_lost_won_6months.zip


Project: Spotify Music Rankings

Source: https://www.kaggle.com/datasets/edumucelli/spotifys-worldwide-daily-song-ranking



Spotify is the biggest music streaming player with 365M users and 165M subscribers. Some of its key competitors are Apple (Apple Music), Amazon (Amazon Music), and Google (YouTube Music) and it has maintained its leading position.

It shows the top songs for its customers in each of the countries. This dataset contains the daily ranking of the 200 most listened songs.  We have filtered only for one country and consolidated the features that may help in ranking the songs.


Link: https://ramgopalprajapat.com/static/files/2022/11/17/spotify_ec_ranking.zip

It has data for 341 days and ranks of the music (position). Following is descriptions of the columns.

Track Name        : Music Track Name

last3days             : Avg Stream Volume in the latest 3 days

last7days             : Avg Stream Volume in the latest 7 days

last15days           : Avg Stream Volume in the latest 5 days

last30days           : Avg Stream Volume in the latest 30 days

c_date                  : Ranking Data

Position                : Position rank

max_stream_artist         : Artist Max Stream Volume Historically in the country

avg_stream_artist           : Artist Average Stream Volume Historically in the country

days_stream_artist         :  Historically number of days of steaming in the country



Develop ML based Ranking Model that ranks music based on their performance features. Currently each of the music is assigned a rank for each date


Project: Book Relevance Ranking

Original Data Source: https://www.kaggle.com/datasets/sp1thas/book-depository-dataset


Users can browse books on bookdepository.com. They have a list of categories and thousands of boos for each of these categories.

We will use book metadata like title, description, dimensions, category, rating, reviews, cover image and others to find the ranking of the books on the website.  Since, we do not have the labelled data, we will use Bestsellers ranking as relevance ranking.  A small dataset for these categories is used for the ML Ranking Model Development.

Following categories, Anthologies (non-poetry), Biography: General, Children's Fiction and Contemporary Fiction – are used as a data sample.  1000 top ranked books for these categories are considered in the sample.


Develop ML Model that uses book meta data and user reviews to ranking the books under each category.




isbn10              : ISBN Number of a book

publication-date          : Publication Date

rating-avg                   : Average Rating

rating-count                : Number of Ratings

title                              : Title of Book

categories                   : Category of a book and ranking is within this category

edition                         : Edition Type 

rank                             : Rank of the book

Leave a comment