Ramgopal Prajapat:

Learnings and Views

Blog Recommendation Engine

Jul 18, 2020

Project Summary/Context

Recommendation is one of the common Data Science & Machine Learning topic. And used extensively by Amazon for Product Recommendations, Netflix for Movie Recommendation, LinkedIn for Job or Company Recommendations and many more.

 

Typically, the algorithm recommendations are based on User Similarity or Content Similarity.

For example, if user A watched the movie M and user B has similar movie liking (based on some measures or means) and did not watch this movie, then recommend this movie to user B.  This approach is called Collaborative Filtering.

In another approach, if we know user A read a blog on “Data Science & Python”, it makes sense to recommend a blog that is related to Data Science and Python. If when a recommendation is driven based on contents, the approach is called Content-Based Filtering.

In this blog, we want to discuss different approaches to “recommend relevant next set of blogs” when someone is reading a blog on a website.

We can use the “Collaborative Filtering” approach if the blog website captures, the user clicks, and profile data.

In this blog, we focus on the “Content-Based Filtering” approach for finding the recommended set of blogs or posts for the readers. One of the limitations of this approach is that it will recommend the same set of blogs when someone is reading the same blog.

High Level Steps

A blog or post has a few important attributes – title, tags, category, body text, and author.  Some or all of this information can be used for finding similar blogs and recommending the most similar blogs.

In this project, we are going to use “full blog body content” for finding the most similar blogs.

  1. From each of the blogs, we found the “Bag of words” – text processing steps were similar to those used in “Word Cloud” - http://ramgopalprajapat.com/posts/wonderful-word-cloud-python/.  For a simple version, we can be used tags for the post to find similar blogs.
  2. Stored the all the list of distinct words in a table
  3. Mapped word ids and post id in a separate table
  4. Written a blog recommendation function which does the following steps
    1. Find word ids for the current post
    2. For each of the blogs (from the same category), find the word ids
    3. Based on the current post words, and for a blog in step, find Cosine Similarity metric

    1. Store blog id and a corresponding similarity value
  1. Sort the post based on similarity measure value and recommend n value.  In this website, we are showing only 3 posts

Business Outcome

Improved user engagement on the website.