By: Ram on Jan 09, 2023
Context:
Return to origin or RTO is a common term used in e-commerce. A delivery is marked as RTO by delivery partner when the order could not be delivered due to issue with the delivery address, or the buyer is not responding.
“For some e-commerce market players, this RTO’s can go upto 30% if not kept in check” - source
% RTO is ratio of orders that could not delivered by total number of orders, and it is very important metric for Ecommerce – both marketplaces like Flipkart, Amazon or D2C businesses.
RTO % increases cost and hit the bottom line for the ecommerce companies and some of the cost components are:
The products or orders returned by the customer post delivered are NOT considered in the RTO the below scenario.
There are two strong indicators of RTO before order is processed for delivery
In this blog, we will discuss on the steps to develop NLP Based Model to Predict or Identify Addresses that are incomplete or Gibberish.
Overall Approach
Hands on Text Classification Model for Categorising address as Gibberish or Genuine
Read the data prepared
The Universal Sentence Encoder encodes text into high dimensional vectors. These vectors will be used as input for text classification model. The pre-trained Universal Sentence Encoder is publicly available in Tensorflow-hub and the same is used for the encoding addresses.
Splitting input data into train (used for developing the model) and test (used for validating the model)
30% random addresses used for test sample and raining 70% for training the model.
Multiple deep learning layers from Kera is used for defining architecture of the Text Classification Model. Lemda layer is used for creating custom embedding based on the universal encoder model.
Input to the model is text and the output has 2 categories – whether address is Genuine or Gibberish.
Fitting the model
Decoding the output and Confusion Matric is created.
The model is 91% accurate in flagging Incomplete/Gibberish orders as Incomplete/Gibberish accurately.
We tried randomly with a few addresses and looks reasonable.
There are multiple other approaches or models can be used for improving the model performance.