Ramgopal Prajapat:

Learnings and Views

CLTV Models for Acquisition

By: Ram on Jul 09, 2022

It is clear that not all the customers contribute same value for an Ecommerce company. We have a seen, top 30% of the customer contribute over 120% of the profitability and bottom 20% customers add only cost to the organization.

Based on historical data, we can identify the profiles of the customers who will add long term value and work on to develop acquisition strategies that focus on acquiring this type of customers.

The long-term value of customers can be defined as customer lifetime value. For this example, we have taken accumulated profitability of 2 years as a lifetime value of the customers.

When a customer makes his/her first purchase transaction, we predict the customer lifetime value.

Following set of attributes or characteristics are available at the time predicting customer lifetime value.

  • Order Information – Total paid amount, # of Products in an order
  • Product Category - Brand and Category of Products Purchased
  • Visits and Browsing Behaviours
  • Acquisition Channels - e.g., Paid Search, Paid Marketing, Social

For this version, we may not be able to get all the above data but proceed with what data is available.

 

Overall Approach

  1. Define Target or Response Variable – Accumulated profit in the 2 Years of each customer post purchase
  2. Prepare Feature Dataset – Order attributes, product brand and category details
  3. EDA and Feature Engineering
  4. Model Development
  5. Analysis and Insights

 

1. Data

We have 3 different data files and each of these files have attributes of the customers and their first order.

  1. CLTV - Customer level Customer Lifetime Value and customers' first order (Target/Outcome Variable at a customer level)
  2. Orders - Order Attributes
  3. Products - Product Brand and Category Information

Below diagram can help in visualizing the relationship (for joining) among tables.

CLTV Data Model

Read Data

All the 3 data files are CSV files, and we need to read CSV files into Python and then explore the data.

 

Graphical user interface, text, application

Description automatically generated

Target or Response Variable

“cltv” file has information of customer value and it is at a customer level. For the stable model, the customers who made first purchase across months have been used as a customer base. This is useful only for preparing the data. Now, only the customer number (customer_number), order number (order_number) for joining with other data files and response variable (cltv) are useful.

 

Table

Description automatically generated

First Order Data

We can explore the data structure for the first order information. If a customer has purchased multiple products on an order, there will be those many rows for an order number.

  • Amount Paid - The amount paid for a product on the order. It is for each of the product and should be summed across products for an order to get the order value.
  • Products Purchased on Promotion - if a product was purchased on a promotion, the promo flag (promo_product_flag) will be one, if we want to find the products bought on promotion for an order, we need to add the flag for an order number.
  • Payment mode (payment_mode) - How payment is made? the mode payment (e.g. Credit Card, UPI, Debit Card, Net Banking etc) for an order, even though there can be multiple products on an order, the mode of payment takes only one value.
  • Order channel - how order is used for placing the order? It takes 2 values - App (App Android or iOS) or Web (website). Same value will be taken for all the products on an order.
  • Campaign Type - How a customer came to purchase the products? They may have seen an ad on Facebook or Google Search to visit the App/Web. Or they directly logged in to the App/Website and placed the order.
  • Tracking Code Flag - It indicates whether customer visit is attributed to source tracking code.

 

A picture containing text

Description automatically generated

 

Products

Products are purchased across orders and each of the products are linked to certain attributes. Two key attributes considered in the example are category and brand. For the category top level is level 0 (lob), level 1 (cat_l1) and Level 2 (cat_l2) and Level 3 (cat_l3).

 

Table

Description automatically generated

 

2. EDA and Feature Engineering

For the order information, we are aggregating the numeric information at the order number level. After aggregation there will be one row per order.

Aggregate for numeric features

Table

Description automatically generated

Selecting First Value for Categorical features

As we have explained, payment_mode, order_channel and campaign_tyep takes same values across rows for an order, so we can pick up the first values for an order.

Graphical user interface, text

Description automatically generated

Feature Engineering - Categorical Variables

Creating dummy variables for the categorical variables in the data.

Text

Description automatically generated with medium confidence

Feature Engineering – Numeric Variables

Promo Flag distribution has a long tail and we can bucket the values into 3 groups - No promo products, 1 Promo products and more than 1 products on the promotions.

Graphical user interface, text, application, Word

Description automatically generated Number of products purchased on a first order has a long tail and we can bucket the values into 2 groups - Single Product or Multiple Products

Graphical user interface, text, application

Description automatically generated

For tracking code, we can create 3 levels - No tracking code, 1 tracking code or more than 1.

Graphical user interface, text, application, Word

Description automatically generated

Paid Amount - It is a continuous variable, and we can plot a histogram to see the distribution. There are a few orders with very high order amount (outliers), so we want to cap the paid amount value to 20,000 (if paid amount is more than 20k, make it 20k).

Graphical user interface, text, application

Description automatically generated

Graphical user interface, text, application

Description automatically generated

Product Category and Brand Data

Creating dummy variables for each of the category or brand product purchases in an order.

Text

Description automatically generated

There can be multiple products of same brand or category in an order. Now, we need to aggregate the rows and make one row per order.

Text

Description automatically generated

We can merge data of all the dummy features across category and brands

Text

Description automatically generated

Combine (merge) with first order data

Now, the data is ready for the modelling.

3.Model Data and Development

Create feature and label objects from the above set of data frames.  We need to create Test and Train samples for us to develop the model.

Model: Decision Tree – Regression

Define the parameters – such as depth, minimum split size for a lead and size of leaf node for Decision Tree. Since, the target variable (cltv) is continuous variable, the decision tree is regression tree.

Min_samples_leaf : Minimum number of rows or cases required in a leaf. If we keep too high, it will not consider for further partition using decision tree algorithm and the performance will not be great. When a small value is selected, the decision tree may be over fitting to the sample.

Min_samples_leaf: Minimum rows in a node to be consider for a split

Depth: depth or level of decision tree. If a high value is selected, the tree will be very bushy.

Graphical user interface, application, Word

Description automatically generated

 

Output Decision Tree

Text

Description automatically generated

Qr code

Description automatically generated

Now, we can interpret the decision tree results and find the segment of the customers with high value. These segments can help in defining characteristics for acquiring similar customers using social media campaigns. Additionally, it helps in estimating the CLTV that gives lever in the level of customer

For, better accuracy for predicting CLTV, we explore other techniques and algorithms as

Leave a comment