Ramgopal Prajapat:

Learnings and Views

Steps for RFM Segmentation in Python

By: Ram on Jul 03, 2022

In the previous blog - RFM based Transactional Segmentations for Ecommerce, I have explained the about the RFM Segmentation and its application in Retail/Ecommerce Scenario.

The customer segmentation enables for marketing in reaching out to the customers in a personalized way. They can develop a very focused communication and creatives based on customer profiles of each of these segments. This helps in improving response rates.

The customer segments are created based on Transactional Behaviour (called Transactional Segmentation), Geo Location of the customers (called Geo Segments), Demographic Attributes (called Demographic Segments) and customers' lifestyle/opinions/attitudes (called psychographic segmentation). In this blog, we will focus on the Transactional Behaviour Segmentation.

For understanding customers' transactional behaviour, transactional segmentation is important and RFM is one of the fundamental approaches for segmenting customers based on their transactional behaviour.

RFM stands for (R)ecency, (F)requency, and (M)onetary and it captures customer purchase behaviours related these dimensions.

RFM Methodology is very simple and effective way of developing the customer segmentation and we will explain the steps.

1.Data

The source data table had information at a transaction level and gave information on who has bought (customer), what is been purchased (product), when it has been purchased (Order Date), and how much amount paid (monetary value).

  • Customer_id : Customer ID
  • Order_ID : Order ID or Order Number
  • Product_ID : Product ID
  • Order_Date : Order Date
  • Paid_Amount : Amount paid for each product

From this data, we have aggregated the data and got the data at a customer level - One Customer, One Row. Also, we have taken a random sample of 50k customers.

Now dataset has following attributes.

 

Graphical user interface, text

Description automatically generated

2.Data Preparation and EDA

 

For the key analysis variables, we need to check the distribution and summary statistics. Also, some of the visualization may help to understand these variables.

Table

Description automatically generated

Recency

For calculating Recency, we can calculate number of days between cut off date and the last purchase date for each of the customers.

 

 

Graphical user interface, text, application

Description automatically generated

Now, we can check the distribution of the days since last purchase, and this can be done by plotting histogram.

Chart, histogram

Description automatically generated

Observation: Good number of customers have purchased in the recent period, and this is good as recency of the purchases shows engagement of the customers.

A picture containing chart

Description automatically generated

Chart, histogram

Description automatically generated

Frequency

There are a few customers who made a lot of orders. We can consider these as outliers and group them together.  So, any customers with orders more than 10 are marked as 10 orders. Now, we can plot the order frequency

Text

Description automatically generated with low confidence

Chart, histogram

Description automatically generated

Monetary Amount

Spend of customers has a skewed distribution with a positive long tail and few customers have spent less than zero.

  • Outlier Treatment – if spend more than 100,000 then consider as 100,000
  • Remove customers with negative spend, may be data anomaly or return cases

Text

Description automatically generated

Chart

Description automatically generated with medium confidence

 

2 Define Recency, Frequency and Monetary Scores

We have looked at the variable distribution and now we need to create groups for each of the recency, frequency and monetary variables.  We can consider top 30%, middle 50% and bottom 30% as three groups for each of the variables. But challenge will be for Frequency Variables – Order counts as most of the customers have 1 order. So, I have created more logical segments for each of the variables.

Overall approach adopted is that based on the Recency, Frequency and Monetary features, we will create groups for each of these and follow below steps.

  • Create Bins
  • Find Counts for each bin
  • Combine bins and create few logical groups

Recency

Graphical user interface, text, application

Description automatically generated

We have good counts of customers in each of these groups.  If you want to see the visual distribution.

Graphical user interface, text, application

Description automatically generated

Chart, bar chart, histogram

Description automatically generated

Frequency

We can check the key statistics for frequency – order counts per customer

Table

Description automatically generated

At least 50% of the customers have placed only 1 order. So, we can consider the following grouping of the orders.

Graphical user interface, text, application

Description automatically generated

Monetary

 

We can again create groups based on the quantile values, I have preferred simple and logical groups.

Graphical user interface, application, Word

Description automatically generated

You can see the almost equal counts in each of the 10 groups.

Chart, bar chart

Description automatically generated

3.RFM Scores and Segmentation

 

Having a lot of groups for each of the Recency, Frequency and Monetary variables, can be useful if the customer base is huge and we want to have a sharper focus. Otherwise, we can create 3 groups for each and then also we will have 3*3*3 =27 segments to manage.

A picture containing text

Description automatically generated

R, F and M Scores are combined to get the RFM score.  Still there are 27 segments and may be difficult for us to think of clear understanding of these customers. We have combined the similar segments and created 7 key segments.

 

ecommerce_df['rfm_group'] = np.where(ecommerce_df['rfm'].astype(str)=='333', 'Super Stars',

                                  # Recently Purchased - High Freq or High Monetary Value

                                 np.where(ecommerce_df['rfm'].astype(str).str.contains('332|331|323|313'),

                                          'Approaching Stars',

                                          # Recently Purchased Low/Medium Freq Or Monetary Val

                                         np.where(ecommerce_df['rfm'].astype(str).str.contains('321|322|311|312'),

                                                  'Aspiring Stars',

                                                  # Purchased between 3 to 6 months back but high Freq and Value

                                                 np.where(ecommerce_df['rfm'].astype(str).str.contains('233|223|232'),

                                                          'Low Engaged 1',

                                                          # Purchased in between 3- 6 months not not top Freq or Value

                                                          np.where(ecommerce_df['rfm'].astype(str).str.contains('213|212|231|211|221|222'),

                                                                   'Low Engaged 2',

                                                                   # Lost (1) - not purchased in the latest 6 months but shopped

                                                                   # high with High Freq or Val

                                                                   # Lost (2) - Not purchased in the engagement was low when they purchased

                                                                   np.where(ecommerce_df['rfm'].astype(str).str.contains('132|123|113|133|131'),

                                                                            'Lost 1','Lost 2'

                                                                           )                                                           

                                                         )))))

 

ecommerce_df['rfm_group'].value_counts().sort_index()

 

We have labelled these customers based on their transactional behaviours. You can give interesting names to these.  And there is the distribution of these segments.

Text

Description automatically generated

Improved visualizations of the segment distributions.

Table

Description automatically generated

 

Chart, bubble chart

Description automatically generated

Chart, treemap chart

Description automatically generated

Leave a comment