By: Ram on Nov 11, 2022
Market Basket Analysis is a common Machine Learning Application in Retail and Ecommerce. In the previous blog, we have explained the applications and role of Market Basket Analysis (MBA.
In this blog, our plan is to explain Apriori Algorithm steps with an example. The algorithm has two steps 1) Candidate Generation and 2) Pruning or Rejecting candidates. The candidates here are the list of valid combinations.
For a retail scenario, the focus is to find the combinations of products in a basket or the products those are bought together. And the candidates are nothing but product combinations.
Below is the list of 5 transactions or orders and we will use this as an example to show the steps of Apriori Algorithm.
As a process, first identify the candidates and the candidate can be 1 product at a time, 2 product combinations, 3 product combinations and so on.
List of products purchased across these transactions
Single Item Candidates |
Eggs |
Coke |
Beer |
Bread |
Milk |
Diapers |
Scan all the transactions and find the list of products purchased and then scan to find the frequency/count (Support) of these products.
Now, we apply a limit on minimum support (or count) required for any candidate and assume it to be 3. Only 4 products are meeting the support 3 requited, so the pruned list of candidates are only 4 and this list will be used for identifying next set of combinations.
Based on Apriori Principle, only Frequent itemsets are generated that satisfy support condition. Or all in the infrequent itemsets are ignored. Based on the 1 candidate only 4 products – Beer, Bread, Milk and Diapers satisfy minimum support of 3. So, 2 product combinations are generated for these 4 products only.
2 Item Candidates |
Beer, Bread |
Beer, Milk |
Beer, Diapers |
Bread, Milk |
Bread, Diapers |
Milk, Diapers |
Now, the transactions are scanned to find the support for these 4C2 (6) combinations.
Again, same rule is applied to prune the candidates based on the minimum support required for the candidates.
Only products in these 4 candidate combinations are used for candidate generation for 3 product combinations. And the process continues.
If we do not used the apriori principal, we will consider all the combinations are each step and the total candidates will be for the 6 products are
As a result of Apriori Principal based Pruning, we will have following candidates.
As a result, a net 68% reduction in the number of candidates itemset even in this simple example
https://www-users.cse.umn.edu/~kumar001/dmbook/ch5_association_analysis.pdf