Definition: Association Rule Mining is a Data Mining technique that finds patterns in data.
It is a rule based unsupervised machine learning task for discovering interesting relations between variables in large databases or transactions.
In Given set of transactions, it finds rules that will predict the occurrence of an item based on the occurrences of other items in the transaction.
The patterns found by Association Rule Mining represent relationships between items. When this is used with sales data, it is referred to as Market Basket Analysis.
Why Association Rules?
It helps businesses build sales strategies.
- Ultimately, the main objective of any business is to become profitable. This means, attracting more customers and improving their sales.
- By identifying products that sell better together, they can build better strategies. For instance, knowing that people who buy fries almost always buy Coke can be exploited to drive up sales.
- Attracting customers is a very important part of any business. Knowledge of what products sell together and which products don’t is key in building marketing strategies.
- This includes the planning of sales and advertisements as well as targeted marketing. For example, the knowledge that some ornaments do not sell as well others during Christmas may help the manager offer a sale on the non-frequent ornaments.
It helps shelf-life planning.
- Knowledge of association rules can enable store managers to plan their inventory as well as ensure that they don’t lose out by overstocking low-selling perishables.
For instance, if olives don’t sell very often, the manager will not stock up on it. But he still wants to ensure that the existing stock sells before the expiration date. With the knowledge that people who buy pizza dough tend to buy olives, the olives can be offered at a lower price in combination with the pizza dough.
It helps the in-store organization.
- Products which are known to drive the sales of other products can be moved closer together in the store.
For instance, if the sale of butter is driven by the sale of bread, they can be moved to the same passage in the store.
Concepts used in Association Rule Mining :
Itemset(I): A collection of one or more items in any transaction is known as Itemset.
e.g., {milk, bread, jam}
Support Count (Freq): Frequency of occurrence of an itemset in the whole transaction is known as Support count.
e.g. {Milk, Bread}= 3 ({Soda, Chips}) = 4
Support (Supp{I}): The Fraction of transactions that contain an itemset is known as Support.
e.g. Supp ({Milk, Bread}) = 3/8, Supp ({Soda, Chips}) = 4/8
Frequent Itemset: An itemset whose support is greater than or equal to a minimum threshold are known as frequent Itemset.
Apriori Algorithm : Most basic Association Rule Mining algorithm
- The Apriori algorithm is considered one of the most basic Association Rule Mining algorithms.
- It works on the principle that “ Having prior knowledge of frequent itemsets can generate strong association rules. ”
- Apriori finds the frequent itemsets by a process called candidate itemset generation.
- This is an iterative approach, where k-itemsets are used to explore (k+1)-itemsets.
- First, the set of frequent 1-itemsets is found, then, frequent 2-itemsets, and so on, until no more frequent k-itemsets can be found.
- A Candidate k-itemset is an itemset with k items in it. Example: Candidate 2-itemset can be [bread, butter].
To improve the efficiency of the level-wise generation of frequent itemsets, an important property called the Apriori property, is used to reduce the search space. - The Apriori Property states that “All non-empty subsets of a frequent itemset must also be frequent.”
- This means that if there is a frequent item then, its subsets will also be frequent.
For instance, if [Bread, Butter] is a frequent itemset, it means that [Bread] and [Butter] must individually be frequent too.