Joseph Simone
11/11/19
\(Co\)-\(Orrurence\) \(Grouping\) \(and\) \(Association\) \(Discovery\) attempts to find associations between entities based on transcations involving them.
“If A occurs then B is likely to occur as well”
Association Discovery is commonly refered to as Market Basket Analysis (MBA).
The MBA helps us to understand what items are likely to be purchased together.
The Lift of the co-orrurrence of A and B is the probability that we actually see the two together.
The Leavage measure the diffenece of the quantities rather than their ratio.
\(Lift(A,B)=\frac{p(A,B)}{p(A)*p(B)}\)
\(Leverage(A,B)= p(B,A) - p(A)*p(B)\)
We operate a small convenience store where people buy groceries, liquor, lottery tickets, etc.
30% of all transactions involve beer.
40% of all transactions involve lottery.
20% of all the transactions include both beer and lottery tickets.
\(p(beer) = 0.3\)
\(p(lotterytickets) = 0.4\)
## [1] 0.12
\(p(beer)*p(lotterytickets)=0.12\)
As mentioned before, 20% of the transaction involve both.
Therefore, \(p(lotterytickets, beer) = 0.2\)
\(Lift(A,B)=\frac{p(A,B)}{p(A)*p(B)}\)
## [1] 1.666667
This means that buying lottery tickets and beer together is about \(1\frac{2}{3}\) times more likely
\(Leverage(A,B)= p(B,A) - p(A)*p(B)\)
\(p(lotterytickets,beer)-p(lotterytickets)*p(beer)\)
## [1] 0.08
Whatever is driving the Co-Occurences result in an 8% point increase in the probability of buying both together, not just simply because we would expect them to be popular items.
or
\(p(lotterytickets, beer)\)
## [1] 0.2
\(Support =\) 20%
or
\((beer,lotterytickets)\)
## [1] 0.6666667
\(Strength =\) 67%
“12. Other Data Science Tasks and Techniques.” Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking, by Foster Provost and Tom Fawcett, O’Reilly, 2013, pp. 292–295.