What analytical elements one should consider when running empirical analysis with association rules?

When running empirical analysis with association rules, there are several analytical elements to consider:

Support and Confidence: Support refers to the proportion of transactions in the data set that contain a particular item set. It represents the frequency of occurrence of the item set in the data set, and is calculated as the number of transactions containing the item set divided by the total number of transactions in the data set. Confidence is a measure of the strength of the association between two item sets, and is defined as the proportion of transactions containing both products out of the total number of transactions containing A. In other words, it represents the conditional probability of finding item set B in a transaction given that item set A is present in the same transaction. Support and confidence are used to identify rules in the dataset. Typically, a minimum support and confidence threshold is set by the user, and the Apriori algorithm generates all itemsets that meet or exceed the specified thresholds. These itemsets are then used to generate association rules with a minimum confidence level, which can be further analysed and visualised to gain insights into the dataset.

Lift: Lift is a statistical measure used in association rules analysis that represents the strength of the association between two items. It is defined as the ratio of the observed frequency of co-occurrence of two items to the expected frequency of co-occurrence, assuming that the items are independent of each other. In other words, lift measures how much more likely it is to find the two items occurring together than if they were independent of each other. A lift value greater than 1 indicates a positive association between the items, meaning that they occur together more frequently than would be expected by chance. A lift value of 1 indicates that the items are independent of each other, and a lift value less than 1 indicates a negative association between the items, meaning that they occur together less frequently than would be expected by chance. Lift is often used in conjunction with support and confidence measures in association rules analysis to identify the most interesting and significant rules in the dataset. High lift values indicate strong associations that may be particularly useful for decision-making or targeted marketing campaigns.

Lift Ratio: Lift Ratio is a variation of Lift that compares the lift of a rule to the expected lift if the antecedent and consequent were independent.

Correlation: Correlation measures the degree of linear relationship between two variables. It is often used to identify associations between continuous variables.

Conviction: Conviction is a measure of how strongly the antecedent and consequent are associated with each other, taking into account how often the antecedent occurs on its own.

Statistical Significance: It is important to test the statistical significance of association rules to understand whether they are meaningful. This method can be performed using statistical tests such as chi-squared or t-tests.

By considering these analytical elements, one can ensure that the association rules generated are meaningful and useful for decision-making.

References

Khanam, Iffat & Gupta, Avadhesh & Kannapiran, Thirunavukkarasu. (2013). Empirical Analysis of Association Rules in Reference with WEKA Framework Empirical Analysis of Association Rules in Reference with WEKA Framework. International Journal of Computer Science and Systems.