An association rule is a data mining technique used to find relationships or patterns between items in large datasets. The goal is to discover if-then rules of the form:

If (Condition A) → Then (Condition B)

Here :

Key metrics:

How Association Rules Help Analyze Traffic Patterns

Association rule mining allows you to uncover hidden relationships in traffic patterns that may not be immediately obvious. For example:

For this project i have used traffic data details from database available at : https://archive.ics.uci.edu/dataset/492/metro+interstate+traffic+volume and Apriori algorithm to mine association rules from a traffic dataset.

Loading the libraries and dataset

library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)
library(openxlsx)

# Load Traffic Data from a CSV File
traffic_data <- read.csv("/Users/ashutoshverma/Downloads/Metro_Interstate_Traffic_Volume.csv", stringsAsFactors = TRUE)

Convert Relevant Columns to Factors

traffic_data$holiday <- as.factor(traffic_data$holiday)
traffic_data$weather_main <- as.factor(traffic_data$weather_main)
traffic_data$weather_description <- as.factor(traffic_data$weather_description)
traffic_data$traffic_volume <- as.factor(traffic_data$traffic_volume)

Convert Data to Transaction Format

traffic_data_trans <- as(traffic_data, "transactions")
## Warning: Column(s) 2, 3, 4, 5 not logical or factor. Applying default
## discretization (see '? discretizeDF').
## Warning in discretize(x = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, : The calculated breaks are: 0, 0, 0, 9831.3
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.
## Warning in discretize(x = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, : The calculated breaks are: 0, 0, 0, 0.51
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

Apply Apriori Algorithm

The Apriori algorithm is applied to discover association rules with:

The result is stored in the rules object.

rules <- apriori(
  traffic_data_trans,
  parameter = list(supp = 0.01, conf = 0.5, target = "rules")
)
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 482 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[47348 item(s), 48204 transaction(s)] done [0.05s].
## sorting and recoding items ... [34 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 5 6 7 done [0.00s].
## writing ... [5220 rule(s)] done [0.00s].
## creating S4 object  ... done [0.01s].
rules_df <- as(rules, "data.frame") #Extract the rules data (support, confidence, lift, etc.)

Note : I have used export function to export all the rules that have been created , the file was attached with the email. Here showing you the the top 5 elements of rules_df

head(rules_df)
##                                                       rules    support
## 1                                      {} => {holiday=None} 0.99873454
## 2                              {} => {rain_1h=[0,9.83e+03]} 1.00000000
## 3                                  {} => {snow_1h=[0,0.51]} 1.00000000
## 4   {weather_description=heavy snow} => {weather_main=Snow} 0.01277902
## 5        {weather_description=heavy snow} => {temp=[0,275)} 0.01188698
## 6 {weather_description=heavy snow} => {clouds_all=[90,100]} 0.01143059
##   confidence   coverage      lift count
## 1  0.9987345 1.00000000  1.000000 48143
## 2  1.0000000 1.00000000  1.000000 48204
## 3  1.0000000 1.00000000  1.000000 48204
## 4  1.0000000 0.01277902 16.760779   616
## 5  0.9301948 0.01277902  2.796502   573
## 6  0.8944805 0.01277902  2.471627   551

Visualize Rules

Scatter Plot of Support vs. Confidence This plot visualizes the rules, showing the relationship between support and confidence. It helps identify rules with both high support and high confidence.

plot(rules, method = "scatterplot", measure = c("support", "confidence"), main = "Support vs Confidence")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

Network Graph of Rules

A graph-based visualization shows the relationships among items in the rules. Each node represents an item, and edges represent association rules.

plot(rules, method = "graph", control = list(type = "items"), main = "Network Graph of Association Rules")
## Warning: Unknown control parameters: type, main
## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE
## Warning: Too many rules supplied. Only plotting the best 100 using 'lift'
## (change control parameter max if needed).

Conclusion

Association rule mining helps uncover actionable insights from traffic data, enabling better decision-making for urban planners and traffic management authorities. For example in case of rule : {weather_description=heavy snow} => {weather_main=Snow} we can see :

Similar insights can be made to other condition combinations based on values for support , lift , confidence , count and coverage. These insights will be helpful in Policy Implications and Traffic Predictions.