Discovering Traffic Patterns Using Association Rule Mining and Visualization

An association rule is a data mining technique used to find relationships or patterns between items in large datasets. The goal is to discover if-then rules of the form:

If (Condition A) → Then (Condition B)

Here :

Condition A (Antecedent): Items or events that occur together (e.g., holiday = Yes and weather_main = Rain).
Condition B (Consequent): The result or outcome associated with Condition A (e.g., traffic_volume = High).

Key metrics:

Support: How often the rule occurs in the dataset.
Confidence: The likelihood that the rule is correct.
Lift: How much more likely the rule is compared to random chance.

How Association Rules Help Analyze Traffic Patterns

Association rule mining allows you to uncover hidden relationships in traffic patterns that may not be immediately obvious. For example:

Traffic Impact Factors: Determine how variables such as holidays, weather, and specific times of day affect traffic volume.
Pattern Prediction: Identify situations where traffic is likely to be high or low based on historical data.
Optimization: Enable better resource allocation, such as adjusting traffic signals, deploying road personnel, or planning infrastructure improvements.

For this project i have used traffic data details from database available at : https://archive.ics.uci.edu/dataset/492/metro+interstate+traffic+volume and Apriori algorithm to mine association rules from a traffic dataset.

Loading the libraries and dataset

library(arules)

## Loading required package: Matrix

## 
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

library(arulesViz)
library(openxlsx)

# Load Traffic Data from a CSV File
traffic_data <- read.csv("/Users/ashutoshverma/Downloads/Metro_Interstate_Traffic_Volume.csv", stringsAsFactors = TRUE)

Convert Relevant Columns to Factors

traffic_data$holiday <- as.factor(traffic_data$holiday)
traffic_data$weather_main <- as.factor(traffic_data$weather_main)
traffic_data$weather_description <- as.factor(traffic_data$weather_description)
traffic_data$traffic_volume <- as.factor(traffic_data$traffic_volume)

Convert Data to Transaction Format

traffic_data_trans <- as(traffic_data, "transactions")

## Warning: Column(s) 2, 3, 4, 5 not logical or factor. Applying default
## discretization (see '? discretizeDF').

## Warning in discretize(x = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, : The calculated breaks are: 0, 0, 0, 9831.3
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, : The calculated breaks are: 0, 0, 0, 0.51
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

Apply Apriori Algorithm

The Apriori algorithm is applied to discover association rules with:

Support (supp): Minimum fraction of transactions containing an itemset (set at 0.01 or 1%).
Confidence (conf): Minimum probability that a rule is correct (set at 0.5 or 50%).
Target: Specifies that the algorithm should generate “rules.”

The result is stored in the rules object.

rules <- apriori(
  traffic_data_trans,
  parameter = list(supp = 0.01, conf = 0.5, target = "rules")
)

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 482 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[47348 item(s), 48204 transaction(s)] done [0.05s].
## sorting and recoding items ... [34 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 5 6 7 done [0.00s].
## writing ... [5220 rule(s)] done [0.00s].
## creating S4 object  ... done [0.01s].

rules_df <- as(rules, "data.frame") #Extract the rules data (support, confidence, lift, etc.)

Note : I have used export function to export all the rules that have been created , the file was attached with the email. Here showing you the the top 5 elements of rules_df

head(rules_df)

##                                                       rules    support
## 1                                      {} => {holiday=None} 0.99873454
## 2                              {} => {rain_1h=[0,9.83e+03]} 1.00000000
## 3                                  {} => {snow_1h=[0,0.51]} 1.00000000
## 4   {weather_description=heavy snow} => {weather_main=Snow} 0.01277902
## 5        {weather_description=heavy snow} => {temp=[0,275)} 0.01188698
## 6 {weather_description=heavy snow} => {clouds_all=[90,100]} 0.01143059
##   confidence   coverage      lift count
## 1  0.9987345 1.00000000  1.000000 48143
## 2  1.0000000 1.00000000  1.000000 48204
## 3  1.0000000 1.00000000  1.000000 48204
## 4  1.0000000 0.01277902 16.760779   616
## 5  0.9301948 0.01277902  2.796502   573
## 6  0.8944805 0.01277902  2.471627   551

Visualize Rules

Scatter Plot of Support vs. Confidence This plot visualizes the rules, showing the relationship between support and confidence. It helps identify rules with both high support and high confidence.

plot(rules, method = "scatterplot", measure = c("support", "confidence"), main = "Support vs Confidence")

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

Network Graph of Rules

A graph-based visualization shows the relationships among items in the rules. Each node represents an item, and edges represent association rules.

plot(rules, method = "graph", control = list(type = "items"), main = "Network Graph of Association Rules")

## Warning: Unknown control parameters: type, main

## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

## Warning: Too many rules supplied. Only plotting the best 100 using 'lift'
## (change control parameter max if needed).

Conclusion

Association rule mining helps uncover actionable insights from traffic data, enabling better decision-making for urban planners and traffic management authorities. For example in case of rule : {weather_description=heavy snow} => {weather_main=Snow} we can see :

Support (0.0128 or ~1.28%): Out of all observations, ~1.28% of them have “heavy snow” as the weather description.
Confidence (1 or 100%): Every instance where weather_description = heavy snow, the general category weather_main = Snow is true.
Coverage (0.0128 or ~1.28%): The proportion of transactions that match the antecedent (weather_description = heavy snow).
Lift (16.76): This rule is 16.76 times more likely than random chance.
Count (616):This rule applies to 616 transactions in the dataset, meaning there are 616 instances where “heavy snow” and “Snow” both occur.

Similar insights can be made to other condition combinations based on values for support , lift , confidence , count and coverage. These insights will be helpful in Policy Implications and Traffic Predictions.

Discovering Traffic Patterns Using Association Rule Mining and Visualization

Ashutosh Kumar Verma (Student ID 475852)

2025-01-25