I’ve obtained dataset from IMF here. The dataset is a collection of natural disasters between 1980 and 2023 in different countries. Data series: • Climate related disasters frequency, Number of Disasters: TOTAL • Climate related disasters frequency, Number of Disasters: Drought • Climate related disasters frequency, Number of Disasters: Extreme temperature • Climate related disasters frequency, Number of Disasters: Flood • Climate related disasters frequency, Number of Disasters: Landslide • Climate related disasters frequency, Number of Disasters: Storm • Climate related disasters frequency, Number of Disasters: Wildfire
My goal of this project was to identify an interesting and appropriate dataset for an association rule mining task and to extract meaningful (and maybe unexpected) insights using the Apriori and Eclat algorithms.
Both algorithms identify frequent itemsets but use different approaches. Apriori follows a step-by-step (breadth-first) method, generating and pruning itemsets, which makes it effective but sometimes slow. Eclat takes a depth-first approach, using a more memory-efficient vertical format, making it faster for dense datasets.
library(tidyr)
library(dplyr)
library(readr)
library(readxl)
library(arules)
library(arulesViz)
library(arulesCBA)
library(RColorBrewer)
The dataset was not in an ideal format for directly applying the association rule mining algorithms. It contained several irrelevant or incomplete entries that needed to be addressed before analysis. To ensure the data was suitable for the algorithms, I had to remove unnecessary rows and filtering out those that were not applicable for the task at hand. This involved eliminating rows with missing values, duplicates, and irrelevant information that would have hindered the performance of the Apriori and Eclat algorithms.
I also limited my focus on 4 years (columns): from 2020 to 2023.
climate <- Indicator
climate <- climate %>%
separate(col = Indicator, into = c("Indicator", "Disaster"), sep = ":") %>%
select(-Indicator, -ObjectId)
climate_filtered <- climate %>%
filter(Disaster != " TOTAL") %>%
group_by(Country) %>%
filter(n() >= 6)
head(climate)
## # A tibble: 6 Ă— 6
## Country Disaster `2020` `2021` `2022` `2023`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan, Islamic Rep. of " Drought" NA 1 NA NA
## 2 Afghanistan, Islamic Rep. of " Extreme temperatur… NA NA NA 1
## 3 Afghanistan, Islamic Rep. of " Flood" 5 2 5 2
## 4 Afghanistan, Islamic Rep. of " Landslide" 1 1 1 NA
## 5 Afghanistan, Islamic Rep. of " Storm" 1 NA NA NA
## 6 Afghanistan, Islamic Rep. of " TOTAL" 7 4 6 3
climate_filtered <- climate_filtered %>%
replace_na(list(`2020` = 0, `2021` = 0, `2022` = 0, `2023` = 0))
After removing unnecessary columns and rows, I reshaped the dataset to make it more suitable for the algorithms. This transformation involved reorganizing the data into a format that could be easily interpreted by both Apriori and Eclat.
climate_long <- climate_filtered %>%
pivot_longer(cols = `2020`:`2023`, names_to = "Year", values_to = "Value")
climate_wide <- climate_long %>%
pivot_wider(names_from = Disaster, values_from = Value)
head(climate_wide)
## # A tibble: 6 Ă— 8
## # Groups: Country [2]
## Country Year ` Drought` ` Extreme temperature` ` Flood` ` Landslide` ` Storm`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghan… 2020 0 0 5 1 1
## 2 Afghan… 2021 1 0 2 1 0
## 3 Afghan… 2022 0 0 5 1 0
## 4 Afghan… 2023 0 1 2 0 0
## 5 Albania 2020 0 0 0 0 0
## 6 Albania 2021 0 0 1 0 0
## # ℹ 1 more variable: ` Wildfire` <dbl>
climate_ready <- climate_wide
climate_ready$' Drought' <-ifelse(climate_ready$' Drought' == 1, "Drought", "No Drought")
climate_ready$' Extreme temperature' <-ifelse(climate_ready$' Extreme temperature' == 2, "Extr Temp 2", ifelse(climate_ready$' Extreme temperature' == 1, "Extr Temp 1", "No Extr Temp"))
climate_ready$' Flood' <-ifelse(climate_ready$' Flood' == 1, "Flood 1", ifelse(climate_ready$' Flood' > 1, "Flood a few", ifelse(climate_ready$' Flood' > 2, "Flood a lot", "No Flood")))
climate_ready$' Landslide' <-ifelse(climate_ready$' Landslide' == 1, "Landslide 1", ifelse(climate_ready$' Landslide' > 1, "Landslide a few", "No Landslide"))
climate_ready$' Storm' <-ifelse(climate_ready$' Storm' == 1, "Storm 1", ifelse(climate_ready$' Storm' > 1, "Storm a few", ifelse(climate_ready$' Storm' > 2, "Storm a lot", "No Storm")))
climate_ready$' Wildfire' <-ifelse(climate_ready$' Wildfire' == 1, "Wildfire 1", ifelse(climate_ready$' Wildfire' > 1, "Wildfire a few", "No Wildfire"))
data <- climate_ready[,3:8]
head(data)
## # A tibble: 6 Ă— 6
## ` Drought` ` Extreme temperature` ` Flood` ` Landslide` ` Storm` ` Wildfire`
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 No Drought No Extr Temp Flood a f… Landslide 1 Storm 1 No Wildfire
## 2 Drought No Extr Temp Flood a f… Landslide 1 No Storm No Wildfire
## 3 No Drought No Extr Temp Flood a f… Landslide 1 No Storm No Wildfire
## 4 No Drought Extr Temp 1 Flood a f… No Landslide No Storm No Wildfire
## 5 No Drought No Extr Temp No Flood No Landslide No Storm No Wildfire
## 6 No Drought No Extr Temp Flood 1 No Landslide No Storm No Wildfire
data <- climate_ready[,3:8]
write.csv(data[,1:6], file="DataClm.csv", row.names=FALSE)
data_tran <- read.transactions("DataClm.csv", format = "basket", sep = ",", skip = 1)
data_tran
## transactions in sparse format with
## 104 transactions (rows) and
## 17 items (columns)
size(data_tran)
## [1] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## [38] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## [75] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
length(data_tran)
## [1] 104
LIST(head(data_tran))
## [[1]]
## [1] "Flood a few" "Landslide 1" "No Drought" "No Extr Temp" "No Wildfire"
## [6] "Storm 1"
##
## [[2]]
## [1] "Drought" "Flood a few" "Landslide 1" "No Extr Temp" "No Storm"
## [6] "No Wildfire"
##
## [[3]]
## [1] "Flood a few" "Landslide 1" "No Drought" "No Extr Temp" "No Storm"
## [6] "No Wildfire"
##
## [[4]]
## [1] "Extr Temp 1" "Flood a few" "No Drought" "No Landslide" "No Storm"
## [6] "No Wildfire"
##
## [[5]]
## [1] "No Drought" "No Extr Temp" "No Flood" "No Landslide" "No Storm"
## [6] "No Wildfire"
##
## [[6]]
## [1] "Flood 1" "No Drought" "No Extr Temp" "No Landslide" "No Storm"
## [6] "No Wildfire"
round(itemFrequency(data_tran),3)
## Drought Extr Temp 1 Extr Temp 2 Flood 1 Flood a few
## 0.183 0.125 0.010 0.250 0.538
## Landslide 1 Landslide a few No Drought No Extr Temp No Flood
## 0.135 0.029 0.817 0.865 0.212
## No Landslide No Storm No Wildfire Storm 1 Storm a few
## 0.837 0.423 0.740 0.231 0.346
## Wildfire 1 Wildfire a few
## 0.221 0.038
itemFrequency(data_tran, type="absolute")
## Drought Extr Temp 1 Extr Temp 2 Flood 1 Flood a few
## 19 13 1 26 56
## Landslide 1 Landslide a few No Drought No Extr Temp No Flood
## 14 3 85 90 22
## No Landslide No Storm No Wildfire Storm 1 Storm a few
## 87 44 77 24 36
## Wildfire 1 Wildfire a few
## 23 4
ctab <- crossTable(data_tran, measure="count", sort=TRUE)
stab <- crossTable(data_tran, measure="support", sort=TRUE)
ltab <- crossTable(data_tran, measure="lift", sort=TRUE)
ctab
## No Extr Temp No Landslide No Drought No Wildfire Flood a few
## No Extr Temp 90 74 74 68 47
## No Landslide 74 87 72 62 41
## No Drought 74 72 85 64 42
## No Wildfire 68 62 64 77 44
## Flood a few 47 41 42 44 56
## No Storm 41 39 39 30 17
## Storm a few 29 30 25 26 27
## Flood 1 23 25 24 18 0
## Storm 1 20 18 21 21 12
## Wildfire 1 20 21 20 0 9
## No Flood 20 21 19 15 0
## Drought 16 15 0 13 14
## Landslide 1 13 0 10 12 12
## Extr Temp 1 0 12 10 8 8
## Wildfire a few 2 4 1 0 3
## Landslide a few 3 0 3 3 3
## Extr Temp 2 0 1 1 1 1
## No Storm Storm a few Flood 1 Storm 1 Wildfire 1 No Flood
## No Extr Temp 41 29 23 20 20 20
## No Landslide 39 30 25 18 21 21
## No Drought 39 25 24 21 20 19
## No Wildfire 30 26 18 21 0 15
## Flood a few 17 27 0 12 9 0
## No Storm 44 0 11 0 13 16
## Storm a few 0 36 6 0 7 3
## Flood 1 11 6 26 9 8 0
## Storm 1 0 0 9 24 3 3
## Wildfire 1 13 7 8 3 23 6
## No Flood 16 3 0 3 6 22
## Drought 5 11 2 3 3 3
## Landslide 1 5 4 1 5 2 1
## Extr Temp 1 3 6 3 4 3 2
## Wildfire a few 1 3 0 0 0 1
## Landslide a few 0 2 0 1 0 0
## Extr Temp 2 0 1 0 0 0 0
## Drought Landslide 1 Extr Temp 1 Wildfire a few Landslide a few
## No Extr Temp 16 13 0 2 3
## No Landslide 15 0 12 4 0
## No Drought 0 10 10 1 3
## No Wildfire 13 12 8 0 3
## Flood a few 14 12 8 3 3
## No Storm 5 5 3 1 0
## Storm a few 11 4 6 3 2
## Flood 1 2 1 3 0 0
## Storm 1 3 5 4 0 1
## Wildfire 1 3 2 3 0 0
## No Flood 3 1 2 1 0
## Drought 19 4 3 3 0
## Landslide 1 4 14 1 0 0
## Extr Temp 1 3 1 13 2 0
## Wildfire a few 3 0 2 4 0
## Landslide a few 0 0 0 0 3
## Extr Temp 2 0 0 0 0 0
## Extr Temp 2
## No Extr Temp 0
## No Landslide 1
## No Drought 1
## No Wildfire 1
## Flood a few 1
## No Storm 0
## Storm a few 1
## Flood 1 0
## Storm 1 0
## Wildfire 1 0
## No Flood 0
## Drought 0
## Landslide 1 0
## Extr Temp 1 0
## Wildfire a few 0
## Landslide a few 0
## Extr Temp 2 1
stab
## No Extr Temp No Landslide No Drought No Wildfire Flood a few
## No Extr Temp 0.86538462 0.711538462 0.711538462 0.653846154 0.451923077
## No Landslide 0.71153846 0.836538462 0.692307692 0.596153846 0.394230769
## No Drought 0.71153846 0.692307692 0.817307692 0.615384615 0.403846154
## No Wildfire 0.65384615 0.596153846 0.615384615 0.740384615 0.423076923
## Flood a few 0.45192308 0.394230769 0.403846154 0.423076923 0.538461538
## No Storm 0.39423077 0.375000000 0.375000000 0.288461538 0.163461538
## Storm a few 0.27884615 0.288461538 0.240384615 0.250000000 0.259615385
## Flood 1 0.22115385 0.240384615 0.230769231 0.173076923 0.000000000
## Storm 1 0.19230769 0.173076923 0.201923077 0.201923077 0.115384615
## Wildfire 1 0.19230769 0.201923077 0.192307692 0.000000000 0.086538462
## No Flood 0.19230769 0.201923077 0.182692308 0.144230769 0.000000000
## Drought 0.15384615 0.144230769 0.000000000 0.125000000 0.134615385
## Landslide 1 0.12500000 0.000000000 0.096153846 0.115384615 0.115384615
## Extr Temp 1 0.00000000 0.115384615 0.096153846 0.076923077 0.076923077
## Wildfire a few 0.01923077 0.038461538 0.009615385 0.000000000 0.028846154
## Landslide a few 0.02884615 0.000000000 0.028846154 0.028846154 0.028846154
## Extr Temp 2 0.00000000 0.009615385 0.009615385 0.009615385 0.009615385
## No Storm Storm a few Flood 1 Storm 1 Wildfire 1
## No Extr Temp 0.394230769 0.278846154 0.221153846 0.192307692 0.19230769
## No Landslide 0.375000000 0.288461538 0.240384615 0.173076923 0.20192308
## No Drought 0.375000000 0.240384615 0.230769231 0.201923077 0.19230769
## No Wildfire 0.288461538 0.250000000 0.173076923 0.201923077 0.00000000
## Flood a few 0.163461538 0.259615385 0.000000000 0.115384615 0.08653846
## No Storm 0.423076923 0.000000000 0.105769231 0.000000000 0.12500000
## Storm a few 0.000000000 0.346153846 0.057692308 0.000000000 0.06730769
## Flood 1 0.105769231 0.057692308 0.250000000 0.086538462 0.07692308
## Storm 1 0.000000000 0.000000000 0.086538462 0.230769231 0.02884615
## Wildfire 1 0.125000000 0.067307692 0.076923077 0.028846154 0.22115385
## No Flood 0.153846154 0.028846154 0.000000000 0.028846154 0.05769231
## Drought 0.048076923 0.105769231 0.019230769 0.028846154 0.02884615
## Landslide 1 0.048076923 0.038461538 0.009615385 0.048076923 0.01923077
## Extr Temp 1 0.028846154 0.057692308 0.028846154 0.038461538 0.02884615
## Wildfire a few 0.009615385 0.028846154 0.000000000 0.000000000 0.00000000
## Landslide a few 0.000000000 0.019230769 0.000000000 0.009615385 0.00000000
## Extr Temp 2 0.000000000 0.009615385 0.000000000 0.000000000 0.00000000
## No Flood Drought Landslide 1 Extr Temp 1 Wildfire a few
## No Extr Temp 0.192307692 0.15384615 0.125000000 0.000000000 0.019230769
## No Landslide 0.201923077 0.14423077 0.000000000 0.115384615 0.038461538
## No Drought 0.182692308 0.00000000 0.096153846 0.096153846 0.009615385
## No Wildfire 0.144230769 0.12500000 0.115384615 0.076923077 0.000000000
## Flood a few 0.000000000 0.13461538 0.115384615 0.076923077 0.028846154
## No Storm 0.153846154 0.04807692 0.048076923 0.028846154 0.009615385
## Storm a few 0.028846154 0.10576923 0.038461538 0.057692308 0.028846154
## Flood 1 0.000000000 0.01923077 0.009615385 0.028846154 0.000000000
## Storm 1 0.028846154 0.02884615 0.048076923 0.038461538 0.000000000
## Wildfire 1 0.057692308 0.02884615 0.019230769 0.028846154 0.000000000
## No Flood 0.211538462 0.02884615 0.009615385 0.019230769 0.009615385
## Drought 0.028846154 0.18269231 0.038461538 0.028846154 0.028846154
## Landslide 1 0.009615385 0.03846154 0.134615385 0.009615385 0.000000000
## Extr Temp 1 0.019230769 0.02884615 0.009615385 0.125000000 0.019230769
## Wildfire a few 0.009615385 0.02884615 0.000000000 0.019230769 0.038461538
## Landslide a few 0.000000000 0.00000000 0.000000000 0.000000000 0.000000000
## Extr Temp 2 0.000000000 0.00000000 0.000000000 0.000000000 0.000000000
## Landslide a few Extr Temp 2
## No Extr Temp 0.028846154 0.000000000
## No Landslide 0.000000000 0.009615385
## No Drought 0.028846154 0.009615385
## No Wildfire 0.028846154 0.009615385
## Flood a few 0.028846154 0.009615385
## No Storm 0.000000000 0.000000000
## Storm a few 0.019230769 0.009615385
## Flood 1 0.000000000 0.000000000
## Storm 1 0.009615385 0.000000000
## Wildfire 1 0.000000000 0.000000000
## No Flood 0.000000000 0.000000000
## Drought 0.000000000 0.000000000
## Landslide 1 0.000000000 0.000000000
## Extr Temp 1 0.000000000 0.000000000
## Wildfire a few 0.000000000 0.000000000
## Landslide a few 0.028846154 0.000000000
## Extr Temp 2 0.000000000 0.009615385
ltab
## No Extr Temp No Landslide No Drought No Wildfire Flood a few
## No Extr Temp NA 0.9828863 1.0060131 1.0204906 0.9698413
## No Landslide 0.9828863 NA 1.0125761 0.9625317 0.8752053
## No Drought 1.0060131 1.0125761 NA 1.0169595 0.9176471
## No Wildfire 1.0204906 0.9625317 1.0169595 NA 1.0612245
## Flood a few 0.9698413 0.8752053 0.9176471 1.0612245 NA
## No Storm 1.0767677 1.0595611 1.0844920 0.9208973 0.7175325
## Storm a few 0.9308642 0.9961686 0.8496732 0.9754690 1.3928571
## Flood 1 1.0222222 1.1494253 1.1294118 0.9350649 0.0000000
## Storm 1 0.9629630 0.8965517 1.0705882 1.1818182 0.9285714
## Wildfire 1 1.0048309 1.0914543 1.0639386 0.0000000 0.7267081
## No Flood 1.0505051 1.1410658 1.0566845 0.9208973 0.0000000
## Drought 0.9730994 0.9437387 0.0000000 0.9241285 1.3684211
## Landslide 1 1.0730159 0.0000000 0.8739496 1.1576994 1.5918367
## Extr Temp 1 0.0000000 1.1034483 0.9411765 0.8311688 1.1428571
## Wildfire a few 0.5777778 1.1954023 0.3058824 0.0000000 1.3928571
## Landslide a few 1.1555556 0.0000000 1.2235294 1.3506494 1.8571429
## Extr Temp 2 0.0000000 1.1954023 1.2235294 1.3506494 1.8571429
## No Storm Storm a few Flood 1 Storm 1 Wildfire 1 No Flood
## No Extr Temp 1.0767677 0.9308642 1.0222222 0.9629630 1.0048309 1.0505051
## No Landslide 1.0595611 0.9961686 1.1494253 0.8965517 1.0914543 1.1410658
## No Drought 1.0844920 0.8496732 1.1294118 1.0705882 1.0639386 1.0566845
## No Wildfire 0.9208973 0.9754690 0.9350649 1.1818182 0.0000000 0.9208973
## Flood a few 0.7175325 1.3928571 0.0000000 0.9285714 0.7267081 0.0000000
## No Storm NA 0.0000000 1.0000000 0.0000000 1.3359684 1.7190083
## Storm a few 0.0000000 NA 0.6666667 0.0000000 0.8792271 0.3939394
## Flood 1 1.0000000 0.6666667 NA 1.5000000 1.3913043 0.0000000
## Storm 1 0.0000000 0.0000000 1.5000000 NA 0.5652174 0.5909091
## Wildfire 1 1.3359684 0.8792271 1.3913043 0.5652174 NA 1.2332016
## No Flood 1.7190083 0.3939394 0.0000000 0.5909091 1.2332016 NA
## Drought 0.6220096 1.6725146 0.4210526 0.6842105 0.7139588 0.7464115
## Landslide 1 0.8441558 0.8253968 0.2857143 1.5476190 0.6459627 0.3376623
## Extr Temp 1 0.5454545 1.3333333 0.9230769 1.3333333 1.0434783 0.7272727
## Wildfire a few 0.5909091 2.1666667 0.0000000 0.0000000 0.0000000 1.1818182
## Landslide a few 0.0000000 1.9259259 0.0000000 1.4444444 0.0000000 0.0000000
## Extr Temp 2 0.0000000 2.8888889 0.0000000 0.0000000 0.0000000 0.0000000
## Drought Landslide 1 Extr Temp 1 Wildfire a few
## No Extr Temp 0.9730994 1.0730159 0.0000000 0.5777778
## No Landslide 0.9437387 0.0000000 1.1034483 1.1954023
## No Drought 0.0000000 0.8739496 0.9411765 0.3058824
## No Wildfire 0.9241285 1.1576994 0.8311688 0.0000000
## Flood a few 1.3684211 1.5918367 1.1428571 1.3928571
## No Storm 0.6220096 0.8441558 0.5454545 0.5909091
## Storm a few 1.6725146 0.8253968 1.3333333 2.1666667
## Flood 1 0.4210526 0.2857143 0.9230769 0.0000000
## Storm 1 0.6842105 1.5476190 1.3333333 0.0000000
## Wildfire 1 0.7139588 0.6459627 1.0434783 0.0000000
## No Flood 0.7464115 0.3376623 0.7272727 1.1818182
## Drought NA 1.5639098 1.2631579 4.1052632
## Landslide 1 1.5639098 NA 0.5714286 0.0000000
## Extr Temp 1 1.2631579 0.5714286 NA 4.0000000
## Wildfire a few 4.1052632 0.0000000 4.0000000 NA
## Landslide a few 0.0000000 0.0000000 0.0000000 0.0000000
## Extr Temp 2 0.0000000 0.0000000 0.0000000 0.0000000
## Landslide a few Extr Temp 2
## No Extr Temp 1.155556 0.000000
## No Landslide 0.000000 1.195402
## No Drought 1.223529 1.223529
## No Wildfire 1.350649 1.350649
## Flood a few 1.857143 1.857143
## No Storm 0.000000 0.000000
## Storm a few 1.925926 2.888889
## Flood 1 0.000000 0.000000
## Storm 1 1.444444 0.000000
## Wildfire 1 0.000000 0.000000
## No Flood 0.000000 0.000000
## Drought 0.000000 0.000000
## Landslide 1 0.000000 0.000000
## Extr Temp 1 0.000000 0.000000
## Wildfire a few 0.000000 0.000000
## Landslide a few NA 0.000000
## Extr Temp 2 0.000000 NA
After examining the dataset, I realized that the insights I was hoping for might not be as meaningful as I expected. The dataset seemed to lack the complexity needed to uncover deeper patterns. Most of the correlations were quite simple and didn’t reveal anything particularly interesting. Although the data didn’t offer the advanced insights I was aiming for, I still proceeded with the analysis to explore even the basic patterns it could provide.
freq.items <- eclat(data_tran, parameter=list(supp=0.25, maxlen=15))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.25 1 15 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 26
##
## create itemset ...
## set transactions ...[17 item(s), 104 transaction(s)] done [0.00s].
## sorting and recoding items ... [8 item(s)] done [0.00s].
## creating bit matrix ... [8 row(s), 104 column(s)] done [0.00s].
## writing ... [45 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
inspect(freq.items)
## items support count
## [1] {No Extr Temp, Storm a few} 0.2788462 29
## [2] {No Landslide, Storm a few} 0.2884615 30
## [3] {No Wildfire, Storm a few} 0.2500000 26
## [4] {Flood a few, Storm a few} 0.2596154 27
## [5] {No Extr Temp, No Storm, No Wildfire} 0.2692308 28
## [6] {No Landslide, No Storm, No Wildfire} 0.2500000 26
## [7] {No Drought, No Storm, No Wildfire} 0.2500000 26
## [8] {No Drought, No Extr Temp, No Landslide, No Storm} 0.3076923 32
## [9] {No Drought, No Extr Temp, No Storm} 0.3461538 36
## [10] {No Drought, No Landslide, No Storm} 0.3365385 35
## [11] {No Extr Temp, No Landslide, No Storm} 0.3461538 36
## [12] {No Extr Temp, No Storm} 0.3942308 41
## [13] {No Landslide, No Storm} 0.3750000 39
## [14] {No Drought, No Storm} 0.3750000 39
## [15] {No Storm, No Wildfire} 0.2884615 30
## [16] {Flood a few, No Drought, No Extr Temp, No Wildfire} 0.2692308 28
## [17] {Flood a few, No Extr Temp, No Wildfire} 0.3557692 37
## [18] {Flood a few, No Landslide, No Wildfire} 0.2980769 31
## [19] {Flood a few, No Drought, No Wildfire} 0.3365385 35
## [20] {Flood a few, No Drought, No Extr Temp} 0.3365385 35
## [21] {Flood a few, No Drought, No Landslide} 0.2884615 30
## [22] {Flood a few, No Extr Temp, No Landslide} 0.3173077 33
## [23] {Flood a few, No Extr Temp} 0.4519231 47
## [24] {Flood a few, No Landslide} 0.3942308 41
## [25] {Flood a few, No Drought} 0.4038462 42
## [26] {Flood a few, No Wildfire} 0.4230769 44
## [27] {No Drought, No Extr Temp, No Landslide, No Wildfire} 0.4423077 46
## [28] {No Drought, No Extr Temp, No Wildfire} 0.5384615 56
## [29] {No Drought, No Landslide, No Wildfire} 0.5096154 53
## [30] {No Extr Temp, No Landslide, No Wildfire} 0.5192308 54
## [31] {No Extr Temp, No Wildfire} 0.6538462 68
## [32] {No Landslide, No Wildfire} 0.5961538 62
## [33] {No Drought, No Wildfire} 0.6153846 64
## [34] {No Drought, No Extr Temp, No Landslide} 0.5961538 62
## [35] {No Drought, No Extr Temp} 0.7115385 74
## [36] {No Drought, No Landslide} 0.6923077 72
## [37] {No Extr Temp, No Landslide} 0.7115385 74
## [38] {No Extr Temp} 0.8653846 90
## [39] {No Landslide} 0.8365385 87
## [40] {No Drought} 0.8173077 85
## [41] {No Wildfire} 0.7403846 77
## [42] {Flood a few} 0.5384615 56
## [43] {No Storm} 0.4230769 44
## [44] {Storm a few} 0.3461538 36
## [45] {Flood 1} 0.2500000 26
round(support(items(freq.items), data_tran), 2)
## [1] 0.28 0.29 0.25 0.26 0.27 0.25 0.25 0.31 0.35 0.34 0.35 0.39 0.38 0.38 0.29
## [16] 0.27 0.36 0.30 0.34 0.34 0.29 0.32 0.45 0.39 0.40 0.42 0.44 0.54 0.51 0.52
## [31] 0.65 0.60 0.62 0.60 0.71 0.69 0.71 0.87 0.84 0.82 0.74 0.54 0.42 0.35 0.25
freq.rules <- ruleInduction(freq.items, data_tran, confidence=0.9)
freq.rules
## set of 5 rules
To begin with, I decided to determine the most appropriate support and confidence levels for the association rules by generating a plot based on a large number of rules. This would help visually assess how different combinations of support and confidence impact the resulting rules. By experimenting with these parameters, I could identify suitable thresholds that would balance the number of rules generated with their relevance.
rules_AP <- apriori(data_tran, parameter = list(support = 0.001, confidence = 0.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 0
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[17 item(s), 104 transaction(s)] done [0.00s].
## sorting and recoding items ... [17 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.00s].
## writing ... [2122 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
plot(rules_AP, measure = c("support", "confidence"), shading = "lift")
Sup=0.2, Conf=0.8 seemed most suitable.
rules.ap_norm <- apriori(data_tran, parameter=list(supp=0.2, conf=0.8))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.2 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 20
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[17 item(s), 104 transaction(s)] done [0.00s].
## sorting and recoding items ... [11 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [78 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules.by.conf <- sort(rules.ap_norm, by="confidence", decreasing=TRUE)
inspect(head(rules.by.conf))
## lhs rhs support confidence coverage
## [1] {Flood 1} => {No Landslide} 0.2403846 0.9615385 0.2500000
## [2] {Flood 1, No Drought} => {No Landslide} 0.2211538 0.9583333 0.2307692
## [3] {Flood 1, No Extr Temp} => {No Landslide} 0.2115385 0.9565217 0.2211538
## [4] {No Flood} => {No Landslide} 0.2019231 0.9545455 0.2115385
## [5] {No Storm, No Wildfire} => {No Extr Temp} 0.2692308 0.9333333 0.2884615
## [6] {No Storm} => {No Extr Temp} 0.3942308 0.9318182 0.4230769
## lift count
## [1] 1.149425 25
## [2] 1.145594 23
## [3] 1.143428 22
## [4] 1.141066 21
## [5] 1.078519 28
## [6] 1.076768 41
rules.by.lift <- sort(rules.ap_norm, by="lift", decreasing=TRUE)
inspect(head(rules.by.lift))
## lhs rhs support confidence coverage
## [1] {Storm 1} => {No Wildfire} 0.2019231 0.8750000 0.2307692
## [2] {Flood 1} => {No Landslide} 0.2403846 0.9615385 0.2500000
## [3] {Flood 1, No Drought} => {No Landslide} 0.2211538 0.9583333 0.2307692
## [4] {Flood 1, No Extr Temp} => {No Landslide} 0.2115385 0.9565217 0.2211538
## [5] {No Flood} => {No Landslide} 0.2019231 0.9545455 0.2115385
## [6] {Flood 1} => {No Drought} 0.2307692 0.9230769 0.2500000
## lift count
## [1] 1.181818 21
## [2] 1.149425 25
## [3] 1.145594 23
## [4] 1.143428 22
## [5] 1.141066 21
## [6] 1.129412 24
rules.flood1 <- apriori(data=data_tran, parameter=list(supp=0.02, conf = 0.6), appearance=list(default="lhs", rhs="Flood 1"), control=list(verbose=F))
rules.flood1.byconf <- sort(rules.flood1, by="confidence", decreasing=TRUE)
inspect(head(rules.flood1.byconf))
## lhs rhs support confidence coverage lift count
## [1] {Storm 1,
## Wildfire 1} => {Flood 1} 0.02884615 1 0.02884615 4 3
## [2] {No Drought,
## Storm 1,
## Wildfire 1} => {Flood 1} 0.02884615 1 0.02884615 4 3
## [3] {No Landslide,
## Storm 1,
## Wildfire 1} => {Flood 1} 0.02884615 1 0.02884615 4 3
## [4] {No Drought,
## No Landslide,
## Storm 1,
## Wildfire 1} => {Flood 1} 0.02884615 1 0.02884615 4 3
trans.sel <- data_tran[, itemFrequency(data_tran) > 0.05]
d.jac.i <- dissimilarity(trans.sel, which="items")
plot(hclust(d.jac.i, method="ward.D2"), main="Dendrogram for Items")
colors <- brewer.pal(8, "Pastel1")
itemFrequencyPlot(data_tran, topN = 10, type = "absolute", col = colors, main = "Item Frequency")
plot(rules.ap_norm, method="graph", col = colors, control=list(type="items"))
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
plot(rules.ap_norm, method="paracoord", col = colors, control=list(reorder=TRUE))
By applying both algorithms, I was able to compare their performance and uncover patterns. However, I didn’t observe any significant difference in the performance of the two algorithms, nor was I able to draw any meaningful insights from the analysis. That said, the process allowed me to visualize and analyze some interesting dependencies in the data. For instance, over the past four years, there were very few extreme temperatures, but a significant number of floods and storms. This observation, though simple, provided a glimpse into certain trends and anomalies that could be further explored with more refined data or different methods.