Dataset

I’ve obtained dataset from IMF here. The dataset is a collection of natural disasters between 1980 and 2023 in different countries. Data series: • Climate related disasters frequency, Number of Disasters: TOTAL • Climate related disasters frequency, Number of Disasters: Drought • Climate related disasters frequency, Number of Disasters: Extreme temperature • Climate related disasters frequency, Number of Disasters: Flood • Climate related disasters frequency, Number of Disasters: Landslide • Climate related disasters frequency, Number of Disasters: Storm • Climate related disasters frequency, Number of Disasters: Wildfire

Introduction

My goal of this project was to identify an interesting and appropriate dataset for an association rule mining task and to extract meaningful (and maybe unexpected) insights using the Apriori and Eclat algorithms.

Both algorithms identify frequent itemsets but use different approaches. Apriori follows a step-by-step (breadth-first) method, generating and pruning itemsets, which makes it effective but sometimes slow. Eclat takes a depth-first approach, using a more memory-efficient vertical format, making it faster for dense datasets.

Loading Libraries

library(tidyr)
library(dplyr)
library(readr)
library(readxl)
library(arules)
library(arulesViz)
library(arulesCBA)
library(RColorBrewer)

Data Cleaning & Preprocessing

The dataset was not in an ideal format for directly applying the association rule mining algorithms. It contained several irrelevant or incomplete entries that needed to be addressed before analysis. To ensure the data was suitable for the algorithms, I had to remove unnecessary rows and filtering out those that were not applicable for the task at hand. This involved eliminating rows with missing values, duplicates, and irrelevant information that would have hindered the performance of the Apriori and Eclat algorithms.

I also limited my focus on 4 years (columns): from 2020 to 2023.

Filtering Dataset

climate <- Indicator
climate <- climate %>% 
  separate(col = Indicator, into = c("Indicator", "Disaster"), sep = ":") %>%
  select(-Indicator, -ObjectId)

climate_filtered <- climate %>%
  filter(Disaster != " TOTAL") %>%
  group_by(Country) %>%
  filter(n() >= 6)
head(climate)
## # A tibble: 6 Ă— 6
##   Country                      Disaster              `2020` `2021` `2022` `2023`
##   <chr>                        <chr>                  <dbl>  <dbl>  <dbl>  <dbl>
## 1 Afghanistan, Islamic Rep. of " Drought"                NA      1     NA     NA
## 2 Afghanistan, Islamic Rep. of " Extreme temperatur…     NA     NA     NA      1
## 3 Afghanistan, Islamic Rep. of " Flood"                   5      2      5      2
## 4 Afghanistan, Islamic Rep. of " Landslide"               1      1      1     NA
## 5 Afghanistan, Islamic Rep. of " Storm"                   1     NA     NA     NA
## 6 Afghanistan, Islamic Rep. of " TOTAL"                   7      4      6      3

Replacing Missing Values

climate_filtered <- climate_filtered %>% 
  replace_na(list(`2020` = 0, `2021` = 0, `2022` = 0, `2023` = 0))

After removing unnecessary columns and rows, I reshaped the dataset to make it more suitable for the algorithms. This transformation involved reorganizing the data into a format that could be easily interpreted by both Apriori and Eclat.

Pivoting Data

climate_long <- climate_filtered %>%
  pivot_longer(cols = `2020`:`2023`, names_to = "Year", values_to = "Value")

climate_wide <- climate_long %>%
  pivot_wider(names_from = Disaster, values_from = Value)

head(climate_wide)
## # A tibble: 6 Ă— 8
## # Groups:   Country [2]
##   Country Year  ` Drought` ` Extreme temperature` ` Flood` ` Landslide` ` Storm`
##   <chr>   <chr>      <dbl>                  <dbl>    <dbl>        <dbl>    <dbl>
## 1 Afghan… 2020           0                      0        5            1        1
## 2 Afghan… 2021           1                      0        2            1        0
## 3 Afghan… 2022           0                      0        5            1        0
## 4 Afghan… 2023           0                      1        2            0        0
## 5 Albania 2020           0                      0        0            0        0
## 6 Albania 2021           0                      0        1            0        0
## # ℹ 1 more variable: ` Wildfire` <dbl>

Encoding Data

climate_ready <- climate_wide
climate_ready$' Drought' <-ifelse(climate_ready$' Drought' == 1, "Drought", "No Drought")
climate_ready$' Extreme temperature' <-ifelse(climate_ready$' Extreme temperature' == 2, "Extr Temp 2", ifelse(climate_ready$' Extreme temperature' == 1, "Extr Temp 1", "No Extr Temp"))
climate_ready$' Flood' <-ifelse(climate_ready$' Flood' == 1, "Flood 1", ifelse(climate_ready$' Flood' > 1, "Flood a few", ifelse(climate_ready$' Flood' > 2, "Flood a lot", "No Flood")))
climate_ready$' Landslide' <-ifelse(climate_ready$' Landslide' == 1, "Landslide 1", ifelse(climate_ready$' Landslide' > 1, "Landslide a few", "No Landslide"))
climate_ready$' Storm' <-ifelse(climate_ready$' Storm' == 1, "Storm 1", ifelse(climate_ready$' Storm' > 1, "Storm a few", ifelse(climate_ready$' Storm' > 2, "Storm a lot", "No Storm")))
climate_ready$' Wildfire' <-ifelse(climate_ready$' Wildfire' == 1, "Wildfire 1", ifelse(climate_ready$' Wildfire' > 1, "Wildfire a few", "No Wildfire"))
data <- climate_ready[,3:8]


head(data)
## # A tibble: 6 Ă— 6
##   ` Drought` ` Extreme temperature` ` Flood`   ` Landslide` ` Storm` ` Wildfire`
##   <chr>      <chr>                  <chr>      <chr>        <chr>    <chr>      
## 1 No Drought No Extr Temp           Flood a f… Landslide 1  Storm 1  No Wildfire
## 2 Drought    No Extr Temp           Flood a f… Landslide 1  No Storm No Wildfire
## 3 No Drought No Extr Temp           Flood a f… Landslide 1  No Storm No Wildfire
## 4 No Drought Extr Temp 1            Flood a f… No Landslide No Storm No Wildfire
## 5 No Drought No Extr Temp           No Flood   No Landslide No Storm No Wildfire
## 6 No Drought No Extr Temp           Flood 1    No Landslide No Storm No Wildfire

Data Export and Saving before algorithms application

data <- climate_ready[,3:8]
write.csv(data[,1:6], file="DataClm.csv", row.names=FALSE)
data_tran <- read.transactions("DataClm.csv", format = "basket", sep = ",", skip = 1)

data_tran
## transactions in sparse format with
##  104 transactions (rows) and
##  17 items (columns)

Inspecting Data before applying algorithms

size(data_tran)
##   [1] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
##  [38] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
##  [75] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
length(data_tran)
## [1] 104
LIST(head(data_tran))
## [[1]]
## [1] "Flood a few"  "Landslide 1"  "No Drought"   "No Extr Temp" "No Wildfire" 
## [6] "Storm 1"     
## 
## [[2]]
## [1] "Drought"      "Flood a few"  "Landslide 1"  "No Extr Temp" "No Storm"    
## [6] "No Wildfire" 
## 
## [[3]]
## [1] "Flood a few"  "Landslide 1"  "No Drought"   "No Extr Temp" "No Storm"    
## [6] "No Wildfire" 
## 
## [[4]]
## [1] "Extr Temp 1"  "Flood a few"  "No Drought"   "No Landslide" "No Storm"    
## [6] "No Wildfire" 
## 
## [[5]]
## [1] "No Drought"   "No Extr Temp" "No Flood"     "No Landslide" "No Storm"    
## [6] "No Wildfire" 
## 
## [[6]]
## [1] "Flood 1"      "No Drought"   "No Extr Temp" "No Landslide" "No Storm"    
## [6] "No Wildfire"
round(itemFrequency(data_tran),3)
##         Drought     Extr Temp 1     Extr Temp 2         Flood 1     Flood a few 
##           0.183           0.125           0.010           0.250           0.538 
##     Landslide 1 Landslide a few      No Drought    No Extr Temp        No Flood 
##           0.135           0.029           0.817           0.865           0.212 
##    No Landslide        No Storm     No Wildfire         Storm 1     Storm a few 
##           0.837           0.423           0.740           0.231           0.346 
##      Wildfire 1  Wildfire a few 
##           0.221           0.038
itemFrequency(data_tran, type="absolute")
##         Drought     Extr Temp 1     Extr Temp 2         Flood 1     Flood a few 
##              19              13               1              26              56 
##     Landslide 1 Landslide a few      No Drought    No Extr Temp        No Flood 
##              14               3              85              90              22 
##    No Landslide        No Storm     No Wildfire         Storm 1     Storm a few 
##              87              44              77              24              36 
##      Wildfire 1  Wildfire a few 
##              23               4

Cross Table Analysis

ctab <- crossTable(data_tran, measure="count", sort=TRUE)
stab <- crossTable(data_tran, measure="support", sort=TRUE)
ltab <- crossTable(data_tran, measure="lift", sort=TRUE)
ctab
##                 No Extr Temp No Landslide No Drought No Wildfire Flood a few
## No Extr Temp              90           74         74          68          47
## No Landslide              74           87         72          62          41
## No Drought                74           72         85          64          42
## No Wildfire               68           62         64          77          44
## Flood a few               47           41         42          44          56
## No Storm                  41           39         39          30          17
## Storm a few               29           30         25          26          27
## Flood 1                   23           25         24          18           0
## Storm 1                   20           18         21          21          12
## Wildfire 1                20           21         20           0           9
## No Flood                  20           21         19          15           0
## Drought                   16           15          0          13          14
## Landslide 1               13            0         10          12          12
## Extr Temp 1                0           12         10           8           8
## Wildfire a few             2            4          1           0           3
## Landslide a few            3            0          3           3           3
## Extr Temp 2                0            1          1           1           1
##                 No Storm Storm a few Flood 1 Storm 1 Wildfire 1 No Flood
## No Extr Temp          41          29      23      20         20       20
## No Landslide          39          30      25      18         21       21
## No Drought            39          25      24      21         20       19
## No Wildfire           30          26      18      21          0       15
## Flood a few           17          27       0      12          9        0
## No Storm              44           0      11       0         13       16
## Storm a few            0          36       6       0          7        3
## Flood 1               11           6      26       9          8        0
## Storm 1                0           0       9      24          3        3
## Wildfire 1            13           7       8       3         23        6
## No Flood              16           3       0       3          6       22
## Drought                5          11       2       3          3        3
## Landslide 1            5           4       1       5          2        1
## Extr Temp 1            3           6       3       4          3        2
## Wildfire a few         1           3       0       0          0        1
## Landslide a few        0           2       0       1          0        0
## Extr Temp 2            0           1       0       0          0        0
##                 Drought Landslide 1 Extr Temp 1 Wildfire a few Landslide a few
## No Extr Temp         16          13           0              2               3
## No Landslide         15           0          12              4               0
## No Drought            0          10          10              1               3
## No Wildfire          13          12           8              0               3
## Flood a few          14          12           8              3               3
## No Storm              5           5           3              1               0
## Storm a few          11           4           6              3               2
## Flood 1               2           1           3              0               0
## Storm 1               3           5           4              0               1
## Wildfire 1            3           2           3              0               0
## No Flood              3           1           2              1               0
## Drought              19           4           3              3               0
## Landslide 1           4          14           1              0               0
## Extr Temp 1           3           1          13              2               0
## Wildfire a few        3           0           2              4               0
## Landslide a few       0           0           0              0               3
## Extr Temp 2           0           0           0              0               0
##                 Extr Temp 2
## No Extr Temp              0
## No Landslide              1
## No Drought                1
## No Wildfire               1
## Flood a few               1
## No Storm                  0
## Storm a few               1
## Flood 1                   0
## Storm 1                   0
## Wildfire 1                0
## No Flood                  0
## Drought                   0
## Landslide 1               0
## Extr Temp 1               0
## Wildfire a few            0
## Landslide a few           0
## Extr Temp 2               1
stab
##                 No Extr Temp No Landslide  No Drought No Wildfire Flood a few
## No Extr Temp      0.86538462  0.711538462 0.711538462 0.653846154 0.451923077
## No Landslide      0.71153846  0.836538462 0.692307692 0.596153846 0.394230769
## No Drought        0.71153846  0.692307692 0.817307692 0.615384615 0.403846154
## No Wildfire       0.65384615  0.596153846 0.615384615 0.740384615 0.423076923
## Flood a few       0.45192308  0.394230769 0.403846154 0.423076923 0.538461538
## No Storm          0.39423077  0.375000000 0.375000000 0.288461538 0.163461538
## Storm a few       0.27884615  0.288461538 0.240384615 0.250000000 0.259615385
## Flood 1           0.22115385  0.240384615 0.230769231 0.173076923 0.000000000
## Storm 1           0.19230769  0.173076923 0.201923077 0.201923077 0.115384615
## Wildfire 1        0.19230769  0.201923077 0.192307692 0.000000000 0.086538462
## No Flood          0.19230769  0.201923077 0.182692308 0.144230769 0.000000000
## Drought           0.15384615  0.144230769 0.000000000 0.125000000 0.134615385
## Landslide 1       0.12500000  0.000000000 0.096153846 0.115384615 0.115384615
## Extr Temp 1       0.00000000  0.115384615 0.096153846 0.076923077 0.076923077
## Wildfire a few    0.01923077  0.038461538 0.009615385 0.000000000 0.028846154
## Landslide a few   0.02884615  0.000000000 0.028846154 0.028846154 0.028846154
## Extr Temp 2       0.00000000  0.009615385 0.009615385 0.009615385 0.009615385
##                    No Storm Storm a few     Flood 1     Storm 1 Wildfire 1
## No Extr Temp    0.394230769 0.278846154 0.221153846 0.192307692 0.19230769
## No Landslide    0.375000000 0.288461538 0.240384615 0.173076923 0.20192308
## No Drought      0.375000000 0.240384615 0.230769231 0.201923077 0.19230769
## No Wildfire     0.288461538 0.250000000 0.173076923 0.201923077 0.00000000
## Flood a few     0.163461538 0.259615385 0.000000000 0.115384615 0.08653846
## No Storm        0.423076923 0.000000000 0.105769231 0.000000000 0.12500000
## Storm a few     0.000000000 0.346153846 0.057692308 0.000000000 0.06730769
## Flood 1         0.105769231 0.057692308 0.250000000 0.086538462 0.07692308
## Storm 1         0.000000000 0.000000000 0.086538462 0.230769231 0.02884615
## Wildfire 1      0.125000000 0.067307692 0.076923077 0.028846154 0.22115385
## No Flood        0.153846154 0.028846154 0.000000000 0.028846154 0.05769231
## Drought         0.048076923 0.105769231 0.019230769 0.028846154 0.02884615
## Landslide 1     0.048076923 0.038461538 0.009615385 0.048076923 0.01923077
## Extr Temp 1     0.028846154 0.057692308 0.028846154 0.038461538 0.02884615
## Wildfire a few  0.009615385 0.028846154 0.000000000 0.000000000 0.00000000
## Landslide a few 0.000000000 0.019230769 0.000000000 0.009615385 0.00000000
## Extr Temp 2     0.000000000 0.009615385 0.000000000 0.000000000 0.00000000
##                    No Flood    Drought Landslide 1 Extr Temp 1 Wildfire a few
## No Extr Temp    0.192307692 0.15384615 0.125000000 0.000000000    0.019230769
## No Landslide    0.201923077 0.14423077 0.000000000 0.115384615    0.038461538
## No Drought      0.182692308 0.00000000 0.096153846 0.096153846    0.009615385
## No Wildfire     0.144230769 0.12500000 0.115384615 0.076923077    0.000000000
## Flood a few     0.000000000 0.13461538 0.115384615 0.076923077    0.028846154
## No Storm        0.153846154 0.04807692 0.048076923 0.028846154    0.009615385
## Storm a few     0.028846154 0.10576923 0.038461538 0.057692308    0.028846154
## Flood 1         0.000000000 0.01923077 0.009615385 0.028846154    0.000000000
## Storm 1         0.028846154 0.02884615 0.048076923 0.038461538    0.000000000
## Wildfire 1      0.057692308 0.02884615 0.019230769 0.028846154    0.000000000
## No Flood        0.211538462 0.02884615 0.009615385 0.019230769    0.009615385
## Drought         0.028846154 0.18269231 0.038461538 0.028846154    0.028846154
## Landslide 1     0.009615385 0.03846154 0.134615385 0.009615385    0.000000000
## Extr Temp 1     0.019230769 0.02884615 0.009615385 0.125000000    0.019230769
## Wildfire a few  0.009615385 0.02884615 0.000000000 0.019230769    0.038461538
## Landslide a few 0.000000000 0.00000000 0.000000000 0.000000000    0.000000000
## Extr Temp 2     0.000000000 0.00000000 0.000000000 0.000000000    0.000000000
##                 Landslide a few Extr Temp 2
## No Extr Temp        0.028846154 0.000000000
## No Landslide        0.000000000 0.009615385
## No Drought          0.028846154 0.009615385
## No Wildfire         0.028846154 0.009615385
## Flood a few         0.028846154 0.009615385
## No Storm            0.000000000 0.000000000
## Storm a few         0.019230769 0.009615385
## Flood 1             0.000000000 0.000000000
## Storm 1             0.009615385 0.000000000
## Wildfire 1          0.000000000 0.000000000
## No Flood            0.000000000 0.000000000
## Drought             0.000000000 0.000000000
## Landslide 1         0.000000000 0.000000000
## Extr Temp 1         0.000000000 0.000000000
## Wildfire a few      0.000000000 0.000000000
## Landslide a few     0.028846154 0.000000000
## Extr Temp 2         0.000000000 0.009615385
ltab
##                 No Extr Temp No Landslide No Drought No Wildfire Flood a few
## No Extr Temp              NA    0.9828863  1.0060131   1.0204906   0.9698413
## No Landslide       0.9828863           NA  1.0125761   0.9625317   0.8752053
## No Drought         1.0060131    1.0125761         NA   1.0169595   0.9176471
## No Wildfire        1.0204906    0.9625317  1.0169595          NA   1.0612245
## Flood a few        0.9698413    0.8752053  0.9176471   1.0612245          NA
## No Storm           1.0767677    1.0595611  1.0844920   0.9208973   0.7175325
## Storm a few        0.9308642    0.9961686  0.8496732   0.9754690   1.3928571
## Flood 1            1.0222222    1.1494253  1.1294118   0.9350649   0.0000000
## Storm 1            0.9629630    0.8965517  1.0705882   1.1818182   0.9285714
## Wildfire 1         1.0048309    1.0914543  1.0639386   0.0000000   0.7267081
## No Flood           1.0505051    1.1410658  1.0566845   0.9208973   0.0000000
## Drought            0.9730994    0.9437387  0.0000000   0.9241285   1.3684211
## Landslide 1        1.0730159    0.0000000  0.8739496   1.1576994   1.5918367
## Extr Temp 1        0.0000000    1.1034483  0.9411765   0.8311688   1.1428571
## Wildfire a few     0.5777778    1.1954023  0.3058824   0.0000000   1.3928571
## Landslide a few    1.1555556    0.0000000  1.2235294   1.3506494   1.8571429
## Extr Temp 2        0.0000000    1.1954023  1.2235294   1.3506494   1.8571429
##                  No Storm Storm a few   Flood 1   Storm 1 Wildfire 1  No Flood
## No Extr Temp    1.0767677   0.9308642 1.0222222 0.9629630  1.0048309 1.0505051
## No Landslide    1.0595611   0.9961686 1.1494253 0.8965517  1.0914543 1.1410658
## No Drought      1.0844920   0.8496732 1.1294118 1.0705882  1.0639386 1.0566845
## No Wildfire     0.9208973   0.9754690 0.9350649 1.1818182  0.0000000 0.9208973
## Flood a few     0.7175325   1.3928571 0.0000000 0.9285714  0.7267081 0.0000000
## No Storm               NA   0.0000000 1.0000000 0.0000000  1.3359684 1.7190083
## Storm a few     0.0000000          NA 0.6666667 0.0000000  0.8792271 0.3939394
## Flood 1         1.0000000   0.6666667        NA 1.5000000  1.3913043 0.0000000
## Storm 1         0.0000000   0.0000000 1.5000000        NA  0.5652174 0.5909091
## Wildfire 1      1.3359684   0.8792271 1.3913043 0.5652174         NA 1.2332016
## No Flood        1.7190083   0.3939394 0.0000000 0.5909091  1.2332016        NA
## Drought         0.6220096   1.6725146 0.4210526 0.6842105  0.7139588 0.7464115
## Landslide 1     0.8441558   0.8253968 0.2857143 1.5476190  0.6459627 0.3376623
## Extr Temp 1     0.5454545   1.3333333 0.9230769 1.3333333  1.0434783 0.7272727
## Wildfire a few  0.5909091   2.1666667 0.0000000 0.0000000  0.0000000 1.1818182
## Landslide a few 0.0000000   1.9259259 0.0000000 1.4444444  0.0000000 0.0000000
## Extr Temp 2     0.0000000   2.8888889 0.0000000 0.0000000  0.0000000 0.0000000
##                   Drought Landslide 1 Extr Temp 1 Wildfire a few
## No Extr Temp    0.9730994   1.0730159   0.0000000      0.5777778
## No Landslide    0.9437387   0.0000000   1.1034483      1.1954023
## No Drought      0.0000000   0.8739496   0.9411765      0.3058824
## No Wildfire     0.9241285   1.1576994   0.8311688      0.0000000
## Flood a few     1.3684211   1.5918367   1.1428571      1.3928571
## No Storm        0.6220096   0.8441558   0.5454545      0.5909091
## Storm a few     1.6725146   0.8253968   1.3333333      2.1666667
## Flood 1         0.4210526   0.2857143   0.9230769      0.0000000
## Storm 1         0.6842105   1.5476190   1.3333333      0.0000000
## Wildfire 1      0.7139588   0.6459627   1.0434783      0.0000000
## No Flood        0.7464115   0.3376623   0.7272727      1.1818182
## Drought                NA   1.5639098   1.2631579      4.1052632
## Landslide 1     1.5639098          NA   0.5714286      0.0000000
## Extr Temp 1     1.2631579   0.5714286          NA      4.0000000
## Wildfire a few  4.1052632   0.0000000   4.0000000             NA
## Landslide a few 0.0000000   0.0000000   0.0000000      0.0000000
## Extr Temp 2     0.0000000   0.0000000   0.0000000      0.0000000
##                 Landslide a few Extr Temp 2
## No Extr Temp           1.155556    0.000000
## No Landslide           0.000000    1.195402
## No Drought             1.223529    1.223529
## No Wildfire            1.350649    1.350649
## Flood a few            1.857143    1.857143
## No Storm               0.000000    0.000000
## Storm a few            1.925926    2.888889
## Flood 1                0.000000    0.000000
## Storm 1                1.444444    0.000000
## Wildfire 1             0.000000    0.000000
## No Flood               0.000000    0.000000
## Drought                0.000000    0.000000
## Landslide 1            0.000000    0.000000
## Extr Temp 1            0.000000    0.000000
## Wildfire a few         0.000000    0.000000
## Landslide a few              NA    0.000000
## Extr Temp 2            0.000000          NA

After examining the dataset, I realized that the insights I was hoping for might not be as meaningful as I expected. The dataset seemed to lack the complexity needed to uncover deeper patterns. Most of the correlations were quite simple and didn’t reveal anything particularly interesting. Although the data didn’t offer the advanced insights I was aiming for, I still proceeded with the analysis to explore even the basic patterns it could provide.

ECLAT Algorithm

freq.items <- eclat(data_tran, parameter=list(supp=0.25, maxlen=15))
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.25      1     15 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 26 
## 
## create itemset ... 
## set transactions ...[17 item(s), 104 transaction(s)] done [0.00s].
## sorting and recoding items ... [8 item(s)] done [0.00s].
## creating bit matrix ... [8 row(s), 104 column(s)] done [0.00s].
## writing  ... [45 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
inspect(freq.items)
##      items                                                 support   count
## [1]  {No Extr Temp, Storm a few}                           0.2788462 29   
## [2]  {No Landslide, Storm a few}                           0.2884615 30   
## [3]  {No Wildfire, Storm a few}                            0.2500000 26   
## [4]  {Flood a few, Storm a few}                            0.2596154 27   
## [5]  {No Extr Temp, No Storm, No Wildfire}                 0.2692308 28   
## [6]  {No Landslide, No Storm, No Wildfire}                 0.2500000 26   
## [7]  {No Drought, No Storm, No Wildfire}                   0.2500000 26   
## [8]  {No Drought, No Extr Temp, No Landslide, No Storm}    0.3076923 32   
## [9]  {No Drought, No Extr Temp, No Storm}                  0.3461538 36   
## [10] {No Drought, No Landslide, No Storm}                  0.3365385 35   
## [11] {No Extr Temp, No Landslide, No Storm}                0.3461538 36   
## [12] {No Extr Temp, No Storm}                              0.3942308 41   
## [13] {No Landslide, No Storm}                              0.3750000 39   
## [14] {No Drought, No Storm}                                0.3750000 39   
## [15] {No Storm, No Wildfire}                               0.2884615 30   
## [16] {Flood a few, No Drought, No Extr Temp, No Wildfire}  0.2692308 28   
## [17] {Flood a few, No Extr Temp, No Wildfire}              0.3557692 37   
## [18] {Flood a few, No Landslide, No Wildfire}              0.2980769 31   
## [19] {Flood a few, No Drought, No Wildfire}                0.3365385 35   
## [20] {Flood a few, No Drought, No Extr Temp}               0.3365385 35   
## [21] {Flood a few, No Drought, No Landslide}               0.2884615 30   
## [22] {Flood a few, No Extr Temp, No Landslide}             0.3173077 33   
## [23] {Flood a few, No Extr Temp}                           0.4519231 47   
## [24] {Flood a few, No Landslide}                           0.3942308 41   
## [25] {Flood a few, No Drought}                             0.4038462 42   
## [26] {Flood a few, No Wildfire}                            0.4230769 44   
## [27] {No Drought, No Extr Temp, No Landslide, No Wildfire} 0.4423077 46   
## [28] {No Drought, No Extr Temp, No Wildfire}               0.5384615 56   
## [29] {No Drought, No Landslide, No Wildfire}               0.5096154 53   
## [30] {No Extr Temp, No Landslide, No Wildfire}             0.5192308 54   
## [31] {No Extr Temp, No Wildfire}                           0.6538462 68   
## [32] {No Landslide, No Wildfire}                           0.5961538 62   
## [33] {No Drought, No Wildfire}                             0.6153846 64   
## [34] {No Drought, No Extr Temp, No Landslide}              0.5961538 62   
## [35] {No Drought, No Extr Temp}                            0.7115385 74   
## [36] {No Drought, No Landslide}                            0.6923077 72   
## [37] {No Extr Temp, No Landslide}                          0.7115385 74   
## [38] {No Extr Temp}                                        0.8653846 90   
## [39] {No Landslide}                                        0.8365385 87   
## [40] {No Drought}                                          0.8173077 85   
## [41] {No Wildfire}                                         0.7403846 77   
## [42] {Flood a few}                                         0.5384615 56   
## [43] {No Storm}                                            0.4230769 44   
## [44] {Storm a few}                                         0.3461538 36   
## [45] {Flood 1}                                             0.2500000 26
round(support(items(freq.items), data_tran), 2)
##  [1] 0.28 0.29 0.25 0.26 0.27 0.25 0.25 0.31 0.35 0.34 0.35 0.39 0.38 0.38 0.29
## [16] 0.27 0.36 0.30 0.34 0.34 0.29 0.32 0.45 0.39 0.40 0.42 0.44 0.54 0.51 0.52
## [31] 0.65 0.60 0.62 0.60 0.71 0.69 0.71 0.87 0.84 0.82 0.74 0.54 0.42 0.35 0.25
freq.rules <- ruleInduction(freq.items, data_tran, confidence=0.9)
freq.rules
## set of 5 rules

Apriori Algorithm

To begin with, I decided to determine the most appropriate support and confidence levels for the association rules by generating a plot based on a large number of rules. This would help visually assess how different combinations of support and confidence impact the resulting rules. By experimenting with these parameters, I could identify suitable thresholds that would balance the number of rules generated with their relevance.

rules_AP <- apriori(data_tran, parameter = list(support = 0.001, confidence = 0.5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 0 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[17 item(s), 104 transaction(s)] done [0.00s].
## sorting and recoding items ... [17 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.00s].
## writing ... [2122 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
plot(rules_AP, measure = c("support", "confidence"), shading = "lift")

Sorting and Displaying Rules

Sup=0.2, Conf=0.8 seemed most suitable.

rules.ap_norm <- apriori(data_tran, parameter=list(supp=0.2, conf=0.8))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5     0.2      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 20 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[17 item(s), 104 transaction(s)] done [0.00s].
## sorting and recoding items ... [11 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [78 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules.by.conf <- sort(rules.ap_norm, by="confidence", decreasing=TRUE)
inspect(head(rules.by.conf))
##     lhs                        rhs            support   confidence coverage 
## [1] {Flood 1}               => {No Landslide} 0.2403846 0.9615385  0.2500000
## [2] {Flood 1, No Drought}   => {No Landslide} 0.2211538 0.9583333  0.2307692
## [3] {Flood 1, No Extr Temp} => {No Landslide} 0.2115385 0.9565217  0.2211538
## [4] {No Flood}              => {No Landslide} 0.2019231 0.9545455  0.2115385
## [5] {No Storm, No Wildfire} => {No Extr Temp} 0.2692308 0.9333333  0.2884615
## [6] {No Storm}              => {No Extr Temp} 0.3942308 0.9318182  0.4230769
##     lift     count
## [1] 1.149425 25   
## [2] 1.145594 23   
## [3] 1.143428 22   
## [4] 1.141066 21   
## [5] 1.078519 28   
## [6] 1.076768 41
rules.by.lift <- sort(rules.ap_norm, by="lift", decreasing=TRUE)
inspect(head(rules.by.lift))
##     lhs                        rhs            support   confidence coverage 
## [1] {Storm 1}               => {No Wildfire}  0.2019231 0.8750000  0.2307692
## [2] {Flood 1}               => {No Landslide} 0.2403846 0.9615385  0.2500000
## [3] {Flood 1, No Drought}   => {No Landslide} 0.2211538 0.9583333  0.2307692
## [4] {Flood 1, No Extr Temp} => {No Landslide} 0.2115385 0.9565217  0.2211538
## [5] {No Flood}              => {No Landslide} 0.2019231 0.9545455  0.2115385
## [6] {Flood 1}               => {No Drought}   0.2307692 0.9230769  0.2500000
##     lift     count
## [1] 1.181818 21   
## [2] 1.149425 25   
## [3] 1.145594 23   
## [4] 1.143428 22   
## [5] 1.141066 21   
## [6] 1.129412 24

Generating Specific Rules

rules.flood1 <- apriori(data=data_tran, parameter=list(supp=0.02, conf = 0.6), appearance=list(default="lhs", rhs="Flood 1"), control=list(verbose=F))
rules.flood1.byconf <- sort(rules.flood1, by="confidence", decreasing=TRUE)
inspect(head(rules.flood1.byconf))
##     lhs                rhs          support confidence   coverage lift count
## [1] {Storm 1,                                                               
##      Wildfire 1}    => {Flood 1} 0.02884615          1 0.02884615    4     3
## [2] {No Drought,                                                            
##      Storm 1,                                                               
##      Wildfire 1}    => {Flood 1} 0.02884615          1 0.02884615    4     3
## [3] {No Landslide,                                                          
##      Storm 1,                                                               
##      Wildfire 1}    => {Flood 1} 0.02884615          1 0.02884615    4     3
## [4] {No Drought,                                                            
##      No Landslide,                                                          
##      Storm 1,                                                               
##      Wildfire 1}    => {Flood 1} 0.02884615          1 0.02884615    4     3

Visualization of itemset with dendrogram

trans.sel <- data_tran[, itemFrequency(data_tran) > 0.05]
d.jac.i <- dissimilarity(trans.sel, which="items")
plot(hclust(d.jac.i, method="ward.D2"), main="Dendrogram for Items")

Other Visualizations

colors <- brewer.pal(8, "Pastel1")
itemFrequencyPlot(data_tran, topN = 10, type = "absolute", col = colors, main = "Item Frequency")

plot(rules.ap_norm, method="graph", col = colors, control=list(type="items"))
## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

plot(rules.ap_norm, method="paracoord", col = colors, control=list(reorder=TRUE))

Conclusion

By applying both algorithms, I was able to compare their performance and uncover patterns. However, I didn’t observe any significant difference in the performance of the two algorithms, nor was I able to draw any meaningful insights from the analysis. That said, the process allowed me to visualize and analyze some interesting dependencies in the data. For instance, over the past four years, there were very few extreme temperatures, but a significant number of floods and storms. This observation, though simple, provided a glimpse into certain trends and anomalies that could be further explored with more refined data or different methods.