Introduction

In this analysis, I applied Association Rule Mining using the Apriori algorithm on economic indicators GDP, Foreign Direct Investment (FDI), Inflation, and Unemployment. The objective of this analysis was to analyze what impact the foreign investment plays into a country’s economy.

The data used was downloaded from https://databank.worldbank.org/ for all countries and used GDP, Foreign Direct Investment (FDI), Inflation, and Unemployment as Variables.

Data preparation

if (!require(arules)) install.packages("arules", dependencies=TRUE)
## Loading required package: arules
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
if (!require(arulesViz)) install.packages("arulesViz", dependencies=TRUE)
## Loading required package: arulesViz
# Load required libraries
library(arules)
library(arulesViz)

Loading Data and cleaning Data.

To avoid bias and wrong interpretation in our dataset I decided to use only rows which had complete cases.

data <- read.csv("gdpData.csv", stringsAsFactors = FALSE)

# Replace ".." with NA for missing values
data[data == ".."] <- NA

# Convert necessary columns to numeric
data[, 5:8] <- lapply(data[, 5:8], as.numeric)

# Remove rows with missing values
data_complete <- na.omit(data)

# Discretizing numerical variables
data_complete$GDP <- cut(data_complete[,5], breaks=3, labels=c("Low GDP", "Medium GDP", "High GDP"))
data_complete$FDI <- cut(data_complete[,6], breaks=3, labels=c("Low FDI", "Medium FDI", "High FDI"))
data_complete$Inflation <- cut(data_complete[,7], breaks=3, labels=c("Low Inflation", "Medium Inflation", "High Inflation"))
data_complete$Unemployment <- cut(data_complete[,8], breaks=3, labels=c("Low Unemployment", "Medium Unemployment", "High Unemployment"))

head(data_complete)
##    Time Time.Code           Country.Name Country.Code
## 2  2023    YR2023                Albania          ALB
## 7  2023    YR2023                Belarus          BLR
## 9  2023    YR2023 Bosnia and Herzegovina          BIH
## 10 2023    YR2023               Botswana          BWA
## 11 2023    YR2023                 Brazil          BRA
## 12 2023    YR2023           Burkina Faso          BFA
##    GDP..current.US....NY.GDP.MKTP.CD.
## 2                        2.354718e+10
## 7                        7.185738e+10
## 9                        2.751478e+10
## 10                       1.939608e+10
## 11                       2.173666e+12
## 12                       2.032462e+10
##    Foreign.direct.investment..net.inflows..BoP..current.US....BX.KLT.DINV.CD.WD.
## 2                                                                     1620982551
## 7                                                                     1992107728
## 9                                                                     1035178227
## 10                                                                     665417580
## 11                                                                   64227330466
## 12                                                                       5174270
##    Inflation..consumer.prices..annual.....FP.CPI.TOTL.ZG.
## 2                                               4.7597642
## 7                                               5.0005990
## 9                                               6.1059011
## 10                                              5.0676155
## 11                                              4.5935628
## 12                                              0.7429104
##    Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS.
## 2                                                                                10.669
## 7                                                                                 3.461
## 9                                                                                10.668
## 10                                                                               23.381
## 11                                                                                7.947
## 12                                                                                5.348
##        GDP        FDI     Inflation      Unemployment
## 2  Low GDP Medium FDI Low Inflation  Low Unemployment
## 7  Low GDP Medium FDI Low Inflation  Low Unemployment
## 9  Low GDP Medium FDI Low Inflation  Low Unemployment
## 10 Low GDP Medium FDI Low Inflation High Unemployment
## 11 Low GDP Medium FDI Low Inflation  Low Unemployment
## 12 Low GDP Medium FDI Low Inflation  Low Unemployment

Converting our Data into Transactions

# Selecting categorical columns for analysis
data_trans <- data_complete[, c("GDP", "FDI", "Inflation", "Unemployment")]

# Convert each row into a character vector
data_list <- split(data_trans, rownames(data_trans))

# Ensure data is in the correct format
data_list <- lapply(data_list, function(x) as.character(unlist(x)))

# Convert to transactions
transactions <- as(data_list, "transactions")

# Display a summary of the transactions
summary(transactions)
## transactions as itemMatrix in sparse format with
##  86 rows (elements/itemsets/transactions) and
##  11 columns (items) and a density of 0.3636364 
## 
## most frequent items:
##             Low GDP       Low Inflation          Medium FDI    Low Unemployment 
##                  85                  82                  82                  78 
## Medium Unemployment             (Other) 
##                   6                  11 
## 
## element (itemset/transaction) length distribution:
## sizes
##  4 
## 86 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       4       4       4       4       4       4 
## 
## includes extended item information - examples:
##           labels
## 1       High FDI
## 2       High GDP
## 3 High Inflation
## 
## includes extended transaction information - examples:
##   transactionID
## 1            10
## 2           103
## 3           106

Association rule using apriori Algorithm

Minimum support is set to 5 % and confidence is set to 80 %

rules <- apriori(transactions, parameter=list(support=0.05, confidence=0.8))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 4 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[11 item(s), 86 transaction(s)] done [0.00s].
## sorting and recoding items ... [5 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [44 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary(rules)
## set of 44 rules
## 
## rule length distribution (lhs + rhs):sizes
##  1  2  3  4 
##  4 15 18  7 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   2.636   3.000   4.000 
## 
## summary of quality measures:
##     support          confidence        coverage            lift       
##  Min.   :0.05814   Min.   :0.8333   Min.   :0.05814   Min.   :0.8740  
##  1st Qu.:0.06977   1st Qu.:0.9144   1st Qu.:0.06977   1st Qu.:0.9981  
##  Median :0.86047   Median :0.9595   Median :0.90698   Median :1.0049  
##  Mean   :0.66068   Mean   :0.9511   Mean   :0.69345   Mean   :0.9963  
##  3rd Qu.:0.90698   3rd Qu.:1.0000   3rd Qu.:0.95349   3rd Qu.:1.0118  
##  Max.   :0.98837   Max.   :1.0000   Max.   :1.00000   Max.   :1.0488  
##      count      
##  Min.   : 5.00  
##  1st Qu.: 6.00  
##  Median :74.00  
##  Mean   :56.82  
##  3rd Qu.:78.00  
##  Max.   :85.00  
## 
## mining info:
##          data ntransactions support confidence
##  transactions            86    0.05        0.8
##                                                                              call
##  apriori(data = transactions, parameter = list(support = 0.05, confidence = 0.8))
rules_df <-as(rules, 'data.frame')

Visualization of the Association Rules

## Warning: Unknown control parameters: type
## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

Plotting a scatter plot of the Rules

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

From above plot we can observe that the rules that have high probability of correctness and are most reliable into predicting relationship between GDP, FDI and Inflation and unemployment are clustered on the top right hand of the plot.

Inspect Rules by sorting Lift

cat("Top 10 Association Rules:\n")
## Top 10 Association Rules:
inspect(sort(rules, by="lift")[1:10])
##      lhs                      rhs             support confidence   coverage     lift count
## [1]  {Medium Unemployment} => {Medium FDI} 0.06976744  1.0000000 0.06976744 1.048780     6
## [2]  {Low Inflation,                                                                      
##       Medium Unemployment} => {Medium FDI} 0.05813953  1.0000000 0.05813953 1.048780     5
## [3]  {Low GDP,                                                                            
##       Medium Unemployment} => {Medium FDI} 0.06976744  1.0000000 0.06976744 1.048780     6
## [4]  {Low GDP,                                                                            
##       Low Inflation,                                                                      
##       Medium Unemployment} => {Medium FDI} 0.05813953  1.0000000 0.05813953 1.048780     5
## [5]  {Medium Unemployment} => {Low GDP}    0.06976744  1.0000000 0.06976744 1.011765     6
## [6]  {Medium FDI}          => {Low GDP}    0.95348837  1.0000000 0.95348837 1.011765    82
## [7]  {Low GDP}             => {Medium FDI} 0.95348837  0.9647059 0.98837209 1.011765    82
## [8]  {Low Inflation,                                                                      
##       Medium Unemployment} => {Low GDP}    0.05813953  1.0000000 0.05813953 1.011765     5
## [9]  {Medium FDI,                                                                         
##       Medium Unemployment} => {Low GDP}    0.06976744  1.0000000 0.06976744 1.011765     6
## [10] {Low Unemployment,                                                                   
##       Medium FDI}          => {Low GDP}    0.86046512  1.0000000 0.86046512 1.011765    74

Conclusion

There is a strong relationship between Medium Unemployment and Medium FDI which might be the reasonable as we know that employment rate are driven by Capital which comes from investment.However this relationship between Medium Unemployment and medium FDI isn’t strong correlated as the lift is slight over 1.048 which confirms the relationship but doesn’t indicate the major impact.

Moreover, based on the 6 rule, it indicates that FDI is not one of the most important drivers of GDP growth, because there are 95.35% cases with Low GDP and Medium FDI which indicates that while Medium FDI is most often received by countries with Low GDP, investment does not automatically lead to economic growth.