Objective: In this research project, I aim to analyze the factors influencing the indicator of the Sustainable Development Goals related to “Zero Hunger,” which focuses on ending hunger, achieving food security, improving nutrition, and promoting sustainable agriculture. I will consider countries and years as observations to understand what drives this indicator’s variation.

Research Questions:

What are the key factors that influence progress towards achieving “Zero Hunger” in different countries over the years? Can we use machine learning methods to predict and understand the factors affecting “Zero Hunger” progress?

Methodology:

Step 1: Data Preparation:

Import the dataset containing information about countries, years, and the “Zero Hunger” indicator. Clean the data by addressing missing values or inconsistencies. Step 2: Feature Selection:

Identify a set of observable variables (features) that I believe may have a significant impact on the “Zero Hunger” indicator based on my research questions. Step 3: Data Analysis:

Perform exploratory data analysis (EDA) to gain insights into the relationships between the selected features and the “Zero Hunger” indicator. Utilize data visualization techniques to visually represent these relationships. Step 4: Machine Learning Methods:

Apply at least three machine learning methods to predict and interpret the factors influencing the “Zero Hunger” indicator. I’ll use R programming and common machine learning methods such as Linear Regression, Decision Trees, and Random Forests. Step 5: Model Evaluation:

Split the dataset into training and testing subsets to evaluate the performance of each machine learning method. Utilize metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R2) to assess model performance. Step 6: Identifying Important Variables:

Analyze feature importance or coefficients for each machine learning method to identify the most influential observable variables that affect the “Zero Hunger” indicator. Step 7: Report Writing:

1.0 Load the libraries:

library(ggplot2)  # for graphics
library(dplyr) # For piping function
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(rpart)        # For decision trees
library(class)        # For k-NN
library(neuralnet)    # For neural networks
## 
## Attaching package: 'neuralnet'
## The following object is masked from 'package:dplyr':
## 
##     compute
#install.packages('caTools')
#install.packages('party')
#install.packages('magrittr')
#install.packages('rpart.plot')
library(caTools)
library(party)
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## 
## Attaching package: 'party'
## The following object is masked from 'package:dplyr':
## 
##     where
library(magrittr)
library(rpart.plot)
library(caret)
## Loading required package: lattice
#install.packages(c('neuralnet','keras','tensorflow'),dependencies = T)

2.0 Load dataset

df <- read.csv("AWU.csv",sep = "," ,header = TRUE)
df <- as.data.frame(df)
head(df)
##   Years                              Target Austria Belgium Bulgaria Cyprus
## 1  2009 agricultural factor income per AWU     84.2    76.5     89.5   89.6
## 2  2010 agricultural factor income per AWU    100.0   100.0    100.0  100.0
## 3  2011 agricultural factor income per AWU    113.3    90.4    115.5  102.3
## 4  2012 agricultural factor income per AWU    106.0   108.3    133.7  105.0
## 5  2013 agricultural factor income per AWU     94.0    88.5    161.2  113.6
## 6  2014 agricultural factor income per AWU     87.2    84.2    171.1  109.0
##   Croatia Slovakia Slovenia Spain Estonia Finland France Greece Hungary Ireland
## 1   109.4     65.6     86.1  94.3    60.6    90.9   70.8   96.7    85.0    88.4
## 2   100.0    100.0    100.0 100.0   100.0   100.0  100.0  100.0   100.0   100.0
## 3    95.4    118.6    114.0 101.2   124.3    87.6  104.6   89.1   149.3   127.5
## 4    81.7    133.6     90.7 102.7   143.2    89.2  105.3   91.5   137.9   114.6
## 5    90.5    130.3     91.1 112.9   132.6    86.2   89.5   84.3   151.6   118.8
## 6    78.3    143.3    103.2 118.6   123.7    85.0  101.8   90.0   161.1   122.5
##   Italy Latvia Lithuania Luxembourg Malta Netherlands Poland Portugal
## 1 110.3   78.2      83.8       86.0 107.8        78.0   85.6     85.7
## 2 100.0  100.0     100.0      100.0 100.0       100.0  100.0    100.0
## 3 117.1   95.7     125.9      101.1  87.9        85.6  131.6     86.0
## 4 124.9  115.1     156.8      105.9  83.0        92.3  122.7     92.4
## 5 149.0  103.7     138.4       90.7  80.2       103.6  134.2    105.9
## 6 136.2  115.3     125.8      118.8  78.9        99.5  110.5    107.2
##   CzechRepublic Romania Sweden Iceland UnitedKingdom Switzerland
## 1          84.8    77.9   77.3   143.6         102.6       103.0
## 2         100.0   100.0  100.0   100.0         100.0       100.0
## 3         134.8   129.0  102.9   109.4         116.5       106.1
## 4         133.7    96.1  102.2   120.7         110.4       105.0
## 5         135.1   113.5   92.5    97.8         119.0       110.9
## 6         155.4   123.9  101.5   155.5         117.1       121.6

3.0 Clean data by removing all empty cell

# remove all rows with (x   Not available   )
df <- subset(df, !apply(df == "x", 1, any))

df["Target"] <- as.factor(x=df$Target)
str(df)  # check the structure of the cleaned data
## 'data.frame':    34 obs. of  30 variables:
##  $ Years        : int  2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 ...
##  $ Target       : Factor w/ 3 levels "agricultural factor income per AWU ",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Austria      : num  84.2 100 113.3 106 94 ...
##  $ Belgium      : num  76.5 100 90.4 108.3 88.5 ...
##  $ Bulgaria     : num  89.5 100 115.5 133.7 161.2 ...
##  $ Cyprus       : num  89.6 100 102.3 105 113.6 ...
##  $ Croatia      : num  109.4 100 95.4 81.7 90.5 ...
##  $ Slovakia     : num  65.6 100 118.6 133.6 130.3 ...
##  $ Slovenia     : num  86.1 100 114 90.7 91.1 ...
##  $ Spain        : num  94.3 100 101.2 102.7 112.9 ...
##  $ Estonia      : num  60.6 100 124.3 143.2 132.6 ...
##  $ Finland      : num  90.9 100 87.6 89.2 86.2 85 68.4 71.6 72.8 74.8 ...
##  $ France       : num  70.8 100 104.6 105.3 89.5 ...
##  $ Greece       : num  96.7 100 89.1 91.5 84.3 ...
##  $ Hungary      : num  85 100 149 138 152 ...
##  $ Ireland      : num  88.4 100 127.5 114.6 118.8 ...
##  $ Italy        : num  110 100 117 125 149 ...
##  $ Latvia       : num  78.2 100 95.7 115.1 103.7 ...
##  $ Lithuania    : num  83.8 100 125.9 156.8 138.4 ...
##  $ Luxembourg   : num  86 100 101.1 105.9 90.7 ...
##  $ Malta        : num  107.8 100 87.9 83 80.2 ...
##  $ Netherlands  : num  78 100 85.6 92.3 103.6 ...
##  $ Poland       : num  85.6 100 131.6 122.7 134.2 ...
##  $ Portugal     : num  85.7 100 86 92.4 105.9 ...
##  $ CzechRepublic: num  84.8 100 134.8 133.7 135.1 ...
##  $ Romania      : num  77.9 100 129 96.1 113.5 ...
##  $ Sweden       : num  77.3 100 102.9 102.2 92.5 ...
##  $ Iceland      : num  143.6 100 109.4 120.7 97.8 ...
##  $ UnitedKingdom: num  103 100 116 110 119 ...
##  $ Switzerland  : num  103 100 106 105 111 ...
sum(is.na(df))
## [1] 0

4.0 Statistical summaries

summary(df)
##      Years     
##  Min.   :2005  
##  1st Qu.:2011  
##  Median :2014  
##  Mean   :2014  
##  3rd Qu.:2017  
##  Max.   :2020  
##                                                          Target  
##  agricultural factor income per AWU                         :12  
##  area under organic farming                                 :13  
##  government support to agricultural research and development: 9  
##                                                                  
##                                                                  
##                                                                  
##     Austria           Belgium          Bulgaria          Cyprus       
##  Min.   :  4.700   Min.   : 16.70   Min.   :  1.70   Min.   :  0.100  
##  1st Qu.:  6.425   1st Qu.: 20.55   1st Qu.:  5.35   1st Qu.:  2.325  
##  Median : 37.300   Median : 37.20   Median : 19.05   Median :  5.950  
##  Mean   : 45.788   Mean   : 50.54   Mean   : 66.47   Mean   : 41.312  
##  3rd Qu.: 89.675   3rd Qu.: 83.72   3rd Qu.:129.15   3rd Qu.:102.125  
##  Max.   :113.300   Max.   :108.30   Max.   :244.70   Max.   :128.400  
##     Croatia          Slovakia          Slovenia          Spain       
##  Min.   :  0.90   Min.   :  4.900   Min.   :  4.60   Min.   :  4.60  
##  1st Qu.:  3.45   1st Qu.:  7.425   1st Qu.:  7.50   1st Qu.:  8.95  
##  Median :  5.75   Median : 10.900   Median :  9.85   Median :107.80  
##  Mean   : 41.03   Mean   : 58.921   Mean   : 43.39   Mean   :159.61  
##  3rd Qu.: 94.17   3rd Qu.:127.375   3rd Qu.: 95.97   3rd Qu.:308.68  
##  Max.   :144.00   Max.   :207.100   Max.   :135.60   Max.   :717.80  
##     Estonia          Finland           France           Greece      
##  Min.   :  3.10   Min.   :  7.20   Min.   :  6.30   Min.   :  1.70  
##  1st Qu.:  7.00   1st Qu.: 16.50   1st Qu.: 10.05   1st Qu.:  4.70  
##  Median :  9.00   Median : 72.20   Median :100.90   Median : 29.80  
##  Mean   : 42.15   Mean   : 56.81   Mean   :134.58   Mean   : 45.29  
##  3rd Qu.: 95.35   3rd Qu.: 87.40   3rd Qu.:232.45   3rd Qu.: 89.78  
##  Max.   :143.20   Max.   :100.00   Max.   :459.00   Max.   :133.60  
##     Hungary           Ireland           Italy             Latvia      
##  Min.   :  6.500   Min.   :  1.80   Min.   :  0.800   Min.   :  4.90  
##  1st Qu.:  8.125   1st Qu.:  2.75   1st Qu.:  1.625   1st Qu.:  7.95  
##  Median : 33.850   Median : 91.05   Median :126.900   Median : 12.10  
##  Mean   : 64.765   Mean   : 70.19   Mean   :123.865   Mean   : 50.48  
##  3rd Qu.:146.450   3rd Qu.:117.45   3rd Qu.:220.100   3rd Qu.:102.78  
##  Max.   :193.900   Max.   :155.00   Max.   :440.700   Max.   :191.30  
##    Lithuania        Luxembourg         Malta          Netherlands    
##  Min.   :  6.00   Min.   :  0.10   Min.   :  0.300   Min.   :  0.10  
##  1st Qu.:  8.60   1st Qu.:  0.95   1st Qu.:  1.725   1st Qu.:  0.40  
##  Median : 11.60   Median :  6.40   Median :  3.400   Median : 92.90  
##  Mean   : 51.87   Mean   : 39.31   Mean   : 30.821   Mean   : 71.53  
##  3rd Qu.:110.83   3rd Qu.: 95.42   3rd Qu.: 77.000   3rd Qu.:104.05  
##  Max.   :189.90   Max.   :122.60   Max.   :107.800   Max.   :237.30  
##      Poland           Portugal       CzechRepublic       Romania      
##  Min.   :  2.500   Min.   :  1.000   Min.   :  4.30   Min.   :  7.10  
##  1st Qu.:  2.775   1st Qu.:  3.775   1st Qu.:  6.55   1st Qu.: 13.78  
##  Median : 57.650   Median : 17.750   Median : 41.60   Median : 21.30  
##  Mean   : 63.018   Mean   : 46.174   Mean   : 62.56   Mean   : 52.96  
##  3rd Qu.:111.550   3rd Qu.: 98.100   3rd Qu.:134.53   3rd Qu.:110.12  
##  Max.   :183.800   Max.   :142.900   Max.   :163.80   Max.   :140.90  
##      Sweden          Iceland       UnitedKingdom    Switzerland     
##  Min.   :  0.70   Min.   :  3.40   Min.   :  4.2   Min.   :  2.600  
##  1st Qu.:  2.10   1st Qu.: 11.38   1st Qu.:  5.1   1st Qu.:  3.425  
##  Median : 47.45   Median : 18.75   Median :104.4   Median :108.500  
##  Mean   : 48.79   Mean   : 56.30   Mean   :107.5   Mean   :157.056  
##  3rd Qu.: 96.35   3rd Qu.:117.80   3rd Qu.:153.5   3rd Qu.:272.050  
##  Max.   :114.00   Max.   :174.70   Max.   :341.0   Max.   :482.200

4.1 Graphs For Few Selected countries

4.1.1 Visualize the trend of indicators over time for Austria
ggplot(df, aes(x=Years , y = Austria, color = Target)) + geom_line() +
  labs(title = "Sustainable Development Goal 2 For Austria Over The Years", x = "Years", y = "Selected Indicators")

4.1.2 Visualize the trend of indicators over time for Belgium
ggplot(df, aes(x=Years , y =Belgium, color = Target)) + geom_line() +
  labs(title = "Sustainable Development Goal 2 For Belgium Over The Years", x = "Years", y = "Selected Indicators")

4.1.3 Visualize the trend of indicators over time for United Kingdom
ggplot(df, aes(x=Years , y=UnitedKingdom, color = Target)) + geom_line() +
  labs(title = "Sustainable Development Goal 2 For United Kingdom Over The Years", x = "Years", y = "Selected Indicators")

4.1.4 Visualize the trend of indicators over time for Spain
ggplot(df, aes(x=Years , y =Spain, color = Target)) + geom_line() +
  labs(title = "Sustainable Development Goal 2 For Spain Over The Years", x = "Years", y = "Selected Indicators")

4.1.5 Visualize the trend of indicators over time for Hungary
ggplot(df, aes(x=Years , y =Hungary, color = Target)) + geom_line() +
  labs(title = "Sustainable Development Goal 2 For Hungary Over The Years", x = "Years", y = "Selected Indicators")

4.1.6 Visualize the trend of indicators over time for Finland
ggplot(df, aes(x=Years , y =Finland, color = Target)) + geom_line() +
  labs(title = "Sustainable Development Goal 2 For Finland Over The Years", x = "Years", y = "Selected Indicators")

4.1.7 Visualize the trend of indicators over time for France
ggplot(df, aes(x=Years , y =France, color = Target)) + geom_line() +
  labs(title = "Sustainable Development Goal 2 For France Over The Years", x = "Years", y = "Selected Indicators")

4.1.8 Visualize the trend of indicators over time for Portugal
ggplot(df, aes(x=Years , y =Portugal, color = Target)) + geom_line() +
  labs(title = "Sustainable Development Goal 2 For Portugal Over The Years", x = "Years", y = "Selected Indicators")

4.1.9 Visualize the trend of indicators over time for CzechRepublic
ggplot(df, aes(x=Years , y =CzechRepublic, color = Target)) + geom_line() +
  labs(title = "Sustainable Development Goal 2 For CzechRepublic Over The Years", x = "Years", y = "Selected Indicators")

4.1.10 Visualize the trend of indicators over time for Poland
ggplot(df, aes(x=Years , y =Poland, color = Target)) + geom_line()+
  labs(title = "Sustainable Development Goal 2 For Poland Over The Years", x = "Years", y = "Selected Indicators")

5.0 Model Training

set.seed(123)  # For reproducibility
train_indices <- sample(1:nrow(df))
df <- df[train_indices,2:30]  # shuffle the dataset excluding the years

# Split data into training and testing sets
create_train_test <- function(data, size = 0.8, train = TRUE) {
    n_row = nrow(data)
    total_row = size * n_row
    train_sample <- 1: total_row
    if (train == TRUE) {
        return (data[train_sample, ])
    } else {
        return (data[-train_sample, ])
    }
}



# feature scaling 
train_data <- create_train_test(df, 0.8, train = TRUE)
test_data <- create_train_test(df, 0.8, train = FALSE)


train_data[,2:29] <- as.data.frame(scale(train_data[,2:29]))
test_data[,2:29] <- as.data.frame(scale(test_data[,2:29]))


# Check the dimension of the data 
dim(train_data)
## [1] 27 29
dim(test_data)
## [1]  7 29
prop.table(table(train_data$Target))
## 
##                         agricultural factor income per AWU  
##                                                   0.3333333 
##                                 area under organic farming  
##                                                   0.3703704 
## government support to agricultural research and development 
##                                                   0.2962963
# Check for possible NA's 
sum(is.na(train_data))
## [1] 0
sum(is.na(test_data))
## [1] 0
head(train_data)
##                                                         Target    Austria
## 31 government support to agricultural research and development -0.3356672
## 15                                 area under organic farming  -1.0247729
## 19                                 area under organic farming  -1.0018027
## 14                                 area under organic farming  -1.0298774
## 3                          agricultural factor income per AWU   1.7367542
## 10                         agricultural factor income per AWU   1.3079774
##       Belgium   Bulgaria     Cyprus    Croatia   Slovakia   Slovenia      Spain
## 31 -0.4166382 -0.5941732 -0.6519674 -0.6103252 -0.6419443 -0.7117207  1.4427283
## 15 -1.0582658 -0.7603215 -0.7529896 -0.7465624 -0.7050414 -0.7284143 -0.9298614
## 19 -1.0135753 -0.7315891 -0.7374477 -0.6972011 -0.6858379 -0.6845935 -0.9166637
## 14 -1.0678423 -0.7640692 -0.7568750 -0.7524857 -0.7036697 -0.7263276 -0.9322610
## 3   1.2847921  0.6525634  1.2285998  1.1074484  0.8531823  1.5231395 -0.3581603
## 10  1.0230336  1.8868076  1.6268604  1.6958352  1.9916733  1.9738676 -0.2057866
##       Estonia    Finland     France     Greece    Hungary    Ireland
## 31 -0.6861565  0.5847645  1.6413096 -0.2494989 -0.4558857  0.4548326
## 15 -0.7381847 -1.4031516 -0.9028668 -1.0050137 -0.8327199 -1.2478083
## 19 -0.6778319 -1.1973023 -0.8867888 -0.9612862 -0.8266906 -1.2350749
## 14 -0.7444281 -1.3766852 -0.9047961 -1.0098723 -0.8236759 -1.2423512
## 3   1.7654154  0.9170641 -0.2726107  1.1133429  1.3122205  1.0387511
## 10  0.8726103  0.5406539 -0.1555632  1.4850270  1.6136879  1.1096945
##          Italy     Latvia  Lithuania Luxembourg      Malta Netherlands
## 31  1.30514701 -0.6393022 -0.7114225 -0.7535354 -0.7280323   0.5856206
## 15 -1.09031200 -0.6944610 -0.7044497 -0.6628822 -0.6700548  -1.2033882
## 19 -1.08856221 -0.6464194 -0.6730719 -0.6381586 -0.6498888  -1.2068121
## 14 -1.09118689 -0.6944610 -0.6817880 -0.6834852 -0.6750964  -1.2051001
## 3  -0.07456039  0.8677797  1.3490492  1.3273669  1.4801518   0.2569223
## 10  0.11791623  1.4923201  0.9882052  1.7703313  1.3641969   0.3921680
##        Poland   Portugal CzechRepublic    Romania      Sweden    Iceland
## 31 -0.1882219 -0.4917611    -0.3441579 -0.5123669  0.07890057 -0.8000068
## 15 -1.0911320 -0.8183061    -0.9457780 -0.8251643 -1.13026166 -0.7286643
## 19 -1.0875632 -0.7619382    -0.9630659 -0.7208985 -1.10407901 -0.6191619
## 14 -1.0911320 -0.8338559    -0.9302189 -0.8448371 -1.13502214 -0.7734607
## 3   1.2125378  0.8183061     1.2757214  1.5513090  1.29520352  0.9221677
## 10  1.4855521  1.6929801     1.4486007  1.7342659  1.05717946  1.2241290
##    UnitedKingdom Switzerland
## 31   1.330612821   1.4615733
## 15  -1.025557436  -0.8910270
## 19  -1.020966301  -0.8936698
## 14  -1.029230344  -0.8926127
## 3    0.001020252  -0.3497863
## 10  -0.093557120  -0.2514749

5.1 Decision Tree

# Decision Tree
tree_model <- rpart(Target~., data = train_data , control = rpart.control(maxdepth = 30,minsplit = 1))
rpart.plot(tree_model)

summary(tree_model)
## Call:
## rpart(formula = Target ~ ., data = train_data, control = rpart.control(maxdepth = 30, 
##     minsplit = 1))
##   n= 27 
## 
##          CP nsplit rel error    xerror      xstd
## 1 0.5294118      0 1.0000000 1.2941176 0.1187314
## 2 0.4705882      1 0.4705882 0.6470588 0.1501846
## 3 0.0100000      2 0.0000000 0.0000000 0.0000000
## 
## Variable importance
##  Austria  Belgium Bulgaria   Cyprus  Finland    Spain  Croatia Slovakia 
##       17       17       17       17        9        9        8        8 
## 
## Node number 1: 27 observations,    complexity param=0.5294118
##   predicted class=area under organic farming                                   expected loss=0.6296296  P(node) =1
##     class counts:     9    10     8
##    probabilities: 0.333 0.370 0.296 
##   left son=2 (17 obs) right son=3 (10 obs)
##   Primary splits:
##       Austria  < -0.6457648  to the right, improve=9.455338, (0 missing)
##       Belgium  < -0.6416867  to the right, improve=9.455338, (0 missing)
##       Bulgaria < -0.6491396  to the right, improve=9.455338, (0 missing)
##       Cyprus   < -0.6743088  to the right, improve=9.455338, (0 missing)
##       Spain    < -0.6512096  to the right, improve=9.455338, (0 missing)
##   Surrogate splits:
##       Belgium  < -0.6416867  to the right, agree=1, adj=1, (0 split)
##       Bulgaria < -0.6491396  to the right, agree=1, adj=1, (0 split)
##       Cyprus   < -0.6743088  to the right, agree=1, adj=1, (0 split)
##       Spain    < -0.6512096  to the right, agree=1, adj=1, (0 split)
##       Finland  < -0.4959443  to the right, agree=1, adj=1, (0 split)
## 
## Node number 2: 17 observations,    complexity param=0.4705882
##   predicted class=agricultural factor income per AWU                           expected loss=0.4705882  P(node) =0.6296296
##     class counts:     9     0     8
##    probabilities: 0.529 0.000 0.471 
##   left son=4 (9 obs) right son=5 (8 obs)
##   Primary splits:
##       Austria  < 0.4670132   to the right, improve=8.470588, (0 missing)
##       Belgium  < 0.6447606   to the right, improve=8.470588, (0 missing)
##       Bulgaria < -0.07636528 to the right, improve=8.470588, (0 missing)
##       Cyprus   < 0.1989504   to the right, improve=8.470588, (0 missing)
##       Croatia  < 0.1429284   to the right, improve=8.470588, (0 missing)
##   Surrogate splits:
##       Belgium  < 0.6447606   to the right, agree=1, adj=1, (0 split)
##       Bulgaria < -0.07636528 to the right, agree=1, adj=1, (0 split)
##       Cyprus   < 0.1989504   to the right, agree=1, adj=1, (0 split)
##       Croatia  < 0.1429284   to the right, agree=1, adj=1, (0 split)
##       Slovakia < -0.2160389  to the right, agree=1, adj=1, (0 split)
## 
## Node number 3: 10 observations
##   predicted class=area under organic farming                                   expected loss=0  P(node) =0.3703704
##     class counts:     0    10     0
##    probabilities: 0.000 1.000 0.000 
## 
## Node number 4: 9 observations
##   predicted class=agricultural factor income per AWU                           expected loss=0  P(node) =0.3333333
##     class counts:     9     0     0
##    probabilities: 1.000 0.000 0.000 
## 
## Node number 5: 8 observations
##   predicted class=government support to agricultural research and development  expected loss=0  P(node) =0.2962963
##     class counts:     0     0     8
##    probabilities: 0.000 0.000 1.000
tree_model$variable.importance
##   Austria   Belgium  Bulgaria    Cyprus   Finland     Spain   Croatia  Slovakia 
## 17.925926 17.925926 17.925926 17.925926  9.455338  9.455338  8.470588  8.470588
# Performance Evaluation
tree_predictions <- predict(tree_model, test_data, type = "class")
tree_accuracy <- (sum(tree_predictions == test_data$Target) / nrow(test_data)) * 100

 
# Print the metrics
cat("Decision Tree Accuracy:", tree_accuracy, "% \n")
## Decision Tree Accuracy: 100 %

5.2. K nearest neighbors

# Choose the  K-Value for model accuracy.

trainControl <- trainControl(method="repeatedcv", number=10, repeats=3)
metric <- "Accuracy"

set.seed(7)
grid <- expand.grid(.k=seq(1,25,by=1))
fit.knn <- train(Target~., data=train_data, method="knn", 
                 metric=metric, tuneGrid=grid, trControl=trainControl)
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.00180270425553,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-1.02477289269996, -1.00180270425553,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-1.00180270425553, -1.02987737902094,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-1.02477289269996, -1.00180270425553,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
knn.k2 <- fit.knn$bestTune

print(fit.knn)
## k-Nearest Neighbors 
## 
## 27 samples
## 28 predictors
##  3 classes: 'agricultural factor income per AWU ', 'area under organic farming ', 'government support to agricultural research and development' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 25, 25, 24, 24, 24, 24, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa     
##    1  1.0000000  1.00000000
##    2  1.0000000  1.00000000
##    3  1.0000000  1.00000000
##    4  1.0000000  1.00000000
##    5  1.0000000  1.00000000
##    6  1.0000000  1.00000000
##    7  1.0000000  1.00000000
##    8  1.0000000  1.00000000
##    9  1.0000000  1.00000000
##   10  1.0000000  1.00000000
##   11  1.0000000  1.00000000
##   12  1.0000000  1.00000000
##   13  1.0000000  1.00000000
##   14  0.8611111  0.78333333
##   15  0.7166667  0.55000000
##   16  0.6833333  0.50555556
##   17  0.6666667  0.48333333
##   18  0.7000000  0.52777778
##   19  0.6666667  0.48333333
##   20  0.6666667  0.48333333
##   21  0.6666667  0.48333333
##   22  0.6833333  0.50555556
##   23  0.5388889  0.29444444
##   24  0.4166667  0.07777778
##   25  0.3833333  0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 13.
plot(fit.knn)

I found optimal k= 13, the number of closest instances to collect in order to make a prediction.

Using the fit model to predict class for our test set, and print out the confusion matrix:

set.seed(7)
prediction <- predict(fit.knn, newdata = test_data)
cf <- confusionMatrix(prediction, test_data$Target)
print(cf)
## Confusion Matrix and Statistics
## 
##                                                              Reference
## Prediction                                                    agricultural factor income per AWU 
##   agricultural factor income per AWU                                                            3
##   area under organic farming                                                                    0
##   government support to agricultural research and development                                   0
##                                                              Reference
## Prediction                                                    area under organic farming 
##   agricultural factor income per AWU                                                    0
##   area under organic farming                                                            3
##   government support to agricultural research and development                           0
##                                                              Reference
## Prediction                                                    government support to agricultural research and development
##   agricultural factor income per AWU                                                                                    0
##   area under organic farming                                                                                            0
##   government support to agricultural research and development                                                           1
## 
## Overall Statistics
##                                      
##                Accuracy : 1          
##                  95% CI : (0.5904, 1)
##     No Information Rate : 0.4286     
##     P-Value [Acc > NIR] : 0.002656   
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
## 
## Statistics by Class:
## 
##                      Class: agricultural factor income per AWU 
## Sensitivity                                              1.0000
## Specificity                                              1.0000
## Pos Pred Value                                           1.0000
## Neg Pred Value                                           1.0000
## Prevalence                                               0.4286
## Detection Rate                                           0.4286
## Detection Prevalence                                     0.4286
## Balanced Accuracy                                        1.0000
##                      Class: area under organic farming 
## Sensitivity                                      1.0000
## Specificity                                      1.0000
## Pos Pred Value                                   1.0000
## Neg Pred Value                                   1.0000
## Prevalence                                       0.4286
## Detection Rate                                   0.4286
## Detection Prevalence                             0.4286
## Balanced Accuracy                                1.0000
##                      Class: government support to agricultural research and development
## Sensitivity                                                                      1.0000
## Specificity                                                                      1.0000
## Pos Pred Value                                                                   1.0000
## Neg Pred Value                                                                   1.0000
## Prevalence                                                                       0.1429
## Detection Rate                                                                   0.1429
## Detection Prevalence                                                             0.1429
## Balanced Accuracy                                                                1.0000

With k = 13, the accuracy of model is 100%

5.3 Neural Networks Model

set.seed(1234)
nn_model = neuralnet(Target~., data=train_data, linear.output = FALSE
)

plot(nn_model,rep = "best")

pred <- predict(nn_model, test_data)
labels <- c("Awu", "Auof", "Rd")
prediction_label <- data.frame(max.col(pred)) %>%     
mutate(pred=labels[max.col.pred.]) %>%
select(2) %>%unlist()

table(test_data$Target, prediction_label)
##                                                              prediction_label
##                                                               Auof Awu Rd
##   agricultural factor income per AWU                             0   3  0
##   area under organic farming                                     3   0  0
##   government support to agricultural research and development    0   0  1
check = as.numeric(test_data$Target) == max.col(pred)
accuracy = (sum(check)/nrow(test_data))*100

# Print the metrics
cat("Neural Network Accuracy is :",accuracy, "% \n")
## Neural Network Accuracy is : 100 %