Objective: In this research project, I aim to analyze the factors influencing the indicator of the Sustainable Development Goals related to “Zero Hunger,” which focuses on ending hunger, achieving food security, improving nutrition, and promoting sustainable agriculture. I will consider countries and years as observations to understand what drives this indicator’s variation.
What are the key factors that influence progress towards achieving “Zero Hunger” in different countries over the years? Can we use machine learning methods to predict and understand the factors affecting “Zero Hunger” progress?
Step 1: Data Preparation:
Import the dataset containing information about countries, years, and the “Zero Hunger” indicator. Clean the data by addressing missing values or inconsistencies. Step 2: Feature Selection:
Identify a set of observable variables (features) that I believe may have a significant impact on the “Zero Hunger” indicator based on my research questions. Step 3: Data Analysis:
Perform exploratory data analysis (EDA) to gain insights into the relationships between the selected features and the “Zero Hunger” indicator. Utilize data visualization techniques to visually represent these relationships. Step 4: Machine Learning Methods:
Apply at least three machine learning methods to predict and interpret the factors influencing the “Zero Hunger” indicator. I’ll use R programming and common machine learning methods such as Linear Regression, Decision Trees, and Random Forests. Step 5: Model Evaluation:
Split the dataset into training and testing subsets to evaluate the performance of each machine learning method. Utilize metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R2) to assess model performance. Step 6: Identifying Important Variables:
Analyze feature importance or coefficients for each machine learning method to identify the most influential observable variables that affect the “Zero Hunger” indicator. Step 7: Report Writing:
library(ggplot2) # for graphics
library(dplyr) # For piping function
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(rpart) # For decision trees
library(class) # For k-NN
library(neuralnet) # For neural networks
##
## Attaching package: 'neuralnet'
## The following object is masked from 'package:dplyr':
##
## compute
#install.packages('caTools')
#install.packages('party')
#install.packages('magrittr')
#install.packages('rpart.plot')
library(caTools)
library(party)
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
##
## Attaching package: 'party'
## The following object is masked from 'package:dplyr':
##
## where
library(magrittr)
library(rpart.plot)
library(caret)
## Loading required package: lattice
#install.packages(c('neuralnet','keras','tensorflow'),dependencies = T)
df <- read.csv("AWU.csv",sep = "," ,header = TRUE)
df <- as.data.frame(df)
head(df)
## Years Target Austria Belgium Bulgaria Cyprus
## 1 2009 agricultural factor income per AWU 84.2 76.5 89.5 89.6
## 2 2010 agricultural factor income per AWU 100.0 100.0 100.0 100.0
## 3 2011 agricultural factor income per AWU 113.3 90.4 115.5 102.3
## 4 2012 agricultural factor income per AWU 106.0 108.3 133.7 105.0
## 5 2013 agricultural factor income per AWU 94.0 88.5 161.2 113.6
## 6 2014 agricultural factor income per AWU 87.2 84.2 171.1 109.0
## Croatia Slovakia Slovenia Spain Estonia Finland France Greece Hungary Ireland
## 1 109.4 65.6 86.1 94.3 60.6 90.9 70.8 96.7 85.0 88.4
## 2 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
## 3 95.4 118.6 114.0 101.2 124.3 87.6 104.6 89.1 149.3 127.5
## 4 81.7 133.6 90.7 102.7 143.2 89.2 105.3 91.5 137.9 114.6
## 5 90.5 130.3 91.1 112.9 132.6 86.2 89.5 84.3 151.6 118.8
## 6 78.3 143.3 103.2 118.6 123.7 85.0 101.8 90.0 161.1 122.5
## Italy Latvia Lithuania Luxembourg Malta Netherlands Poland Portugal
## 1 110.3 78.2 83.8 86.0 107.8 78.0 85.6 85.7
## 2 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
## 3 117.1 95.7 125.9 101.1 87.9 85.6 131.6 86.0
## 4 124.9 115.1 156.8 105.9 83.0 92.3 122.7 92.4
## 5 149.0 103.7 138.4 90.7 80.2 103.6 134.2 105.9
## 6 136.2 115.3 125.8 118.8 78.9 99.5 110.5 107.2
## CzechRepublic Romania Sweden Iceland UnitedKingdom Switzerland
## 1 84.8 77.9 77.3 143.6 102.6 103.0
## 2 100.0 100.0 100.0 100.0 100.0 100.0
## 3 134.8 129.0 102.9 109.4 116.5 106.1
## 4 133.7 96.1 102.2 120.7 110.4 105.0
## 5 135.1 113.5 92.5 97.8 119.0 110.9
## 6 155.4 123.9 101.5 155.5 117.1 121.6
# remove all rows with (x Not available )
df <- subset(df, !apply(df == "x", 1, any))
df["Target"] <- as.factor(x=df$Target)
str(df) # check the structure of the cleaned data
## 'data.frame': 34 obs. of 30 variables:
## $ Years : int 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 ...
## $ Target : Factor w/ 3 levels "agricultural factor income per AWU ",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Austria : num 84.2 100 113.3 106 94 ...
## $ Belgium : num 76.5 100 90.4 108.3 88.5 ...
## $ Bulgaria : num 89.5 100 115.5 133.7 161.2 ...
## $ Cyprus : num 89.6 100 102.3 105 113.6 ...
## $ Croatia : num 109.4 100 95.4 81.7 90.5 ...
## $ Slovakia : num 65.6 100 118.6 133.6 130.3 ...
## $ Slovenia : num 86.1 100 114 90.7 91.1 ...
## $ Spain : num 94.3 100 101.2 102.7 112.9 ...
## $ Estonia : num 60.6 100 124.3 143.2 132.6 ...
## $ Finland : num 90.9 100 87.6 89.2 86.2 85 68.4 71.6 72.8 74.8 ...
## $ France : num 70.8 100 104.6 105.3 89.5 ...
## $ Greece : num 96.7 100 89.1 91.5 84.3 ...
## $ Hungary : num 85 100 149 138 152 ...
## $ Ireland : num 88.4 100 127.5 114.6 118.8 ...
## $ Italy : num 110 100 117 125 149 ...
## $ Latvia : num 78.2 100 95.7 115.1 103.7 ...
## $ Lithuania : num 83.8 100 125.9 156.8 138.4 ...
## $ Luxembourg : num 86 100 101.1 105.9 90.7 ...
## $ Malta : num 107.8 100 87.9 83 80.2 ...
## $ Netherlands : num 78 100 85.6 92.3 103.6 ...
## $ Poland : num 85.6 100 131.6 122.7 134.2 ...
## $ Portugal : num 85.7 100 86 92.4 105.9 ...
## $ CzechRepublic: num 84.8 100 134.8 133.7 135.1 ...
## $ Romania : num 77.9 100 129 96.1 113.5 ...
## $ Sweden : num 77.3 100 102.9 102.2 92.5 ...
## $ Iceland : num 143.6 100 109.4 120.7 97.8 ...
## $ UnitedKingdom: num 103 100 116 110 119 ...
## $ Switzerland : num 103 100 106 105 111 ...
sum(is.na(df))
## [1] 0
summary(df)
## Years
## Min. :2005
## 1st Qu.:2011
## Median :2014
## Mean :2014
## 3rd Qu.:2017
## Max. :2020
## Target
## agricultural factor income per AWU :12
## area under organic farming :13
## government support to agricultural research and development: 9
##
##
##
## Austria Belgium Bulgaria Cyprus
## Min. : 4.700 Min. : 16.70 Min. : 1.70 Min. : 0.100
## 1st Qu.: 6.425 1st Qu.: 20.55 1st Qu.: 5.35 1st Qu.: 2.325
## Median : 37.300 Median : 37.20 Median : 19.05 Median : 5.950
## Mean : 45.788 Mean : 50.54 Mean : 66.47 Mean : 41.312
## 3rd Qu.: 89.675 3rd Qu.: 83.72 3rd Qu.:129.15 3rd Qu.:102.125
## Max. :113.300 Max. :108.30 Max. :244.70 Max. :128.400
## Croatia Slovakia Slovenia Spain
## Min. : 0.90 Min. : 4.900 Min. : 4.60 Min. : 4.60
## 1st Qu.: 3.45 1st Qu.: 7.425 1st Qu.: 7.50 1st Qu.: 8.95
## Median : 5.75 Median : 10.900 Median : 9.85 Median :107.80
## Mean : 41.03 Mean : 58.921 Mean : 43.39 Mean :159.61
## 3rd Qu.: 94.17 3rd Qu.:127.375 3rd Qu.: 95.97 3rd Qu.:308.68
## Max. :144.00 Max. :207.100 Max. :135.60 Max. :717.80
## Estonia Finland France Greece
## Min. : 3.10 Min. : 7.20 Min. : 6.30 Min. : 1.70
## 1st Qu.: 7.00 1st Qu.: 16.50 1st Qu.: 10.05 1st Qu.: 4.70
## Median : 9.00 Median : 72.20 Median :100.90 Median : 29.80
## Mean : 42.15 Mean : 56.81 Mean :134.58 Mean : 45.29
## 3rd Qu.: 95.35 3rd Qu.: 87.40 3rd Qu.:232.45 3rd Qu.: 89.78
## Max. :143.20 Max. :100.00 Max. :459.00 Max. :133.60
## Hungary Ireland Italy Latvia
## Min. : 6.500 Min. : 1.80 Min. : 0.800 Min. : 4.90
## 1st Qu.: 8.125 1st Qu.: 2.75 1st Qu.: 1.625 1st Qu.: 7.95
## Median : 33.850 Median : 91.05 Median :126.900 Median : 12.10
## Mean : 64.765 Mean : 70.19 Mean :123.865 Mean : 50.48
## 3rd Qu.:146.450 3rd Qu.:117.45 3rd Qu.:220.100 3rd Qu.:102.78
## Max. :193.900 Max. :155.00 Max. :440.700 Max. :191.30
## Lithuania Luxembourg Malta Netherlands
## Min. : 6.00 Min. : 0.10 Min. : 0.300 Min. : 0.10
## 1st Qu.: 8.60 1st Qu.: 0.95 1st Qu.: 1.725 1st Qu.: 0.40
## Median : 11.60 Median : 6.40 Median : 3.400 Median : 92.90
## Mean : 51.87 Mean : 39.31 Mean : 30.821 Mean : 71.53
## 3rd Qu.:110.83 3rd Qu.: 95.42 3rd Qu.: 77.000 3rd Qu.:104.05
## Max. :189.90 Max. :122.60 Max. :107.800 Max. :237.30
## Poland Portugal CzechRepublic Romania
## Min. : 2.500 Min. : 1.000 Min. : 4.30 Min. : 7.10
## 1st Qu.: 2.775 1st Qu.: 3.775 1st Qu.: 6.55 1st Qu.: 13.78
## Median : 57.650 Median : 17.750 Median : 41.60 Median : 21.30
## Mean : 63.018 Mean : 46.174 Mean : 62.56 Mean : 52.96
## 3rd Qu.:111.550 3rd Qu.: 98.100 3rd Qu.:134.53 3rd Qu.:110.12
## Max. :183.800 Max. :142.900 Max. :163.80 Max. :140.90
## Sweden Iceland UnitedKingdom Switzerland
## Min. : 0.70 Min. : 3.40 Min. : 4.2 Min. : 2.600
## 1st Qu.: 2.10 1st Qu.: 11.38 1st Qu.: 5.1 1st Qu.: 3.425
## Median : 47.45 Median : 18.75 Median :104.4 Median :108.500
## Mean : 48.79 Mean : 56.30 Mean :107.5 Mean :157.056
## 3rd Qu.: 96.35 3rd Qu.:117.80 3rd Qu.:153.5 3rd Qu.:272.050
## Max. :114.00 Max. :174.70 Max. :341.0 Max. :482.200
ggplot(df, aes(x=Years , y = Austria, color = Target)) + geom_line() +
labs(title = "Sustainable Development Goal 2 For Austria Over The Years", x = "Years", y = "Selected Indicators")
ggplot(df, aes(x=Years , y =Belgium, color = Target)) + geom_line() +
labs(title = "Sustainable Development Goal 2 For Belgium Over The Years", x = "Years", y = "Selected Indicators")
ggplot(df, aes(x=Years , y=UnitedKingdom, color = Target)) + geom_line() +
labs(title = "Sustainable Development Goal 2 For United Kingdom Over The Years", x = "Years", y = "Selected Indicators")
ggplot(df, aes(x=Years , y =Spain, color = Target)) + geom_line() +
labs(title = "Sustainable Development Goal 2 For Spain Over The Years", x = "Years", y = "Selected Indicators")
ggplot(df, aes(x=Years , y =Hungary, color = Target)) + geom_line() +
labs(title = "Sustainable Development Goal 2 For Hungary Over The Years", x = "Years", y = "Selected Indicators")
ggplot(df, aes(x=Years , y =Finland, color = Target)) + geom_line() +
labs(title = "Sustainable Development Goal 2 For Finland Over The Years", x = "Years", y = "Selected Indicators")
ggplot(df, aes(x=Years , y =France, color = Target)) + geom_line() +
labs(title = "Sustainable Development Goal 2 For France Over The Years", x = "Years", y = "Selected Indicators")
ggplot(df, aes(x=Years , y =Portugal, color = Target)) + geom_line() +
labs(title = "Sustainable Development Goal 2 For Portugal Over The Years", x = "Years", y = "Selected Indicators")
ggplot(df, aes(x=Years , y =CzechRepublic, color = Target)) + geom_line() +
labs(title = "Sustainable Development Goal 2 For CzechRepublic Over The Years", x = "Years", y = "Selected Indicators")
ggplot(df, aes(x=Years , y =Poland, color = Target)) + geom_line()+
labs(title = "Sustainable Development Goal 2 For Poland Over The Years", x = "Years", y = "Selected Indicators")
set.seed(123) # For reproducibility
train_indices <- sample(1:nrow(df))
df <- df[train_indices,2:30] # shuffle the dataset excluding the years
# Split data into training and testing sets
create_train_test <- function(data, size = 0.8, train = TRUE) {
n_row = nrow(data)
total_row = size * n_row
train_sample <- 1: total_row
if (train == TRUE) {
return (data[train_sample, ])
} else {
return (data[-train_sample, ])
}
}
# feature scaling
train_data <- create_train_test(df, 0.8, train = TRUE)
test_data <- create_train_test(df, 0.8, train = FALSE)
train_data[,2:29] <- as.data.frame(scale(train_data[,2:29]))
test_data[,2:29] <- as.data.frame(scale(test_data[,2:29]))
# Check the dimension of the data
dim(train_data)
## [1] 27 29
dim(test_data)
## [1] 7 29
prop.table(table(train_data$Target))
##
## agricultural factor income per AWU
## 0.3333333
## area under organic farming
## 0.3703704
## government support to agricultural research and development
## 0.2962963
# Check for possible NA's
sum(is.na(train_data))
## [1] 0
sum(is.na(test_data))
## [1] 0
head(train_data)
## Target Austria
## 31 government support to agricultural research and development -0.3356672
## 15 area under organic farming -1.0247729
## 19 area under organic farming -1.0018027
## 14 area under organic farming -1.0298774
## 3 agricultural factor income per AWU 1.7367542
## 10 agricultural factor income per AWU 1.3079774
## Belgium Bulgaria Cyprus Croatia Slovakia Slovenia Spain
## 31 -0.4166382 -0.5941732 -0.6519674 -0.6103252 -0.6419443 -0.7117207 1.4427283
## 15 -1.0582658 -0.7603215 -0.7529896 -0.7465624 -0.7050414 -0.7284143 -0.9298614
## 19 -1.0135753 -0.7315891 -0.7374477 -0.6972011 -0.6858379 -0.6845935 -0.9166637
## 14 -1.0678423 -0.7640692 -0.7568750 -0.7524857 -0.7036697 -0.7263276 -0.9322610
## 3 1.2847921 0.6525634 1.2285998 1.1074484 0.8531823 1.5231395 -0.3581603
## 10 1.0230336 1.8868076 1.6268604 1.6958352 1.9916733 1.9738676 -0.2057866
## Estonia Finland France Greece Hungary Ireland
## 31 -0.6861565 0.5847645 1.6413096 -0.2494989 -0.4558857 0.4548326
## 15 -0.7381847 -1.4031516 -0.9028668 -1.0050137 -0.8327199 -1.2478083
## 19 -0.6778319 -1.1973023 -0.8867888 -0.9612862 -0.8266906 -1.2350749
## 14 -0.7444281 -1.3766852 -0.9047961 -1.0098723 -0.8236759 -1.2423512
## 3 1.7654154 0.9170641 -0.2726107 1.1133429 1.3122205 1.0387511
## 10 0.8726103 0.5406539 -0.1555632 1.4850270 1.6136879 1.1096945
## Italy Latvia Lithuania Luxembourg Malta Netherlands
## 31 1.30514701 -0.6393022 -0.7114225 -0.7535354 -0.7280323 0.5856206
## 15 -1.09031200 -0.6944610 -0.7044497 -0.6628822 -0.6700548 -1.2033882
## 19 -1.08856221 -0.6464194 -0.6730719 -0.6381586 -0.6498888 -1.2068121
## 14 -1.09118689 -0.6944610 -0.6817880 -0.6834852 -0.6750964 -1.2051001
## 3 -0.07456039 0.8677797 1.3490492 1.3273669 1.4801518 0.2569223
## 10 0.11791623 1.4923201 0.9882052 1.7703313 1.3641969 0.3921680
## Poland Portugal CzechRepublic Romania Sweden Iceland
## 31 -0.1882219 -0.4917611 -0.3441579 -0.5123669 0.07890057 -0.8000068
## 15 -1.0911320 -0.8183061 -0.9457780 -0.8251643 -1.13026166 -0.7286643
## 19 -1.0875632 -0.7619382 -0.9630659 -0.7208985 -1.10407901 -0.6191619
## 14 -1.0911320 -0.8338559 -0.9302189 -0.8448371 -1.13502214 -0.7734607
## 3 1.2125378 0.8183061 1.2757214 1.5513090 1.29520352 0.9221677
## 10 1.4855521 1.6929801 1.4486007 1.7342659 1.05717946 1.2241290
## UnitedKingdom Switzerland
## 31 1.330612821 1.4615733
## 15 -1.025557436 -0.8910270
## 19 -1.020966301 -0.8936698
## 14 -1.029230344 -0.8926127
## 3 0.001020252 -0.3497863
## 10 -0.093557120 -0.2514749
# Decision Tree
tree_model <- rpart(Target~., data = train_data , control = rpart.control(maxdepth = 30,minsplit = 1))
rpart.plot(tree_model)
summary(tree_model)
## Call:
## rpart(formula = Target ~ ., data = train_data, control = rpart.control(maxdepth = 30,
## minsplit = 1))
## n= 27
##
## CP nsplit rel error xerror xstd
## 1 0.5294118 0 1.0000000 1.2941176 0.1187314
## 2 0.4705882 1 0.4705882 0.6470588 0.1501846
## 3 0.0100000 2 0.0000000 0.0000000 0.0000000
##
## Variable importance
## Austria Belgium Bulgaria Cyprus Finland Spain Croatia Slovakia
## 17 17 17 17 9 9 8 8
##
## Node number 1: 27 observations, complexity param=0.5294118
## predicted class=area under organic farming expected loss=0.6296296 P(node) =1
## class counts: 9 10 8
## probabilities: 0.333 0.370 0.296
## left son=2 (17 obs) right son=3 (10 obs)
## Primary splits:
## Austria < -0.6457648 to the right, improve=9.455338, (0 missing)
## Belgium < -0.6416867 to the right, improve=9.455338, (0 missing)
## Bulgaria < -0.6491396 to the right, improve=9.455338, (0 missing)
## Cyprus < -0.6743088 to the right, improve=9.455338, (0 missing)
## Spain < -0.6512096 to the right, improve=9.455338, (0 missing)
## Surrogate splits:
## Belgium < -0.6416867 to the right, agree=1, adj=1, (0 split)
## Bulgaria < -0.6491396 to the right, agree=1, adj=1, (0 split)
## Cyprus < -0.6743088 to the right, agree=1, adj=1, (0 split)
## Spain < -0.6512096 to the right, agree=1, adj=1, (0 split)
## Finland < -0.4959443 to the right, agree=1, adj=1, (0 split)
##
## Node number 2: 17 observations, complexity param=0.4705882
## predicted class=agricultural factor income per AWU expected loss=0.4705882 P(node) =0.6296296
## class counts: 9 0 8
## probabilities: 0.529 0.000 0.471
## left son=4 (9 obs) right son=5 (8 obs)
## Primary splits:
## Austria < 0.4670132 to the right, improve=8.470588, (0 missing)
## Belgium < 0.6447606 to the right, improve=8.470588, (0 missing)
## Bulgaria < -0.07636528 to the right, improve=8.470588, (0 missing)
## Cyprus < 0.1989504 to the right, improve=8.470588, (0 missing)
## Croatia < 0.1429284 to the right, improve=8.470588, (0 missing)
## Surrogate splits:
## Belgium < 0.6447606 to the right, agree=1, adj=1, (0 split)
## Bulgaria < -0.07636528 to the right, agree=1, adj=1, (0 split)
## Cyprus < 0.1989504 to the right, agree=1, adj=1, (0 split)
## Croatia < 0.1429284 to the right, agree=1, adj=1, (0 split)
## Slovakia < -0.2160389 to the right, agree=1, adj=1, (0 split)
##
## Node number 3: 10 observations
## predicted class=area under organic farming expected loss=0 P(node) =0.3703704
## class counts: 0 10 0
## probabilities: 0.000 1.000 0.000
##
## Node number 4: 9 observations
## predicted class=agricultural factor income per AWU expected loss=0 P(node) =0.3333333
## class counts: 9 0 0
## probabilities: 1.000 0.000 0.000
##
## Node number 5: 8 observations
## predicted class=government support to agricultural research and development expected loss=0 P(node) =0.2962963
## class counts: 0 0 8
## probabilities: 0.000 0.000 1.000
tree_model$variable.importance
## Austria Belgium Bulgaria Cyprus Finland Spain Croatia Slovakia
## 17.925926 17.925926 17.925926 17.925926 9.455338 9.455338 8.470588 8.470588
# Performance Evaluation
tree_predictions <- predict(tree_model, test_data, type = "class")
tree_accuracy <- (sum(tree_predictions == test_data$Target) / nrow(test_data)) * 100
# Print the metrics
cat("Decision Tree Accuracy:", tree_accuracy, "% \n")
## Decision Tree Accuracy: 100 %
# Choose the K-Value for model accuracy.
trainControl <- trainControl(method="repeatedcv", number=10, repeats=3)
metric <- "Accuracy"
set.seed(7)
grid <- expand.grid(.k=seq(1,25,by=1))
fit.knn <- train(Target~., data=train_data, method="knn",
metric=metric, tuneGrid=grid, trControl=trainControl)
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.00180270425553,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-1.02477289269996, -1.00180270425553,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-1.00180270425553, -1.02987737902094,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-1.02477289269996, -1.00180270425553,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
## Warning in knn3Train(train = structure(c(-0.335667239366992, -1.02477289269996,
## : k = 25 exceeds number 24 of patterns
knn.k2 <- fit.knn$bestTune
print(fit.knn)
## k-Nearest Neighbors
##
## 27 samples
## 28 predictors
## 3 classes: 'agricultural factor income per AWU ', 'area under organic farming ', 'government support to agricultural research and development'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 25, 25, 24, 24, 24, 24, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 1 1.0000000 1.00000000
## 2 1.0000000 1.00000000
## 3 1.0000000 1.00000000
## 4 1.0000000 1.00000000
## 5 1.0000000 1.00000000
## 6 1.0000000 1.00000000
## 7 1.0000000 1.00000000
## 8 1.0000000 1.00000000
## 9 1.0000000 1.00000000
## 10 1.0000000 1.00000000
## 11 1.0000000 1.00000000
## 12 1.0000000 1.00000000
## 13 1.0000000 1.00000000
## 14 0.8611111 0.78333333
## 15 0.7166667 0.55000000
## 16 0.6833333 0.50555556
## 17 0.6666667 0.48333333
## 18 0.7000000 0.52777778
## 19 0.6666667 0.48333333
## 20 0.6666667 0.48333333
## 21 0.6666667 0.48333333
## 22 0.6833333 0.50555556
## 23 0.5388889 0.29444444
## 24 0.4166667 0.07777778
## 25 0.3833333 0.00000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 13.
plot(fit.knn)
I found optimal k= 13, the number of closest instances to collect in
order to make a prediction.
Using the fit model to predict class for our test set, and print out the confusion matrix:
set.seed(7)
prediction <- predict(fit.knn, newdata = test_data)
cf <- confusionMatrix(prediction, test_data$Target)
print(cf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction agricultural factor income per AWU
## agricultural factor income per AWU 3
## area under organic farming 0
## government support to agricultural research and development 0
## Reference
## Prediction area under organic farming
## agricultural factor income per AWU 0
## area under organic farming 3
## government support to agricultural research and development 0
## Reference
## Prediction government support to agricultural research and development
## agricultural factor income per AWU 0
## area under organic farming 0
## government support to agricultural research and development 1
##
## Overall Statistics
##
## Accuracy : 1
## 95% CI : (0.5904, 1)
## No Information Rate : 0.4286
## P-Value [Acc > NIR] : 0.002656
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: agricultural factor income per AWU
## Sensitivity 1.0000
## Specificity 1.0000
## Pos Pred Value 1.0000
## Neg Pred Value 1.0000
## Prevalence 0.4286
## Detection Rate 0.4286
## Detection Prevalence 0.4286
## Balanced Accuracy 1.0000
## Class: area under organic farming
## Sensitivity 1.0000
## Specificity 1.0000
## Pos Pred Value 1.0000
## Neg Pred Value 1.0000
## Prevalence 0.4286
## Detection Rate 0.4286
## Detection Prevalence 0.4286
## Balanced Accuracy 1.0000
## Class: government support to agricultural research and development
## Sensitivity 1.0000
## Specificity 1.0000
## Pos Pred Value 1.0000
## Neg Pred Value 1.0000
## Prevalence 0.1429
## Detection Rate 0.1429
## Detection Prevalence 0.1429
## Balanced Accuracy 1.0000
With k = 13, the accuracy of model is 100%
set.seed(1234)
nn_model = neuralnet(Target~., data=train_data, linear.output = FALSE
)
plot(nn_model,rep = "best")
pred <- predict(nn_model, test_data)
labels <- c("Awu", "Auof", "Rd")
prediction_label <- data.frame(max.col(pred)) %>%
mutate(pred=labels[max.col.pred.]) %>%
select(2) %>%unlist()
table(test_data$Target, prediction_label)
## prediction_label
## Auof Awu Rd
## agricultural factor income per AWU 0 3 0
## area under organic farming 3 0 0
## government support to agricultural research and development 0 0 1
check = as.numeric(test_data$Target) == max.col(pred)
accuracy = (sum(check)/nrow(test_data))*100
# Print the metrics
cat("Neural Network Accuracy is :",accuracy, "% \n")
## Neural Network Accuracy is : 100 %