This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Data Preparation

We attempt to replace missing data with average value

# Read CSV
whr.df <- read.csv(file.choose(), header = T)

# Read CSV to refer for the row names
whr.df.names <- read.csv(file.choose(), header = T)

head(whr.df)
##       country LifeLadder   LnGDPpc   SocSupp  LifeExp LifeChoice
## 1 Afghanistan   4.220169  7.497288 0.5590718 49.87127  0.5225662
## 2     Albania   4.511101  9.282300 0.6384115 68.69838  0.7298189
## 3     Algeria   5.388171  9.549138 0.7481497 64.82995         NA
## 4   Argentina   6.427221        NA 0.8828191 67.44399  0.8477022
## 5     Armenia   4.325472  8.989569 0.7092183 65.40947  0.6109869
## 6   Australia   7.250080 10.696281 0.9423342 72.52163  0.9223157
##    Generosity Corruption     GDPpc
## 1  0.05739315  0.7932456  1803.145
## 2 -0.01792729  0.9010708 10746.120
## 3          NA         NA 14032.594
## 4          NA  0.8509245        NA
## 5 -0.15581442  0.9214211  8018.998
## 6  0.22308631  0.3985451 44191.221
str(whr.df)
## 'data.frame':    141 obs. of  9 variables:
##  $ country   : Factor w/ 141 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ LifeLadder: num  4.22 4.51 5.39 6.43 4.33 ...
##  $ LnGDPpc   : num  7.5 9.28 9.55 NA 8.99 ...
##  $ SocSupp   : num  0.559 0.638 0.748 0.883 0.709 ...
##  $ LifeExp   : num  49.9 68.7 64.8 67.4 65.4 ...
##  $ LifeChoice: num  0.523 0.73 NA 0.848 0.611 ...
##  $ Generosity: num  0.0574 -0.0179 NA NA -0.1558 ...
##  $ Corruption: num  0.793 0.901 NA 0.851 0.921 ...
##  $ GDPpc     : num  1803 10746 14033 NA 8019 ...
whr.df <- whr.df[,-1]

# Replacing missing data with average
whr.df$LifeExp<- ifelse(is.na(whr.df$LifeExp), mean(whr.df$LifeExp, na.rm=TRUE), whr.df$LifeExp)
whr.df$LifeChoice<- ifelse(is.na(whr.df$LifeChoice), mean(whr.df$LifeChoice, na.rm=TRUE), whr.df$LifeChoice)
whr.df$Generosity<- ifelse(is.na(whr.df$Generosity), mean(whr.df$Generosity, na.rm=TRUE), whr.df$Generosity)
whr.df$Corruption<- ifelse(is.na(whr.df$Corruption), mean(whr.df$Corruption, na.rm=TRUE), whr.df$Corruption)
whr.df$GDPpc<- ifelse(is.na(whr.df$GDPpc), mean(whr.df$GDPpc, na.rm=TRUE), whr.df$GDPpc)
whr.df$LnGDPpc<- ifelse(is.na(whr.df$LnGDPpc), log(whr.df$GDPpc), whr.df$LnGDPpc)


# Plot to see the relationships between input variables
ggcorr(whr.df[, c(1:8)], label=TRUE, cex=3)

ggpairs(whr.df, columns= c(1:8), upper = list(continuous = wrap("cor", size = 3)))

#Data Transformation From the plots, we can see that GDP per capita is strongly positively skewed while social support, life choice and corruption are highly negatively skewed. Therefore, to make the data more normalized distributed, we use Logarithmic transformation for GDP per captia and power transformation for social support, life choice and corruption.

Next we will perform data normalization. The original data is very skewed, which would affect the result’s accuracy.

whr.df$SocSupp_sq<-whr.df$SocSupp^2
hist(whr.df$SocSupp_sq)

whr.df$SocSupp_3<-whr.df$SocSupp^3
hist(whr.df$SocSupp_3)

whr.df$LifeChoice_sq<-whr.df$LifeChoice^2
hist(whr.df$LifeChoice_sq)

whr.df$Corruption_sq<-whr.df$Corruption^2
hist(whr.df$Corruption_sq)

whr.df$Corruption_3<-whr.df$Corruption^3
hist(whr.df$Corruption_3)

ggpairs(whr.df, columns= c(1,2,4,6,10,11,13), upper = list(continuous = wrap("cor", size = 3)))

As a result, the data looks more normally distributed.

Variable Selection

From the correlation between variables, we can oberseve that except the 0.783 correlation between life expectancy and Ln GDP per capita, there is no significantly collinearity among the other factors if Life Ladder is excluded. Therefore, we would omit Life Ladder and include the remaining 6 factors in our model: Ln GDP per captia, life expectancy, generosity, social support cube, life choice squre and corruption cube.

Normalize input varables

We strive to standardize the the data. Hence, we will perform data normalization. The original data is very skewed, which would affect the result’s accuracy.

whr.df.new<-whr.df[,c(1,2,4,6,10,11,13)]
# Normalize input variables
whr.df.norm <- sapply(whr.df.new, scale)
whr.df.norm
##         LifeLadder     LnGDPpc      LifeExp  Generosity   SocSupp_3
##   [1,] -1.03039735 -1.56706372 -1.622661723  0.47297884 -1.83285739
##   [2,] -0.77623324 -0.02933493  0.725175776 -0.07477943 -1.43787696
##   [3,] -0.01000758  0.20053722  0.242762618  0.00000000 -0.70496288
##   [4,]  0.89772756  0.45934963  0.568747307  0.00000000  0.53971620
##   [5,] -0.93840232 -0.28151321  0.315031874 -1.07754599 -0.99167937
##   [6,]  1.61659329  1.18876314  1.201954597  1.67796116  1.22724540
##   [7,]  1.44011489  1.18731063  1.092715836  0.57584020  1.03336958
##   [8,] -0.08363252  0.32291661 -0.104543974 -1.60416156 -0.47002416
##   [9,]  0.67272866  1.19780440  0.384174750  0.00000000  0.32720279
##  [10,] -0.73688529 -1.04735632 -0.104386513 -0.52487060 -1.37635351
##  [11,] -0.19370480  0.31731769  0.359339716 -0.91176431  1.03613689
##  [12,]  1.35350824  1.13354043  1.077255187 -0.41168620  1.06493693
##  [13,] -1.21631324 -1.46003885 -1.510679053 -0.22454491 -2.08733444
##  [14,]  0.32332440 -0.44732595 -0.353867632 -0.23453736 -0.30967289
##  [15,] -0.19111371 -0.06561825  0.610098122  1.07902301 -0.20494834
##  [16,] -1.66047984  0.26564246 -0.958936540 -1.82374746 -0.54429230
##  [17,]  0.85194645  0.18018310  0.242884400 -0.74671969  0.87087191
##  [18,] -0.49103143  0.38767269  0.666732620 -1.21119952  1.03000897
##  [19,] -1.04309454 -1.67144851 -1.468559013  0.17897251 -0.57606675
##  [20,] -0.81977568 -1.00763560 -0.542008031  0.62191748 -0.72236205
##  [21,] -0.50966439 -1.11675093 -1.813832886  0.08586072 -1.31591944
##  [22,]  1.61202055  1.17001024  1.174104498  1.48168203  1.01049993
##  [23,] -2.36450865  0.45934963 -2.351942913  0.00000000 -2.52761613
##  [24,] -1.19710000 -1.49215029 -2.216647487  0.35646344 -1.55906522
##  [25,]  1.03037359  0.59758506  1.115846870  0.70825081  0.11264662
##  [26,] -0.06523327  0.21076644  0.771107699 -1.72773524 -0.75456901
##  [27,]  0.72867668  0.14634473  0.140019231 -0.72834582  0.52979743
##  [28,] -1.11834912 -0.52137879 -1.120399470 -0.79401811 -1.56304009
##  [29,] -0.76676784 -2.31167980 -1.570433273  0.10096201  0.34223804
##  [30,]  1.51659670  0.25227899  0.845315959 -0.17001126  0.73691402
##  [31,]  0.34507333  0.86729869  1.216957612 -0.21155116 -0.39231431
##  [32,]  1.16715703  0.86750655  0.815256652 -1.42149763  1.08446245
##  [33,]  1.88540832  1.17740920  1.001668312  1.01065234  1.37837805
##  [34,] -0.14058949  0.19156333  0.053068066 -0.54911485  0.67044487
##  [35,]  0.62534707 -0.08041206  0.533579802 -0.07109489  0.12212001
##  [36,] -0.73636124 -0.06023046 -0.179827820 -1.03745613 -0.19123108
##  [37,]  0.64665235 -0.25742264  0.152852045 -1.35837755 -0.32980983
##  [38,]  0.21844805  0.77867633  0.560487991 -1.08514251  1.17064623
##  [39,] -0.96253447 -1.66999517 -0.852263771  0.43584300 -0.92452458
##  [40,]  1.97457065  1.07891005  1.059923086 -0.20529951  1.37192582
##  [41,]  0.93965035  1.05089900  1.211416514 -0.68177813  0.56250676
##  [42,] -0.49609531  0.46855597 -0.844516796 -1.66056962 -0.44666865
##  [43,] -0.83102194 -0.13552633  0.371107886 -1.67596398 -1.93904394
##  [44,]  1.28783529  1.19555743  0.995136783  1.05153112  0.79720654
##  [45,] -0.77334097 -0.86615661 -1.145720208  0.74839865 -1.38692132
##  [46,] -0.08474686  0.68811427  1.043393036 -1.89171012 -0.25078758
##  [47,]  0.83805494 -0.35528097 -0.089264095  0.12792696 -0.17287121
##  [48,] -1.56969495 -1.95484397 -1.545115866  0.02612367 -1.21618873
##  [49,] -1.78858437 -1.64238094 -1.206583787  2.21992071 -1.72114259
##  [50,]  0.21711960 -0.70965295  0.087647116  0.64402691 -0.49805985
##  [51,]  0.08630899  1.36276354  1.635687167  0.64935118  0.02225864
##  [52,]  0.04304816  0.69535549  0.551946102 -1.37028392  0.72354808
##  [53,]  1.84369464  1.18251600  1.143581847  2.04813594  1.77594913
##  [54,] -1.06620817 -0.51963134 -0.440665142  0.31102744 -1.57309127
##  [55,] -0.23002467 -0.02493055 -0.300036981  3.58944069 -0.34575187
##  [56,] -0.65250232  0.37256341  0.345910358  1.19629560 -1.80120584
##  [57,] -0.86234042  0.23291238 -0.253100831 -0.54739469 -0.92281924
##  [58,]  1.43370214  1.36525286  1.062588977  1.30488357  1.42520323
##  [59,]  1.53703342  0.90549790  1.175094931  1.12239693  0.61423014
##  [60,]  0.48476988  0.96095886  1.227635449 -0.57911360  1.04402006
##  [61,] -0.74876225 -0.99833473 -2.185324213 -0.13796790 -1.55275621
##  [62,]  0.48488069  1.01434128  1.522100396 -0.49424170  0.72649144
##  [63,] -0.11212163 -0.06582446  0.177707061 -0.32944375 -0.09253708
##  [64,]  0.11700011  0.66351269  0.296245997 -0.33211564  1.05115173
##  [65,] -0.87667599 -1.13000924 -1.035128520  2.34908568 -1.01456666
##  [66,]  0.31431639 -0.13118028 -0.077050659  0.91584641 -0.05640099
##  [67,]  0.47836672  1.52777033  0.291897035 -0.76684085  0.15045439
##  [68,] -0.47445594 -1.05495985 -0.036138451  0.65133121  0.89308896
##  [69,]  0.47247136  0.63454288  0.329425024 -1.12731301  0.92446664
##  [70,] -0.11261152  0.12103744  0.746857790  0.30668120 -0.01778467
##  [71,] -1.39029699 -1.25622203 -2.432480525 -0.62613004 -0.29117056
##  [72,] -1.78650879 -2.26964927 -1.413419679  0.67887831 -1.41396568
##  [73,]  0.02966571  0.26621920 -0.197752658 -0.88598938  0.46728894
##  [74,]  0.40704252  0.77511405  0.271449970 -1.93612447  1.17257611
##  [75,]  1.36958677  1.84603397  1.248805597  0.08354195  1.21403997
##  [76,] -0.04707062  0.14641785  0.351103234  0.61788819  0.41591996
##  [77,] -1.51707578 -1.79440628 -0.778495068 -0.36184526 -0.71776221
##  [78,] -1.68008737 -1.98366261 -0.947900014  0.52418180 -1.97438680
##  [79,] -1.20873867 -1.34343994 -1.657706495 -0.49161223  0.06255836
##  [80,]  1.04067005  0.45934963  1.093782383  0.00000000  1.08177897
##  [81,] -0.81026196 -0.91936650 -1.185876508 -1.10508535 -0.40610751
##  [82,]  0.18378990  0.46151932  0.324108465  0.99572309  0.06040241
##  [83,]  1.24451239  0.35167609  0.634336613 -1.08570764  0.65646708
##  [84,]  0.15564238 -0.72928245  0.127816736  0.09706734  0.07291777
##  [85,] -0.29932515  0.01829577  0.005412162  0.59863195  1.29107113
##  [86,] -0.08348297  0.28909714  0.453839959 -0.58998024  0.35872442
##  [87,] -0.01163556 -0.35223545  0.154091751 -1.75893968 -1.33923105
##  [88,] -0.67837118  0.45934963 -0.705092189  0.00000000 -0.33153889
##  [89,] -0.26216130 -1.35761265 -0.227822431  1.39433385  0.07021773
##  [90,]  1.87063951  1.24119995  1.045526130  1.69974885  1.02891686
##  [91,]  1.59528385  0.99178457  1.071206030  1.91849480  1.15710090
##  [92,]  0.53562820 -0.68066671  0.401758027  0.36803341  0.22521531
##  [93,] -1.01774974 -2.15731754 -1.213557254  0.10577985 -1.16898045
##  [94,] -0.15730245 -0.60065757 -2.171449112  0.29240813 -0.23143153
##  [95,]  0.37347412  0.45934963  0.000000000  0.00000000 -0.20508260
##  [96,]  1.91908544  1.50947914  0.985337109  0.82311224  1.44559059
##  [97,]  0.13006638 -0.71311582 -0.684033833  0.65076423 -1.50165320
##  [98,] -0.43070146 -0.73616465  0.008946228 -0.90725950 -0.11274885
##  [99,]  0.62726956  0.58041366  0.616137765 -0.66014394  0.53583929
## [100,]  0.35098035 -0.20247433  0.065838562 -0.36887991  1.19694411
## [101,]  0.26296194  0.06039408  0.297325863 -0.99173131 -0.24854858
## [102,]  0.02726291 -0.36683455 -0.410113473 -0.49855075 -0.07989352
## [103,]  0.66609179  0.71816114  0.595952341 -0.67853330  0.92825638
## [104,]  0.04106985  0.76864428  0.981806373 -1.63508109  0.78136406
## [105,]  0.49730335  0.53955262  0.506950380 -0.80448872 -0.19113474
## [106,]  0.39777624  0.64261470 -0.084270544 -1.34580744  0.85326311
## [107,] -1.80545418 -1.60329569 -0.971757937  0.30982972 -1.28046059
## [108,]  0.93852560  1.29781439  0.120665354 -1.09109921  0.61721150
## [109,] -0.70334439 -1.33234602 -0.622969978 -0.44020725  0.08921446
## [110,]  0.30850018  0.14476734  0.397089387 -0.43052224  0.67201599
## [111,] -0.58241868 -1.69116469 -2.353499918  0.85729909 -1.33138744
## [112,]  0.55374793  1.70885010  1.686692086  0.96337847  1.01922424
## [113,]  0.51852574  0.80201600  0.658442857 -0.49077137  1.26238154
## [114,]  0.46930456  0.83648778  0.987297045 -0.39153504  1.13142513
## [115,] -0.63921443  0.45934963 -1.910543764  0.00000000 -1.66977810
## [116,] -0.55028122  0.08528649 -1.581808505 -0.53145350  0.46009913
## [117,]  0.49878303  0.99206308  1.492828868  0.16263462 -0.17352822
## [118,] -2.19410807 -1.59807838 -1.681356056  0.20255994 -1.94400529
## [119,]  0.80284441  0.95747545  1.381292417 -0.37322239  1.21989376
## [120,]  1.72026074  1.23013273  1.132097201  1.01186855  0.86631978
## [121,]  1.79869045  1.37934506  1.256260007  0.62326302  1.04897500
## [122,]  0.97253515  0.45934963  1.009705947  0.00000000  0.67306322
## [123,] -0.25850836 -1.23142615 -0.140194803  0.16396372  0.26527187
## [124,] -2.18133403 -1.24826646 -0.799298442  1.30809383 -1.44157797
## [125,]  0.58883183  0.29593169  0.363677261  2.57120141  0.81447346
## [126,] -1.32881728 -1.77616685 -1.316057551  0.05137243 -2.02943511
## [127,] -0.76718900 -0.02416979  0.209324801 -1.24553608 -1.04273033
## [128,] -0.06412726  0.48025661  0.338447384  0.00000000  0.50928888
## [129,]  0.42582461  0.31945620 -0.538856437 -0.10874718  1.06575297
## [130,] -1.01895947 -1.59459661 -1.424297792  1.11111995 -0.66282190
## [131,] -1.19767654 -0.33328045  0.041196190  0.28316569  0.56292424
## [132,]  1.25043316  1.54610587  0.701728869  0.82205788  0.19184570
## [133,]  1.24460904  1.08112415  1.029528877  1.79996526  1.37353786
## [134,]  1.22653929  1.34677764  0.903126417  0.98684568  0.69267314
## [135,]  0.67431164  0.50703532  0.669586893 -0.56653763  0.73331125
## [136,]  0.43061854 -0.53781983 -0.281918529  1.54637787  1.26142836
## [137,] -1.18682228  0.18754673  0.232700351 -1.44356295  0.75097286
## [138,] -0.29472326 -0.53775575  0.381458433 -0.58749907  0.47003265
## [139,] -1.37507328  0.45934963 -1.077530183  0.00000000 -0.48560260
## [140,] -0.91911992 -0.96287790 -1.246294345  0.89047959 -0.55455764
## [141,] -1.45390061 -1.63165631 -1.391128753 -0.33035335 -0.54328856
##        LifeChoice_sq Corruption_3
##   [1,] -1.7753148416  0.095099243
##   [2,] -0.3633544510  1.132109681
##   [3,] -0.0878906972 -0.280463287
##   [4,]  0.6482521868  0.616986539
##   [5,] -1.2300840159  1.358265328
##   [6,]  1.3666739356 -1.849142112
##   [7,]  1.0337071401 -1.491026094
##   [8,] -0.4986707865 -1.134984861
##   [9,]  1.0354191493 -0.280463287
##  [10,]  0.9012141074 -0.679711939
##  [11,] -0.9039118969 -0.825249965
##  [12,]  0.8165582001 -1.585023817
##  [13,]  0.0470521080  0.490968525
##  [14,]  0.9685584570  0.633185798
##  [15,] -1.0779944794  1.782157753
##  [16,]  0.6851621296 -0.402055711
##  [17,]  0.2781195316 -0.005679438
##  [18,] -0.5932622256  1.526411195
##  [19,] -0.9999296777 -0.462734586
##  [20,]  1.7298128713  0.516418192
##  [21,] -0.4991836782  0.902778250
##  [22,]  1.2679472758 -1.876788095
##  [23,] -1.1422772732  0.696705665
##  [24,] -1.7601766666  0.326183087
##  [25,] -0.9462513971  0.687352254
##  [26,] -0.0878906972 -0.280463287
##  [27,]  0.5316736253  1.094044216
##  [28,]  0.0991030604  0.071103353
##  [29,] -1.0509438921  0.856940794
##  [30,]  0.8847808923 -0.010009792
##  [31,] -0.1499258813  1.094967739
##  [32,]  0.6725090578  1.125160063
##  [33,]  1.6303686649 -2.090288580
##  [34,]  0.8823164468 -0.344423425
##  [35,]  0.6356650628 -0.062392415
##  [36,] -0.9209505679  0.305899618
##  [37,]  0.2193563423  0.129518768
##  [38,]  0.6029027008 -0.967136074
##  [39,] -0.2471694014 -0.582465876
##  [40,]  1.6318297829 -2.062120605
##  [41,]  0.1065795941 -1.054435916
##  [42,] -0.6033334030  0.297288549
##  [43,] -1.2600092997 -1.344241083
##  [44,]  0.8614790076 -1.735986353
##  [45,] -0.1913570253  1.055401851
##  [46,] -1.9990036017  1.103940268
##  [47,]  0.7875679778  0.257058466
##  [48,] -0.3960842341  0.176367678
##  [49,] -2.7595880791  0.498558680
##  [50,]  0.6699045838  0.091981708
##  [51,]  0.2184549299 -1.839973287
##  [52,] -1.5915191971  1.389773572
##  [53,]  1.6652897637 -0.471355344
##  [54,]  0.3975515125 -0.136563002
##  [55,]  0.4861678260  1.009869229
##  [56,] -0.0878906972 -0.280463287
##  [57,] -0.8467709999  0.142768958
##  [58,]  0.9001545974 -1.849143821
##  [59,] -0.0162534632  0.187385233
##  [60,] -1.1444146885  1.150948622
##  [61,] -0.0456615999 -0.192911065
##  [62,]  0.5416592299 -0.616863254
##  [63,] -0.0242030658 -0.280463287
##  [64,]  0.0726407303 -0.588172707
##  [65,] -0.2130579930  0.404554104
##  [66,]  0.4632407948  1.584277327
##  [67,]  0.5863838707 -0.280463287
##  [68,]  0.3430676342  1.307404297
##  [69,] -0.7060661383  0.782160757
##  [70,] -0.9101478347  0.638262544
##  [71,] -0.3659667871 -0.302718340
##  [72,] -0.0899495595  1.134247415
##  [73,]  0.4182446661 -0.280463287
##  [74,] -1.2084068328  1.685831495
##  [75,]  0.9744757293 -1.929698624
##  [76,] -0.5480169515  0.803160342
##  [77,] -1.4955969531  0.747357069
##  [78,]  0.3072478592  0.360756191
##  [79,] -0.6256049203  0.728962461
##  [80,]  1.3037514363 -0.624306996
##  [81,] -2.0766580481  0.529846947
##  [82,]  0.3895874270  1.020303642
##  [83,] -0.1877215766  0.226737965
##  [84,] -1.5708576991  1.933336190
##  [85,] -0.1208983910  1.125392247
##  [86,] -1.5018616098  0.598057907
##  [87,]  0.3662783225 -0.484775513
##  [88,]  0.9278138464 -1.132441469
##  [89,]  0.5728662408  0.302211698
##  [90,]  1.2173246758 -1.768623014
##  [91,]  1.4095224817 -2.035415002
##  [92,] -0.4678769732 -0.385687644
##  [93,] -0.5805867340  0.278867093
##  [94,]  0.2006157183  1.171778409
##  [95,]  0.1879853623 -0.788701820
##  [96,]  1.6937270041 -1.824836358
##  [97,] -1.0729669266  0.089079508
##  [98,] -1.2520768449  0.260895991
##  [99,]  0.9947980376  0.484034433
## [100,]  0.7022246437 -0.203159564
## [101,]  0.4852852391  0.764866711
## [102,]  1.2201457665  0.084309657
## [103,]  0.8633020042  0.586377807
## [104,]  0.5599158441  1.367036866
## [105,]  0.4123040007  1.681633453
## [106,] -0.4906556261  1.404389520
## [107,]  1.2512109582 -2.113741441
## [108,]  0.0003236584 -0.280463287
## [109,] -0.2518484824  0.104444175
## [110,] -1.2075280439  1.010799353
## [111,] -0.7365224736  0.738308954
## [112,]  1.1821108388 -2.131066064
## [113,] -0.5945344770  1.303876559
## [114,]  1.1802966082  0.498099925
## [115,]  1.3167084686 -1.749456522
## [116,] -0.0007823621  0.264381321
## [117,] -1.3610531632  0.723886832
## [118,] -2.2080342795  0.029004507
## [119,] -0.0508084825  0.315134401
## [120,]  1.3238329348 -2.064980843
## [121,]  1.4841243683 -2.009200681
## [122,] -0.4492071383  0.243766879
## [123,] -0.5721829267 -1.006035512
## [124,]  0.0105849970 -0.329368309
## [125,]  1.3850550306  0.887565656
## [126,] -0.3596385581  0.283755786
## [127,] -1.2070766264  0.245742489
## [128,] -1.0036789661 -0.144500379
## [129,] -0.2130895489 -0.280463287
## [130,] -0.2867015166  0.248594895
## [131,] -1.8869783609  1.024698442
## [132,]  1.6395436572 -0.280463287
## [133,]  0.4075811215 -1.702087132
## [134,] -0.1361537133 -0.331763936
## [135,]  1.0130253263 -0.752181229
## [136,]  2.0042321151 -0.280463287
## [137,] -2.1217019031  1.014609520
## [138,]  1.0903184421  0.145962311
## [139,] -1.7156114998 -0.280463287
## [140,]  0.3221578235 -0.089860065
## [141,] -0.3382686322 -0.441315811
# Add row names:
#row.names(whr.df.norm) <- row.names(whr.df)
row.names(whr.df.norm) <- whr.df.names[,1]

Exclude “LifeLadder” Column

We first exclude variable LifeLadder when performing clustering.

K-means clustering model

We try to find the best number of cluster.

set.seed(123)

# Initialize total within sum of squares error: wss
wss <- 0

# For 1 to 15 cluster centers
for (i in 1:15) {
  km.out <- kmeans(whr.df.norm[,-1], centers = i, nstart=20)

# Save total within sum of squares to wss variable
  wss[i] <- km.out$tot.withinss
}

# Plot total within sum of squares vs. number of clusters
plot(1:15, wss, type = "b", 
     xlab = "Number of Clusters", 
     ylab = "Within groups sum of squares")

From the plot, we can figure out that the best optimal number of clusters is 3. It is consistent with our judgments as we believe “happiness” depends alot on wealth. An indicator of wealth would be GDPpc, which can be separated into three categories: high, medium and low. Therefore, having three cluster make intuitive sense to our group.

Next, we set k to 3 and start running the model.

# Set k equal to the number of clusters corresponding to the elbow location
k <- 3

# Build model with k=3 clusters
km.out <- kmeans(whr.df.norm[,-1], centers = k, nstart = 25, iter.max = 50)
km.out$cluster
##                         Afghanistan                             Albania 
##                                   2                                   1 
##                             Algeria                           Argentina 
##                                   1                                   1 
##                             Armenia                           Australia 
##                                   1                                   3 
##                             Austria                          Azerbaijan 
##                                   3                                   1 
##                             Bahrain                          Bangladesh 
##                                   3                                   2 
##                             Belarus                             Belgium 
##                                   1                                   3 
##                               Benin                             Bolivia 
##                                   2                                   1 
##              Bosnia and Herzegovina                            Botswana 
##                                   1                                   1 
##                              Brazil                            Bulgaria 
##                                   1                                   1 
##                        Burkina Faso                            Cambodia 
##                                   2                                   2 
##                            Cameroon                              Canada 
##                                   2                                   3 
##            Central African Republic                                Chad 
##                                   2                                   2 
##                               Chile                               China 
##                                   1                                   1 
##                            Colombia                 Congo (Brazzaville) 
##                                   1                                   2 
##                    Congo (Kinshasa)                          Costa Rica 
##                                   2                                   1 
##                              Cyprus                      Czech Republic 
##                                   1                                   1 
##                             Denmark                  Dominican Republic 
##                                   3                                   1 
##                             Ecuador                               Egypt 
##                                   1                                   1 
##                         El Salvador                             Estonia 
##                                   1                                   1 
##                            Ethiopia                             Finland 
##                                   2                                   3 
##                              France                               Gabon 
##                                   3                                   1 
##                             Georgia                             Germany 
##                                   1                                   3 
##                               Ghana                              Greece 
##                                   2                                   1 
##                           Guatemala                              Guinea 
##                                   1                                   2 
##                               Haiti                            Honduras 
##                                   2                                   2 
##                           Hong Kong                             Hungary 
##                                   3                                   1 
##                             Iceland                               India 
##                                   3                                   2 
##                           Indonesia                                Iran 
##                                   2                                   2 
##                                Iraq                             Ireland 
##                                   1                                   3 
##                              Israel                               Italy 
##                                   3                                   1 
##                         Ivory Coast                               Japan 
##                                   2                                   3 
##                              Jordan                          Kazakhstan 
##                                   1                                   1 
##                               Kenya                              Kosovo 
##                                   2                                   1 
##                              Kuwait                          Kyrgyzstan 
##                                   1                                   1 
##                              Latvia                             Lebanon 
##                                   1                                   1 
##                             Lesotho                             Liberia 
##                                   2                                   2 
##                               Libya                           Lithuania 
##                                   1                                   1 
##                          Luxembourg                           Macedonia 
##                                   3                                   1 
##                          Madagascar                              Malawi 
##                                   2                                   2 
##                                Mali                               Malta 
##                                   2                                   3 
##                          Mauritania                           Mauritius 
##                                   2                                   1 
##                              Mexico                             Moldova 
##                                   1                                   1 
##                            Mongolia                          Montenegro 
##                                   1                                   1 
##                             Morocco                             Myanmar 
##                                   1                                   1 
##                               Nepal                         Netherlands 
##                                   2                                   3 
##                         New Zealand                           Nicaragua 
##                                   3                                   1 
##                               Niger                             Nigeria 
##                                   2                                   2 
## Turkish Republic of Northern Cyprus                              Norway 
##                                   1                                   3 
##                            Pakistan                           Palestine 
##                                   2                                   1 
##                              Panama                            Paraguay 
##                                   1                                   1 
##                                Peru                         Philippines 
##                                   1                                   1 
##                              Poland                            Portugal 
##                                   1                                   1 
##                             Romania                              Russia 
##                                   1                                   1 
##                              Rwanda                        Saudi Arabia 
##                                   2                                   1 
##                             Senegal                              Serbia 
##                                   2                                   1 
##                        Sierra Leone                           Singapore 
##                                   2                                   3 
##                            Slovakia                            Slovenia 
##                                   1                                   1 
##                             Somalia                        South Africa 
##                                   2                                   1 
##                         South Korea                         South Sudan 
##                                   1                                   2 
##                               Spain                              Sweden 
##                                   1                                   3 
##                         Switzerland                              Taiwan 
##                                   3                                   1 
##                          Tajikistan                            Tanzania 
##                                   2                                   2 
##                            Thailand                                Togo 
##                                   3                                   2 
##                             Tunisia                              Turkey 
##                                   1                                   1 
##                        Turkmenistan                              Uganda 
##                                   1                                   2 
##                             Ukraine                United Arab Emirates 
##                                   1                                   3 
##                      United Kingdom                       United States 
##                                   3                                   3 
##                             Uruguay                          Uzbekistan 
##                                   3                                   3 
##                           Venezuela                             Vietnam 
##                                   1                                   1 
##                               Yemen                              Zambia 
##                                   2                                   2 
##                            Zimbabwe 
##                                   2
# plot the clusters
plot(whr.df.norm, col = km.out$cluster, main = "k-means with 3 clusters", xlab = "", ylab = "")

#centroid plot
plot(c(0), xaxt = 'n', ylab = "", type = "l", ylim = c(min(km.out$centers), max(km.out$centers)), xlim = c(0,6))

# Label x-axes
axis(1, at = c(1:6), labels = names(whr.df.new[,-1]), cex.axis = 0.7)

# Plot Centroids
for (i in c(1:k))
lines(km.out$centers[i,], lty = i, lwd = 2, col = ifelse(i %in% c(2), "black", "dark gray"))  


# Name the clusters
text(x = 0.5, y = km.out$centers[,1], labels = paste("Cluster", c(1:k)))

#Plot the clusters
clusplot(whr.df.norm, km.out$cluster, main = "Cluster Plot with K-means excluding LifeLadder", color = TRUE, shade = TRUE, labels = 2, lines = 0, cex = 0.7)

From the graph, we can see the characteristics of the three clusters. Cluster 1 has high GDP per capita and high score for life expectancy, generosity, social support with very low corruption. Cluster 2, on the other hand, has very low GDP per capita, low life expectancy, social support and life choice but significantly high corruption and surprisingly high generosity. On the other hand, Cluster 3 has medium GDP per capita, social support and life choice but highest corruption and lowest generosity among the three clusters.

So, a handful of countries from each of the clusters is as below

cluster 1: Luxembourg, Finland, Switzerland, New Zealand and Denmark

Cluster 2: Rwanda, Kenya, Malawi, Haiti, and Ivory Coast

Cluster 3: Greece, Hungary, China, Russia and Mexico

#Heirarchical clustering model introduction ( how to measure the distance between clusters)

#compare different measurement of distance

#plot(hc.out.complete)
hc.out.complete<- hclust(dist(whr.df.norm), method = "complete")
plot(hc.out.complete)

#plot(hc.out.single)
hc.out.single<- hclust(dist(whr.df.norm), method = "single")
plot(hc.out.single)

#plot(hc.out.average)
hc.out.average<- hclust(dist(whr.df.norm), method = "average")
plot(hc.out.average)

#plot(hc.out.centroid)
hc.out.centroid<- hclust(dist(whr.df.norm), method = "centroid")
plot(hc.out.complete)

#"complete" gives the most balanced clustering

By using different types of measurement of distance, we get different results and we choose the most balance one which is the “complete” method.

Now, with k = 3, we will start building our hierarchical model

#cut the dendogram into k=3 clusters
cut.whr<-cutree(hc.out.complete,k= 3)

# plot heatmap
heatmap(whr.df.norm[,-1], Colv = NA, hclustfun = hclust, 
        col=rev(paste("gray",1:99,sep="")), cexRow = 0.2, cexCol = 0.9)

# Plot the clusters
clusplot(whr.df.norm, cut.whr, main = "Cluster Plot for Hierarchical model without LifeLadder", color = TRUE, shade = TRUE, labels = 2, lines = 0, cex = 0.7)

From the heatmap we can see that the first cluster has the characteristic of high GDP per capita, long life expectancy, large social support, more freedom of life choices, while people are less generous and the government corruption is not widespread. The second cluster includes countries that have similar conditions in terms of GDP per capita, life expectancy, less social suppor to that in the first cluster, very little freedom of life choices, and the corruption is very severe. Countries in the third cluster generally have less GDP per captia, shorter life expectancy, little social support, and widepread corruption. But their people are very generous and have more freedom of life choices than that in the second cluster.

Comparing cluster results from k-means and heirarchical

From the plots, we see significant difference between two models

# comparing cluster results
table(km.out$cluster)
## 
##  1  2  3 
## 71 42 28
table(cut.whr)
## cut.whr
##  1  2  3 
## 25 79 37

As we can see, the heirachical model provides some conflicting results. For cluster 1 of the heirachical model, we see Finland, Belgium, Switzerland with Rwanda and Kenya in the same cluster. This may result from the nature of heirachichal model which is its sensitivity with outliers.
Therefore, for this case, we choose k-means as our best model.

#Include “LifeLadder” Column Now we will look at the models when including variable lifeladder.

#Heirarchical Model

# As seen above in the Hierarchical model, the best measure of distance is by using the "Complete" method. Therefore, we use the same method
hc.out.complete.1<- hclust(dist(whr.df.norm), method = "complete")
plot(hc.out.complete)

# Cut the dendogram into k=3 clusters
cut.whr.1<-cutree(hc.out.complete.1,k= 3)

# Plot heatmap
heatmap(whr.df.norm, Colv = NA, hclustfun = hclust, 
        col=rev(paste("gray",1:99,sep="")), cexRow = 0.2, cexCol = 0.9)

# Plot the clusters
clusplot(whr.df.norm, cut.whr.1, main = "Cluster Plot for Hierachical Model including LifeLadder", color = TRUE, shade = TRUE, labels = 2, lines = 0, cex = 0.7)

From the heatmap, we can see the characteristics of the clusters. The cluster with lowest Lifeladder has low GDP per capita with very high generostity and corruption. The cluster with medium liferladder show many conflicting characteristics within the cluster itself. For example, despite countries in this cluster share similar GDP per capita, their generosity and lifechoice varies. The cluster with highest lifeladder has very high GDP per capity as well as high score for other factor and very little corruption.

Countries for clusters Cluster 1: Haiti, Ghana, Ivory Coast, Mali, Central African Republic

Cluster 2: Greece, Russia, China, Spain, Italy

Cluster 3: Luxembourg, Finland, Switzerland, New Zealand, Kenya

K-means Model

# Build model with k=3 clusters
km.out.1 <- kmeans(whr.df.norm, centers = k, nstart = 25, iter.max = 50)
km.out$cluster
##                         Afghanistan                             Albania 
##                                   2                                   1 
##                             Algeria                           Argentina 
##                                   1                                   1 
##                             Armenia                           Australia 
##                                   1                                   3 
##                             Austria                          Azerbaijan 
##                                   3                                   1 
##                             Bahrain                          Bangladesh 
##                                   3                                   2 
##                             Belarus                             Belgium 
##                                   1                                   3 
##                               Benin                             Bolivia 
##                                   2                                   1 
##              Bosnia and Herzegovina                            Botswana 
##                                   1                                   1 
##                              Brazil                            Bulgaria 
##                                   1                                   1 
##                        Burkina Faso                            Cambodia 
##                                   2                                   2 
##                            Cameroon                              Canada 
##                                   2                                   3 
##            Central African Republic                                Chad 
##                                   2                                   2 
##                               Chile                               China 
##                                   1                                   1 
##                            Colombia                 Congo (Brazzaville) 
##                                   1                                   2 
##                    Congo (Kinshasa)                          Costa Rica 
##                                   2                                   1 
##                              Cyprus                      Czech Republic 
##                                   1                                   1 
##                             Denmark                  Dominican Republic 
##                                   3                                   1 
##                             Ecuador                               Egypt 
##                                   1                                   1 
##                         El Salvador                             Estonia 
##                                   1                                   1 
##                            Ethiopia                             Finland 
##                                   2                                   3 
##                              France                               Gabon 
##                                   3                                   1 
##                             Georgia                             Germany 
##                                   1                                   3 
##                               Ghana                              Greece 
##                                   2                                   1 
##                           Guatemala                              Guinea 
##                                   1                                   2 
##                               Haiti                            Honduras 
##                                   2                                   2 
##                           Hong Kong                             Hungary 
##                                   3                                   1 
##                             Iceland                               India 
##                                   3                                   2 
##                           Indonesia                                Iran 
##                                   2                                   2 
##                                Iraq                             Ireland 
##                                   1                                   3 
##                              Israel                               Italy 
##                                   3                                   1 
##                         Ivory Coast                               Japan 
##                                   2                                   3 
##                              Jordan                          Kazakhstan 
##                                   1                                   1 
##                               Kenya                              Kosovo 
##                                   2                                   1 
##                              Kuwait                          Kyrgyzstan 
##                                   1                                   1 
##                              Latvia                             Lebanon 
##                                   1                                   1 
##                             Lesotho                             Liberia 
##                                   2                                   2 
##                               Libya                           Lithuania 
##                                   1                                   1 
##                          Luxembourg                           Macedonia 
##                                   3                                   1 
##                          Madagascar                              Malawi 
##                                   2                                   2 
##                                Mali                               Malta 
##                                   2                                   3 
##                          Mauritania                           Mauritius 
##                                   2                                   1 
##                              Mexico                             Moldova 
##                                   1                                   1 
##                            Mongolia                          Montenegro 
##                                   1                                   1 
##                             Morocco                             Myanmar 
##                                   1                                   1 
##                               Nepal                         Netherlands 
##                                   2                                   3 
##                         New Zealand                           Nicaragua 
##                                   3                                   1 
##                               Niger                             Nigeria 
##                                   2                                   2 
## Turkish Republic of Northern Cyprus                              Norway 
##                                   1                                   3 
##                            Pakistan                           Palestine 
##                                   2                                   1 
##                              Panama                            Paraguay 
##                                   1                                   1 
##                                Peru                         Philippines 
##                                   1                                   1 
##                              Poland                            Portugal 
##                                   1                                   1 
##                             Romania                              Russia 
##                                   1                                   1 
##                              Rwanda                        Saudi Arabia 
##                                   2                                   1 
##                             Senegal                              Serbia 
##                                   2                                   1 
##                        Sierra Leone                           Singapore 
##                                   2                                   3 
##                            Slovakia                            Slovenia 
##                                   1                                   1 
##                             Somalia                        South Africa 
##                                   2                                   1 
##                         South Korea                         South Sudan 
##                                   1                                   2 
##                               Spain                              Sweden 
##                                   1                                   3 
##                         Switzerland                              Taiwan 
##                                   3                                   1 
##                          Tajikistan                            Tanzania 
##                                   2                                   2 
##                            Thailand                                Togo 
##                                   3                                   2 
##                             Tunisia                              Turkey 
##                                   1                                   1 
##                        Turkmenistan                              Uganda 
##                                   1                                   2 
##                             Ukraine                United Arab Emirates 
##                                   1                                   3 
##                      United Kingdom                       United States 
##                                   3                                   3 
##                             Uruguay                          Uzbekistan 
##                                   3                                   3 
##                           Venezuela                             Vietnam 
##                                   1                                   1 
##                               Yemen                              Zambia 
##                                   2                                   2 
##                            Zimbabwe 
##                                   2
# clusters
plot(whr.df.norm, col = km.out$cluster, main = "k-means with 3 clusters", xlab = "", ylab = "")

#centroid plot
# Scatter Plot
plot(c(0), xaxt = 'n', ylab = "", type = "l", ylim = c(min(km.out.1$centers), max(km.out.1$centers)), xlim = c(0,7))

# Label x-axes
axis(1, at = c(1:7), labels = names(whr.df.new), cex.axis = 0.7)

# Plot Centroids
for (i in c(1:k))
lines(km.out.1$centers[i,], lty = i, lwd = 2, col = ifelse(i %in% c(2), "black", "dark gray"))  


# Name the clusters
text(x = 0.5, y = km.out.1$centers[,1], labels = paste("Cluster", c(1:k)))

# Plot the clusters 
clusplot(whr.df.norm, km.out.1$cluster, main = "Cluster Plot for K-means including LifeLadder", color = TRUE, shade = TRUE, labels = 2, lines = 0, cex = 0.7)

From the graph, we can see the characteristics of the three clusters. Cluster 2 with highest Lifeladder score has high GDP per capita and high score for life expectancy, generosity, social support with very low corruption. Cluster 1, with lowest Lifeladder score, on the other hand, has very low GDP per capita, low life expectancy, social support and life choice but significantly high corruption and surprisingly high generosity. Cluster 3, with medium lifeladder score, has medium GDP per capita, social support and life choice but highest corruption and lowest generosity among the three clusters.

Countries for clusters Cluster 1: Rwanda, Kenya, Malawi, Haiti, Ivory Coast

Cluster 2: Luxembourg, Finland, New Zealand, Austria, Switzerland

Cluster 3: Greece, Hungary, Russia, Italy, Spain

# comparing cluster results on K-means
table(km.out.1$cluster)
## 
##  1  2  3 
## 41 29 71
# Comparison on Hierarchical Model
table(cut.whr.1)
## cut.whr.1
##  1  2  3 
## 25 79 37

As we can see, the heirachical model provides some conflicting results. For cluster 3 of the heirachical model, we see Finland, Belgium, Switzerland with Rwanda and Kenya in the same cluster. This may result from the nature of heirachichal model which is its sensitivity with outliers.
Therefore, for this case, we choose k-means as our best model.

K-Means Advantages : 1) If variables are huge, then K-Means most of the times is computationally faster than hierarchical clustering, if we keep k smalls. 2) K-Means produce tighter clusters than hierarchical clustering 3) Easy to implement

Disadvantages: 1) Strong sensitivity to outliers and noise 2) Doesn’t work well with non-circular cluster shape – number of cluster and initial seed value need to be specified beforehand 3) The order of the data has an impact on the final results 4) Selection of optimal number of clusters is difficult 5) Not recommended if dataset has more categorical variables 6) Assumes that clusters are spherical, distinct, and approximately equal in size

Heirarchical Model: Advantages 1) It is easier to decide on the number of clusters by looking at the dendrogram 2) NO prior imformation about number of clusters required 3) Dendograms are great for visualization 4) Only a distance of proximity matrix is required to compute the heirarchical clustering

Disadvantages 1) Time complexity: not suitable for large datasets 2) Initial seeds have a strong impact on the final results
3) The order of the data has an impact on the final results 4) Very sensitive to outliers