Dalam era informasi saat ini, memahami faktor-faktor yang mempengaruhi kebahagiaan global menjadi semakin penting bagi para pembuat kebijakan, peneliti, dan masyarakat umum. World Happiness Report (WHR) merupakan salah satu sumber data utama yang memberikan wawasan tentang tingkat kebahagiaan di berbagai negara berdasarkan berbagai indikator seperti GDP per capita, dukungan sosial, harapan hidup sehat, dan kebebasan untuk membuat pilihan hidup.
Untuk menganalisis dan mengelompokkan data kebahagiaan secara efektif, metode clustering menjadi alat yang sangat berguna. Salah satu teknik clustering yang populer adalah k-means clustering.
library(tidyverse) # data manipulation
## Warning: package 'tidyverse' was built under R version 4.3.3
## Warning: package 'readr' was built under R version 4.3.3
## Warning: package 'forcats' was built under R version 4.3.3
## Warning: package 'lubridate' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(cluster) # clustering algorithms
library(factoextra) # clustering algorithms & visualization
## Warning: package 'factoextra' was built under R version 4.3.3
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(PerformanceAnalytics)
## Warning: package 'PerformanceAnalytics' was built under R version 4.3.3
## Loading required package: xts
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.3.3
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'xts'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
##
## Attaching package: 'PerformanceAnalytics'
##
## The following object is masked from 'package:graphics':
##
## legend
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.3.3
library(tibble)
library(MVN)
## Warning: package 'MVN' was built under R version 4.3.3
data <- read.csv(file = "C:/Users/acer/Downloads/archive (11)/2019.csv",header=T,sep=",")
head(data)
## Overall.rank Country.or.region Score GDP.per.capita Social.support
## 1 1 Finland 7.769 1.340 1.587
## 2 2 Denmark 7.600 1.383 1.573
## 3 3 Norway 7.554 1.488 1.582
## 4 4 Iceland 7.494 1.380 1.624
## 5 5 Netherlands 7.488 1.396 1.522
## 6 6 Switzerland 7.480 1.452 1.526
## Healthy.life.expectancy Freedom.to.make.life.choices Generosity
## 1 0.986 0.596 0.153
## 2 0.996 0.592 0.252
## 3 1.028 0.603 0.271
## 4 1.026 0.591 0.354
## 5 0.999 0.557 0.322
## 6 1.052 0.572 0.263
## Perceptions.of.corruption
## 1 0.393
## 2 0.410
## 3 0.341
## 4 0.118
## 5 0.298
## 6 0.343
Data merupakan data sekunder yang didapatkan dari website Kaggle yang berisikan variabel-variabel yang berpengaruh terhadap nilai kebahagian global tahun 2019.
# mengambil variabel numerik
index <- data[, 4:9]
#rownames(index) <- data[1:156]
# cek apakah ada atau tidaknya NA
index %>%
anyNA()
## [1] FALSE
# Cek seberapa banyak jumlah NA pada masing-masing kolom
index %>%
is.na() %>%
colSums()
## GDP.per.capita Social.support
## 0 0
## Healthy.life.expectancy Freedom.to.make.life.choices
## 0 0
## Generosity Perceptions.of.corruption
## 0 0
cor=cor(index)
vif=diag(solve(cor))
vif
## GDP.per.capita Social.support
## 4.115838 2.735651
## Healthy.life.expectancy Freedom.to.make.life.choices
## 3.572728 1.575090
## Generosity Perceptions.of.corruption
## 1.224101 1.431594
fviz_nbclust(index, kmeans, method="silhouette")
#### Clustering dan Visualisasi
# k-means dengan 2 cluster
RNGkind(sample.kind = "Rounding")
## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(100)
k3 <- kmeans(x= index , centers = 2, nstart = 50)
#visualiasi
fviz_cluster(k3, data = index)
165 Negara berhasil di clustering dalam 2 kelompok. Dimana kelompok tersebut masing-masing memiliki karakteristik dan dan kemiripannya masing-masing berdasarkan faktor yang ada.