The world happiness report adalah survei tentang keadaan kebahagiaan secara global. Laporan pertama diterbitkan pada 2012, kedua pada 2013, ketiga pada 2015, dan keempat pada pembaharuan 2016. Laporan ini mendapat pengakuan global karena pemerintah, organisasi, dan masyarakat sipil semakin menggunakan indikator kebahagiaan untuk menginformasikan keputusan pembuatan kebijakan mereka. Laporan meninjau keadaan kebahagiaana di dunia saat ini.
Skor dan peringkat kebahagiaan menggunakan data dari Gallup World Poll. Skor didasarkan pada jawaban atas pertanyaan evaluasi kehidupan utama yang diajukan dalam jajak pendapat. Pertanyaan ini, yang dikenal sebagai tangga Cantril, meminta responden untuk memikirkan tangga dengan kemungkinan kehidupan terbaik bagi mereka adalah 10 dan kemungkinan kehidupan terburuk adalah 0 dan untuk menilai kehidupan mereka sendiri saat ini pada skala itu.
Kolom yang mengikuti skor kebahagiaan memperkirakan sejauh mana masing-masing dari enam faktor - produksi ekonomi, dukungan sosial, harapan hidup, kebebasan, tidak adanya korupsi, dan kemurahan hati - berkontribusi membuat evaluasi kehidupan lebih tinggi di setiap negara daripada di Dystopia, sebuah negara dengan hipotesis yang memiliki nilai sama dengan rata-rata nasional terendah di dunia untuk masing-masing dari enam faktor. Mereka tidak berdampak pada skor total yang dilaporkan untuk setiap negara, tetapi mereka menjelaskan mengapa beberapa negara memiliki peringkat lebih tinggi daripada yang lain.
Pada mini project ini, Saya akan menggunakan dataset laporan tahun 2019.
Faktor apa yang paling mempengaruhi tingkat kebahagiaan di dunia pada tahun 2019?
# Import Library
library(heatmaply)
## Warning: package 'heatmaply' was built under R version 4.0.5
## Loading required package: plotly
## Warning: package 'plotly' was built under R version 4.0.5
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.0.5
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## Loading required package: viridis
## Warning: package 'viridis' was built under R version 4.0.5
## Loading required package: viridisLite
## Warning: package 'viridisLite' was built under R version 4.0.5
##
## ======================
## Welcome to heatmaply version 1.3.0
##
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
##
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## You may ask questions at stackoverflow, use the r and heatmaply tags:
## https://stackoverflow.com/questions/tagged/heatmaply
## ======================
library(visdat)
## Warning: package 'visdat' was built under R version 4.0.5
library(reshape2)
## Warning: package 'reshape2' was built under R version 4.0.5
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.0.5
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:reshape2':
##
## smiths
library(ggplot2)
library(psych)
## Warning: package 'psych' was built under R version 4.0.5
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(DataExplorer)
## Warning: package 'DataExplorer' was built under R version 4.0.5
library(plotly)
library(httr)
## Warning: package 'httr' was built under R version 4.0.5
##
## Attaching package: 'httr'
## The following object is masked from 'package:plotly':
##
## config
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.0.5
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(rjson)
## Warning: package 'rjson' was built under R version 4.0.3
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.0.5
library(RCurl)
## Warning: package 'RCurl' was built under R version 4.0.5
##
## Attaching package: 'RCurl'
## The following object is masked from 'package:tidyr':
##
## complete
# Mengakses Data Set
df<- read.csv("D:/1. KULIAH/Lain-lain/PSDS/Kelas Dasar 3.0/World_Happiness_Report_2019.csv")
df[df==""] <- NA
df[0:10,]
## Overall.rank Country.or.region Score GDP.per.capita Social.support
## 1 1 Finland 7.769 1.340 1.587
## 2 2 Denmark 7.600 1.383 1.573
## 3 3 Norway 7.554 1.488 1.582
## 4 4 Iceland 7.494 1.380 1.624
## 5 5 Netherlands 7.488 1.396 1.522
## 6 6 Switzerland 7.480 1.452 1.526
## 7 7 Sweden 7.343 1.387 1.487
## 8 8 New Zealand 7.307 1.303 1.557
## 9 9 Canada 7.278 1.365 1.505
## 10 10 Austria 7.246 1.376 1.475
## Healthy.life.expectancy Freedom.to.make.life.choices Generosity
## 1 0.986 0.596 0.153
## 2 0.996 0.592 0.252
## 3 1.028 0.603 0.271
## 4 1.026 0.591 0.354
## 5 0.999 0.557 0.322
## 6 1.052 0.572 0.263
## 7 1.009 0.574 0.267
## 8 1.026 0.585 0.330
## 9 1.039 0.584 0.285
## 10 1.016 0.532 0.244
## Perceptions.of.corruption
## 1 0.393
## 2 0.410
## 3 0.341
## 4 0.118
## 5 0.298
## 6 0.343
## 7 0.373
## 8 0.380
## 9 0.308
## 10 0.226
Melakukan analisis untuk memperoleh faktor yang paling mempengaruhi tingkat kebahagiaan.
Variabel-variabel yang terdapat pada dataset ini adalah sebagai berikut:
## DIMENSI DATA
dim(df)
## [1] 156 9
Artinya kita memiliki data dengan 9 kolom dan 156 baris.
## VARIABEL DATA SET
names(df)
## [1] "Overall.rank" "Country.or.region"
## [3] "Score" "GDP.per.capita"
## [5] "Social.support" "Healthy.life.expectancy"
## [7] "Freedom.to.make.life.choices" "Generosity"
## [9] "Perceptions.of.corruption"
# Mengubah nama setiap kolom terkhusus kolom yang akan dianalisis
names(df)[1]<-paste("Rank")
names(df)[2]<-paste("Country")
names(df)[6]<-paste("Life Expectancy")
names(df)[7]<-paste("Freedom")
names(df)
## [1] "Rank" "Country"
## [3] "Score" "GDP.per.capita"
## [5] "Social.support" "Life Expectancy"
## [7] "Freedom" "Generosity"
## [9] "Perceptions.of.corruption"
str(df)
## 'data.frame': 156 obs. of 9 variables:
## $ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Country : chr "Finland" "Denmark" "Norway" "Iceland" ...
## $ Score : num 7.77 7.6 7.55 7.49 7.49 ...
## $ GDP.per.capita : num 1.34 1.38 1.49 1.38 1.4 ...
## $ Social.support : num 1.59 1.57 1.58 1.62 1.52 ...
## $ Life Expectancy : num 0.986 0.996 1.028 1.026 0.999 ...
## $ Freedom : num 0.596 0.592 0.603 0.591 0.557 0.572 0.574 0.585 0.584 0.532 ...
## $ Generosity : num 0.153 0.252 0.271 0.354 0.322 0.263 0.267 0.33 0.285 0.244 ...
## $ Perceptions.of.corruption: num 0.393 0.41 0.341 0.118 0.298 0.343 0.373 0.38 0.308 0.226 ...
## MENGECEK MISSING DATA
sapply(df, function(x) sum(is.na(x)))
## Rank Country Score
## 0 0 0
## GDP.per.capita Social.support Life Expectancy
## 0 0 0
## Freedom Generosity Perceptions.of.corruption
## 0 0 0
heatmaply_na(
df[1:20,],
showticklabels = c(TRUE, FALSE)
)
vis_miss(df)
Dari ketiga hasil di atas diperoleh bahwa tidak ada kolom dengan missing data. Jadi tidak perlu melakukan step hapus data.
## MENGECEK OUTLIER
num_cols <- unlist(lapply(df, is.numeric)) #Memilih kolom bertipe numerik
df_num <- df[ , num_cols]
boxplot(df_num)
Titik lingkaran di luar boxplot adalah outlier. Sehingga kolom yang mengandung outlier adalah Social.support, Life.Expectancy, Freedom, Generosity, Perceptions.
## MELIHAT KORELASI DATA
plot_correlation(df_num)
## MELIHAT STATISTIK DATA
summary(df)
## Rank Country Score GDP.per.capita
## Min. : 1.00 Length:156 Min. :2.853 Min. :0.0000
## 1st Qu.: 39.75 Class :character 1st Qu.:4.545 1st Qu.:0.6028
## Median : 78.50 Mode :character Median :5.380 Median :0.9600
## Mean : 78.50 Mean :5.407 Mean :0.9051
## 3rd Qu.:117.25 3rd Qu.:6.184 3rd Qu.:1.2325
## Max. :156.00 Max. :7.769 Max. :1.6840
## Social.support Life Expectancy Freedom Generosity
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.056 1st Qu.:0.5477 1st Qu.:0.3080 1st Qu.:0.1087
## Median :1.272 Median :0.7890 Median :0.4170 Median :0.1775
## Mean :1.209 Mean :0.7252 Mean :0.3926 Mean :0.1848
## 3rd Qu.:1.452 3rd Qu.:0.8818 3rd Qu.:0.5072 3rd Qu.:0.2482
## Max. :1.624 Max. :1.1410 Max. :0.6310 Max. :0.5660
## Perceptions.of.corruption
## Min. :0.0000
## 1st Qu.:0.0470
## Median :0.0855
## Mean :0.1106
## 3rd Qu.:0.1412
## Max. :0.4530
d <- melt(df_num)
## No id variables; using all as measure variables
ggplot(d,aes(x = value)) +
facet_wrap(~variable,scales = "free_x") +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## PAIR PLOT
pairs.panels(df_num,
method = "pearson", # correlation method
hist.col = "#00AFBB",
density = TRUE, # show density plots
ellipses = TRUE # show correlation ellipses
)
# SELEKSI KOLOM
df$Rank<- NULL
df$Score<- NULL
df$year<- NULL
head(df)
## Country GDP.per.capita Social.support Life Expectancy Freedom Generosity
## 1 Finland 1.340 1.587 0.986 0.596 0.153
## 2 Denmark 1.383 1.573 0.996 0.592 0.252
## 3 Norway 1.488 1.582 1.028 0.603 0.271
## 4 Iceland 1.380 1.624 1.026 0.591 0.354
## 5 Netherlands 1.396 1.522 0.999 0.557 0.322
## 6 Switzerland 1.452 1.526 1.052 0.572 0.263
## Perceptions.of.corruption
## 1 0.393
## 2 0.410
## 3 0.341
## 4 0.118
## 5 0.298
## 6 0.343
# MENGHAPUS OUTLIER
is_outlier <- function(x, na.rm = FALSE) {
qs = quantile(x, probs = c(0.25, 0.75), na.rm = na.rm)
lowerq <- qs[1]
upperq <- qs[2]
iqr = upperq - lowerq
extreme.threshold.upper = (iqr * 3) + upperq
extreme.threshold.lower = lowerq - (iqr * 3)
# Return logical vector
x > extreme.threshold.upper | x < extreme.threshold.lower
}
remove_outliers <- function(df, cols = names(df)) {
for (col in cols) {
cat("Removing outliers in column: ", col, " \n")
df<- df[!is_outlier(df[[col]]),]
}
df
}
vars_of_interest <- c("Social.support", "Life.Expectancy", "Freedom", "Generosity","Perceptions")
df_filtered <- remove_outliers(df, vars_of_interest)
## Removing outliers in column: Social.support
## Removing outliers in column: Life.Expectancy
## Removing outliers in column: Freedom
## Removing outliers in column: Generosity
## Removing outliers in column: Perceptions
boxplot(df_filtered)
# NORMALISASI DATA
unit_length <- function(x) {
x / sqrt(sum(x^2))
}
unit_length_df <- as.data.frame(lapply(df_num, unit_length))
head(unit_length_df)
## Rank Score GDP.per.capita Social.support Life.Expectancy
## 1 0.0008846904 0.1126892 0.1085420 0.10205270 0.1032818
## 2 0.0017693809 0.1102379 0.1120250 0.10115243 0.1043293
## 3 0.0026540713 0.1095707 0.1205302 0.10173118 0.1076813
## 4 0.0035387617 0.1087004 0.1117820 0.10443200 0.1074718
## 5 0.0044234522 0.1086133 0.1130781 0.09787285 0.1046436
## 6 0.0053081426 0.1084973 0.1176141 0.09813007 0.1101952
## Freedom Generosity Perceptions.of.corruption
## 1 0.1142277 0.05894826 0.21654835
## 2 0.1134611 0.09709125 0.22591558
## 3 0.1155693 0.10441162 0.18789564
## 4 0.1132694 0.13639009 0.06501961
## 5 0.1067531 0.12406104 0.16420206
## 6 0.1096279 0.10132936 0.18899767
library(plotly)
fig <- plot_ly(x = df['Country'],
y = df['GDP.per.capita'],
name = "GDP per capita",
line = list(color = 'rgb(22, 96, 167)'),
marker = list(color = 'rgb(22, 96, 167)'),
type = 'scatter', mode = 'lines+markers')
fig <- fig %>% add_trace (x = df['Country'],
y = df['Social.support'],
name = "Social support",
line = list(color = 'rgb(96, 167, 22)'),
marker = list(color = 'rgb(96, 167, 22)'),
type = 'scatter', mode = 'lines+markers')
fig <- fig %>% add_trace(x = df['Country'],
y = df['Freedom'],
name = "Freedom",
line = list(color = 'rgb(255, 0, 0)'),
marker = list(color = 'rgb(255, 0, 0)'),
type = 'scatter', mode = 'lines+markers')
fig <- fig %>% add_trace(x = df['Country'],
y = df['Generosity'],
name = "Generosity",
line = list(color = 'rgb(255, 255, 0)'),
marker = list(color = 'rgb(255, 255, 0)'),
type = 'scatter', mode = 'lines+markers')
fig <-fig %>% add_trace(x = df['Country'],
y = df['Perceptions.of.corruption'],
name = "Perceptions of corruption",
line = list(color = 'rgb(0, 0, 128)'),
marker = list(color = 'rgb(0, 0, 128)'),
type = 'scatter', mode = 'lines+markers')
fig <- fig %>% layout(title = "Faktor Yang Mempengaruhi Tingkat Kebahagiaan di Dunia",
xaxis = list(title = "Country"),
yaxis = list (title = "Happiness"))
fig