Bai toan: Lay so lieu ve GDP per capital va Population tu co so du lieu cua WB. Lay so lieu ve Healthy life expectancy tu du lieu cua WB. Ghep hai bang so lieu nay lai de mo ta moi quan he giua GDP per capital va Healthy life expectancy.

#Lay du lieu ve GDP va GDP per capital tu WB
library(WDI)
library(tidyverse)
library(ggrepel)
library(ggthemes)
mydf <- WDI(country = "all", 
            start = 2015, 
            end = 2015, 
            cache = NULL, 
            indicator = c("SP.POP.TOTL", 
                          # T???ng GDP: 
                          "NY.GDP.MKTP.CD", 
                          # Tu???i th??? b??nh qu??n: 
                          "SP.DYN.LE00.IN", 
                          # GDP ?????u ng?????i: 
                          "NY.GDP.PCAP.PP.CD",
                          # T??? l???  sinh con c???a  ph??? n???: 
                          "SP.DYN.TFRT.IN",
                          # T??? l??? ????i ngh??o: 
                          "SI.POV.2DAY"))

d <- WDIcache() 
d2 <- data.frame(d[[2]])
# only select year of 2015: 
nam2015 <- mydf %>% filter(year == 2015)
# Dataframe mydf co code country theo ma iso2c, d2 co code country theo ma iso2c va iso3c. Hai df nay co chung bien country => co the ghep hai file nay theo bien country. So lieu cua who code country theo ma iso3c. De ghep duoc so lieu cua WB va WHO thi phai co mot bien chung, o day la ios3c, thi phai tao ra mydf co bien code country theo iso3c
# Xem cau truc so lieu cua mydf va d2
str(mydf)# Bien country cua mydf la dang chr,
## 'data.frame':    264 obs. of  9 variables:
##  $ iso2c            : chr  "1A" "1W" "4E" "7E" ...
##  $ country          : chr  "Arab World" "World" "East Asia & Pacific (excluding high income)" "Europe & Central Asia (excluding high income)" ...
##  $ year             : num  2015 2015 2015 2015 2015 ...
##  $ SP.POP.TOTL      : num  3.92e+08 7.35e+09 2.04e+09 4.11e+08 1.74e+09 ...
##  $ NY.GDP.MKTP.CD   : num  2.56e+12 7.43e+13 1.33e+13 2.93e+12 2.68e+12 ...
##  $ SP.DYN.LE00.IN   : num  70.8 71.7 74.2 72.2 68.4 ...
##  $ NY.GDP.PCAP.PP.CD: num  16452 15691 12968 18441 5664 ...
##  $ SP.DYN.TFRT.IN   : num  3.33 2.45 1.82 1.93 2.52 ...
##  $ SI.POV.2DAY      : num  NA NA NA NA NA NA NA NA NA NA ...
str(d2)# Bien contry cua d2 la Factor
## 'data.frame':    304 obs. of  9 variables:
##  $ iso3c    : Factor w/ 304 levels "ABW","AFG","AFR",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ iso2c    : Factor w/ 304 levels "1A","1W","4E",..: 25 16 13 20 18 14 147 1 15 21 ...
##  $ country  : Factor w/ 304 levels "Afghanistan",..: 13 1 2 8 3 7 6 10 290 11 ...
##  $ region   : Factor w/ 8 levels "Aggregates","East Asia & Pacific",..: 4 7 1 8 3 3 1 1 5 4 ...
##  $ capital  : Factor w/ 212 levels "","Abu Dhabi",..: 134 80 1 100 191 10 1 1 2 43 ...
##  $ longitude: Factor w/ 212 levels "","-0.126236",..: 47 195 1 94 126 70 1 1 187 27 ...
##  $ latitude : Factor w/ 212 levels "","-0.229498",..: 58 122 1 42 146 151 1 1 100 31 ...
##  $ income   : Factor w/ 5 levels "Aggregates","High income",..: 2 3 1 5 5 2 1 1 2 5 ...
##  $ lending  : Factor w/ 5 levels "Aggregates","Blend",..: 5 4 1 3 3 5 1 1 5 3 ...
# De ghep duoc 2 df nay thi bien chung country phai cung mot loai=> Vay ta phai chuyen bien country cua d2 ve dang chr
d2$country <- as.character(d2$country)
#Ghep 2 dataframe mydf va d2
wbdata <- right_join(mydf, d2, by = "country")
##Lay du lieu Healthy life expectancy
path <- "http://apps.who.int/gho/athena/data/GHO/WHOSIS_000002,WHOSIS_000007?filter=COUNTRY:*&x-sideaxis=COUNTRY&x-topaxis=GHO;SEX;YEAR&profile=verbose&format=csv"
whodata <- read.csv(path)
#Xem so lieu vua moi lay ve
dim(whodata)
## [1] 2196   23
str(whodata)
## 'data.frame':    2196 obs. of  23 variables:
##  $ GHO..CODE.            : Factor w/ 2 levels "WHOSIS_000002",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ GHO..DISPLAY.         : Factor w/ 2 levels "Healthy life expectancy (HALE) at age 60 (years)",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ GHO..URL.             : Factor w/ 1 level "http://apps.who.int/gho/indicatorregistry/App_Main/view_indicator.aspx?iid=66": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PUBLISHSTATE..CODE.   : Factor w/ 1 level "PUBLISHED": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PUBLISHSTATE..DISPLAY.: Factor w/ 1 level "Published": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PUBLISHSTATE..URL.    : logi  NA NA NA NA NA NA ...
##  $ YEAR..CODE.           : int  2000 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ YEAR..DISPLAY.        : int  2000 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ YEAR..URL.            : logi  NA NA NA NA NA NA ...
##  $ REGION..CODE.         : Factor w/ 6 levels "AFR","AMR","EMR",..: 3 1 5 5 4 4 4 3 3 2 ...
##  $ REGION..DISPLAY.      : Factor w/ 6 levels "Africa","Americas",..: 3 1 5 5 4 4 4 3 3 2 ...
##  $ REGION..URL.          : logi  NA NA NA NA NA NA ...
##  $ COUNTRY..CODE.        : Factor w/ 183 levels "AFG","AGO","ALB",..: 4 14 15 15 16 16 16 17 17 18 ...
##  $ COUNTRY..DISPLAY.     : Factor w/ 183 levels "Afghanistan",..: 172 26 13 13 25 25 25 12 12 11 ...
##  $ COUNTRY..URL.         : logi  NA NA NA NA NA NA ...
##  $ SEX..CODE.            : Factor w/ 3 levels "BTSX","FMLE",..: 3 1 2 1 3 2 1 3 2 3 ...
##  $ SEX..DISPLAY.         : Factor w/ 3 levels "Both sexes","Female",..: 3 1 2 1 3 2 1 3 2 3 ...
##  $ SEX..URL.             : logi  NA NA NA NA NA NA ...
##  $ Display.Value         : num  64.8 52.6 62.9 62.4 63.8 69.2 66.4 66.9 67 64.8 ...
##  $ Numeric               : num  64.8 52.6 62.9 62.4 63.8 ...
##  $ Low                   : logi  NA NA NA NA NA NA ...
##  $ High                  : logi  NA NA NA NA NA NA ...
##  $ Comments              : logi  NA NA NA NA NA NA ...
View(whodata)
# So lieu nay lay cho nam 2010 va 2015, cho Male, Female va Bothsex. Trong pham vi bai nay chi quan tam den so lieu nam 2015 va cho both sex. Do vay, can phai tach so lieu nay ra cho nam 2015 va bothsex. 
whodata <-  whodata %>% filter(.$YEAR..CODE.== 2015 & .$SEX..CODE.=="BTSX")


# Hai dataframe can ghep vao nhau la wbdata va whodata co chung code nuoc theo kieu iso3c, tuy nhien trong whodata thi ten bien lai la COUNTRY..CODE. nen ta phai doi ten bien nay sao cho thong nhat voi bien do o wbdata la "iso3c"
whodata <- whodata %>% rename(iso3c = COUNTRY..CODE.)
#Xem cau truc du lieu cua whodata
str(whodata)
## 'data.frame':    366 obs. of  23 variables:
##  $ GHO..CODE.            : Factor w/ 2 levels "WHOSIS_000002",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ GHO..DISPLAY.         : Factor w/ 2 levels "Healthy life expectancy (HALE) at age 60 (years)",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ GHO..URL.             : Factor w/ 1 level "http://apps.who.int/gho/indicatorregistry/App_Main/view_indicator.aspx?iid=66": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PUBLISHSTATE..CODE.   : Factor w/ 1 level "PUBLISHED": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PUBLISHSTATE..DISPLAY.: Factor w/ 1 level "Published": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PUBLISHSTATE..URL.    : logi  NA NA NA NA NA NA ...
##  $ YEAR..CODE.           : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ YEAR..DISPLAY.        : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ YEAR..URL.            : logi  NA NA NA NA NA NA ...
##  $ REGION..CODE.         : Factor w/ 6 levels "AFR","AMR","EMR",..: 1 5 4 1 1 6 1 4 4 1 ...
##  $ REGION..DISPLAY.      : Factor w/ 6 levels "Africa","Americas",..: 1 5 4 1 1 6 1 4 4 1 ...
##  $ REGION..URL.          : logi  NA NA NA NA NA NA ...
##  $ iso3c                 : Factor w/ 183 levels "AFG","AGO","ALB",..: 14 15 16 34 35 58 59 80 81 99 ...
##  $ COUNTRY..DISPLAY.     : Factor w/ 183 levels "Afghanistan",..: 26 13 25 30 46 107 60 74 80 93 ...
##  $ COUNTRY..URL.         : logi  NA NA NA NA NA NA ...
##  $ SEX..CODE.            : Factor w/ 3 levels "BTSX","FMLE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ SEX..DISPLAY.         : Factor w/ 3 levels "Both sexes","Female",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ SEX..URL.             : logi  NA NA NA NA NA NA ...
##  $ Display.Value         : num  52.6 62.4 66.4 50.3 51.8 62.5 57.2 72.7 72.8 46.6 ...
##  $ Numeric               : num  52.6 62.4 66.4 50.3 51.8 ...
##  $ Low                   : logi  NA NA NA NA NA NA ...
##  $ High                  : logi  NA NA NA NA NA NA ...
##  $ Comments              : logi  NA NA NA NA NA NA ...
str(wbdata)
## 'data.frame':    304 obs. of  17 variables:
##  $ iso2c.x          : chr  "AW" "AF" NA "AO" ...
##  $ country          : chr  "Aruba" "Afghanistan" "Africa" "Angola" ...
##  $ year             : num  2015 2015 NA 2015 2015 ...
##  $ SP.POP.TOTL      : num  103889 32526562 NA 25021974 2889167 ...
##  $ NY.GDP.MKTP.CD   : num  NA 1.93e+10 NA 1.03e+11 1.14e+10 ...
##  $ SP.DYN.LE00.IN   : num  75.6 60.7 NA 52.7 78 ...
##  $ NY.GDP.PCAP.PP.CD: num  NA 1925 NA 7387 11479 ...
##  $ SP.DYN.TFRT.IN   : num  1.65 4.65 NA 6 1.79 ...
##  $ SI.POV.2DAY      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ iso3c            : Factor w/ 304 levels "ABW","AFG","AFR",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ iso2c.y          : Factor w/ 304 levels "1A","1W","4E",..: 25 16 13 20 18 14 147 1 15 21 ...
##  $ region           : Factor w/ 8 levels "Aggregates","East Asia & Pacific",..: 4 7 1 8 3 3 1 1 5 4 ...
##  $ capital          : Factor w/ 212 levels "","Abu Dhabi",..: 134 80 1 100 191 10 1 1 2 43 ...
##  $ longitude        : Factor w/ 212 levels "","-0.126236",..: 47 195 1 94 126 70 1 1 187 27 ...
##  $ latitude         : Factor w/ 212 levels "","-0.229498",..: 58 122 1 42 146 151 1 1 100 31 ...
##  $ income           : Factor w/ 5 levels "Aggregates","High income",..: 2 3 1 5 5 2 1 1 2 5 ...
##  $ lending          : Factor w/ 5 levels "Aggregates","Blend",..: 5 4 1 3 3 5 1 1 5 3 ...
View(whodata)
View(wbdata)
# Bien iso3c o ca hai data da co dang giong nhau la factor => Ghep
wbwhodata <- right_join(whodata, wbdata, by = "iso3c")
View(wbwhodata)
# Ve hinh don gian
library(ggplot2)
p <- wbwhodata %>% 
  ggplot(aes(.$NY.GDP.PCAP.PP.CD, .$Display.Value, size = .$SP.POP.TOTL, colour = .$income)) +
  geom_point(alpha = 0.4, show.legend = FALSE) + 
  scale_size(range = c(1, 20))
p

## Co ve nhu so lieu ve Healthy life expectancy khong dang tin cay lam, vi co mot nhom nuoc phia ben duoi cua do thi co so nam song khoe duoi 20 nam, tham chi duoi 10 nam, bao gom ca nhung nuoc co thu nhap trung binh (mau tim), va thu nhap cao (mau vang). Chuyen gi dang dien ra o day?

Van de nam o cho so lieu ve HALE co hai loai, at birth, tuc la so nam song khoe tinh tu khi sinh ra, va at age 60, tuc la so nam song khoe tinh tu 60 tuoi tro len. Do vay ket luan tu hinh ve o phan tren co the gay hieu lam neu khong chi ro khu vuc nao la ve cho HALE at birth va khu vuc nao la HALE at age 60. Ta se loc so lieu ra va ve la cho HALE at birth nhu duoi day

str(wbwhodata)
## 'data.frame':    487 obs. of  39 variables:
##  $ GHO..CODE.            : Factor w/ 2 levels "WHOSIS_000002",..: NA 1 2 NA 1 2 1 2 NA NA ...
##  $ GHO..DISPLAY.         : Factor w/ 2 levels "Healthy life expectancy (HALE) at age 60 (years)",..: NA 2 1 NA 2 1 2 1 NA NA ...
##  $ GHO..URL.             : Factor w/ 1 level "http://apps.who.int/gho/indicatorregistry/App_Main/view_indicator.aspx?iid=66": NA 1 1 NA 1 1 1 1 NA NA ...
##  $ PUBLISHSTATE..CODE.   : Factor w/ 1 level "PUBLISHED": NA 1 1 NA 1 1 1 1 NA NA ...
##  $ PUBLISHSTATE..DISPLAY.: Factor w/ 1 level "Published": NA 1 1 NA 1 1 1 1 NA NA ...
##  $ PUBLISHSTATE..URL.    : logi  NA NA NA NA NA NA ...
##  $ YEAR..CODE.           : int  NA 2015 2015 NA 2015 2015 2015 2015 NA NA ...
##  $ YEAR..DISPLAY.        : int  NA 2015 2015 NA 2015 2015 2015 2015 NA NA ...
##  $ YEAR..URL.            : logi  NA NA NA NA NA NA ...
##  $ REGION..CODE.         : Factor w/ 6 levels "AFR","AMR","EMR",..: NA 3 3 NA 1 1 4 4 NA NA ...
##  $ REGION..DISPLAY.      : Factor w/ 6 levels "Africa","Americas",..: NA 3 3 NA 1 1 4 4 NA NA ...
##  $ REGION..URL.          : logi  NA NA NA NA NA NA ...
##  $ iso3c                 : chr  "ABW" "AFG" "AFG" "AFR" ...
##  $ COUNTRY..DISPLAY.     : Factor w/ 183 levels "Afghanistan",..: NA 1 1 NA 4 4 2 2 NA NA ...
##  $ COUNTRY..URL.         : logi  NA NA NA NA NA NA ...
##  $ SEX..CODE.            : Factor w/ 3 levels "BTSX","FMLE",..: NA 1 1 NA 1 1 1 1 NA NA ...
##  $ SEX..DISPLAY.         : Factor w/ 3 levels "Both sexes","Female",..: NA 1 1 NA 1 1 1 1 NA NA ...
##  $ SEX..URL.             : logi  NA NA NA NA NA NA ...
##  $ Display.Value         : num  NA 52.3 11.3 NA 45.9 11.7 68.8 16.9 NA NA ...
##  $ Numeric               : num  NA 52.3 11.3 NA 45.9 ...
##  $ Low                   : logi  NA NA NA NA NA NA ...
##  $ High                  : logi  NA NA NA NA NA NA ...
##  $ Comments              : logi  NA NA NA NA NA NA ...
##  $ iso2c.x               : chr  "AW" "AF" "AF" NA ...
##  $ country               : chr  "Aruba" "Afghanistan" "Afghanistan" "Africa" ...
##  $ year                  : num  2015 2015 2015 NA 2015 ...
##  $ SP.POP.TOTL           : num  103889 32526562 32526562 NA 25021974 ...
##  $ NY.GDP.MKTP.CD        : num  NA 1.93e+10 1.93e+10 NA 1.03e+11 ...
##  $ SP.DYN.LE00.IN        : num  75.6 60.7 60.7 NA 52.7 ...
##  $ NY.GDP.PCAP.PP.CD     : num  NA 1925 1925 NA 7387 ...
##  $ SP.DYN.TFRT.IN        : num  1.65 4.65 4.65 NA 6 ...
##  $ SI.POV.2DAY           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ iso2c.y               : Factor w/ 304 levels "1A","1W","4E",..: 25 16 16 13 20 20 18 18 14 147 ...
##  $ region                : Factor w/ 8 levels "Aggregates","East Asia & Pacific",..: 4 7 7 1 8 8 3 3 3 1 ...
##  $ capital               : Factor w/ 212 levels "","Abu Dhabi",..: 134 80 80 1 100 100 191 191 10 1 ...
##  $ longitude             : Factor w/ 212 levels "","-0.126236",..: 47 195 195 1 94 94 126 126 70 1 ...
##  $ latitude              : Factor w/ 212 levels "","-0.229498",..: 58 122 122 1 42 42 146 146 151 1 ...
##  $ income                : Factor w/ 5 levels "Aggregates","High income",..: 2 3 3 1 5 5 5 5 2 1 ...
##  $ lending               : Factor w/ 5 levels "Aggregates","Blend",..: 5 4 4 1 3 3 3 3 5 1 ...
library(tidyverse)
wbwhodata1 <-filter(wbwhodata, GHO..DISPLAY. == "Healthy life expectancy (HALE) at birth (years)")
library(ggplot2)
library(ggrepel)
note <- c("Vietnam", "China", "India", "Japan", "Thailand", 
          "Philippines", "United States", "Indonesia", "Malaysia", "France", "Ukraine", 
          "Singapore", "Spain", "Australia", "Russian Federation", "Sweeden")
p <- wbwhodata1 %>% 
  ggplot(aes(.$NY.GDP.PCAP.PP.CD, .$Display.Value, size = .$SP.POP.TOTL, colour = .$income)) +
  geom_point(alpha = 0.4, show.legend = FALSE) + 
  scale_size(range = c(1, 30)) +
     labs(x = "GDP per capital", 
          y = "Healthy Life expectancy at birth", 
          title = "The relationship between Healthy Life Expectancy and GDP per capital", 
          caption =  "Data Source: The World Bank ")

p

library(plotly)
ggplotly(p)