Stat 579 - Homework #8

Human Development Index

The United Nations Development Programme publishes an annual report on human development across the globe. An interactive tool to visualize some of the data is available at http://hdr.undp.org/en/data

2.

Download the data for 2018 from the left side tab stating Download all 2018 Human Development data. This will lead you to an Excel document. Open it and comment on the structure. Focus on table 2: what problems do you see with the file format? List three problems and explain.

The Table 2 contains Countries and their HDI overal rank as well as their HDI over time and changes in their classification. One main problem is that the data is messy and the countries with the same overal Classification are in the same category, for example Very High HDI , High HDI, etc. ;However, their classification over time has not been considered. For instance, Ireland belongs to High HDI in 1990 and to Very High HDI in 2017, but it is in Very HDI category.

3.

There is an R package available online (on CRAN) that allows to read Excel files. Find the package, (install it if necessary), load it and try to read the second table of the file “2018_statistical_annex_all.xlsx” into R (make use of the available parameters). Compare the result in R with what is visible in Excel. Keep track of and comment on all steps in the process. For the write-up assume, that the Excel file is in the same folder as the homework Rmd file. Also assume, that both the TA and I have an updated version of the package installed, i.e. only load the package but do not install it in the code.

library(readxl)
d<- read_excel(path = "2018_statistical_annex_all.xlsx", sheet = "Table 2",
               range = cell_limits(c(4, "A"), c(205,"Q")))%>%
  select("Country",  "Value", "X__2", "X__4", "X__6", "X__8", "X__10", "X__12",
         "X__14" )

## Warning in cell_limits(c(4, "A"), c(205, "Q")): NAs introduced by coercion

## Warning in cell_limits(c(4, "A"), c(205, "Q")): NAs introduced by coercion

d%>%head()

## # A tibble: 6 x 9
##   Country    Value   X__2    X__4    X__6    X__8    X__10   X__12  X__14 
##   <chr>      <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  <chr> 
## 1 <NA>       1990    2000    2010    2012    2014    2015    2016   2017  
## 2 VERY HIGH~ <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>  
## 3 Norway     0.8500~ 0.9166~ 0.9415~ 0.9421~ 0.9456~ 0.9480~ 0.951~ 0.952~
## 4 Switzerla~ 0.8319~ 0.8890~ 0.9319~ 0.9351~ 0.9393~ 0.9418~ 0.943~ 0.943~
## 5 Australia  0.8662~ 0.8982~ 0.9230~ 0.9287~ 0.9332~ 0.9356~ 0.937~ 0.938~
## 6 Ireland    0.7634~ 0.8566~ 0.9094~ 0.9017~ 0.9211~ 0.9292~ 0.934~ 0.938~

In reading the excel file into R, the difference is that any empty column in the excel sheet is reported as NA in R output. HDI are considred characters in this data set. Column names are different from the excel file and it is difficult to read them in R.

4.

Introduce names to the data extracted for Table 2.

names(d)<-c("Country", 1990, 2000, 2010, 2012, 2014, 2015,2016,
            "2017")

d%>% head()

## # A tibble: 6 x 9
##   Country   `1990`  `2000`  `2010`  `2012`  `2014`  `2015`  `2016`  `2017`
##   <chr>     <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr> 
## 1 <NA>      1990    2000    2010    2012    2014    2015    2016    2017  
## 2 VERY HIG~ <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>  
## 3 Norway    0.8500~ 0.9166~ 0.9415~ 0.9421~ 0.9456~ 0.9480~ 0.9512~ 0.952~
## 4 Switzerl~ 0.8319~ 0.8890~ 0.9319~ 0.9351~ 0.9393~ 0.9418~ 0.9431~ 0.943~
## 5 Australia 0.8662~ 0.8982~ 0.9230~ 0.9287~ 0.9332~ 0.9356~ 0.9378~ 0.938~
## 6 Ireland   0.7634~ 0.8566~ 0.9094~ 0.9017~ 0.9211~ 0.9292~ 0.9344~ 0.938~

5.

Use tidyr and dplyr functionality to get the data set into the following format (Classification of HDI is very high for [0.8,1], high for [0.7, .8), medium for [.55, .7) and low below .55):

d<-d[-c(1,2, 62,116,156,195),]

d<-d%>%
  gather(key=Year, value = HDI, 2:9 )
#make classification
Classification<-d$HDI
Classification[(Classification<1) & (Classification >=0.8)]<-"very high"
Classification[(Classification<.8) & (Classification >=0.7)]<-"high"
Classification[(Classification<.7) & (Classification >=0.55)]<-"medium"
Classification[Classification<0.55]<-"low"

d<-d%>%
  mutate(Classification= as.factor(Classification),
         HDI = as.numeric(HDI))

## Warning in evalq(as.numeric(HDI), <environment>): NAs introduced by
## coercion

d%>% head()

## # A tibble: 6 x 4
##   Country     Year    HDI Classification
##   <chr>       <chr> <dbl> <fct>         
## 1 Norway      1990  0.850 very high     
## 2 Switzerland 1990  0.832 very high     
## 3 Australia   1990  0.866 very high     
## 4 Ireland     1990  0.763 high          
## 5 Germany     1990  0.801 very high     
## 6 Iceland     1990  0.802 very high

6.

Use ggplot to draw a plot of HDI over time for the US, Norway, Germany, and Canada.

d%>%
  filter(Country %in% c( "United States", "Norway", "Germany", "Canada"))%>%
  ggplot(aes(x = `Year`, y = `HDI`, color=Country, group= Country))+
  geom_point()+
  geom_line()

7.

How many countries switched classifications between 2000 and 2017? How many countries went up, how many countries went down?

d1<-d%>%
  filter(Year %in% c(2000, 2017))%>%
  select(-HDI)%>%
  spread(key = Year, value = Classification)%>%
  filter(!`2000`== `2017`)
d1%>%head()

## # A tibble: 6 x 3
##   Country             `2000` `2017`   
##   <chr>               <fct>  <fct>    
## 1 Albania             medium high     
## 2 Algeria             medium high     
## 3 Andorra             high   very high
## 4 Angola              low    medium   
## 5 Antigua and Barbuda low    high     
## 6 Argentina           high   very high

d1%>%tally()

## # A tibble: 1 x 1
##       n
##   <int>
## 1    95

Ther are 95 countries that switched classification between the years 2000 and 2017

d%>%
  filter(Year %in% c(2000, 2017))%>%
  select(-HDI)%>%
  spread(key = Year, value = Classification)%>%
  filter(!`2000`== `2017`)%>%
  filter(`2000` %in% c("very high", "high", "medium") &
           !`2017` %in% c("very high", "high", "medium"))

## # A tibble: 1 x 3
##   Country              `2000` `2017`
##   <chr>                <fct>  <fct> 
## 1 Syrian Arab Republic medium low

There is only one country which went down from the Medium classification for HDI to Low classification.

d2<-d%>%
  filter(Year %in% c(2000, 2017))%>%
  select(-HDI)%>%
  spread(key = Year, value = Classification)%>%
  filter(!`2000`== `2017`)%>%
  filter(`2000` %in% c("low", "medium", "high") &
           `2017` %in% c( "medium", "high", "very high"))
d2%>%head()

## # A tibble: 6 x 3
##   Country             `2000` `2017`   
##   <chr>               <fct>  <fct>    
## 1 Albania             medium high     
## 2 Algeria             medium high     
## 3 Andorra             high   very high
## 4 Angola              low    medium   
## 5 Antigua and Barbuda low    high     
## 6 Argentina           high   very high

d2%>%tally()

## # A tibble: 1 x 1
##       n
##   <int>
## 1    94

94 countries went up in HDI classification between 2000 and 2017.

Stat 579 - Homework #8

Amin Shirazi

10/24/2018

Human Development Index

2.

3.

4.

5.

6.

7.