This dataset was posted by classmate(Erinda Budo). It contains unemployment rates from 88 countries from year 1990 through 2017. The required analysis is to investigate annual unemployment rates from year 2011 to 2015 to answer the following questions:

(1) For the five year period from 2011 to 2015, what is the average annual unemployment rate for each country?
(2) For the five year period from 2011 to 2015, what is the distribution of the average annual unemployment rate?
(3) For the five year period from 2011 to 2015, what is the overall trend of the world’s annual unemployment rate?

Load the original data file from GitHub.

UnemploymentRate <- read.csv("https://raw.githubusercontent.com/SieSiongWong/DATA-607/master/UnemploymentRate.csv", header=TRUE, sep=",",)

Load the packages required to tidy and transform the data.

library(dplyr)
library(tidyr)
library(ggplot2)
library(reshape2)
library(stringr)

Review the dataset.

head(UnemploymentRate)
##      X Advanced.Economies Argentina Australia  Austria  Belgium Bulgaria
## 1 1990           5.800582        NA  6.943297 5.373002 6.550260       NA
## 2 1991           6.728688        NA  9.614137 5.823096 6.439812       NA
## 3 1992           7.511064        NA 10.750080 5.941711 7.088092 13.23500
## 4 1993           7.936175        NA 10.866170 6.811381 8.619130 15.85583
## 5 1994           7.715897        NA  9.705695 6.545480 9.753554 14.06583
## 6 1995           7.264255        NA  8.471058 6.589767 9.674164 11.38583
##   Bahrain Belarus Brazil    Canada Switzerland Chile China Colombia Cyprus
## 1      NA      NA     NA  8.150000    0.501328    NA    NA       NA     NA
## 2      NA      NA     NA 10.316670    1.090451    NA    NA       NA     NA
## 3      NA      NA     NA 11.216670    2.563105    NA    NA       NA     NA
## 4      NA      NA     NA 11.375000    4.516116    NA    NA       NA     NA
## 5      NA      NA     NA 10.391670    4.718465    NA    NA       NA     NA
## 6      NA      NA     NA  9.466667    4.232892    NA    NA       NA     NA
##   Czech.Republic  Germany Denmark Dominican.Republic Algeria
## 1             NA       NA      NA                 NA    25.0
## 2             NA 4.864885      NA                 NA    25.0
## 3             NA 5.764563      NA                 NA    27.0
## 4       4.333333 6.931370      NA                 NA    23.2
## 5       4.283333 7.340639      NA                 NA    24.4
## 6       4.033333 7.091997      NA                 NA    28.1
##   EMDE.East.Asia...Pacific EMDE.Europe...Central.Asia Ecuador
## 1                       NA                         NA      NA
## 2                       NA                         NA      NA
## 3                       NA                         NA      NA
## 4                       NA                         NA      NA
## 5                       NA                         NA      NA
## 6                       NA                   9.683161      NA
##   Egypt..Arab.Rep. Emerging.Market.and.Developing.Economies..EMDEs.
## 1               NA                                               NA
## 2               NA                                               NA
## 3               NA                                               NA
## 4               NA                                               NA
## 5               NA                                               NA
## 6               NA                                               NA
##      Spain Estonia   Finland France United.Kingdom Greece
## 1 15.48333   0.650  3.103129  7.625       7.091667     NA
## 2 15.51667   1.475  6.666424  7.800       8.825000     NA
## 3 17.06667   3.725 11.796830  8.650       9.966667     NA
## 4 20.83333   6.550 16.384210  9.650      10.400000     NA
## 5 22.05000   7.550 16.534420 10.250       9.500000     NA
## 6 20.79167   9.750 15.426480  9.675       8.658333     NA
##   High.Income.Countries Hong.Kong.SAR..China Croatia Hungary India
## 1              5.619945             1.318868      NA      NA    NA
## 2              6.771918             1.750180      NA      NA    NA
## 3              7.693434             1.946343      NA      NA    NA
## 4              8.192391             1.979785      NA      NA    NA
## 5              8.052012             1.911372      NA      NA    NA
## 6              7.566117             3.040433      NA      NA    NA
##    Ireland Iceland Israel Italy Jordan    Japan Kazakhstan Korea..Rep.
## 1 13.41667      NA     NA    NA     NA 2.108117         NA          NA
## 2 14.73333      NA     NA    NA     NA 2.099018         NA          NA
## 3 15.40000      NA     NA    NA     NA 2.151389         NA          NA
## 4 15.63333      NA     NA    NA   19.7 2.503291         NA          NA
## 5 14.35000      NA     NA    NA   15.8 2.890953         NA          NA
## 6 12.28333      NA     NA    NA   15.3 3.153574         NA          NA
##   EMDE.Latin.America...Caribbean Low.Income.Countries..LIC. Sri.Lanka
## 1                             NA                         NA      15.9
## 2                             NA                         NA      14.7
## 3                             NA                         NA      14.6
## 4                             NA                         NA      13.8
## 5                             NA                         NA      13.1
## 6                             NA                         NA      12.3
##   Lithuania Luxembourg   Latvia Morocco Moldova..Rep. Mexico
## 1        NA         NA       NA      NA            NA     NA
## 2        NA         NA       NA      NA            NA     NA
## 3        NA         NA       NA      NA            NA     NA
## 4  4.191667         NA 4.658333      NA            NA     NA
## 5  3.625000         NA 6.358333      NA            NA     NA
## 6  6.116667   2.600765 6.350000      NA            NA     NA
##   Middle.Income.Countries..MIC. North.Macedonia Malta
## 1                            NA              NA    NA
## 2                            NA              NA    NA
## 3                            NA              NA    NA
## 4                            NA              NA    NA
## 5                            NA              NA    NA
## 6                            NA              NA    NA
##   EMDE.Middle.East...N..Africa Netherlands   Norway New.Zealand Pakistan
## 1                           NA          NA 5.783333    7.984591     3.13
## 2                           NA          NA 6.041667   10.611440     6.28
## 3                           NA          NA 6.550000   10.644730     5.85
## 4                           NA          NA 6.608333    9.800159     4.73
## 5                           NA          NA 6.000000    8.342465     4.84
## 6                           NA          NA 5.441667    6.451948     5.37
##   Peru Philippines    Poland Portugal   Romania Russian.Federation
## 1   NA          NA  3.441667       NA        NA                 NA
## 2   NA      10.475  9.008333       NA        NA                 NA
## 3   NA       9.850 12.933330       NA  5.450000                 NA
## 4   NA       9.350 15.033330       NA  9.208333                 NA
## 5   NA       9.550 16.508330       NA 10.975000           7.006540
## 6   NA       9.500 15.225000 7.150996  9.975000           8.308334
##   EMDE.South.Asia Saudi.Arabia Singapore EMDE.Sub.Saharan.Africa Slovakia
## 1              NA           NA        NA                      NA       NA
## 2              NA           NA     1.750                      NA  7.05000
## 3              NA           NA     1.800                      NA 11.31833
## 4              NA           NA     1.675                      NA 12.85500
## 5              NA           NA     1.725                      NA 14.62917
## 6              NA           NA     1.725                      NA 13.68083
##   Slovenia    Sweden Thailand Tunisia Turkey Taiwan..China Uruguay
## 1       NA  2.239701       NA      NA     NA      1.658333      NA
## 2       NA  4.005607       NA      NA     NA      1.533333      NA
## 3 11.56667  7.110956       NA      NA     NA      1.500000      NA
## 4 14.57500 11.146890       NA      NA     NA      1.425000      NA
## 5 14.55000 10.766190       NA      NA     NA      1.566667      NA
## 6 14.04167 10.421390       NA      NA     NA      1.808333      NA
##   United.States Venezuela..RB Vietnam World..WBG.members. South.Africa
## 1      5.616667            NA      NA                  NA           NA
## 2      6.850000            NA      NA                  NA           NA
## 3      7.491667            NA      NA                  NA           NA
## 4      6.908333            NA      NA                  NA           NA
## 5      6.100000            NA      NA                  NA           NA
## 6      5.591667            NA      NA                  NA           NA

Clean the data.

## Remove non-country columns..
UnemploymentRate <- UnemploymentRate[, -c(2,22,23,26,33,46,47,55,58,69,72,84)]
                                     
## Rename the columns 1.
UnemploymentRate <- UnemploymentRate %>% rename("Year"="X")

## Remove the rows which year before 2011 and year after 2015.
UnemploymentRate <- UnemploymentRate[-c(1:21,27:31),]

## Reorder the data frame row number.
rownames(UnemploymentRate) <- 1:nrow(UnemploymentRate)

Rehape the clean data.

## Convert the dataset to long form and also eliminate NAs.
UnemploymentRate <- UnemploymentRate %>% melt(UnemploymentRate, id.vars=c("Year"), measure.vars=2:ncol(UnemploymentRate), variable.name="Country", value.name="Unemployment_Rate", na.rm=TRUE)
                                              
## Removing dot in country column and replace with space.
UnemploymentRate$Country <- sub("\\.{2}",", ", UnemploymentRate$Country)
UnemploymentRate$Country <- sub("\\."," ", UnemploymentRate$Country)

head(UnemploymentRate,n=15)
##    Year   Country Unemployment_Rate
## 1  2011 Argentina          7.153890
## 2  2012 Argentina          7.214755
## 3  2013 Argentina          7.076472
## 4  2014 Argentina          7.270868
## 5  2015 Argentina          6.611389
## 6  2011 Australia          5.084689
## 7  2012 Australia          5.223705
## 8  2013 Australia          5.671072
## 9  2014 Australia          6.087930
## 10 2015 Australia          6.056494
## 11 2011   Austria          6.724852
## 12 2012   Austria          6.981256
## 13 2013   Austria          7.615858
## 14 2014   Austria          8.366916
## 15 2015   Austria          9.103351
tail(UnemploymentRate,n=15)
##     Year       Country Unemployment_Rate
## 346 2011 Venezuela, RB          8.186364
## 347 2012 Venezuela, RB          7.805915
## 348 2013 Venezuela, RB          7.525147
## 349 2014 Venezuela, RB          6.933782
## 350 2015 Venezuela, RB          6.832846
## 351 2011       Vietnam          2.220000
## 352 2012       Vietnam          1.960000
## 353 2013       Vietnam          2.180000
## 354 2014       Vietnam          2.100000
## 355 2015       Vietnam          2.330000
## 356 2011  South Africa         24.788550
## 357 2012  South Africa         24.880960
## 358 2013  South Africa         24.733880
## 359 2014  South Africa         25.104750
## 360 2015  South Africa         25.339290

Analyze the clean data.

## Average annual unemployment rate for each country between year 2011 and 2015: Table 1.
UnemploymentRate_Mean <- UnemploymentRate %>% group_by(Country) %>% summarize(Average=round(mean(Unemployment_Rate), digits=2))

## The average of the overall country means.
UnemploymentRate_AvgMean <- mean(UnemploymentRate_Mean$Average)

## Standard deviation of the overall country means.
UnemploymentRate_StdMean <- sd(UnemploymentRate_Mean$Average)

## Plot a histogram to show the distribution of the average annual unemployment rate:- Figure 1.
hist(UnemploymentRate_Mean$Average, main="Average Annual Unemployment Rate Distribution", xlab="Mean", ylab="Frequency", ylim=c(0,25), col="hotpink", breaks = 15)

## Using a density histogram to overlay a standard normal curve over the histogram:- Figure 2.
hist(UnemploymentRate_Mean$Average, main="Average Annual Unemployment Rate Distribution", xlab="Mean", ylab="Density",  col="hotpink", breaks = 15, probability=TRUE)
x <- 0:20
y <- dnorm(x=x, mean=UnemploymentRate_AvgMean, sd=UnemploymentRate_StdMean)
lines(x=x,y=y, col="blue")

## Plot a normal Q-Q Plot to further show that the distribution of the average annual unemployment rate is right skewed:- Figure 3.
qqnorm(UnemploymentRate_Mean$Average)
qqline(UnemploymentRate_Mean$Average)

Conclusions:

From the figure 1 we can see that the distribution of average annual unemployment rate is right skewed. From the figure 2, we can also see that a normal distribution does not properly overlay on the histogram. Furthermore, in Q-Q plot all the points have to fall very close to the theoretical line to be considered as normally distributed. But in the figure 3, we can see that many points are not closely follow the theoretical line and with a clear curvature in the points. This could be due to the regular discrete values in the dataset. Therefore, the average annual unemployment rate may not appear to come from a normal distribution
The right skewed distribution means that the average annual unemployment rates fall toward the lower side of the scale. Therefore, we can conclude that the overall trend of the world’s annual unemployment rate was declining.