This dataset was posted by classmate(Erinda Budo). It contains unemployment rates from 88 countries from year 1990 through 2017. The required analysis is to investigate annual unemployment rates from year 2011 to 2015 to answer the following questions:
(1) For the five year period from 2011 to 2015, what is the average annual unemployment rate for each country?
(2) For the five year period from 2011 to 2015, what is the distribution of the average annual unemployment rate?
(3) For the five year period from 2011 to 2015, what is the overall trend of the world’s annual unemployment rate?
Load the original data file from GitHub.
UnemploymentRate <- read.csv("https://raw.githubusercontent.com/SieSiongWong/DATA-607/master/UnemploymentRate.csv", header=TRUE, sep=",",)
Review the dataset.
head(UnemploymentRate)
## X Advanced.Economies Argentina Australia Austria Belgium Bulgaria
## 1 1990 5.800582 NA 6.943297 5.373002 6.550260 NA
## 2 1991 6.728688 NA 9.614137 5.823096 6.439812 NA
## 3 1992 7.511064 NA 10.750080 5.941711 7.088092 13.23500
## 4 1993 7.936175 NA 10.866170 6.811381 8.619130 15.85583
## 5 1994 7.715897 NA 9.705695 6.545480 9.753554 14.06583
## 6 1995 7.264255 NA 8.471058 6.589767 9.674164 11.38583
## Bahrain Belarus Brazil Canada Switzerland Chile China Colombia Cyprus
## 1 NA NA NA 8.150000 0.501328 NA NA NA NA
## 2 NA NA NA 10.316670 1.090451 NA NA NA NA
## 3 NA NA NA 11.216670 2.563105 NA NA NA NA
## 4 NA NA NA 11.375000 4.516116 NA NA NA NA
## 5 NA NA NA 10.391670 4.718465 NA NA NA NA
## 6 NA NA NA 9.466667 4.232892 NA NA NA NA
## Czech.Republic Germany Denmark Dominican.Republic Algeria
## 1 NA NA NA NA 25.0
## 2 NA 4.864885 NA NA 25.0
## 3 NA 5.764563 NA NA 27.0
## 4 4.333333 6.931370 NA NA 23.2
## 5 4.283333 7.340639 NA NA 24.4
## 6 4.033333 7.091997 NA NA 28.1
## EMDE.East.Asia...Pacific EMDE.Europe...Central.Asia Ecuador
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA 9.683161 NA
## Egypt..Arab.Rep. Emerging.Market.and.Developing.Economies..EMDEs.
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## Spain Estonia Finland France United.Kingdom Greece
## 1 15.48333 0.650 3.103129 7.625 7.091667 NA
## 2 15.51667 1.475 6.666424 7.800 8.825000 NA
## 3 17.06667 3.725 11.796830 8.650 9.966667 NA
## 4 20.83333 6.550 16.384210 9.650 10.400000 NA
## 5 22.05000 7.550 16.534420 10.250 9.500000 NA
## 6 20.79167 9.750 15.426480 9.675 8.658333 NA
## High.Income.Countries Hong.Kong.SAR..China Croatia Hungary India
## 1 5.619945 1.318868 NA NA NA
## 2 6.771918 1.750180 NA NA NA
## 3 7.693434 1.946343 NA NA NA
## 4 8.192391 1.979785 NA NA NA
## 5 8.052012 1.911372 NA NA NA
## 6 7.566117 3.040433 NA NA NA
## Ireland Iceland Israel Italy Jordan Japan Kazakhstan Korea..Rep.
## 1 13.41667 NA NA NA NA 2.108117 NA NA
## 2 14.73333 NA NA NA NA 2.099018 NA NA
## 3 15.40000 NA NA NA NA 2.151389 NA NA
## 4 15.63333 NA NA NA 19.7 2.503291 NA NA
## 5 14.35000 NA NA NA 15.8 2.890953 NA NA
## 6 12.28333 NA NA NA 15.3 3.153574 NA NA
## EMDE.Latin.America...Caribbean Low.Income.Countries..LIC. Sri.Lanka
## 1 NA NA 15.9
## 2 NA NA 14.7
## 3 NA NA 14.6
## 4 NA NA 13.8
## 5 NA NA 13.1
## 6 NA NA 12.3
## Lithuania Luxembourg Latvia Morocco Moldova..Rep. Mexico
## 1 NA NA NA NA NA NA
## 2 NA NA NA NA NA NA
## 3 NA NA NA NA NA NA
## 4 4.191667 NA 4.658333 NA NA NA
## 5 3.625000 NA 6.358333 NA NA NA
## 6 6.116667 2.600765 6.350000 NA NA NA
## Middle.Income.Countries..MIC. North.Macedonia Malta
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## EMDE.Middle.East...N..Africa Netherlands Norway New.Zealand Pakistan
## 1 NA NA 5.783333 7.984591 3.13
## 2 NA NA 6.041667 10.611440 6.28
## 3 NA NA 6.550000 10.644730 5.85
## 4 NA NA 6.608333 9.800159 4.73
## 5 NA NA 6.000000 8.342465 4.84
## 6 NA NA 5.441667 6.451948 5.37
## Peru Philippines Poland Portugal Romania Russian.Federation
## 1 NA NA 3.441667 NA NA NA
## 2 NA 10.475 9.008333 NA NA NA
## 3 NA 9.850 12.933330 NA 5.450000 NA
## 4 NA 9.350 15.033330 NA 9.208333 NA
## 5 NA 9.550 16.508330 NA 10.975000 7.006540
## 6 NA 9.500 15.225000 7.150996 9.975000 8.308334
## EMDE.South.Asia Saudi.Arabia Singapore EMDE.Sub.Saharan.Africa Slovakia
## 1 NA NA NA NA NA
## 2 NA NA 1.750 NA 7.05000
## 3 NA NA 1.800 NA 11.31833
## 4 NA NA 1.675 NA 12.85500
## 5 NA NA 1.725 NA 14.62917
## 6 NA NA 1.725 NA 13.68083
## Slovenia Sweden Thailand Tunisia Turkey Taiwan..China Uruguay
## 1 NA 2.239701 NA NA NA 1.658333 NA
## 2 NA 4.005607 NA NA NA 1.533333 NA
## 3 11.56667 7.110956 NA NA NA 1.500000 NA
## 4 14.57500 11.146890 NA NA NA 1.425000 NA
## 5 14.55000 10.766190 NA NA NA 1.566667 NA
## 6 14.04167 10.421390 NA NA NA 1.808333 NA
## United.States Venezuela..RB Vietnam World..WBG.members. South.Africa
## 1 5.616667 NA NA NA NA
## 2 6.850000 NA NA NA NA
## 3 7.491667 NA NA NA NA
## 4 6.908333 NA NA NA NA
## 5 6.100000 NA NA NA NA
## 6 5.591667 NA NA NA NA
Clean the data.
## Remove non-country columns..
UnemploymentRate <- UnemploymentRate[, -c(2,22,23,26,33,46,47,55,58,69,72,84)]
## Rename the columns 1.
UnemploymentRate <- UnemploymentRate %>% rename("Year"="X")
## Remove the rows which year before 2011 and year after 2015.
UnemploymentRate <- UnemploymentRate[-c(1:21,27:31),]
## Reorder the data frame row number.
rownames(UnemploymentRate) <- 1:nrow(UnemploymentRate)
Rehape the clean data.
## Convert the dataset to long form and also eliminate NAs.
UnemploymentRate <- UnemploymentRate %>% melt(UnemploymentRate, id.vars=c("Year"), measure.vars=2:ncol(UnemploymentRate), variable.name="Country", value.name="Unemployment_Rate", na.rm=TRUE)
## Removing dot in country column and replace with space.
UnemploymentRate$Country <- sub("\\.{2}",", ", UnemploymentRate$Country)
UnemploymentRate$Country <- sub("\\."," ", UnemploymentRate$Country)
head(UnemploymentRate,n=15)
## Year Country Unemployment_Rate
## 1 2011 Argentina 7.153890
## 2 2012 Argentina 7.214755
## 3 2013 Argentina 7.076472
## 4 2014 Argentina 7.270868
## 5 2015 Argentina 6.611389
## 6 2011 Australia 5.084689
## 7 2012 Australia 5.223705
## 8 2013 Australia 5.671072
## 9 2014 Australia 6.087930
## 10 2015 Australia 6.056494
## 11 2011 Austria 6.724852
## 12 2012 Austria 6.981256
## 13 2013 Austria 7.615858
## 14 2014 Austria 8.366916
## 15 2015 Austria 9.103351
tail(UnemploymentRate,n=15)
## Year Country Unemployment_Rate
## 346 2011 Venezuela, RB 8.186364
## 347 2012 Venezuela, RB 7.805915
## 348 2013 Venezuela, RB 7.525147
## 349 2014 Venezuela, RB 6.933782
## 350 2015 Venezuela, RB 6.832846
## 351 2011 Vietnam 2.220000
## 352 2012 Vietnam 1.960000
## 353 2013 Vietnam 2.180000
## 354 2014 Vietnam 2.100000
## 355 2015 Vietnam 2.330000
## 356 2011 South Africa 24.788550
## 357 2012 South Africa 24.880960
## 358 2013 South Africa 24.733880
## 359 2014 South Africa 25.104750
## 360 2015 South Africa 25.339290
Analyze the clean data.
## Average annual unemployment rate for each country between year 2011 and 2015: Table 1.
UnemploymentRate_Mean <- UnemploymentRate %>% group_by(Country) %>% summarize(Average=round(mean(Unemployment_Rate), digits=2))
## The average of the overall country means.
UnemploymentRate_AvgMean <- mean(UnemploymentRate_Mean$Average)
## Standard deviation of the overall country means.
UnemploymentRate_StdMean <- sd(UnemploymentRate_Mean$Average)
## Plot a histogram to show the distribution of the average annual unemployment rate:- Figure 1.
hist(UnemploymentRate_Mean$Average, main="Average Annual Unemployment Rate Distribution", xlab="Mean", ylab="Frequency", ylim=c(0,25), col="hotpink", breaks = 15)

## Using a density histogram to overlay a standard normal curve over the histogram:- Figure 2.
hist(UnemploymentRate_Mean$Average, main="Average Annual Unemployment Rate Distribution", xlab="Mean", ylab="Density", col="hotpink", breaks = 15, probability=TRUE)
x <- 0:20
y <- dnorm(x=x, mean=UnemploymentRate_AvgMean, sd=UnemploymentRate_StdMean)
lines(x=x,y=y, col="blue")

## Plot a normal Q-Q Plot to further show that the distribution of the average annual unemployment rate is right skewed:- Figure 3.
qqnorm(UnemploymentRate_Mean$Average)
qqline(UnemploymentRate_Mean$Average)

Conclusions:
From the figure 1 we can see that the distribution of average annual unemployment rate is right skewed. From the figure 2, we can also see that a normal distribution does not properly overlay on the histogram. Furthermore, in Q-Q plot all the points have to fall very close to the theoretical line to be considered as normally distributed. But in the figure 3, we can see that many points are not closely follow the theoretical line and with a clear curvature in the points. This could be due to the regular discrete values in the dataset. Therefore, the average annual unemployment rate may not appear to come from a normal distribution
The right skewed distribution means that the average annual unemployment rates fall toward the lower side of the scale. Therefore, we can conclude that the overall trend of the world’s annual unemployment rate was declining.