Untidy unemployment rates data

John Kellogg

2019-10-06


Untidy unemployment rates data

Found by: Erinda Budo

Client Comments:

The dataset I chose was taken from the World Bank Global Economic Monitor. This dataset contains unemployment rates from 88 countries from year 1990 through 2017.It can be found: https://github.com/ErindaB/Other The data will be transformed from wide to long format.Many columns need to be renamed and some blank ones need to be removed.The analysis will investigate annual unemployment rates from 2011 to 2015 of 71 countries and it will asnwer questions:1)For the five year period from 2011 to 2015, what’s the average annual unemployment rate of each country?2)For the five year period from 2011 to 2015, what’s the distribution of the average annual unemployment rate?3)For the five year period from 2011 to 2015, what’s the overall trend of the world’s annual unemployment rate?

Analyst Actions:

The data came in an Excel XLSX file with two tabs. I wanted those imported seperately into R. Prior to the first chunk below, I saved each tab into their own CSV. No other modifications were made to the data file.

As a backup to the file provided by the client, I found the dataset on the World Bank website (https://datacatalog.worldbank.org/dataset/global-economic-monitor)

##      X Advanced.Economies Argentina Australia  Austria  Belgium Bulgaria
## 1   NA                 NA        NA        NA       NA       NA       NA
## 2 1990           5.800582        NA  6.943297 5.373002 6.550260       NA
## 3 1991           6.728688        NA  9.614137 5.823096 6.439812       NA
## 4 1992           7.511064        NA 10.750080 5.941711 7.088092 13.23500
## 5 1993           7.936175        NA 10.866170 6.811381 8.619130 15.85583
##   Bahrain Belarus Brazil   Canada Switzerland Chile China Colombia Cyprus
## 1      NA      NA     NA       NA          NA    NA    NA       NA     NA
## 2      NA      NA     NA  8.15000    0.501328    NA    NA       NA     NA
## 3      NA      NA     NA 10.31667    1.090451    NA    NA       NA     NA
## 4      NA      NA     NA 11.21667    2.563105    NA    NA       NA     NA
## 5      NA      NA     NA 11.37500    4.516116    NA    NA       NA     NA
##   Czech.Republic  Germany Denmark Dominican.Republic Algeria
## 1             NA       NA      NA                 NA      NA
## 2             NA       NA      NA                 NA    25.0
## 3             NA 4.864885      NA                 NA    25.0
## 4             NA 5.764563      NA                 NA    27.0
## 5       4.333333 6.931370      NA                 NA    23.2
##   EMDE.East.Asia...Pacific EMDE.Europe...Central.Asia Ecuador
## 1                       NA                         NA      NA
## 2                       NA                         NA      NA
## 3                       NA                         NA      NA
## 4                       NA                         NA      NA
## 5                       NA                         NA      NA
##   Egypt..Arab.Rep. Emerging.Market.and.Developing.Economies..EMDEs.
## 1               NA                                               NA
## 2               NA                                               NA
## 3               NA                                               NA
## 4               NA                                               NA
## 5               NA                                               NA
##      Spain Estonia   Finland France United.Kingdom Greece
## 1       NA      NA        NA     NA             NA     NA
## 2 15.48333   0.650  3.103129  7.625       7.091667     NA
## 3 15.51667   1.475  6.666424  7.800       8.825000     NA
## 4 17.06667   3.725 11.796830  8.650       9.966667     NA
## 5 20.83333   6.550 16.384210  9.650      10.400000     NA
##   High.Income.Countries Hong.Kong.SAR..China Croatia Hungary India
## 1                    NA                   NA      NA      NA    NA
## 2              5.619945             1.318868      NA      NA    NA
## 3              6.771918             1.750180      NA      NA    NA
## 4              7.693434             1.946343      NA      NA    NA
## 5              8.192391             1.979785      NA      NA    NA
##    Ireland Iceland Israel Italy Jordan    Japan Kazakhstan Korea..Rep.
## 1       NA      NA     NA    NA     NA       NA         NA          NA
## 2 13.41667      NA     NA    NA     NA 2.108117         NA          NA
## 3 14.73333      NA     NA    NA     NA 2.099018         NA          NA
## 4 15.40000      NA     NA    NA     NA 2.151389         NA          NA
## 5 15.63333      NA     NA    NA   19.7 2.503291         NA          NA
##   EMDE.Latin.America...Caribbean Low.Income.Countries..LIC. Sri.Lanka
## 1                             NA                         NA        NA
## 2                             NA                         NA      15.9
## 3                             NA                         NA      14.7
## 4                             NA                         NA      14.6
## 5                             NA                         NA      13.8
##   Lithuania Luxembourg   Latvia Morocco Moldova..Rep. Mexico
## 1        NA         NA       NA      NA            NA     NA
## 2        NA         NA       NA      NA            NA     NA
## 3        NA         NA       NA      NA            NA     NA
## 4        NA         NA       NA      NA            NA     NA
## 5  4.191667         NA 4.658333      NA            NA     NA
##   Middle.Income.Countries..MIC. North.Macedonia Malta
## 1                            NA              NA    NA
## 2                            NA              NA    NA
## 3                            NA              NA    NA
## 4                            NA              NA    NA
## 5                            NA              NA    NA
##   EMDE.Middle.East...N..Africa Netherlands   Norway New.Zealand Pakistan
## 1                           NA          NA       NA          NA       NA
## 2                           NA          NA 5.783333    7.984591     3.13
## 3                           NA          NA 6.041667   10.611440     6.28
## 4                           NA          NA 6.550000   10.644730     5.85
## 5                           NA          NA 6.608333    9.800159     4.73
##   Peru Philippines    Poland Portugal  Romania Russian.Federation
## 1   NA          NA        NA       NA       NA                 NA
## 2   NA          NA  3.441667       NA       NA                 NA
## 3   NA      10.475  9.008333       NA       NA                 NA
## 4   NA       9.850 12.933330       NA 5.450000                 NA
## 5   NA       9.350 15.033330       NA 9.208333                 NA
##   EMDE.South.Asia Saudi.Arabia Singapore EMDE.Sub.Saharan.Africa Slovakia
## 1              NA           NA        NA                      NA       NA
## 2              NA           NA        NA                      NA       NA
## 3              NA           NA     1.750                      NA  7.05000
## 4              NA           NA     1.800                      NA 11.31833
## 5              NA           NA     1.675                      NA 12.85500
##   Slovenia    Sweden Thailand Tunisia Turkey Taiwan..China Uruguay
## 1       NA        NA       NA      NA     NA            NA      NA
## 2       NA  2.239701       NA      NA     NA      1.658333      NA
## 3       NA  4.005607       NA      NA     NA      1.533333      NA
## 4 11.56667  7.110956       NA      NA     NA      1.500000      NA
## 5 14.57500 11.146890       NA      NA     NA      1.425000      NA
##   United.States Venezuela..RB Vietnam World..WBG.members. South.Africa
## 1            NA            NA      NA                  NA           NA
## 2      5.616667            NA      NA                  NA           NA
## 3      6.850000            NA      NA                  NA           NA
## 4      7.491667            NA      NA                  NA           NA
## 5      6.908333            NA      NA                  NA           NA
##         X Advanced.Economies Argentina Australia  Austria  Belgium
## 1                         NA        NA        NA       NA       NA
## 2 1990M01           5.810216        NA  6.213296 5.306584 6.831543
## 3 1990M02           5.745122        NA  6.406219 5.145882 6.749932
## 4 1990M03           5.622835        NA  6.226480 4.967532 6.670389
## 5 1990M04           5.750094        NA  6.348445 5.055378 6.574611
##   Bulgaria Bahrain Belarus Brazil Canada Switzerland Chile China Colombia
## 1       NA      NA      NA     NA     NA          NA    NA    NA       NA
## 2       NA      NA      NA     NA    7.9    0.468379    NA    NA       NA
## 3       NA      NA      NA     NA    7.7    0.456800    NA    NA       NA
## 4       NA      NA      NA     NA    7.3    0.445170    NA    NA       NA
## 5       NA      NA      NA     NA    7.6    0.438875    NA    NA       NA
##   Cyprus Czech.Republic Germany Denmark Dominican.Republic Algeria
## 1     NA             NA      NA      NA                 NA      NA
## 2     NA             NA      NA      NA                 NA      25
## 3     NA             NA      NA      NA                 NA      25
## 4     NA             NA      NA      NA                 NA      25
## 5     NA             NA      NA      NA                 NA      25
##   EMDE.East.Asia...Pacific EMDE.Europe...Central.Asia Ecuador
## 1                       NA                         NA      NA
## 2                       NA                         NA      NA
## 3                       NA                         NA      NA
## 4                       NA                         NA      NA
## 5                       NA                         NA      NA
##   Egypt..Arab.Rep. Emerging.Market.and.Developing.Economies..EMDEs. Spain
## 1               NA                                               NA    NA
## 2               NA                                               NA  15.9
## 3               NA                                               NA  15.8
## 4               NA                                               NA  15.6
## 5               NA                                               NA  15.6
##   Estonia  Finland   France United.Kingdom Greece High.Income.Countries
## 1      NA       NA       NA             NA     NA                    NA
## 2     0.7 2.856077 7.751852            6.9     NA              5.482747
## 3     0.7 2.615326 7.696296            6.9     NA              5.452225
## 4     0.7 3.135250 7.651852            6.9     NA              5.383208
## 5     0.6 3.842323 7.618519            6.9     NA              5.539168
##   Hong.Kong.SAR..China Croatia Hungary India Ireland Iceland Israel Italy
## 1                   NA      NA      NA    NA      NA      NA     NA    NA
## 2             1.241434      NA      NA    NA    13.8      NA     NA    NA
## 3             1.323013      NA      NA    NA    13.7      NA     NA    NA
## 4             1.346259      NA      NA    NA    13.4      NA     NA    NA
## 5             1.302749      NA      NA    NA    13.2      NA     NA    NA
##   Jordan    Japan Kazakhstan Korea..Rep. EMDE.Latin.America...Caribbean
## 1     NA       NA         NA          NA                             NA
## 2     NA 2.228275         NA          NA                             NA
## 3     NA 2.224679         NA          NA                             NA
## 4     NA 1.991986         NA          NA                             NA
## 5     NA 2.155126         NA          NA                             NA
##   Low.Income.Countries..LIC. Sri.Lanka Lithuania Luxembourg Latvia Morocco
## 1                         NA        NA        NA         NA     NA      NA
## 2                         NA      15.9        NA         NA     NA      NA
## 3                         NA      15.9        NA         NA     NA      NA
## 4                         NA      15.9        NA         NA     NA      NA
## 5                         NA      15.9        NA         NA     NA      NA
##   Moldova..Rep. Mexico Middle.Income.Countries..MIC. North.Macedonia Malta
## 1            NA     NA                            NA              NA    NA
## 2            NA     NA                            NA              NA    NA
## 3            NA     NA                            NA              NA    NA
## 4            NA     NA                            NA              NA    NA
## 5            NA     NA                            NA              NA    NA
##   EMDE.Middle.East...N..Africa Netherlands Norway New.Zealand Pakistan
## 1                           NA          NA     NA          NA       NA
## 2                           NA          NA    6.0    7.185925     3.13
## 3                           NA          NA    5.9    7.161031     3.13
## 4                           NA          NA    5.8    7.200287     3.13
## 5                           NA          NA    5.6    7.597992     3.13
##   Peru Philippines Poland Portugal Romania Russian.Federation
## 1   NA          NA     NA       NA      NA                 NA
## 2   NA         9.6    0.1       NA      NA                 NA
## 3   NA         9.6    0.7       NA      NA                 NA
## 4   NA         9.6    1.6       NA      NA                 NA
## 5   NA          NA    2.3       NA      NA                 NA
##   EMDE.South.Asia Saudi.Arabia Singapore EMDE.Sub.Saharan.Africa Slovakia
## 1              NA           NA        NA                      NA       NA
## 2              NA           NA        NA                      NA       NA
## 3              NA           NA        NA                      NA       NA
## 4              NA           NA        NA                      NA       NA
## 5              NA           NA        NA                      NA       NA
##   Slovenia   Sweden Thailand Tunisia Turkey Taiwan..China Uruguay
## 1       NA       NA       NA      NA     NA            NA      NA
## 2       NA 2.182138       NA      NA     NA           1.5      NA
## 3       NA 1.959847       NA      NA     NA           1.5      NA
## 4       NA 1.976685       NA      NA     NA           1.6      NA
## 5       NA 1.847875       NA      NA     NA           1.5      NA
##   United.States Venezuela..RB Vietnam World..WBG.members. South.Africa
## 1            NA            NA      NA                  NA           NA
## 2           5.4            NA      NA                  NA           NA
## 3           5.3            NA      NA                  NA           NA
## 4           5.2            NA      NA                  NA           NA
## 5           5.4            NA      NA                  NA           NA

Tidy the data

Client Comments:

The data will be transformed from wide to long format. Many columns need to be renamed and some blank ones need to be removed.

Analyst Actions:

Both raw sets were processed through tidyr procedures switching them from wide to long formats. Blank column on row 2 of files were removed. The import of the data did not keep the names accurate. They will be renamed as they are the region or country names. We will need to define the Aggregates vs. Countries.

  • Chunk for initial tidy of the Year data
# rename the column names
colnames(data_raw_year) <-c("Year","Advanced.Economies","Argentina","Australia","Austria","Belgium","Bulgaria","Bahrain","Belarus","Brazil","Canada","Switzerland","Chile","China","Colombia","Cyprus","Czech.Republic","Germany","Denmark","Dominican.Republic","Algeria","EMDE.East.Asia_Pacific","EMDE.Europe_Central.Asia","Ecuador","Egypt_ Arab.Rep.","Emerging.Market.Developing.Economies","Spain","Estonia","Finland","France","United.Kingdom","Greece","High.Income.Countries#HIC","Hong.Kong.SAR_ China","Croatia","Hungary","India","Ireland","Iceland","Israel","Italy","Jordan","Japan","Kazakhstan","Korea.Rep.","EMDE.Latin.America_Caribbean","Low.Income.Countries#LIC","Sri.Lanka","Lithuania","Luxembourg","Latvia","Morocco","Moldova.Rep.","Mexico","Middle.Income.Countries#MIC","North.Macedonia","Malta","EMDE.Middle East_N.Africa","Netherlands","Norway","New.Zealand","Pakistan","Peru","Philippines","Poland","Portugal","Romania","Russian.Federation","EMDE.South.Asia","Saudi.Arabia","Singapore","EMDE.Sub-Saharan.Africa","Slovakia","Slovenia","Sweden","Thailand","Tunisia","Turkey","Taiwan_China","Uruguay","United States","Venezuela_ RB","Vietnam","World^WBG.members","South Africa")

# gather the data into a tidy form
Year_raw <- gather(data_raw_year, "Country", "Value", 2:85)

#Filter out the blank values due to the blank top row on import
Year_Data <- Year_raw %>% 
  filter(!is.na(Value))

#Change the Year to numeric  
Year_Data$Year<- as.numeric(Year_Data$Year)

# QA dataframe
head(Year_Data,10)
##    Year            Country    Value
## 1  1990 Advanced.Economies 5.800582
## 2  1991 Advanced.Economies 6.728688
## 3  1992 Advanced.Economies 7.511064
## 4  1993 Advanced.Economies 7.936175
## 5  1994 Advanced.Economies 7.715897
## 6  1995 Advanced.Economies 7.264255
## 7  1996 Advanced.Economies 7.233725
## 8  1997 Advanced.Economies 6.942783
## 9  1998 Advanced.Economies 6.575284
## 10 1999 Advanced.Economies 6.260355
  • Chunk for initial tidy of the Month data
# rename the column names
colnames(data_raw_month) <-c("Year_Month","Advanced.Economies","Argentina","Australia","Austria","Belgium","Bulgaria","Bahrain","Belarus","Brazil","Canada","Switzerland","Chile","China","Colombia","Cyprus","Czech.Republic","Germany","Denmark","Dominican.Republic","Algeria","EMDE.East.Asia_Pacific","EMDE.Europe_Central.Asia","Ecuador","Egypt_ Arab.Rep.","Emerging.Market.Developing.Economies","Spain","Estonia","Finland","France","United.Kingdom","Greece","High.Income.Countries#HIC","Hong.Kong.SAR_ China","Croatia","Hungary","India","Ireland","Iceland","Israel","Italy","Jordan","Japan","Kazakhstan","Korea.Rep.","EMDE.Latin.America_Caribbean","Low.Income.Countries#LIC","Sri.Lanka","Lithuania","Luxembourg","Latvia","Morocco","Moldova.Rep.","Mexico","Middle.Income.Countries#MIC","North.Macedonia","Malta","EMDE.Middle East_N.Africa","Netherlands","Norway","New.Zealand","Pakistan","Peru","Philippines","Poland","Portugal","Romania","Russian.Federation","EMDE.South.Asia","Saudi.Arabia","Singapore","EMDE.Sub-Saharan.Africa","Slovakia","Slovenia","Sweden","Thailand","Tunisia","Turkey","Taiwan_China","Uruguay","United States","Venezuela_ RB","Vietnam","World^WBG.members_","South Africa")

# Gather the data into tidy form
Month_raw <- gather(data_raw_month, "Country", "Value", 2:85)

# Rename the first column
colnames(Month_raw) [colnames(Month_raw)=='X'] <-"Year_Month"

# Seperate the first column into seperate and filter out blank values 
Month_Data <- Month_raw %>% 
  filter(!is.na(Value))%>%
  separate(Year_Month, c("Year", "Month"), sep = "M")  

#Change the Year to numeric  
Month_Data$Year<- as.numeric(Month_Data$Year)

# QA dataframe
head(Month_Data,10)
##    Year Month            Country    Value
## 1  1990    01 Advanced.Economies 5.810216
## 2  1990    02 Advanced.Economies 5.745122
## 3  1990    03 Advanced.Economies 5.622835
## 4  1990    04 Advanced.Economies 5.750094
## 5  1990    05 Advanced.Economies 5.764293
## 6  1990    06 Advanced.Economies 5.648854
## 7  1990    07 Advanced.Economies 5.803347
## 8  1990    08 Advanced.Economies 5.900060
## 9  1990    09 Advanced.Economies 6.037763
## 10 1990    10 Advanced.Economies 6.099891
Analyst Actions:

For clarity and ability to analize accuratly, I think the datasets need to be broken into 4 seperate subsets (Economies, EMDE, Income, countries). These actions will need to be done on both the Year and Month data.

  • Set 1
    • Advanced Economies
    • Emerging Market and Developing Economies (EMDEs)
  • Set 2
    • EMDE East Asia & Pacific
    • EMDE Europe & Central Asia
    • EMDE Latin America & Caribbean
    • EMDE Middle East & N. Africa
    • EMDE South Asia
    • EMDE Sub-Saharan Africa
    • World (WBG members)
  • Set 3
    • High Income Countries
    • Low-Income Countries (LIC) (No values Present)
    • Middle-Income Countries (MIC)
  • Set 4
    • All Countries

Client Comments:

The analysis will investigate annual unemployment rates from 2011 to 2015 of 71 countries and it will answer questions:

  1. For the five year period from 2011 to 2015, what’s the average annual unemployment rate of each country?
  2. For the five year period from 2011 to 2015, what’s the distribution of the average annual unemployment rate?
  3. For the five year period from 2011 to 2015, what’s the overall trend of the world’s annual unemployment rate? ##### Analyst Actions: Defining each question into a seperate chunk

Client Questions

1. For the five year period from 2011 to 2015, what’s the average annual unemployment rate of each country?

Analyst Actions and Comments:

I think in order to understand the average annual unemployment rate of each country, we need to know the averages by sector first. The numbers start to have more meaning.

  • Median and Mean of Economic groups
  • Median and Mean of Emerging Market and Developing Economies (EMDEs) groups
  • Median and Mean of Countries
## # A tibble: 70 x 3
##    Country     mean median
##    <chr>      <dbl>  <dbl>
##  1 Algeria   10.5   10.6  
##  2 Argentina  7.07   7.15 
##  3 Australia  5.62   5.67 
##  4 Austria    7.76   7.62 
##  5 Bahrain    3.86   3.8  
##  6 Belarus    0.650  0.633
##  7 Belgium    8.05   8.46 
##  8 Brazil     7.43   7.19 
##  9 Bulgaria  10.6   11.1  
## 10 Canada     7.15   7.1  
## # ... with 60 more rows

2. For the five year period from 2011 to 2015, what’s the distribution of the average annual unemployment rate?

Analyst Actions and Comments:

First, we need to see if the distrbution follows a normal distrbution.

  • According to the graph, it does not follow a normal distrbution. Next, we need to determine the spread across the countries.

  • North Macedonia, Greece, South Africa, Spain, Croatia make up the top 5 highest rates of unemployment. Lets compare the top5 with the world numbers of all members.

3. For the five year period from 2011 to 2015, what’s the overall trend of the world’s annual unemployment rate?

  • The Trend is going down (Right Skewed). Since we are looking at median numbers across 70 countries, a drop of .4% is pretty good over all. As a follow up question, what is the rate trend in the previously identified top 5 countries?

  • As we see the Top 5 does’t have the same shape. There is an upwards trend (left skewed) until 2013 than it starts to taper off.

Further analysis can happen on 2013 data from the Month data to pin point when and which country(s) may have caused the rise.