OECD Nations Unemployment & Union Membership

Author

Allan Maino Vieytes

##Introductory Essay

The Data-set I chose is on Unemployment Percentages in Organisation for Economic Co-operation and Development (OECD) nations, with Male, Female and Total being the three main variables. I hope to explore the relationship between Male and Female unemployment, most notably, any major discrepancies between the two and any major events that may have caused them. The source for my data comes directly from OECD. I further plan to incorporate Union Participation data into my project with the hopes of finding some sort of correlation between the two. The union participation data would also come from OECD. ## Loading in Packages and Data

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(viridis)

Loading required package: viridisLite

library(ggforce)

Warning: package 'ggforce' was built under R version 4.3.3

library(wesanderson)

Warning: package 'wesanderson' was built under R version 4.3.3

setwd( "E:/data-110" )
union <- read_csv( "union.mem.csv" )

Rows: 1643 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): location
dbl (2): year, union.per

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

unemp <- read_csv( "unemp.OECD.csv" )

Rows: 14958 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): location, measure.type, measure, Sex, une/pop, Unit multiplier, Dec...
dbl (4): year, unemp.per, UNIT_MULT, DECIMALS

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

egaldem <- read_csv("egal.dem.OECD.csv")

Rows: 33808 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): location, Code, region
dbl (2): year, egaldem.per

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Joining Unemployment and Union Data-frames

Joined union and unemployment data-frames using left_join to preserve as much data as possible while joining by location and year.
Then added new data-set on egalitarian Democracy rates around the world

union.unemp <- left_join( union, unemp, by = c( "location", "year" )) # Joining by location and year,
union.unemp.egal <- left_join(union.unemp, egaldem, by = c( "location", "year" )) # Joined the new dataframe with egaldem data

Filtering & Creating New Data-frame

Filtered for certain countries and for female and male and only calling for percentage as there are also population metrics.

union.unemp.2 <- union.unemp %>%
  filter( Sex %in% c( "Female", "Male" ), # White-listing Male and Female 
          measure == "percentage", # White-listing percentage
          location %in% c( "United States", "OECD", "Sweden", "Italy", "Netherlands", "Finland" ) ) # Selecting desired locations
head(union.unemp.2)

# A tibble: 6 × 12
  location  year union.per measure.type measure    Sex    `une/pop` unemp.per
  <chr>    <dbl>     <dbl> <chr>        <chr>      <chr>  <chr>         <dbl>
1 Finland   1963      37.4 UNE_RATE     percentage Female UNE           0.861
2 Finland   1963      37.4 UNE_RATE     percentage Male   UNE           2.02 
3 Finland   1964      37   UNE_RATE     percentage Female UNE           0.747
4 Finland   1964      37   UNE_RATE     percentage Male   UNE           2.14 
5 Finland   1965      38.3 UNE_RATE     percentage Female UNE           1.05 
6 Finland   1965      38.3 UNE_RATE     percentage Male   UNE           1.65 
# ℹ 4 more variables: UNIT_MULT <dbl>, `Unit multiplier` <chr>, DECIMALS <dbl>,
#   Decimals <chr>

Cor Co

cor( union.unemp.egal$union.per, union.unemp.egal$egaldem.per, use = "complete.obs" ) #Provides the correlation Coefficient

[1] 0.3516027

We find a moderate positive correlation

Regression Model

fit1 <- lm(egaldem.per ~ union.per, data = union.unemp.egal) #Fits the LR Model
summary(fit1) # Summary of the model


Call:
lm(formula = egaldem.per ~ union.per, data = union.unemp.egal)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.70734 -0.02406  0.02774  0.05633  0.12244 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.923e-01  1.891e-03  366.06   <2e-16 ***
union.per   1.757e-03  4.556e-05   38.57   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1004 on 10544 degrees of freedom
  (1417 observations deleted due to missingness)
Multiple R-squared:  0.1236,    Adjusted R-squared:  0.1235 
F-statistic:  1487 on 1 and 10544 DF,  p-value: < 2.2e-16

\[Y_{i}=\beta_{0}+\beta_{1}X_{i}\] Where \(Y_{i}\) is the egalitarian democracy index score of the \(i^{th}\) observation, \(\beta_{0}\) is the intercept, \(\beta_{1}\) is the slope, and \(X_{i}\) is the union participation percentage of the \(i^{th}\) observation. \[Y_{i}=6.923E-01+(1.757E-03)X_{i}\]

Plotting it all out: Area Plot

p.1 <- ggplot( data = union.unemp.2, # Loaded union.unemp.2 into ggplot
              mapping = aes( x = year, #  Applied year to the X-axis
                            y = unemp.per, # Applied unemployment percentage to the Y-axis
                            fill = Sex )) + # Applied Sex to fill in the area
    geom_area ( color = wes_palette( "GrandBudapest2", 1 ) ) + # Adds the area plot layer
    facet_wrap( ~ factor( location, levels = c( "OECD", "United States", "Finland", # creates the facets based on location
                                                "Italy", "Netherlands", "Sweden"))) + # levels allows the ordering
    labs( title = "Male v.s Female Unemployment: Women Have it Worse", # Labels title
        x = "Year (1960-2021)", # Labels x axis
        y = "Unemployment (Percent)", # Labels y axis
        caption = "Source: Organisation for Economic Co-operation and Development (OECD)" ) + # Adds Source
    theme_linedraw() + # sets the theme for the graphs, it is the reason they are dark
    theme(
        aspect.ratio =0.8, # Made the overall size of the visualization smaller
        axis.title.x = element_text( size=14 ), # Changes size of X-axis Label
        axis.title.y = element_text( size=14 ), # Changes size of Y-axis Label
        axis.text = element_text( size = 9 ), # changes axes text sizes
        legend.background = element_blank(), # Makes the background of the legend box blank
        legend.box.background = element_rect( color = "black" ),
        legend.position = c( .768, 0.305 ), # Changes legend position
        legend.title = element_text( size = 8.5, face = "bold" ), # Changes the legend title text size
        legend.text=element_text( size = 8.5 ),# Changes the legend text size
        plot.title = element_text( size = 17, # Changes size of Title
                                   face = "bold", # Boldens Title
                                   hjust = 0.5, ), # Centers the Title to the Plots
        plot.caption = element_text( hjust = 0.5, # Centers the caption to the plots
                                    face = "italic" ), # Italicizes the caption
        panel.spacing = unit( 1, "lines" ), # Spreads out the facets plots
        strip.background = element_rect( color = "black", fill = wes_palette( "Chevalier1" ), linetype="solid" )) # Colored the name plates above each graph with Wes Anderson color pallet
       
p.1

##Final Thoughts I found i did not need to clean the data too much, i spent a good amount of time manually trimming columns that were unnecessary in excel and further filtered the data in R. The visualization demonstrates the continuation of gender gaps as it pertains to employment, especially in the case of Italy. An interesting pattern i found was in regards to Sweden and Finland as they both experienced a very similar uptick in overall unemployment at repetitively the same time (mid 1990’s). After further investigation i found a research article pertaining to this phenomenon that was apparently caused by unfavorable policy and getting too close to the non-accelerating inflation rate of unemployment or NAIRU. Another discover was found in the union participation rates data set, with regards to Estonia’s extremely high rate (~95%) of participation, and its incredibly rapid fall after a few years to a measly ~5%. This was in part due to the disillusionment of the USSR amid other issues. I would have loved to make more visualizations, although i was heavily focused on making the one. I also attempted to make another using multiple variables, which were both percentages, on the same graph but could not find a reliable way of doing so. This is something I would like to explore in the future alongside mastering facet plots. I would have loved to include a sort of timeline of major events that may have caused certain issues (such as Ronald Reagan being elected, or his anti union policies and firings). Overall i thoroughly enjoyed working on this project and am excited for more in the future.

Filtering & Creating New Dataframe #2

union.unemp.3 <- union.unemp.egal %>%
  mutate(egaldem.per = ( egaldem.per * 100)) %>% # Multiplies the egal-dem decimal notation by 100 to change to percentage
  filter( Sex %in% c( "Female", "Male" ), # White-listing Male and Female 
          measure == "percentage", # White-listing percentage
          location %in% c( "United States", "OECD", "Sweden", "Italy", "Estonia", "Finland" ) )  # Selecting desired locations
head(union.unemp.3)

# A tibble: 6 × 15
  location  year union.per measure.type measure    Sex    `une/pop` unemp.per
  <chr>    <dbl>     <dbl> <chr>        <chr>      <chr>  <chr>         <dbl>
1 Finland   1963      37.4 UNE_RATE     percentage Female UNE           0.861
2 Finland   1963      37.4 UNE_RATE     percentage Male   UNE           2.02 
3 Finland   1964      37   UNE_RATE     percentage Female UNE           0.747
4 Finland   1964      37   UNE_RATE     percentage Male   UNE           2.14 
5 Finland   1965      38.3 UNE_RATE     percentage Female UNE           1.05 
6 Finland   1965      38.3 UNE_RATE     percentage Male   UNE           1.65 
# ℹ 7 more variables: UNIT_MULT <dbl>, `Unit multiplier` <chr>, DECIMALS <dbl>,
#   Decimals <chr>, Code <chr>, egaldem.per <dbl>, region <chr>

Plotting it all out 2: Union Participation

union.unemp.4 <- union.unemp.egal %>%
  filter( year %in% c( "1990", "1995", "2000", "2005", "2010", "2015" ))

ggplot( union.unemp.4, mapping = aes( x = egaldem.per, y = union.per)) +
  geom_point() +
 facet_wrap(vars(year))

Warning: Removed 207 rows containing missing values (`geom_point()`).