The Data-set I chose is on Unemployment Percentages in Organisation for Economic Co-operation and Development (OECD) nations, with Male, Female and Total being the three main variables. I hope to explore the relationship between Male and Female unemployment, most notably, any major discrepancies between the two and any major events that may have caused them. The source for my data comes directly from OECD. I further plan to incorporate Union Participation data into my project with the hopes of finding some sort of correlation between the two. The union participation data would also come from OECD. ## Loading in Packages and Data
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(viridis)
Loading required package: viridisLite
library(ggforce)
Warning: package 'ggforce' was built under R version 4.3.3
library(wesanderson)
Warning: package 'wesanderson' was built under R version 4.3.3
Rows: 1643 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): location
dbl (2): year, union.per
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
unemp <-read_csv( "unemp.OECD.csv" )
Rows: 14958 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): location, measure.type, measure, Sex, une/pop, Unit multiplier, Dec...
dbl (4): year, unemp.per, UNIT_MULT, DECIMALS
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
egaldem <-read_csv("egal.dem.OECD.csv")
Rows: 33808 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): location, Code, region
dbl (2): year, egaldem.per
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining Unemployment and Union Data-frames
Joined union and unemployment data-frames using left_join to preserve as much data as possible while joining by location and year.
Then added new data-set on egalitarian Democracy rates around the world
union.unemp <-left_join( union, unemp, by =c( "location", "year" )) # Joining by location and year,union.unemp.egal <-left_join(union.unemp, egaldem, by =c( "location", "year" )) # Joined the new dataframe with egaldem data
Filtering & Creating New Data-frame
Filtered for certain countries and for female and male and only calling for percentage as there are also population metrics.
# A tibble: 6 × 12
location year union.per measure.type measure Sex `une/pop` unemp.per
<chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <dbl>
1 Finland 1963 37.4 UNE_RATE percentage Female UNE 0.861
2 Finland 1963 37.4 UNE_RATE percentage Male UNE 2.02
3 Finland 1964 37 UNE_RATE percentage Female UNE 0.747
4 Finland 1964 37 UNE_RATE percentage Male UNE 2.14
5 Finland 1965 38.3 UNE_RATE percentage Female UNE 1.05
6 Finland 1965 38.3 UNE_RATE percentage Male UNE 1.65
# ℹ 4 more variables: UNIT_MULT <dbl>, `Unit multiplier` <chr>, DECIMALS <dbl>,
# Decimals <chr>
Cor Co
cor( union.unemp.egal$union.per, union.unemp.egal$egaldem.per, use ="complete.obs" ) #Provides the correlation Coefficient
[1] 0.3516027
We find a moderate positive correlation
Regression Model
fit1 <-lm(egaldem.per ~ union.per, data = union.unemp.egal) #Fits the LR Modelsummary(fit1) # Summary of the model
Call:
lm(formula = egaldem.per ~ union.per, data = union.unemp.egal)
Residuals:
Min 1Q Median 3Q Max
-0.70734 -0.02406 0.02774 0.05633 0.12244
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.923e-01 1.891e-03 366.06 <2e-16 ***
union.per 1.757e-03 4.556e-05 38.57 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1004 on 10544 degrees of freedom
(1417 observations deleted due to missingness)
Multiple R-squared: 0.1236, Adjusted R-squared: 0.1235
F-statistic: 1487 on 1 and 10544 DF, p-value: < 2.2e-16
\[Y_{i}=\beta_{0}+\beta_{1}X_{i}\] Where \(Y_{i}\) is the egalitarian democracy index score of the \(i^{th}\) observation, \(\beta_{0}\) is the intercept, \(\beta_{1}\) is the slope, and \(X_{i}\) is the union participation percentage of the \(i^{th}\) observation. \[Y_{i}=6.923E-01+(1.757E-03)X_{i}\]
Plotting it all out: Area Plot
p.1<-ggplot( data = union.unemp.2, # Loaded union.unemp.2 into ggplotmapping =aes( x = year, # Applied year to the X-axisy = unemp.per, # Applied unemployment percentage to the Y-axisfill = Sex )) +# Applied Sex to fill in the areageom_area ( color =wes_palette( "GrandBudapest2", 1 ) ) +# Adds the area plot layerfacet_wrap( ~factor( location, levels =c( "OECD", "United States", "Finland", # creates the facets based on location"Italy", "Netherlands", "Sweden"))) +# levels allows the orderinglabs( title ="Male v.s Female Unemployment: Women Have it Worse", # Labels titlex ="Year (1960-2021)", # Labels x axisy ="Unemployment (Percent)", # Labels y axiscaption ="Source: Organisation for Economic Co-operation and Development (OECD)" ) +# Adds Sourcetheme_linedraw() +# sets the theme for the graphs, it is the reason they are darktheme(aspect.ratio =0.8, # Made the overall size of the visualization smalleraxis.title.x =element_text( size=14 ), # Changes size of X-axis Labelaxis.title.y =element_text( size=14 ), # Changes size of Y-axis Labelaxis.text =element_text( size =9 ), # changes axes text sizeslegend.background =element_blank(), # Makes the background of the legend box blanklegend.box.background =element_rect( color ="black" ),legend.position =c( .768, 0.305 ), # Changes legend positionlegend.title =element_text( size =8.5, face ="bold" ), # Changes the legend title text sizelegend.text=element_text( size =8.5 ),# Changes the legend text sizeplot.title =element_text( size =17, # Changes size of Titleface ="bold", # Boldens Titlehjust =0.5, ), # Centers the Title to the Plotsplot.caption =element_text( hjust =0.5, # Centers the caption to the plotsface ="italic" ), # Italicizes the captionpanel.spacing =unit( 1, "lines" ), # Spreads out the facets plotsstrip.background =element_rect( color ="black", fill =wes_palette( "Chevalier1" ), linetype="solid" )) # Colored the name plates above each graph with Wes Anderson color palletp.1
##Final Thoughts I found i did not need to clean the data too much, i spent a good amount of time manually trimming columns that were unnecessary in excel and further filtered the data in R. The visualization demonstrates the continuation of gender gaps as it pertains to employment, especially in the case of Italy. An interesting pattern i found was in regards to Sweden and Finland as they both experienced a very similar uptick in overall unemployment at repetitively the same time (mid 1990’s). After further investigation i found a research article pertaining to this phenomenon that was apparently caused by unfavorable policy and getting too close to the non-accelerating inflation rate of unemployment or NAIRU. Another discover was found in the union participation rates data set, with regards to Estonia’s extremely high rate (~95%) of participation, and its incredibly rapid fall after a few years to a measly ~5%. This was in part due to the disillusionment of the USSR amid other issues. I would have loved to make more visualizations, although i was heavily focused on making the one. I also attempted to make another using multiple variables, which were both percentages, on the same graph but could not find a reliable way of doing so. This is something I would like to explore in the future alongside mastering facet plots. I would have loved to include a sort of timeline of major events that may have caused certain issues (such as Ronald Reagan being elected, or his anti union policies and firings). Overall i thoroughly enjoyed working on this project and am excited for more in the future.
Filtering & Creating New Dataframe #2
union.unemp.3<- union.unemp.egal %>%mutate(egaldem.per = ( egaldem.per *100)) %>%# Multiplies the egal-dem decimal notation by 100 to change to percentagefilter( Sex %in%c( "Female", "Male" ), # White-listing Male and Female measure =="percentage", # White-listing percentage location %in%c( "United States", "OECD", "Sweden", "Italy", "Estonia", "Finland" ) ) # Selecting desired locationshead(union.unemp.3)
# A tibble: 6 × 15
location year union.per measure.type measure Sex `une/pop` unemp.per
<chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <dbl>
1 Finland 1963 37.4 UNE_RATE percentage Female UNE 0.861
2 Finland 1963 37.4 UNE_RATE percentage Male UNE 2.02
3 Finland 1964 37 UNE_RATE percentage Female UNE 0.747
4 Finland 1964 37 UNE_RATE percentage Male UNE 2.14
5 Finland 1965 38.3 UNE_RATE percentage Female UNE 1.05
6 Finland 1965 38.3 UNE_RATE percentage Male UNE 1.65
# ℹ 7 more variables: UNIT_MULT <dbl>, `Unit multiplier` <chr>, DECIMALS <dbl>,
# Decimals <chr>, Code <chr>, egaldem.per <dbl>, region <chr>