Project 2 DATA 110

Author

Emma Poch

Source: Labour Rights Index, 2022

The data set that I selected reflects information from the GWIP (Global Work-Injury Policy Database) collected in 2020, pertaining to protections afforded to injured or otherwise incapacitated laborers. The primary variables relevant to this project are the ones pertaining to replacement rate (the percentage of an employee’s pre-tax salary with which they are compensated after being either temporarily or permanently incapacitated), duration of compensation (how long these benefits are offered), and overall work injury coverage (the percentage of the working population that benefits from these laws, as calculated by SIED). I specifically chose to use the replacement rates and durations for those who are permanently disabled, as I found this to be a better estimate of the extent to which these labor laws promote (or prohibit) social welfare. I believe that protection of labor rights and welfare is a valuable component of supporting human rights, and I was particularly interested in this data set as it provided historical and colonial context for the countries involved. It is no secret that the labor system under which the world currently operates has been heavily shaped by colonialism; Ashiagbor 2020 argues that dominant colonial countries such as the UK were only able to achieve the levels of social welfare that they presently offer because of the sheer amount of wealth that they derived from exploiting the global south. (Ashiagbor, 2020) Other nuances that the author discusses, such as the mistreatment of migrant workers even within countries offering better labor protections, are unfortunately not within the scope of the data set, but are still valuable factors to keep in mind. Knowledge of the countries that have historically been exploited (or benefited from the exploitation of others) is valuable to incorporate when studying a topic that is influenced by so many globalized factors. Most of my cleaning involved getting rid of variables that either had too many NAs or were too similar to other variables (but provided less comprehensive data).

setwd("C:/Users/emmap/Downloads/DATA110")
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(viridis)
Warning: package 'viridis' was built under R version 4.3.3
Loading required package: viridisLite
library(plotly)
Warning: package 'plotly' was built under R version 4.3.3

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
insurance <- read_csv("gwip20.csv")
Rows: 189 Columns: 26
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (8): country_name, ISO3c, labor_workinjury_firstnat_carriedover, labor_...
dbl (18): cow_code, independence, labor_workinjury_firstlaw, labor_workinjur...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Cleaning data set, filtering out columns with an excessive amount of NAs or that are repetitive/unnecessary, making variables more readable. 
insurance2 <- insurance |>
  select(!year & !RRgross_Sing & !RRgross_Fam & !labor_workinjury_first_fund & !cow_code & !workinjury_coverage_SIED_harmo & !workinjury_replacement_rate_single_SIED_harmo) |>
  mutate(duration_perm_sspw = str_replace_all(duration_perm_sspw, "months", ""), duration_temp_sspw = str_replace_all(duration_temp_sspw, "months", ""), global_south = str_replace_all(global_south, "1", "yes"), global_south = str_replace_all(global_south, "0", "no"), labor_workinjury_firstlaw_programtype = str_replace_all(labor_workinjury_firstlaw_programtype, "Employer-liability", "Employer liability"))
summary(insurance2)
 country_name          ISO3c            independence  labor_workinjury_firstlaw
 Length:189         Length:189         Min.   :1783   Min.   :1854             
 Class :character   Class :character   1st Qu.:1903   1st Qu.:1917             
 Mode  :character   Mode  :character   Median :1956   Median :1929             
                                       Mean   :1929   Mean   :1932             
                                       3rd Qu.:1967   3rd Qu.:1946             
                                       Max.   :2011   Max.   :2004             
                                                                               
 labor_workinjury_firstlaw_sspw labor_workinjury_firstins
 Min.   :1883                   Min.   :1854             
 1st Qu.:1916                   1st Qu.:1924             
 Median :1938                   Median :1951             
 Mean   :1938                   Mean   :1948             
 3rd Qu.:1957                   3rd Qu.:1969             
 Max.   :2006                   Max.   :2006             
 NA's   :15                     NA's   :6                
 labor_workinjury_firstnat_carriedover labor_workinjury_firstlaw_programtype
 Length:189                            Length:189                           
 Class :character                      Class :character                     
 Mode  :character                      Mode  :character                     
                                                                            
                                                                            
                                                                            
                                                                            
 labor_workinjury_firstlaw_bluecollar_fullcoverage replacement_rate_perm_sspw
 Min.   :1880                                      Min.   : 20.00            
 1st Qu.:1927                                      1st Qu.: 70.00            
 Median :1950                                      Median : 80.00            
 Mean   :1950                                      Mean   : 77.78            
 3rd Qu.:1970                                      3rd Qu.: 90.00            
 Max.   :2017                                      Max.   :150.00            
                                                   NA's   :22                
 duration_perm_sspw replacement_rate_temp_sspw duration_temp_sspw
 Length:189         Min.   :  0.00             Length:189        
 Class :character   1st Qu.: 66.70             Class :character  
 Mode  :character   Median : 75.00             Mode  :character  
                    Mean   : 77.55                               
                    3rd Qu.:100.00                               
                    Max.   :100.00                               
                    NA's   :24                                   
    Region          global_south       colonial_history  
 Length:189         Length:189         Length:189        
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  
                                                         
                                                         
                                                         
                                                         
 workinjury_coverage_ILO workinjury_coverage_SIED_harmo_full
 Min.   :0.0140          Min.   :0.0270                     
 1st Qu.:0.1960          1st Qu.:0.2400                     
 Median :0.4780          Median :0.5040                     
 Mean   :0.4704          Mean   :0.5227                     
 3rd Qu.:0.7005          3rd Qu.:0.8050                     
 Max.   :1.0000          Max.   :1.0000                     
 NA's   :22                                                 
 workinjury_replacement_rate_single_SIED_harmo_full
 Min.   :0.100                                     
 1st Qu.:0.992                                     
 Median :1.042                                     
 Mean   :1.037                                     
 3rd Qu.:1.158                                     
 Max.   :1.500                                     
                                                   
# Testing the relationship between average replacement rate (the percentage of a worker's pre-tax income that they receive compensation for) for permanent injuries against the percentage of workers in the labor force covered by work injury laws
model1 <- lm(workinjury_coverage_SIED_harmo_full ~ replacement_rate_perm_sspw, data = insurance2, na.rm = TRUE)
Warning: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
 extra argument 'na.rm' will be disregarded
summary(model1)

Call:
lm(formula = workinjury_coverage_SIED_harmo_full ~ replacement_rate_perm_sspw, 
    data = insurance2, na.rm = TRUE)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.47787 -0.29809 -0.02033  0.27434  0.47813 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                 0.589976   0.104561   5.642 7.15e-08 ***
replacement_rate_perm_sspw -0.000681   0.001308  -0.521    0.603    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3121 on 165 degrees of freedom
  (22 observations deleted due to missingness)
Multiple R-squared:  0.00164,   Adjusted R-squared:  -0.00441 
F-statistic: 0.2711 on 1 and 165 DF,  p-value: 0.6033
plot(model1$residuals)

The linear equation derived from this model is workinjurycoverage = 0.589976 + -0.000681(replacementrate), indicating that the percentage of workers covered when the replacement rate is equal to 0 is about 59%, decreasing by .07% with each percentage increase in replacement rate. Although the residuals plot shows no distinct pattern, indicating that the model is reasonable to use for the data, the p-value is quite high (at 0.603) and the adjusted R-squared is quite low (at -0.00441, implying that less than 1% of the variation in work injury coverage can be explained by replacement rate changes), suggesting that the correlation between the two variables is not statistically significant.

insurance2 |>
  ggplot(aes(x = independence, y = workinjury_coverage_ILO, col = colonial_history))+
  geom_point()+
  labs(x = "Year Country Gained Independence", y = "% of Workforce Covered for Injuries")
Warning: Removed 22 rows containing missing values (`geom_point()`).

insurance2 |>
  ggplot(aes(x = labor_workinjury_firstlaw_programtype))+
  geom_bar(stat = "count")+
  labs(x = "Type of Program for First Work Injury Law")

plot1 <- insurance2 |>
  mutate(percentage = workinjury_coverage_SIED_harmo_full*100) |>
  ggplot(aes(x = colonial_history, y = percentage, col = replacement_rate_perm_sspw, text = paste0("Country: ", country_name,  "\n Replacement Rate: ", replacement_rate_perm_sspw, "%", "\n Payment Duration: ", duration_perm_sspw, "\n Colonizing Country: ", colonial_history, "\n Proportion of Workforce Covered: ", percentage, "%")))+
  theme_minimal()+
  geom_jitter(alpha = 0.65)+
  scale_color_viridis(option = "inferno")+
  theme(axis.text.x = element_text(angle = 90), text = element_text(family = "serif"))+
  labs(x = "Country of Colonization", y = "% of Workforce Covered by Labor Laws", col = "% of Income Compensated \n for Permanent Injuries", title = "Labor Regulation Coverage and Generosity for Countries of the World", caption = "Source: Global Work-Injury Policy Database")
plot2 <- ggplotly(plot1, tooltip = "text")
plot2

This visualization yielded a lot of interesting information to consider. Although my previous linear model had indicated that labor law coverage and generosity did not have a very strong relationship, I was still surprised by the amount of countries with minimal coverage that offered high levels of compensation (such as Burundi, with a 100% replacement rate but only 4.4% of the workforce covered), or vice versa. At a future point in time, it might be interesting to compare the inequalities of wealth distribution within these countries to determine if that factor may be influential; is wealth inequality responsible for there being only a select few receiving reasonable payouts? Unsurprisingly, the countries that had not previously been subjected to colonization numbered among the highest in both coverage and generosity, supporting my initial belief that colonial history would be impactful on the data. I’d originally intended to create a map to display the information, but I was unable to find any CSVs containing latitude and longtiude information that would have joined cleanly with the ISO3 codes in my data set. I would certainly be interested in attempting another map at a later date, if I was able to find a better data set to join, or playing around with other historical factors given that the incorporation of colonial history proved fruitful.