Preface: This report presents an analysis of gender equality in the European labor market, focusing on the gender pay gap across various countries and industries from 2010 to 2021. The dataset, “pay_gap_Europe.csv,” provides a detailed examination of the unadjusted gender pay gap (GPG), which measures the difference between the average gross hourly earnings of male and female paid employees as a percentage of male earnings. The data also includes supplementary information such as GDP per capita, urbanization rates, and economic sector breakdowns, offering a holistic perspective on the economic and social factors that may influence gender-based pay disparities. This analysis seeks to enhance understanding of gender equity issues in the workplace, promoting efforts toward building more inclusive and equitable societies.
Objective: The primary objective of this data analysis is to explore and understand the factors contributing to the gender pay gap in Europe between 2010 and 2021. By analyzing the unadjusted gender pay gap alongside other socioeconomic indicators like GDP per capita and urbanization rates, the report aims to identify patterns and correlations that may explain the persistent disparities in pay between male and female employees. This analysis will serve as a basis for informing policies and initiatives aimed at reducing the gender pay gap and fostering greater gender equality in the labor market across Europe.
Data dictionary:
| Variable | Class | Discription |
|---|---|---|
| Country | character | Countries in Europe |
| Year | double | Years from 2010 to 2021 |
| GDP | double | GDP per capita (euros) |
| Urban_population | double | Urban population (%) |
| Industry | double | Pay gap in Industry, construction and services (%) |
| Business | double | Pay gap in Business economy (%) |
| Mining | double | Pay gap in Mining and quarrying (%) |
| Manufacturing | double | Pay gap in Manufacturing (%) |
| Electricity_supply | double | Pay gap in Electricity, gas, steam and air conditioning supply (%) |
| Water_supply | double | Pay gap in Water supply; sewerage, waste management and remediation activities (%) |
| Construction | double | Pay gap in Construction (%) |
| Retail trade | double | Pay gap in Wholesale and retail trade; repair of motor vehicles and motorcycles (%) |
| Transportation | double | Pay gap in Transportation and storage (%) |
| Accommodation | double | Pay gap in Accommodation and food service activities (%) |
| Information | double | Pay gap in Information and communication (%) |
| Financial | double | Pay gap in Financial and insurance activities (%) |
| Real estate | double | Pay gap in Real estate activities (%) |
| Professional_scientific | double | Pay gap in Professional, scientific and technical activities (%) |
| Administrative | double | Pay gap in Administrative and support service activities (%) |
| Public_administration | double | Pay gap in Public administration and defence; compulsory social security (%) |
| Education | double | Pay gap in Education (%) |
| Human_health | double | Pay gap in Human health and social work activities (%) |
| Arts | double | Pay gap in Arts, entertainment and recreation (%) |
| Other | double | Pay gap in Other service activities (%) |
pay_gap_Europe <- read_csv("~/Documents/SIMMONS/Micro-Internship/2. Jul - Sep 2024/pay_gap_Europe.csv")
## Rows: 324 Columns: 24
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Country
## dbl (23): Year, GDP, Urban_population, Industry, Business, Mining, Manufactu...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(pay_gap_Europe,5)
glimpse(pay_gap_Europe)
## Rows: 324
## Columns: 24
## $ Country <chr> "Austria", "Austria", "Austria", "Austria", "A…
## $ Year <dbl> 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017…
## $ GDP <dbl> 35390, 36300, 36390, 36180, 36130, 36140, 3639…
## $ Urban_population <dbl> 57.40, 57.12, 57.15, 57.34, 57.53, 57.72, 57.9…
## $ Industry <dbl> 24.0, 23.5, 22.9, 22.3, 22.2, 21.8, 20.8, 20.7…
## $ Business <dbl> 25.2, 24.7, 24.3, 23.8, 23.8, 23.4, 22.3, 22.3…
## $ Mining <dbl> 18.3, NA, NA, NA, 15.9, 13.7, 14.4, 10.9, 7.9,…
## $ Manufacturing <dbl> 24.4, NA, NA, NA, 23.0, 22.7, 21.9, 21.7, 21.4…
## $ Electricity_supply <dbl> 23.6, NA, NA, NA, 19.8, 17.6, 13.2, 13.0, 14.4…
## $ Water_supply <dbl> 12.2, NA, NA, NA, 10.0, 9.3, 8.2, 8.4, 8.1, 8.…
## $ Construction <dbl> 9.9, NA, NA, NA, 8.2, 8.2, 8.3, 8.3, 8.3, 8.2,…
## $ `Retail trade` <dbl> 27.5, NA, NA, NA, 23.4, 23.3, 23.3, 23.2, 23.2…
## $ Transportation <dbl> 7.3, NA, NA, NA, 10.6, 11.6, 14.5, 12.4, 11.7,…
## $ Accommodation <dbl> 9.9, NA, NA, NA, 7.4, 6.4, 5.9, 5.7, 5.4, 4.9,…
## $ Information <dbl> 21.2, NA, NA, NA, 22.9, 22.4, 20.9, 20.6, 20.7…
## $ Financial <dbl> 30.3, NA, NA, NA, 30.4, 30.3, 27.1, 28.4, 28.2…
## $ `Real estate` <dbl> 27.0, NA, NA, NA, 27.8, 28.0, 28.7, 29.0, 29.2…
## $ Professional_scientific <dbl> 34.0, NA, NA, NA, 31.5, 31.3, 30.4, 29.4, 28.3…
## $ Administrative <dbl> 22.5, NA, NA, NA, 19.5, 20.0, 17.8, 17.4, 17.1…
## $ Public_administration <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ Education <dbl> 27.8, NA, NA, NA, 24.3, 24.2, 24.3, 23.7, 23.6…
## $ Human_health <dbl> 12.0, NA, NA, NA, 12.8, 12.9, 14.5, 15.0, 15.3…
## $ Arts <dbl> 34.0, NA, NA, NA, 26.6, 26.2, 20.8, 19.1, 18.3…
## $ Other <dbl> 32.0, NA, NA, NA, 28.8, 28.3, 27.8, 26.9, 26.4…
From taking a glimpse at the dataset, we can see that except for
Country whose class is character, others’ class is all
double or numeric values. This makes good sense as this dataset is about
comparing the pay gap percentage between genders among various fields/
careers. We can easily see that there are a substantial amount of NA
values shown above so we’ll check for NA values and clean the data.
# check for NA value in the dataset
colSums(is.na(pay_gap_Europe))
## Country Year GDP
## 0 0 0
## Urban_population Industry Business
## 0 3 4
## Mining Manufacturing Electricity_supply
## 21 6 23
## Water_supply Construction Retail trade
## 12 12 7
## Transportation Accommodation Information
## 6 9 9
## Financial Real estate Professional_scientific
## 6 13 6
## Administrative Public_administration Education
## 9 66 9
## Human_health Arts Other
## 6 13 11
# Replace NA value by mean of the column's value in the dataset
# Function to replace NA with column mean
replace_na_with_mean <- function(x) {
if(is.numeric(x)) {
return(ifelse(is.na(x), mean(x, na.rm = TRUE), x))
} else {
return(x)
}
}
# Apply the function to all columns in the dataset
pay_gap_Europe <- pay_gap_Europe %>%
mutate(across(everything(), replace_na_with_mean))
# Check if there are any remaining NA values
colSums(is.na(pay_gap_Europe))
## Country Year GDP
## 0 0 0
## Urban_population Industry Business
## 0 0 0
## Mining Manufacturing Electricity_supply
## 0 0 0
## Water_supply Construction Retail trade
## 0 0 0
## Transportation Accommodation Information
## 0 0 0
## Financial Real estate Professional_scientific
## 0 0 0
## Administrative Public_administration Education
## 0 0 0
## Human_health Arts Other
## 0 0 0
# View the first few rows of the cleaned dataset
head(pay_gap_Europe)
Most of the numeric columns has NA values, which run from 3 NA values to 66 NA vlaues per column. The code above itterated through each column to see if there is any NA value. If yes, it will replace the NA value by the mean of all the existing values in that column.
# Summary Statistics
summary(pay_gap_Europe)
## Country Year GDP Urban_population
## Length:324 Min. :2010 Min. : 5080 Min. :52.66
## Class :character 1st Qu.:2013 1st Qu.:13045 1st Qu.:65.62
## Mode :character Median :2016 Median :22330 Median :73.28
## Mean :2016 Mean :28012 Mean :73.46
## 3rd Qu.:2018 3rd Qu.:36382 3rd Qu.:84.89
## Max. :2021 Max. :84750 Max. :98.12
## Industry Business Mining Manufacturing
## Min. :-0.200 Min. : 5.40 Min. :-26.600 Min. : 1.70
## 1st Qu.: 9.675 1st Qu.:13.80 1st Qu.: 3.875 1st Qu.:14.28
## Median :14.500 Median :16.10 Median : 9.600 Median :20.05
## Mean :13.862 Mean :16.61 Mean : 9.529 Mean :19.26
## 3rd Qu.:17.625 3rd Qu.:19.90 3rd Qu.: 16.500 3rd Qu.:24.00
## Max. :29.900 Max. :30.20 Max. : 43.700 Max. :33.60
## Electricity_supply Water_supply Construction Retail trade
## Min. :-2.000 Min. :-33.200 Min. :-28.3000 Min. : 7.00
## 1st Qu.: 7.375 1st Qu.: -2.600 1st Qu.: -8.4000 1st Qu.:16.57
## Median :11.512 Median : 2.500 Median : 0.5500 Median :20.66
## Mean :11.512 Mean : 2.211 Mean : -0.6875 Mean :20.66
## 3rd Qu.:16.200 3rd Qu.: 8.000 3rd Qu.: 8.0000 3rd Qu.:24.60
## Max. :49.200 Max. : 20.900 Max. : 23.5000 Max. :38.50
## Transportation Accommodation Information Financial
## Min. :-25.100 Min. : 0.40 Min. : 7.30 Min. : 4.90
## 1st Qu.: 0.675 1st Qu.: 7.60 1st Qu.:14.40 1st Qu.:23.18
## Median : 5.400 Median :10.00 Median :18.40 Median :28.10
## Mean : 4.345 Mean :10.89 Mean :19.22 Mean :27.94
## 3rd Qu.: 10.400 3rd Qu.:13.43 3rd Qu.:24.43 3rd Qu.:32.02
## Max. : 36.800 Max. :29.70 Max. :33.40 Max. :45.10
## Real estate Professional_scientific Administrative
## Min. :-47.90 Min. :-1.80 Min. :-33.200
## 1st Qu.: 9.50 1st Qu.:15.00 1st Qu.: 6.500
## Median : 14.20 Median :19.30 Median : 9.600
## Mean : 13.26 Mean :19.13 Mean : 8.116
## 3rd Qu.: 18.82 3rd Qu.:23.90 3rd Qu.: 14.100
## Max. : 44.80 Max. :36.20 Max. : 27.700
## Public_administration Education Human_health Arts
## Min. :-5.500 Min. :-3.00 Min. :-6.80 Min. :-16.80
## 1st Qu.: 6.275 1st Qu.: 7.30 1st Qu.:13.30 1st Qu.: 10.20
## Median : 9.401 Median :11.27 Median :18.95 Median : 15.25
## Mean : 9.401 Mean :11.27 Mean :18.50 Mean : 17.50
## 3rd Qu.:13.400 3rd Qu.:14.65 3rd Qu.:24.60 3rd Qu.: 20.30
## Max. :23.300 Max. :36.00 Max. :37.60 Max. : 68.60
## Other
## Min. :-11.90
## 1st Qu.: 11.18
## Median : 17.11
## Mean : 17.11
## 3rd Qu.: 23.12
## Max. : 48.10
Based on the summary statistics, the dataset reveals significant gender pay gaps across various sectors in Europe from 2010 to 2021. The Financial sector shows the highest median gap (28.10%), while Construction has the lowest mean (-0.6875%), potentially indicating a reverse gap in some cases. There’s considerable variability across sectors, with some like Real estate and Water supply showing wide ranges from negative to positive values, suggesting complex dynamics in gender pay disparities. Public sector jobs (administration, education, health) generally show smaller gaps than private sector industries. The Arts sector demonstrates the highest maximum gap (68.60%), indicating extreme disparities in some contexts. GDP data suggests economic diversity among the countries studied, which may influence pay gap trends. Overall, the data indicates that gender pay gaps vary substantially by sector, country, and time, highlighting the need for nuanced analysis in addressing this issue.
# Select only numeric columns
numeric_columns <- pay_gap_Europe %>%
select_if(is.numeric)
# Calculate the correlation matrix
cor_matrix <- cor(numeric_columns, use = "complete.obs")
# Create a correlation plot
corrplot(cor_matrix, method = "circle",
type = "upper",
tl.col = "black",
tl.srt = 45,
tl.cex = 0.45, # Adjust text size
mar = c(0,0,2,0))
#print(cor_matrix)
The correlation matrix shows several important points about the gender pay gap. Countries with higher GDP and more urban populations tend to have higher pay gaps. The Industry and Business sectors are closely linked to other sectors, meaning that when pay gaps are higher in these areas, other sectors are affected too. In contrast, sectors like Public Administration, Education, and Health have weaker links, meaning they may be influenced by different factors. Some sectors, like Mining and Manufacturing, show negative correlations with GDP, meaning their pay gaps may shrink as countries become wealthier and more urban. Overall, these findings show that the gender pay gap is shaped by different factors across sectors.
# Calculate average pay gaps by sector
sector_gaps <- pay_gap_Europe %>%
select(-Country, -Year, -GDP, -Urban_population) %>%
summarise(across(everything(), mean, na.rm = TRUE)) %>%
pivot_longer(cols = everything(), names_to = "Sector", values_to = "AvgGap") %>%
arrange(desc(AvgGap))
## Warning: There was 1 warning in `summarise()`.
## ℹ In argument: `across(everything(), mean, na.rm = TRUE)`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
##
## # Previously
## across(a:b, mean, na.rm = TRUE)
##
## # Now
## across(a:b, \(x) mean(x, na.rm = TRUE))
# Visualize top and bottom 5 sectors
ggplot(sector_gaps %>% slice(c(1:5, (n()-4):n())), aes(x = reorder(Sector, AvgGap), y = AvgGap)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Top 5 and Bottom 5 Sectors by Average Pay Gap", x = "Sector", y = "Average Pay Gap (%)")
The graph illustrates the top five and bottom five sectors by average pay gap. The Financial sector has the largest pay gap, followed by Retail Trade, Manufacturing, Information, and Professional and Scientific Services, indicating significant gender disparities in these industries. On the other hand, sectors like Construction, Water Supply, and Transportation have much smaller gaps, with Construction showing almost no gender pay disparity. This suggests that financial and retail industries may require more targeted interventions to address gender inequality, while the lower-gap sectors can serve as models for effective equality policies.
# Categorize sectors
public_sectors <- c("Public_administration", "Education", "Human_health")
private_sectors <- setdiff(names(pay_gap_Europe), c("Country", "Year", "GDP", "Urban_population", public_sectors))
# Compare public vs private
pay_gap_Europe %>%
pivot_longer(cols = c(all_of(public_sectors), all_of(private_sectors)), names_to = "Sector", values_to = "Gap") %>%
mutate(SectorType = ifelse(Sector %in% public_sectors, "Public", "Private")) %>%
group_by(Year, SectorType) %>%
summarise(AvgGap = mean(Gap, na.rm = TRUE)) %>%
ggplot(aes(x = Year, y = AvgGap, color = SectorType)) +
geom_line() +
labs(title = "Public vs Private Sector Pay Gap Over Time", y = "Average Pay Gap (%)")
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
The graph comparing public and private sector pay gaps over time shows a consistently higher gender pay gap in the private sector compared to the public sector. Both sectors have seen a gradual decline in the pay gap since 2010, with the private sector reducing its gap from around 15% to just above 12% by 2020, while the public sector’s gap has decreased from 14% to about 13%. This suggests that while both sectors are improving, the public sector demonstrates more equitable pay practices.
# Correlation analysis
cor_matrix <- cor(pay_gap_Europe[, c("GDP", "Urban_population", private_sectors, public_sectors)], use = "complete.obs")
# Scatter plot for GDP vs overall pay gap
pay_gap_Europe %>%
mutate(OverallGap = rowMeans(select(., all_of(c(private_sectors, public_sectors))), na.rm = TRUE)) %>%
ggplot(aes(x = GDP, y = OverallGap)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "GDP vs Overall Pay Gap", x = "GDP", y = "Overall Pay Gap (%)")
## `geom_smooth()` using formula = 'y ~ x'
The scatter plot of GDP vs Overall Pay Gap shows a slight positive correlation between a country’s GDP and its overall gender pay gap. As GDP increases, there’s a tendency for the pay gap to widen, albeit with significant variation. This suggests that economic development alone doesn’t necessarily lead to greater pay equality, and that more targeted interventions may be necessary to address gender pay disparities in wealthier nations.
# Overall trend
pay_gap_Europe %>%
pivot_longer(cols = c(all_of(private_sectors), all_of(public_sectors)), names_to = "Sector", values_to = "Gap") %>%
group_by(Year) %>%
summarise(AvgGap = mean(Gap, na.rm = TRUE)) %>%
ggplot(aes(x = Year, y = AvgGap)) +
geom_line() +
labs(title = "Overall Pay Gap Trend Over Time", y = "Average Pay Gap (%)")
The graph depicting the Overall Pay Gap Trend Over Time reveals a gradual decrease in the average gender pay gap across Europe from 2010 to 2020. The gap has reduced from about 15% to approximately 12% over this period. This trend indicates slow but steady progress in reducing gender pay disparities, possibly reflecting the impact of policy measures and societal changes aimed at promoting gender equality in the workplace.
# Boxplot to visualize outliers
pay_gap_Europe %>%
pivot_longer(cols = c(all_of(private_sectors), all_of(public_sectors)), names_to = "Sector", values_to = "Gap") %>%
ggplot(aes(x = Sector, y = Gap)) +
geom_boxplot() +
coord_flip() +
labs(title = "Pay Gap Distribution and Outliers by Sector", y = "Pay Gap (%)")
The boxplot displaying Pay Gap Distribution and Outliers by Sector shows significant variation in pay gaps across different sectors. Financial and Professional_scientific sectors tend to have higher median pay gaps, while sectors like Construction and Transportation show lower gaps. Several sectors, particularly Water Supply and Construction, exhibit notable outliers on both ends of the spectrum, suggesting complex dynamics within these industries that may require further investigation to understand the extreme cases of both high and low (or negative) pay gaps.
# Sectors with negative correlation to GDP
negative_cor_sectors <- names(which(cor_matrix["GDP", ] < 0))
# Trend analysis for these sectors
pay_gap_Europe %>%
pivot_longer(cols = all_of(negative_cor_sectors), names_to = "Sector", values_to = "Gap") %>%
ggplot(aes(x = Year, y = Gap, color = Sector)) +
geom_line() +
facet_wrap(~Sector, scales = "free_y") +
labs(title = "Trends in Sectors with Negative GDP Correlation", y = "Pay Gap (%)")
The faceted line graph showing Trends in Sectors with Negative GDP Correlation reveals that sectors such as Manufacturing, Retail Trade, and Water Supply tend to have decreasing pay gaps as GDP increases. This contrasts with the overall trend and suggests that these sectors may become more equitable in terms of gender pay as economies develop. However, the trends are not uniform across all negatively correlated sectors, indicating that sector-specific factors play a significant role in determining pay gap dynamics.
Persistent Gender Pay Gap: A gender pay gap of around 12.5% remains across Europe as of 2020, despite some progress.
Sector Variations: The highest gaps are in finance, retail, and manufacturing, while public sector jobs like education and health have smaller gaps.
Economic Growth and Pay Gap: Higher GDP is linked to a slight increase in the overall pay gap, though some sectors see improvements with economic growth.
Public vs Private Sector: The private sector consistently shows a higher pay gap than the public sector, with slower progress toward equality.
Urbanization: More urbanized countries tend to have wider pay gaps, suggesting urban-rural differences in gender equality.
Complex Sector Dynamics: Certain sectors show extreme variations in pay gaps, indicating complex internal factors.
Slow Progress: Pay gaps are narrowing, but change is slow, highlighting the need for stronger policies.
Intersectional Factors: Addressing the pay gap requires considering industry norms, policies, and cultural differences across sectors and countries.
Targeted Interventions: Sectors with high pay gaps, such as finance, need focused efforts, while practices from low-gap sectors could be adapted.
Data-Driven Policy: Ongoing data collection is essential to refine policies for achieving gender pay equity across Europe.