Labour in Ethiopia

Author

Sarah Abdela

Image Source: Image by Peggy_Marco from Pixabay (used under Pixabay Content License)

https://ilostat.ilo.org/

https://rshiny.ilo.org/dataexplorer54/?lang=en&segment=indicator&id=EAP_TEAP_SEX_AGE_GEO_NB_A&channel=ilostat

Labour in Ethiopia

Paragrapgh 1 intro

The dataset I am working with focuses on the labour market in Ethiopia. The dataset includes 12 variables, consisting of several categorical variables such as gender and area, one quantitative variable representing labour force values, and a time variable. For this analysis, I selected the most relevant variables and renamed them to simpler and more readable terms. I also filtered the dataset to include only observations for Ethiopia and removed aggregated categories such as total values to ensure a more accurate comparison across groups. Since the available data for this analysis focuses on the year 2021, the study examines differences in labour force participation across groups rather than changes over time. In addition, I was interested in exploring whether the COVID-19 pandemic may have had an impact on the labour market during this period. This topic is personally meaningful to me, as I plan to return to Ethiopia after completing my studies, and I would like to better understand the labour market conditions that may influence my future career opportunities.

library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.5.3

Warning: package 'readr' was built under R version 4.5.3

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readr)
library(shiny)

Warning: package 'shiny' was built under R version 4.5.3

library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

ethiopia <- readr::read_csv("C:/Users/ss671/OneDrive/Documents/ethiopia_labour_fixed.csv")

Rows: 22036 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): ref_area.label, source.label, indicator.label, sex.label, classif1...
dbl  (2): time, obs_value

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(ethiopia)

# A tibble: 6 × 12
  ref_area.label source.label           indicator.label sex.label classif1.label
  <chr>          <chr>                  <chr>           <chr>     <chr>         
1 Afghanistan    LFS - Labour Force Su… Labour force b… Total     Age (Youth, a…
2 Afghanistan    LFS - Labour Force Su… Labour force b… Total     Age (Youth, a…
3 Afghanistan    LFS - Labour Force Su… Labour force b… Total     Age (Youth, a…
4 Afghanistan    LFS - Labour Force Su… Labour force b… Total     Age (Youth, a…
5 Afghanistan    LFS - Labour Force Su… Labour force b… Total     Age (Youth, a…
6 Afghanistan    LFS - Labour Force Su… Labour force b… Total     Age (Youth, a…
# ℹ 7 more variables: classif2.label <chr>, time <dbl>, obs_value <dbl>,
#   obs_status.label <chr>, note_classif.label <chr>,
#   note_indicator.label <chr>, note_source.label <chr>

We can see here that the dataset consists of labour-related information for all countries and needs a filtering

This project compares labour force trends in Ethiopia in 2021 and across different groups, such as gender and area. It helps show how labour force participation changes and whether there are differences between males and females or between urban and rural areas. The dataset consisted of multiple countries, so i had to filter it to include only Ethiopia, removing all other countries using the filter() function. Then I renamed the variables to simpler names so they are easier to use in the analysis.

ethiopia_clean <- ethiopia %>%
  select(time, obs_value, sex.label, ref_area.label) %>%
  rename(
    year = time,
    labour_force = obs_value,
    gender = sex.label,
    area = ref_area.label
  ) %>%
  filter(area == "Ethiopia", gender != "Total") %>%
  filter(!is.na(labour_force))

head(ethiopia_clean)

# A tibble: 6 × 4
   year labour_force gender area    
  <dbl>        <dbl> <chr>  <chr>   
1  2021       22064. Male   Ethiopia
2  2021       17137. Male   Ethiopia
3  2021        4927. Male   Ethiopia
4  2021       20729. Male   Ethiopia
5  2021       15934. Male   Ethiopia
6  2021        4795. Male   Ethiopia

ethiopia_filtered <- ethiopia_clean %>%
  filter(year >= 2000, gender != "Total")

head(ethiopia_filtered)

# A tibble: 6 × 4
   year labour_force gender area    
  <dbl>        <dbl> <chr>  <chr>   
1  2021       22064. Male   Ethiopia
2  2021       17137. Male   Ethiopia
3  2021        4927. Male   Ethiopia
4  2021       20729. Male   Ethiopia
5  2021       15934. Male   Ethiopia
6  2021        4795. Male   Ethiopia

summary(ethiopia_filtered$labour_force)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
   47.93  1095.48  3665.20  5585.09  6478.01 22064.31

Exploration

Labour force over time

ggplot(ethiopia_filtered, aes(x = area, y = labour_force, fill = area)) +
  geom_boxplot() +
  labs(
    title = "Labour Force Distribution by Area",
    x = "Area",
    y = "Labour Force"
  )

Main plot

ggplot(ethiopia_filtered, aes(x = gender, y = labour_force, fill = gender)) +
  geom_boxplot() +
  labs(
    title = "Labour Force Distribution by Gender in Ethiopia",
    x = "Gender",
    y = "Labour Force",
    fill = "Gender",
    caption = "Source: Ethiopia Labour Dataset"
  ) +
  scale_fill_brewer(palette = "Set1") +
  theme_minimal()

The boxplot shows the distribution of labour force values for males and females in Ethiopia for the year 2021. Each box represents the middle 50% of the data, while the line inside the box shows the median value. The points outside the boxes represent individual observations. From the graph, we can see that the labour force values for both genders have a wide spread, meaning there is variation within each group. While the median for males appears slightly higher than for females, there is still a lot of overlap between the two groups. This overlap suggests that the difference in labour force between males and females is not very strong. Overall, the boxplot helps show both the variation within each group and the comparison between genders in a clear way

model <- lm(labour_force ~ gender, data = ethiopia_filtered)
summary(model)


Call:
lm(formula = labour_force ~ gender, data = ethiopia_filtered)

Residuals:
    Min      1Q  Median      3Q     Max 
-6247.4 -4325.8 -1892.7   948.7 15684.5 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   4790.4      873.8   5.482 3.52e-07 ***
genderMale    1589.4     1235.8   1.286    0.202    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6054 on 94 degrees of freedom
Multiple R-squared:  0.01729,   Adjusted R-squared:  0.006838 
F-statistic: 1.654 on 1 and 94 DF,  p-value: 0.2016

plot(model)

The regression model examines how gender affects labour force in Ethiopia for the year 2021. The coefficient for males is about 1589, which suggests that males have a higher labour force value than females on average. However, the p-value for gender is 0.202, which is greater than 0.05, meaning that gender is not statistically significant in this model. This means the difference between males and females could be due to chance rather than a strong relationship. The R² value is very low (about 0.017), which shows that the model explains only about 1.7% of the variation in labour force, so it is a weak model. Overall, this suggests that in 2021, gender alone does not strongly explain differences in labour force in Ethiopia.

R-shiny was not working, so I had to check with the exist function that ethiopia exists in memory

exists("ethiopia_filtered")

[1] TRUE

The diagnostic plots help evaluate whether the regression model is appropriate. The Residuals vs Fitted plot shows that the residuals are scattered without a clear pattern, but there is still noticeable spread, suggesting the model may not fully capture the variation in the data. The Q-Q plot shows that the residuals do not perfectly follow a straight line, which means they are not normally distributed. The Scale-Location plot indicates that the spread of residuals is not constant, suggesting unequal variance. Finally, the Residuals vs Factor Levels plot shows variation within each gender group. Overall, these plots suggest that the model is not a very strong fit and that gender alone does not fully explain labour force differences in Ethiopia.

exists("ethiopia_filtered")

[1] TRUE

library(shiny)

# UI
ui <- fluidPage(
  
  titlePanel("Ethiopia Labour Force"),
  
  sidebarLayout(
    
    sidebarPanel(
      
      selectInput(
        inputId = "gender",
        label = "Select Gender:",
        choices = unique(ethiopia_filtered$gender)
      )
      
    ),
    
    mainPanel(
      plotOutput(outputId = "distPlot")
    )
    
  )
)

# Server
server <- function(input, output) {
  
  output$distPlot <- renderPlot({
    
    selected_data <- ethiopia_filtered %>%
      filter(gender == input$gender)
    
    ggplot(selected_data, aes(x = gender, y = labour_force, fill = gender)) +
      geom_boxplot() +
      labs(
        title = "Labour Force Distribution by Gender in Ethiopia",
        x = "Gender",
        y = "Labour Force"
      )
    
  })
}

# Run app
shinyApp(ui = ui, server = server)

Shiny applications not supported in static R Markdown documents

R Shiny is used in this project to make the analysis interactive instead of static. In this case, the Shiny app allows the user to select a gender and view the corresponding labour force distribution for Ethiopia. This makes it easier to focus on one group at a time and better understand the differences in labour force participation. Instead of just looking at one fixed graph, the user can interact with the data, which makes the results clearer and more engaging.

cor(ethiopia_clean$year, ethiopia_clean$labour_force)

Warning in cor(ethiopia_clean$year, ethiopia_clean$labour_force): the standard
deviation is zero

[1] NA

I attempted to use correlation to examine the relationship between variables in the dataset, specifically between year and labour force. However, since my dataset only includes data from one year (2021), there is no variation in the time variable. As a result, the standard deviation is zero, and the correlation cannot be calculated, which is why it returns an undefined value (NA). This showed me that correlation is not meaningful in this case, so I focused my analysis on comparing labour force differences across groups such as gender instead.

Problems I faced

During this project, I faced some challenges while working with the data and visualizations. Initially, I attempted to create a line graph to show trends over time, but the graph appeared as a straight vertical line. After investigating the issue, I realized that the dataset only contained data for one year (2021), which made it impossible to analyze changes over time. Because of this limitation, I had to adjust my approach and focus on comparing labour force values across categories instead of trends. I changed my visualization to a boxplot to better compare differences between genders at a given period of time, which was 2021, which provided a more accurate and meaningful representation of the data.

Reflection

I cleaned the data by first loading the dataset using read_csv(). Then, I selected only the relevant variables such as time, labour force values, gender, and area. I renamed these variables to simpler names to make them easier to work with in R. After that, I filtered the dataset to include only observations for Ethiopia and removed the “Total” category so that I could focus on comparing groups more clearly. This made the dataset cleaner and ready for analysis.

For the visualization, I used a boxplot where each box represents the distribution of labour force values for each gender. The colors represent different gender groups, making it easier to compare them. Since the dataset only included data for the year 2021, I was not able to analyze trends over time, so I focused on comparing differences between groups instead.

The regression results examine the relationship between gender and labour force. The model shows that males have a higher average labour force value than females; however, the p-value is greater than 0.05, which means this difference is not statistically significant. The adjusted R-squared value is very low, indicating that gender explains only a small portion of the variation in labour force. Overall, this suggests that gender alone does not strongly explain differences in labour force in Ethiopia for 2021.

One limitation of this analysis is that the dataset only includes one year (2021), which prevents analyzing trends over time. This limits the ability to fully understand how the labour market has changed.I wish I could include more detailed variables, such as age groups or employment sectors, to better understand how different factors affect labour force participation. This would provide a deeper insight into the labour market and help explain variations more clearly.

I used AI as a learning tool to better understand key concepts such as p-values, R-squared, and regression, which helped me interpret my results more clearly and confidently. It provided simple explanations and examples that made complex statistical ideas easier to understand. I was able to connect these concepts directly to my own dataset and results. This helped me explain what the regression output means instead of just reporting numbers. Overall, it improved my understanding of the analysis and made my interpretations more accurate.