Does an Education Imply a Sustainable Income?

An investigation into if a longer education provides a platform for earning more money long term

Zack Seath (s3843672)

Last updated: 28 May, 2023

Introduction

Problem Statement

Data Description

Data Preprocessing

# Import the data
Slid <- read.csv("SLID.csv", header = TRUE)

# Check the data
names(Slid)

# Drop the 1st column as it is just the index
Slid <- Slid[-c(1)]

# Check summary statistics
summary(Slid)

# Remove observations with wages as NA
Slid2 <- na.omit(Slid)

# Make sure the two factor variables are factors
Slid2$sex <- as.factor(Slid2$sex)
Slid2$language <- as.factor(Slid2$language)

# Final Check of Summary Statistics
summary(Slid2)

Data Preprocessing Explained

Descriptive Statistics

sumtable(Slid2)
Summary Statistics
Variable N Mean Std. Dev. Min Pctl. 25 Pctl. 75 Max
wages 3987 16 7.9 2.3 9.2 20 50
education 3987 13 3 0 12 15 20
age 3987 37 12 16 28 46 69
sex 3987
… Female 2001 50%
… Male 1986 50%
language 3987
… English 3244 81%
… French 259 6%
… Other 484 12%

Descriptive Statistics Explained

Wages and Education Visualisation

p1 <- plot(Slid2$education, Slid2$wages, xlab = "Education", ylab = "Wages")
p1 <- abline(lm(Slid2$wages ~ Slid2$education))

Visualisation Explained

Hypothesis Testing

\[H_0: p = 0 \]

\[H_A: p \ne 0\]

Hypothesis Testing Continued

hyptest <- lm(Slid2$wages ~ Slid2$education, data = Slid2)
hyptest %>% summary()
## 
## Call:
## lm(formula = Slid2$wages ~ Slid2$education, data = Slid2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.688  -5.822  -1.039   4.148  34.190 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      4.97169    0.53429   9.305   <2e-16 ***
## Slid2$education  0.79231    0.03906  20.284   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.492 on 3985 degrees of freedom
## Multiple R-squared:  0.09359,    Adjusted R-squared:  0.09336 
## F-statistic: 411.4 on 1 and 3985 DF,  p-value: < 2.2e-16

Hypothesis Testing Explained

Discussion - Limitations

Discussion - Findings and Conclusion

References

Singh, U., 2023, Survey of Labour and Income Dynamics, Kaggle, Viewed on 27 May 2023, https://www.kaggle.com/datasets/utkarshx27/survey-of-labour-and-income-dynamics?resource=download

Fox, J. (2016) Applied Regression Analysis and Generalized Linear Models, Third Edition. Sage.

Fox, J. and Weisberg, S. (2019) An R Companion to Applied Regression, Third Edition, Sage.