Relationship Between Health and Study

Exporation of the Study and Work Report

Sheryl Maher (S3869791) and Shaun Sturgess (S3869765)

Last updated: 23 October, 2020

Introduction

The Australian Bureau of Statistics has recently released the Qualifications and Work Survey 2018-19 (Qualifications and Work, 2018-19 | Australian Bureau of Statistics, 2020). This survey explores the relationship, at a population level, between people in the workforce and their education level. As part of the survey, participants are asked to evaluate their own health and report on their academic qualifications.

This presentation explores the relationship between health and highest attained qualification amongst the Australian population.

Problem Statement

\(H_0\): There is no association in the population between health and highest qualification level.

\(H_A\): There is an association in the population between health and highest qualification level.

The association between health and highest qualification level for the Australian working population will be determined using a Chi-square test of association.

Data

The Qualifications and Work Survey 2018-19, conducted by the Australian Bureau of Statistics, provides suitable data for this investigation.

As part of the Qualifications and Work Survey 2018-2019 (Australian Bereau of Statistics, 2020) respondents are asked to rate their overall health and respond to a series of questions about their qualifications. As such, this data set will allow the stated problem to be explored using statistical approaches.

Variables

Since the data is already aggregated in the data source, ‘Health’ is imported as one column per factor level. Values represent the predicted total number of Australians in each category, in 1000s.

Load and Preprocess

Load the data file assigning Excel-style letters to the dataframe columns for easy reference, then subset to the required data.

health_data <- read_excel("Qualifications and work 2018-19 Data Tables.xlsx",
                          range = "Table 9!A13:G44",
                          col_names = LETTERS[1:7])
health_data <- health_data[c(1:3, 28:30, 32), c("A", "F", "G")]

Combine and order the values for each qualification level and health status.

health_data %<>%
  group_by(Qualification = A) %>% 
  summarise(`Poor Health` = sum(G),
            `Good Health` = sum(F))
health_data$Qualification %<>% factor(
  levels = c("No non-school qualification", 
             "Below bachelor degree", 
             "Bachelor degree", 
             "Above bachelor degree"),
  ordered = TRUE)
health_data %<>% arrange(Qualification)

Load and Preprocess Cont.

Since the data is already aggregated into a frequency table, name the columns based on their contents and convert to a matrix to allow easier processing.

health_data %<>% column_to_rownames(var = "Qualification")
health_data %<>% as.matrix

View the preprocessed data, which is ready for analysis.

health_data %>% kable(
      caption = "Highest qualification and health status (1000s of people)") %>%
  kable_paper("striped", full_width = FALSE)
Highest qualification and health status (1000s of people)
Poor Health Good Health
No non-school qualification 257.7 3201.4
Below bachelor degree 261.2 3620.0
Bachelor degree 112.6 2398.7
Above bachelor degree 50.7 1481.5

Descriptive Statistics and Visualisation

Calculate the proportion of people within each health status that have that qualification level.

health_prop <- health_data %>% prop.table(margin = 2)
health_prop %>% round(3) %>%
  kable (caption = "Highest qualification and health status (proportion)") %>%
  kable_paper("striped", full_width = FALSE)
Highest qualification and health status (proportion)
Poor Health Good Health
No non-school qualification 0.378 0.299
Below bachelor degree 0.383 0.338
Bachelor degree 0.165 0.224
Above bachelor degree 0.074 0.138

Descriptive Statistics Cont.

Graphing the proportion of people with good or poor health who have different qualification levels, there appears to be a relationship between these variables. (Graph on following slide)

barplot(health_prop,
        ylab = "Proportion Within Health Status",
        ylim = c(0,1.0),
        legend = rownames(health_prop),
        beside = TRUE,
        args.legend = c(title = "Health Status"),
        col = brewer.pal(nrow(health_prop), name = "Blues"),
        main = "Australians with Good Health have Higher Qualification",
        sub = "ABS Qualifications and Work 2018-19")

Descriptive Statistics Cont.

Hypothesis Testing

The chi-square test of association will be used to explore the relationship between health and highest qualification.
The assumption for the chi squared test is no more than 25% of the cells have expected counts below 5, which is true for this data.

Calculate the critical value for the data:

(critical_value <- qchisq(p = 0.95,df = prod(dim(health_data)-1)))
## [1] 7.814728

Perform a chi-square test of association:

(chi <- chisq.test(health_data))
## 
##  Pearson's Chi-squared test
## 
## data:  health_data
## X-squared = 46.521, df = 3, p-value = 4.394e-10

Hypothesis Testing Cont.

Check the observed and expected values:

results_combined <- cbind(chi$observed, round(chi$expected,1))
results_combined %>% kable %>% kable_paper("striped", full_width = FALSE) %>%
  add_header_above(c(" " = 1, "Observed (1000s)" = 2, "Expected (1000s)" = 2))
Observed (1000s)
Expected (1000s)
Poor Health Good Health Poor Health Good Health
No non-school qualification 257.7 3201.4 207.3 3251.8
Below bachelor degree 261.2 3620.0 232.6 3648.6
Bachelor degree 112.6 2398.7 150.5 2360.8
Above bachelor degree 50.7 1481.5 91.8 1440.4

Hypothesis Testing Cont.

Key values from the chi-square test of association are:

As \(\chi^2\) is larger than the critical value, and the \(p\)-value is less than the 0.05 level of significance, \(H_0\) was rejected. There was a statistically significant association between health and highest qualification.

Discussion

Australians who report having poor health have lower-level qualifications than those who report having good health.

The chi-square test of association explored the relationship between self reported health status and highest qualification. The result of the test was statistically significant (\(p\) < 0.05). As such there is evidence to support rejecting the null hypothesis. The results of this analysis suggest there is an association in the population between health and highest qualification.

Future investigations could use a different section of the Qualifications and Work 2018-19 data source to investigate whether Australians’ health and qualification level are associated with how relevant their qualification is to their current job.

References

Qualifications and work, 2018-19 | Australian Bureau of Statistics. (2020, September 29). https://www.abs.gov.au/statistics/people/education/qualifications-and-work/latest-release