Hypothesis Testing

Effect of gender on unemployment rate in Australia

Vishwa Gandhi(3714805), Vikas Virani(3715555), Jigar Mangukiya (3715807)

Last updated: 28 October, 2018

Introduction

Problem Statement

Data

  1. Unemployment rate - female - 25-54
  2. Unemployment rate - male - 25-54

Data ref.

  1. Database Location - https://www.ilo.org/ilostat/faces/wcnav_defaultSelection
  1. Female Unemployment Rate Dataset download location - https://docs.google.com/spreadsheet/pub?key=r9StWVETzyX9Lv-r4-2sh6w&output=xlsx
  2. Male Unemployment Rate Dataset download location - https://docs.google.com/spreadsheet/pub?key=rjkDFSPV2pw9Pbnz2kpiqPQ&output=xlsx

Data Cont.

  1. Year [86-05] - levels - 1985,1986….2005
  2. Unemployment rate - % 0 to 100
  3. Gender [Male, Female] - levels - Male, Female
Unemployement_female <- read_csv("indicator_f 25-54 unemploy.csv")
head(Unemployement_female)
Unemployement_male <- read_csv("indicator_m 25-54 unemploy.csv")
head(Unemployement_male)
Unemployement_female <- Unemployement_female %>% gather(`1981`:`2005`,key = "Year",value = "Unemployment Rate")
Unemployement_male <- Unemployement_male %>% gather(`1981`:`2005`,key = "Year",value = "Unemployment Rate")

# Year %in% c('2000':'2005')

Unemployement_female_filtered <- Unemployement_female %>% filter(`Female 25-54 unemployment (%)`=='Australia' & Year > 1985) %>% select(-X27,-`Female 25-54 unemployment (%)`) %>% mutate(Gender="Female")

Unemployement_male_filtered <- Unemployement_male %>% filter(`Male 25-54 unemployment (%)`=='Australia' & Year > 1985) %>% select(-X27,-`Male 25-54 unemployment (%)`) %>% mutate(Gender="Male")

Combined_data <- rbind(Unemployement_female_filtered,Unemployement_male_filtered)
Combined_data$Year <- factor(Combined_data$Year)
Combined_data$Gender <- factor(Combined_data$Gender)
head(Combined_data)

Descriptive Statistics

Combined_data %>% group_by(Gender) %>% summarise(Min = min(`Unemployment Rate`,na.rm = TRUE),
                                           Q1 = quantile(`Unemployment Rate`,probs = .25,na.rm = TRUE),
                                           Median = median(`Unemployment Rate`, na.rm = TRUE),
                                           Q3 = quantile(`Unemployment Rate`,probs = .75,na.rm = TRUE),
                                           Max = max(`Unemployment Rate`,na.rm = TRUE),
                                           Mean = mean(`Unemployment Rate`, na.rm = TRUE),
                                           SD = sd(`Unemployment Rate`, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(`Unemployment Rate`))) -> table
knitr::kable(table)
Gender Min Q1 Median Q3 Max Mean SD n Missing
Female 4.2 5.175 5.80 6.40 7.6 5.82 0.9479063 20 0
Male 3.6 4.775 5.55 6.65 8.9 5.84 1.4744847 20 0

Descriptive Statistics Visualization

ggplot(data=Combined_data,
 aes(x=Year, y=`Unemployment Rate`,group=Gender,colour=Gender)) +
 geom_line() +geom_point() + ggtitle("Line Plot of Unemployment Rate Over 20 years for Male and Female in Australia")+

 theme(axis.text.x=element_text(size=6, angle=45, vjust=1, hjust=1),
 panel.grid.major.x=element_blank(),
 panel.grid.minor.x=element_blank(),
 panel.grid.minor.y=element_blank(),
 panel.grid.major.y=element_blank()) +
 theme(legend.text = element_text(size=8, face="italic"),
 legend.title = element_blank(),
 legend.position=c(0.1, 0.9))

Descriptive Statistics Cont.

Combined_data %>% boxplot(`Unemployment Rate` ~ Gender, data = .,ylab="Unemployment rate", main = "Boxplot - Australian Unemployment rate by Gender for 1986 - 2005")

Hypothesis Testing

Null Hypothesis - There is no statistically significant difference in the mean unemployment rates of males and females

\[H_0: \mu_1 - \mu_2 = 0\] Alternate hypothesis: There is a statistically significant difference in the mean unemployment rates of males and females

\[H_a: \mu_1 - \mu_2 \ne 0\]

  1. Independence
  2. Normality of data

Hypothesis Testing - Checking Normality

Unemployement_female_filtered$`Unemployment Rate` %>% qqPlot(dist="norm",ylab="Unemployment rate", main = "QQPlot - Australian Unemployment rate (Female)")

Unemployement_male_filtered$`Unemployment Rate` %>% qqPlot(dist="norm",ylab="Unemployment rate", main = "QQPlot - Australian Unemployment rate (Male)")

Hypothesis Testing - Checking equal variance

\[H_0: \sigma_1 = \sigma_2\]

\[H_a: \sigma_1 \ne \sigma_2 \]

leveneTest(Combined_data$`Unemployment Rate` ~ Combined_data$Gender)

Hypothesis Testing Cont.

t.test(
  `Unemployment Rate` ~ Gender,
  data = Combined_data,
  var.equal = FALSE,
  alternative = "two.sided"
  )
## 
##  Welch Two Sample t-test
## 
## data:  Unemployment Rate by Gender
## t = -0.051026, df = 32.414, p-value = 0.9596
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.8179941  0.7779942
## sample estimates:
## mean in group Female   mean in group Male 
##                 5.82                 5.84

Discussion

  1. considering data for several countries, to check if the result is same for all the countries or they differ by the economic state (first world, third world etc) of country.

  2. considering other factors that affect unemployment rate of a country. e.g. Economic downturn & financial crunches due to some natural or manmade events like calamities or wars. This can be used to adjust the rate to get a more accurate result.