Let’s import relevant libraries and load the data:
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.2
load("jtrain2.RData")
data %>%
summarise(trained =sum(train==1), total = n(), fraction_of_men_trained = sum(train)/n())
## trained total fraction_of_men_trained
## 1 185 445 0.4157303
Fraction of men receiving job training is 0.4157
data %>%
select(train,re78) %>%
group_by(train) %>%
summarise(Mean = mean(re78))
## # A tibble: 2 x 2
## train Mean
## <int> <dbl>
## 1 0 4.55
## 2 1 6.35
The average of re78 for sample of men who received job training was around 6.35 The average of re78 for sample of men who did not received job training was around 4.56 The difference is economically large as it is close to 1.79 thousands of 1982 dollars.
total_trained <- data %>%
filter(train == 1) %>%
summarise(n())
total_trained
## n()
## 1 185
total_unemp_trained <- data %>%
filter(unem78 == 1,train == 1) %>%
summarise(n())
total_unemp_trained
## n()
## 1 45
fraction_unemployed_trained <- total_unemp_trained/total_trained
fraction_unemployed_trained
## n()
## 1 0.2432432
Fraction of trained men who are unemployed 24.32%
total_unemp_untrained <- data %>%
filter(unem78 == 1,train == 0) %>%
summarise(n())
total_unemp_untrained
## n()
## 1 92
total_untrained <- data %>%
filter(train == 0) %>%
summarise(n())
total_untrained
## n()
## 1 260
fraction_unemployed_untrained <- total_unemp_untrained/total_untrained
fraction_unemployed_untrained
## n()
## 1 0.3538462
Fraction of untrained men who are unemployed is 35.38%
Aggregate table:
data %>%
group_by(train,unem78)%>%
summarise(n())
## # A tibble: 4 x 3
## # Groups: train [2]
## train unem78 `n()`
## <int> <int> <int>
## 1 0 0 168
## 2 0 1 92
## 3 1 0 140
## 4 1 1 45
Percentage of untrained people who are unemployed is around 35.38% which is much greater than people who are trained but are unemployed (24.32%). It means that people who are trained are likely to get employed.
Yes from (ii) and (iii) we can say that job training program was effective as trained individuals are earning more and likely to get employed than people who are not trained.
ggplot(data = data, aes(train,re78)) +
geom_point()
Two groups of trained and untrained individuals can be distinguished from scatter plot. It is clear from scatter plot that trained individuals are likely to have more earnings than people who were not trained.