Let’s import relevant libraries and load the data:

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.2
load("jtrain2.RData")

(i) Use the indicator variable train to determine the fraction of men receiving job training.

data %>%
summarise(trained =sum(train==1), total = n(), fraction_of_men_trained = sum(train)/n())
##   trained total fraction_of_men_trained
## 1     185   445               0.4157303

Fraction of men receiving job training is 0.4157

ii) The variable re78 is earnings from 1978, measured in thousands of 1982 dollars. Find the averages of re78 for the sample of men receiving job training and the sample not receiving job training. Is the difference economically large?

data %>%
select(train,re78) %>%
group_by(train) %>%
summarise(Mean = mean(re78))
## # A tibble: 2 x 2
##   train  Mean
##   <int> <dbl>
## 1     0  4.55
## 2     1  6.35

The average of re78 for sample of men who received job training was around 6.35 The average of re78 for sample of men who did not received job training was around 4.56 The difference is economically large as it is close to 1.79 thousands of 1982 dollars.

iii) The variable unem78 is an indicator of whether a man is unemployed or not in 1978. What fraction of the men who received job training are unemployed? What about for men who did not receive job training? Comment on the difference.

total_trained <- data %>%
filter(train == 1) %>%
summarise(n())
total_trained
##   n()
## 1 185
total_unemp_trained <- data %>%
filter(unem78 == 1,train == 1) %>%
summarise(n())
total_unemp_trained
##   n()
## 1  45
fraction_unemployed_trained <- total_unemp_trained/total_trained
fraction_unemployed_trained
##         n()
## 1 0.2432432

Fraction of trained men who are unemployed 24.32%

total_unemp_untrained <- data %>%
filter(unem78 == 1,train == 0) %>%
summarise(n())
total_unemp_untrained
##   n()
## 1  92
total_untrained <- data %>%
filter(train == 0) %>%
summarise(n())

total_untrained
##   n()
## 1 260
fraction_unemployed_untrained <- total_unemp_untrained/total_untrained
fraction_unemployed_untrained
##         n()
## 1 0.3538462

Fraction of untrained men who are unemployed is 35.38%

Aggregate table:

data %>%
group_by(train,unem78)%>%
summarise(n())
## # A tibble: 4 x 3
## # Groups:   train [2]
##   train unem78 `n()`
##   <int>  <int> <int>
## 1     0      0   168
## 2     0      1    92
## 3     1      0   140
## 4     1      1    45

Percentage of untrained people who are unemployed is around 35.38% which is much greater than people who are trained but are unemployed (24.32%). It means that people who are trained are likely to get employed.

iv) From parts (ii) and (iii), does it appear that the job training program was effective? What would make our conclusions more convincing?

Yes from (ii) and (iii) we can say that job training program was effective as trained individuals are earning more and likely to get employed than people who are not trained.

v) Scatter plot between re78 and train:

ggplot(data = data, aes(train,re78)) +
geom_point()

Two groups of trained and untrained individuals can be distinguished from scatter plot. It is clear from scatter plot that trained individuals are likely to have more earnings than people who were not trained.