I am using the dataset BankWages from AER package for this homework. The dataset reports the following:
data("BankWages")
str(BankWages)
## 'data.frame': 474 obs. of 4 variables:
## $ job : Ord.factor w/ 3 levels "custodial"<"admin"<..: 3 2 2 2 2 2 2 2 2 2 ...
## $ education: int 15 16 12 8 15 15 15 12 15 12 ...
## $ gender : Factor w/ 2 levels "male","female": 1 1 2 2 1 1 1 2 2 2 ...
## $ minority : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
The dataframe contains 474 observations and 4 variables. job is a factor variable indicating job category. education is a continuous variable representing education in years. Other variables are gender and minority.
xtabs(~ education + job, data = BankWages)
## job
## education custodial admin manage
## 8 13 40 0
## 12 13 176 1
## 14 0 6 0
## 15 1 111 4
## 16 0 24 35
## 17 0 3 8
## 18 0 2 7
## 19 0 1 26
## 20 0 0 2
## 21 0 0 1
eduCat <- factor(BankWages$education)
levels(eduCat)[3:10] <- rep(c("14-15", "16-18", "19-21"), c(2, 3, 3)) #merge some education levels
tab <- xtabs(~ eduCat + job, data = BankWages)
#Table
prop.table(tab, 1)
## job
## eduCat custodial admin manage
## 8 0.245283019 0.754716981 0.000000000
## 12 0.068421053 0.926315789 0.005263158
## 14-15 0.008196721 0.959016393 0.032786885
## 16-18 0.000000000 0.367088608 0.632911392
## 19-21 0.000000000 0.033333333 0.966666667
The above table shows that bank employees with more years of education (16-21) are mostly in managerial positions.Similarly, most employees with 12-15 years of education are in administrative positions.
#Plot
plot(job ~ eduCat, data = BankWages,
main = "Job category by education",
xlab = "Year of education",
ylab = "Job category",
cex.main = 2,
cex.lab = 1.5,
font = 2,
col = c("#2CA25F", "#99D8C9" , "#E5F5F9")
)