library(dplyr)
library(ggplot2)
library(knitr)
names(opm94)
## [1] "x" "sal" "grade" "patco" "major" "age"
## [7] "male" "vet" "handvet" "hand" "yos" "edyrs"
## [13] "promo" "exit" "supmgr" "race" "minority" "grade4"
## [19] "promo01" "supmgr01" "male01" "exit01" "vet01"
str(opm94)
## 'data.frame': 1000 obs. of 23 variables:
## $ x : int 1 2 3 4 5 6 7 8 9 10 ...
## $ sal : int 26045 37651 64926 18588 19573 28648 27805 16560 40440 24285 ...
## $ grade : int 7 9 14 4 3 9 7 3 11 6 ...
## $ patco : Factor w/ 5 levels "Administrative",..: 1 4 4 2 2 4 5 2 1 2 ...
## $ major : Factor w/ 23 levels " ","AGRIC",..: 16 11 10 1 1 11 1 1 1 6 ...
## $ age : int 52 34 37 26 51 44 50 37 59 57 ...
## $ male : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
## $ vet : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
## $ handvet : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ hand : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
## $ yos : int 6 4 3 6 14 1 7 5 13 6 ...
## $ edyrs : int 16 16 16 12 12 16 14 12 12 14 ...
## $ promo : Factor w/ 2 levels "no","yes": 2 1 1 1 NA 1 1 1 1 1 ...
## $ exit : Factor w/ 2 levels "no","yes": 1 1 1 1 2 1 1 1 1 1 ...
## $ supmgr : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ race : Factor w/ 5 levels "American Indian",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ minority: int 1 1 1 1 1 1 1 1 1 1 ...
## $ grade4 : Factor w/ 4 levels "grades 1 to 4",..: 3 4 2 1 1 4 3 1 4 3 ...
## $ promo01 : num 1 0 0 0 NA 0 0 0 0 0 ...
## $ supmgr01: num 0 0 0 0 0 0 0 0 0 0 ...
## $ male01 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ exit01 : num 0 0 0 0 1 0 0 0 0 0 ...
## $ vet01 : num 0 0 0 0 0 0 0 0 1 0 ...
Check the value of patco variable
levels(opm94$patco)
## [1] "Administrative" "Clerical" "Other" "Professional"
## [5] "Technical"
1. What type is `patco`? What values does it have?
`patco` is a nominal categorical variable, which means that the variable has no order and is listed in terms of categories or types which describes the qualitative measures of a set.
The `patco` variable contains 5 types of categories as "Administrative", "Clerical", "Professional", "Techinical", and "other". The values
In an even broader classification than occupational family, all white-collar occupations fall into one of five occupational categories: Professional, Administrative, Technical, Clerical, or “Other” (referred to as PATCO).
Regress sal on patco:
lm(sal ~ patco, data = opm94) %>% summary()
##
## Call:
## lm(formula = sal ~ patco, data = opm94)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33977 -6905 -1555 4899 66721
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49808.2 698.3 71.333 <2e-16 ***
## patcoClerical -27546.1 1265.5 -21.767 <2e-16 ***
## patcoOther -23811.6 2345.3 -10.153 <2e-16 ***
## patcoProfessional 3076.2 1066.3 2.885 0.004 **
## patcoTechnical -20616.7 1071.3 -19.245 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12670 on 990 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.4891, Adjusted R-squared: 0.487
## F-statistic: 236.9 on 4 and 990 DF, p-value: < 2.2e-16
a) What is the reference group?
The reference group, the group that is compared, in this measure is patco Administrative.
b) Interpret the intercept:
The intercept, in this sample, the expected salary for the occuption in Administration is 49808.2.
c) Interpret the coefficient on `patcoClerical`:
The cofficent for `patcoClerical` is -27546.1 which means that a person with an occupation as a Clerical [job] when compared to Administration can expect to have a salary that is less by -27546.1 when compared to the base value expected for someone in administration which is 49808.2. (49808.2 - 27546.1 = 22262.1)
d) Interpret the coefficient of `patcoProfessional`:
The cofficent for `patcoProfessional` is 3076.2 which means that a person with an occupation as a Professional [job] when compared to Administration can expect to have a salary that is more by 3076.2 when compared to the base value expected for someone in administration which is 49808.2. (49808.2 + 3076.2 = 52884.4)
Regress sal on minority and grade
lm(sal ~ minority + grade, data = opm94) %>% summary()
##
## Call:
## lm(formula = sal ~ minority + grade, data = opm94)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12972 -4789 -534 3567 45132
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4643.83 760.73 -6.104 1.48e-09 ***
## minority -862.87 534.12 -1.615 0.107
## grade 4752.52 70.49 67.426 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7286 on 992 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.8306, Adjusted R-squared: 0.8302
## F-statistic: 2432 on 2 and 992 DF, p-value: < 2.2e-16
a) What is the reference group?
The reference group in this regression is minority (1) which is Asian.
b) Interpret the intercept:
The intercept, in this sample, the expected salary for the minority Asian is -4643.83.
c) Interpret the coefficient on `minority`:
The coefficent, in the sample, minority (0) on average make -862.87 less than minority (10) of the same grade.
d) Interpret the coefficient on `grade`:
The coefficent, in this sample, as grade increases by one grade, the expected salary of the same minority increases by 4752.52.
Regress sal on minority:
lm(sal ~ minority, data = opm94) %>% summary()
##
## Call:
## lm(formula = sal ~ minority, data = opm94)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28240 -13169 -2282 10818 78126
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 43294 639 67.75 < 2e-16 ***
## minority -9250 1227 -7.54 1.06e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17210 on 993 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.05415, Adjusted R-squared: 0.0532
## F-statistic: 56.85 on 1 and 993 DF, p-value: 1.058e-13
a) Interpret the intercept:
The intercept, in this sample, the expected salary for minority is 43294.
b) Interpret the coefficient on `minority`:
The coefficient, in this sample, minority on average makes -9250 less than in salary.
lm(sal ~ grade, data = opm94) %>% summary()
##
## Call:
## lm(formula = sal ~ grade, data = opm94)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12775 -4778 -505 3413 45197
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5132.8 698.5 -7.348 4.19e-13 ***
## grade 4779.0 68.6 69.662 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7292 on 993 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.8301, Adjusted R-squared: 0.83
## F-statistic: 4853 on 1 and 993 DF, p-value: < 2.2e-16
c) Why is the coefficient on `minority` different in this regression compared to the previous one (with `grade` included)?
In this regression, grade is not being held constant, which effects the regression when grade has a strong influence in determining salary as noted above in the adjusted r-squared of .83 which is a strong predicitor of salary.