Load Library

library(dplyr)
library(ggplot2)
library(knitr)

Show variable names and values

names(opm94)
##  [1] "x"        "sal"      "grade"    "patco"    "major"    "age"     
##  [7] "male"     "vet"      "handvet"  "hand"     "yos"      "edyrs"   
## [13] "promo"    "exit"     "supmgr"   "race"     "minority" "grade4"  
## [19] "promo01"  "supmgr01" "male01"   "exit01"   "vet01"
str(opm94)
## 'data.frame':    1000 obs. of  23 variables:
##  $ x       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sal     : int  26045 37651 64926 18588 19573 28648 27805 16560 40440 24285 ...
##  $ grade   : int  7 9 14 4 3 9 7 3 11 6 ...
##  $ patco   : Factor w/ 5 levels "Administrative",..: 1 4 4 2 2 4 5 2 1 2 ...
##  $ major   : Factor w/ 23 levels "     ","AGRIC",..: 16 11 10 1 1 11 1 1 1 6 ...
##  $ age     : int  52 34 37 26 51 44 50 37 59 57 ...
##  $ male    : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
##  $ vet     : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
##  $ handvet : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ hand    : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
##  $ yos     : int  6 4 3 6 14 1 7 5 13 6 ...
##  $ edyrs   : int  16 16 16 12 12 16 14 12 12 14 ...
##  $ promo   : Factor w/ 2 levels "no","yes": 2 1 1 1 NA 1 1 1 1 1 ...
##  $ exit    : Factor w/ 2 levels "no","yes": 1 1 1 1 2 1 1 1 1 1 ...
##  $ supmgr  : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ race    : Factor w/ 5 levels "American Indian",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ minority: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ grade4  : Factor w/ 4 levels "grades 1 to 4",..: 3 4 2 1 1 4 3 1 4 3 ...
##  $ promo01 : num  1 0 0 0 NA 0 0 0 0 0 ...
##  $ supmgr01: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ male01  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ exit01  : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ vet01   : num  0 0 0 0 0 0 0 0 1 0 ...

OPM CODEBOOK

Multiple Regression

Check the value of patco variable

levels(opm94$patco)
## [1] "Administrative" "Clerical"       "Other"          "Professional"  
## [5] "Technical"

Question 1

1. What type is `patco`? What values does it have?

`patco` is a nominal categorical variable, which means that the variable has no order and is listed in terms of categories or types which describes the qualitative measures of a set. 

The `patco` variable contains 5 types of categories as "Administrative", "Clerical", "Professional", "Techinical", and "other". The values 

In an even broader classification than occupational family, all white-collar occupations fall into one of five occupational categories: Professional, Administrative, Technical, Clerical, or “Other” (referred to as PATCO). 

Regress sal on patco:

lm(sal ~ patco, data = opm94) %>% summary()
## 
## Call:
## lm(formula = sal ~ patco, data = opm94)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -33977  -6905  -1555   4899  66721 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        49808.2      698.3  71.333   <2e-16 ***
## patcoClerical     -27546.1     1265.5 -21.767   <2e-16 ***
## patcoOther        -23811.6     2345.3 -10.153   <2e-16 ***
## patcoProfessional   3076.2     1066.3   2.885    0.004 ** 
## patcoTechnical    -20616.7     1071.3 -19.245   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12670 on 990 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.4891, Adjusted R-squared:  0.487 
## F-statistic: 236.9 on 4 and 990 DF,  p-value: < 2.2e-16

Question 2

a) What is the reference group?

The reference group, the group that is compared, in this measure is patco Administrative.

b) Interpret the intercept:

The intercept, in this sample, the expected salary for the occuption in Administration is 49808.2.

c) Interpret the coefficient on `patcoClerical`:

The cofficent for `patcoClerical` is -27546.1 which means that a person with an occupation as a Clerical [job] when compared to Administration can expect to have a salary that is less by -27546.1 when compared to the base value expected for someone in administration which is 49808.2. (49808.2 - 27546.1 = 22262.1)

d) Interpret the coefficient of `patcoProfessional`:

The cofficent for `patcoProfessional` is 3076.2  which means that a person with an occupation as a Professional [job] when compared to Administration can expect to have a salary that is more by 3076.2 when compared to the base value expected for someone in administration which is 49808.2. (49808.2 + 3076.2 = 52884.4)

All Variable Types

Regress sal on minority and grade

lm(sal ~ minority + grade, data = opm94) %>% summary()
## 
## Call:
## lm(formula = sal ~ minority + grade, data = opm94)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -12972  -4789   -534   3567  45132 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4643.83     760.73  -6.104 1.48e-09 ***
## minority     -862.87     534.12  -1.615    0.107    
## grade        4752.52      70.49  67.426  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7286 on 992 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.8306, Adjusted R-squared:  0.8302 
## F-statistic:  2432 on 2 and 992 DF,  p-value: < 2.2e-16

Question 3

a) What is the reference group?

The reference group in this regression is minority (1) which is Asian. 

b) Interpret the intercept:

The intercept, in this sample, the expected salary for the minority Asian is -4643.83. 

c) Interpret the coefficient on `minority`:

The coefficent, in the sample, minority (0) on average make -862.87 less than minority (10) of the same grade.  

d) Interpret the coefficient on `grade`:

The coefficent, in this sample, as grade increases by one grade, the expected salary of the same minority increases by 4752.52. 

Regress sal on minority:

lm(sal ~ minority, data = opm94) %>% summary()
## 
## Call:
## lm(formula = sal ~ minority, data = opm94)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -28240 -13169  -2282  10818  78126 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    43294        639   67.75  < 2e-16 ***
## minority       -9250       1227   -7.54 1.06e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17210 on 993 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.05415,    Adjusted R-squared:  0.0532 
## F-statistic: 56.85 on 1 and 993 DF,  p-value: 1.058e-13

Question 4

a) Interpret the intercept:

The intercept, in this sample, the expected salary for minority is 43294. 

b) Interpret the coefficient on `minority`:

The coefficient, in this sample, minority on average makes -9250 less than in salary. 
lm(sal ~ grade, data = opm94) %>% summary()
## 
## Call:
## lm(formula = sal ~ grade, data = opm94)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -12775  -4778   -505   3413  45197 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -5132.8      698.5  -7.348 4.19e-13 ***
## grade         4779.0       68.6  69.662  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7292 on 993 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.8301, Adjusted R-squared:   0.83 
## F-statistic:  4853 on 1 and 993 DF,  p-value: < 2.2e-16
c) Why is the coefficient on `minority` different in this regression compared to the previous one (with `grade` included)?

In this regression, grade is not being held constant, which effects the regression when grade has a strong influence in determining salary as noted above in the adjusted r-squared of .83 which is a strong predicitor of salary.