Load Libraries
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.2
DATA
setwd("C:/Users/ramin/Desktop/2020 winter/Data Analysis/Problem Set 8/Dataset")
load("OPM94.RData")
names(opm94)
## [1] "x" "sal" "grade" "patco" "major" "age"
## [7] "male" "vet" "handvet" "hand" "yos" "edyrs"
## [13] "promo" "exit" "supmgr" "race" "minority" "grade4"
## [19] "promo01" "supmgr01" "male01" "exit01"
str(opm94)
## 'data.frame': 1000 obs. of 22 variables:
## $ x : int 1 2 3 4 5 6 7 8 9 10 ...
## $ sal : int 26045 37651 64926 18588 19573 28648 27805 16560 40440 24285 ...
## $ grade : int 7 9 14 4 3 9 7 3 11 6 ...
## $ patco : Factor w/ 5 levels "Administrative",..: 1 4 4 2 2 4 5 2 1 2 ...
## $ major : Factor w/ 23 levels " ","AGRIC",..: 16 11 10 1 1 11 1 1 1 6 ...
## $ age : int 52 34 37 26 51 44 50 37 59 57 ...
## $ male : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
## $ vet : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
## $ handvet : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ hand : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
## $ yos : int 6 4 3 6 14 1 7 5 13 6 ...
## $ edyrs : int 16 16 16 12 12 16 14 12 12 14 ...
## $ promo : Factor w/ 2 levels "no","yes": 2 1 1 1 NA 1 1 1 1 1 ...
## $ exit : Factor w/ 2 levels "no","yes": 1 1 1 1 2 1 1 1 1 1 ...
## $ supmgr : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ race : Factor w/ 5 levels "American Indian",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ minority: int 1 1 1 1 1 1 1 1 1 1 ...
## $ grade4 : Factor w/ 4 levels "grades 1 to 4",..: 3 4 2 1 1 4 3 1 4 3 ...
## $ promo01 : num 1 0 0 0 NA 0 0 0 0 0 ...
## $ supmgr01: num 0 0 0 0 0 0 0 0 0 0 ...
## $ male01 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ exit01 : num 0 0 0 0 1 0 0 0 0 0 ...
MULTIPLE REGRESSION
Check the value of the patco variable using levels(opm94$patco) command:
levels(opm94$patco)
## [1] "Administrative" "Clerical" "Other" "Professional"
## [5] "Technical"
Questions
What type is this variable? What values does it have?
Patco is nominal categorical, has no order and is in terms of categories or types, and shows the qualitative measurements.It has five categories which are Administrative, Clerical, Professional, Technical, and other.
Regress sal on patco
lm(sal ~ patco, data = opm94) %>% summary()
What is the reference group?
The group thats compared, patco Administrative.
Interpret the intercept:
The expected salary for occuption in Administration is 49808.2.
Interpret the coefficient on patcoClerical:
The cofficent for patcoClerical is -27546.1 this means that a person who has a Clerical job, when compared to Administration can expect to have the salary less by -27546.1
Interpret the coefficient of patcoProfessional:
The cofficent for patcoProfessional is 3076.2 Meaning a person with an occupation as a Professional job when compared to Administration can have an expected salary 3076.2 more comparing to base value expected for someone in administration (49808.2) (49808.2 + 3076.2 = 52884.4)
All Variable Types
lm(sal ~ minority + grade, data = opm94) %>% summary()
##
## Call:
## lm(formula = sal ~ minority + grade, data = opm94)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12972 -4789 -534 3567 45132
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4643.83 760.73 -6.104 1.48e-09 ***
## minority -862.87 534.12 -1.615 0.107
## grade 4752.52 70.49 67.426 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7286 on 992 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.8306, Adjusted R-squared: 0.8302
## F-statistic: 2432 on 2 and 992 DF, p-value: < 2.2e-16
What is the reference group?
The reference group is Asian.
Interpret the intercept:
The expected salary for the minority Asian at zero Asians is -4643.83.
Interpret the coefficient on minority:
Zero Minority on average make -862.87 less than minority ten of the same grade.
Interpret the coefficient on grade:
As grade increases by one grade, the expected salary of the same minority increases by 4752.52.
Regress sal on minority:
lm(sal ~ minority, data = opm94) %>% summary()
##
## Call:
## lm(formula = sal ~ minority, data = opm94)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28240 -13169 -2282 10818 78126
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 43294 639 67.75 < 2e-16 ***
## minority -9250 1227 -7.54 1.06e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17210 on 993 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.05415, Adjusted R-squared: 0.0532
## F-statistic: 56.85 on 1 and 993 DF, p-value: 1.058e-13
Question 4
Interpret the intercept:
The expected salary at zero for minority is 43294.
Interpret the coefficient on minority:
Minority on average makes -9250 less than in salary.
Why is the coefficient on minority different in this regression compared to the previous one (with grade included)?
Grade is not being held constant, this effects the regression when grade has a strong impact determining salary with r-squared of .83 that shows a strong correlation of salary.