Load Libraries

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.2

DATA

setwd("C:/Users/ramin/Desktop/2020 winter/Data Analysis/Problem Set 8/Dataset")

load("OPM94.RData")

names(opm94)
##  [1] "x"        "sal"      "grade"    "patco"    "major"    "age"     
##  [7] "male"     "vet"      "handvet"  "hand"     "yos"      "edyrs"   
## [13] "promo"    "exit"     "supmgr"   "race"     "minority" "grade4"  
## [19] "promo01"  "supmgr01" "male01"   "exit01"
str(opm94)
## 'data.frame':    1000 obs. of  22 variables:
##  $ x       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sal     : int  26045 37651 64926 18588 19573 28648 27805 16560 40440 24285 ...
##  $ grade   : int  7 9 14 4 3 9 7 3 11 6 ...
##  $ patco   : Factor w/ 5 levels "Administrative",..: 1 4 4 2 2 4 5 2 1 2 ...
##  $ major   : Factor w/ 23 levels "     ","AGRIC",..: 16 11 10 1 1 11 1 1 1 6 ...
##  $ age     : int  52 34 37 26 51 44 50 37 59 57 ...
##  $ male    : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
##  $ vet     : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
##  $ handvet : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ hand    : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
##  $ yos     : int  6 4 3 6 14 1 7 5 13 6 ...
##  $ edyrs   : int  16 16 16 12 12 16 14 12 12 14 ...
##  $ promo   : Factor w/ 2 levels "no","yes": 2 1 1 1 NA 1 1 1 1 1 ...
##  $ exit    : Factor w/ 2 levels "no","yes": 1 1 1 1 2 1 1 1 1 1 ...
##  $ supmgr  : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ race    : Factor w/ 5 levels "American Indian",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ minority: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ grade4  : Factor w/ 4 levels "grades 1 to 4",..: 3 4 2 1 1 4 3 1 4 3 ...
##  $ promo01 : num  1 0 0 0 NA 0 0 0 0 0 ...
##  $ supmgr01: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ male01  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ exit01  : num  0 0 0 0 1 0 0 0 0 0 ...

MULTIPLE REGRESSION

Check the value of the patco variable using levels(opm94$patco) command:

levels(opm94$patco)
## [1] "Administrative" "Clerical"       "Other"          "Professional"  
## [5] "Technical"

Questions

  1. What type is this variable? What values does it have?

    Patco is nominal categorical, has no order and is in terms of categories or types, and shows the qualitative measurements.It has five categories which are Administrative, Clerical, Professional, Technical, and other.

  2. Regress sal on patco

    lm(sal ~ patco, data = opm94) %>% summary()

  1. What is the reference group?

    The group thats compared, patco Administrative.

  2. Interpret the intercept:

    The expected salary for occuption in Administration is 49808.2.

  3. Interpret the coefficient on patcoClerical:

    The cofficent for patcoClerical is -27546.1 this means that a person who has a Clerical job, when compared to Administration can expect to have the salary less by -27546.1

  4. Interpret the coefficient of patcoProfessional:

    The cofficent for patcoProfessional is 3076.2 Meaning a person with an occupation as a Professional job when compared to Administration can have an expected salary 3076.2 more comparing to base value expected for someone in administration (49808.2) (49808.2 + 3076.2 = 52884.4)

All Variable Types

  1. Regress sal on minority and grade:
lm(sal ~ minority + grade, data = opm94) %>% summary()
## 
## Call:
## lm(formula = sal ~ minority + grade, data = opm94)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -12972  -4789   -534   3567  45132 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4643.83     760.73  -6.104 1.48e-09 ***
## minority     -862.87     534.12  -1.615    0.107    
## grade        4752.52      70.49  67.426  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7286 on 992 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.8306, Adjusted R-squared:  0.8302 
## F-statistic:  2432 on 2 and 992 DF,  p-value: < 2.2e-16
  1. What is the reference group?

    The reference group is Asian.

  2. Interpret the intercept:

    The expected salary for the minority Asian at zero Asians is -4643.83.

  3. Interpret the coefficient on minority:

    Zero Minority on average make -862.87 less than minority ten of the same grade.

  4. Interpret the coefficient on grade:

    As grade increases by one grade, the expected salary of the same minority increases by 4752.52.

Regress sal on minority:

lm(sal ~ minority, data = opm94) %>% summary()
## 
## Call:
## lm(formula = sal ~ minority, data = opm94)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -28240 -13169  -2282  10818  78126 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    43294        639   67.75  < 2e-16 ***
## minority       -9250       1227   -7.54 1.06e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17210 on 993 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.05415,    Adjusted R-squared:  0.0532 
## F-statistic: 56.85 on 1 and 993 DF,  p-value: 1.058e-13

Question 4

  1. Interpret the intercept:

    The expected salary at zero for minority is 43294.

  2. Interpret the coefficient on minority:

    Minority on average makes -9250 less than in salary.

  3. Why is the coefficient on minority different in this regression compared to the previous one (with grade included)?

    Grade is not being held constant, this effects the regression when grade has a strong impact determining salary with r-squared of .83 that shows a strong correlation of salary.