December 8, 2015

Introduction

This is final exam task set one for the Workforce Education and Development (WFED540). Task Set 1: Gender Discrimination in University Professors' Salaries?

R Packages Required

require (dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require (magrittr)
## Loading required package: magrittr

require (ggvis)
## Loading required package: ggvis

Reading the Data File and Creating the Data Frame

The Data from http://www.personal.psu.edu/dlp/w540/sexdisc.csv

## Source: local data frame [52 x 6]
## 
##       sx    rk    yr    dg    yd    sl
##    (int) (int) (int) (int) (int) (int)
## 1      0     3    25     1    35 36350
## 2      0     3    13     1    22 35350
## 3      0     3    10     1    23 28200
## 4      1     3     7     1    27 26775
## 5      0     3    19     0    30 33696
## 6      0     3    16     1    21 28516
## 7      1     3     0     0    32 24900
## 8      0     3    16     1    18 31909
## 9      0     3    13     0    30 31850
## 10     0     3    13     0    31 32850
## ..   ...   ...   ...   ...   ...   ...
## Observations: 52
## Variables: 6
## $ sx (int) 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ rk (int) 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 2, 3, 3, 3, 2, 2, 3,...
## $ yr (int) 25, 13, 10, 7, 19, 16, 0, 16, 13, 13, 12, 15, 9, 9, 9, 7, 1...
## $ dg (int) 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0,...
## $ yd (int) 35, 22, 23, 27, 30, 21, 32, 18, 30, 31, 22, 19, 17, 27, 24,...
## $ sl (int) 36350, 35350, 28200, 26775, 33696, 28516, 24900, 31909, 318...

1. Using ggvis, construct

a.i. boxplots for sl by sx.

a.ii. boxplots for sl by dg.

b. scatterplots of points, with a smooth line among points

b.i. for sl by yd.

b.ii. for sl by yr.

c. a scatterplot of points, plotted with a linear model, and 95% confidence interval for the model, for sl by yd.

Plot

Model

m1 <- lm(sl~yd, data = Profess_Salary)
plot(Profess_Salary$yd,Profess_Salary$sl)
abline(m1)

summary(m1)
## 
## Call:
## lm(formula = sl ~ yd, data = Profess_Salary)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9703.5 -2319.5  -437.1  2631.8 11167.3 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 17502.26    1149.70  15.223  < 2e-16 ***
## yd            390.65      60.41   6.466  4.1e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4410 on 50 degrees of freedom
## Multiple R-squared:  0.4554, Adjusted R-squared:  0.4445 
## F-statistic: 41.82 on 1 and 50 DF,  p-value: 4.102e-08
confint(m1)
##                  2.5 %     97.5 %
## (Intercept) 15193.0161 19811.4987
## yd            269.3063   511.9839

d. a scatterplot of points of sl by yr grouped by rk.

2. Compute a simple linear regression with sl as the dependent variable and sx, yr, dg, yd, and a recoded rk variable as independent variables. The recode of the rk variable should result in a new categorical variable that allows a mean salary comparison of full professors with another group composed of both assistant professors and associate professors.

## Source: local data frame [52 x 7]
## 
##       sx    rk    yr    dg    yd    sl rk_dummy
##    (int) (int) (int) (int) (int) (int)    (dbl)
## 1      0     3    25     1    35 36350        1
## 2      0     3    13     1    22 35350        1
## 3      0     3    10     1    23 28200        1
## 4      1     3     7     1    27 26775        1
## 5      0     3    19     0    30 33696        1
## 6      0     3    16     1    21 28516        1
## 7      1     3     0     0    32 24900        1
## 8      0     3    16     1    18 31909        1
## 9      0     3    13     0    30 31850        1
## 10     0     3    13     0    31 32850        1
## ..   ...   ...   ...   ...   ...   ...      ...

a. Report a test of the null hypothesis that sl, the dependent variable, is not related to the entire set of independent variables (sx, yr, dg, yd, and the recoded rk variable).

## 
## Call:
## lm(formula = sl ~ sx + yr + dg + yd + rk_dummy, data = Profess_Salary)
## 
## Coefficients:
## (Intercept)           sx           yr           dg           yd  
##    17761.82      -547.47       356.25      -559.33        77.37  
##    rk_dummy  
##     6856.45

The p-value = 2.461e-14 < α = .05 so reject the null hypothesis (H0) which means that the Academic year salary (dependent variable), is negatively related to the sex and to the highest degree but it is positively related to the Number of years in current rank, Number of years since highest degree was earned, and the recoded of the Academic rank variable).

b. Report a test of the null hypothesis that sl is not related to sx.

c. Report and interpret a 95% confidence interval around the regression coefficient for sx

##                2.5 %    97.5 %
## (Intercept) 22812.81 26580.773
## sx          -6970.55   291.257

According to my results, the regression coefficient for sex With 95% CI is [-6970.55 , 291.257]. So, if we collect new data from the same population, we are 95% confidence that the number we will get will be between -6970.55 and 291.257

3. Compute and report a new regression equation with sl as the dependent variable and sx as the sole independent variable. Then, compute a t–test of the difference in mean sl by sx. Describe whether and how the results about the relationship between sl and sx from the regression analysis and from the t–test are similar.

Regression

rg2<-lm(sl ~ sx, data = Profess_Salary)
rg2
## 
## Call:
## lm(formula = sl ~ sx, data = Profess_Salary)
## 
## Coefficients:
## (Intercept)           sx  
##       24697        -3340

t-test

t.test(Profess_Salary$sl~Profess_Salary$sx)
## 
##  Welch Two Sample t-test
## 
## data:  Profess_Salary$sl by Profess_Salary$sx
## t = 1.7744, df = 21.591, p-value = 0.09009
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -567.8539 7247.1471
## sample estimates:
## mean in group 0 mean in group 1 
##        24696.79        21357.14

Results of Question 3

According to the t-test, the p-value = 0.09 is greater than α = .05, so fail to reject the null hypothesis (H0) which means that statically Academic year salary (sl) is not related to Sex (sx) which means there is no difference between the male and female in the academic year salary, in dollars in this study. So, the t-test and the regression analysis gave us similar conclusion here, t = 1.7744, df = 21.591, p-value = 0.09, 95% CI, [-567.8539, 7247.1471]