Analyze whether there is gender discrimination in University Professors’ salaries.
First, I load the packages that I will need: dplyr, ggvis, and magritter.
Next, I read the data file into R and then convert the data fram to a table frame by using the dplyr function, tbl_df.
## Observations: 52
## Variables: 6
## $ sx (int) 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ rk (int) 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 2, 3, 3, 3, 2, 2, 3,...
## $ yr (int) 25, 13, 10, 7, 19, 16, 0, 16, 13, 13, 12, 15, 9, 9, 9, 7, 1...
## $ dg (int) 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0,...
## $ yd (int) 35, 22, 23, 27, 30, 21, 32, 18, 30, 31, 22, 19, 17, 27, 24,...
## $ sl (int) 36350, 35350, 28200, 26775, 33696, 28516, 24900, 31909, 318...
## sx rk yr dg
## Min. :0.0000 Min. :1.000 Min. : 0.000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:1.000 1st Qu.: 3.000 1st Qu.:0.0000
## Median :0.0000 Median :2.000 Median : 7.000 Median :1.0000
## Mean :0.2692 Mean :2.038 Mean : 7.481 Mean :0.6538
## 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.:11.000 3rd Qu.:1.0000
## Max. :1.0000 Max. :3.000 Max. :25.000 Max. :1.0000
## yd sl
## Min. : 1.00 Min. :15000
## 1st Qu.: 6.75 1st Qu.:18247
## Median :15.50 Median :23719
## Mean :16.12 Mean :23798
## 3rd Qu.:23.25 3rd Qu.:27258
## Max. :35.00 Max. :38045
a. Boxplots
For sl (academic year salary) by sx (sex, coded 0 for male and 1 for female)
For sl (academic year salary) by dg (Highest degree, coded 0 for masters and 1 for doctoral degree)
b. Scatterplots of points, with a smooth line among points
Scatterplot: sl (academic year salary) by yd (Number of years since highest degree was earned)
Scatterplot: sl (academic year salary) by yr (Number of years in current rank)
c. Scatterplot of points, plotted with a linear model, and 95% confidence interval for the model, for sl (academic year salary) by yd (Number of years since highest degree was earned).
d. Scatterplot of points of sl (academic year salary) by yr (Number of years in current rank) grouped by rk (academic rank, coded 1 for assistant professor, 2 for associate professor, and 3 for full professor).
In this scatterplot, data points are noted as follows:
Assistant professors (1) - blue
Associate professors (2) - orange
Full professors (3) - green
##
## 1 2 3
## 18 14 20
##
## 0 1
## 32 20
##
## Call:
## lm(formula = sl ~ sx + yr + dg + yd + rkrecode, data = sexdiscrimination)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6066.3 -1719.5 -452.5 957.8 9826.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17761.82 1429.16 12.428 2.62e-16 ***
## sx -547.47 1018.44 -0.538 0.59347
## yr 356.25 109.64 3.249 0.00216 **
## dg -559.33 1204.37 -0.464 0.64454
## yd 77.37 76.84 1.007 0.31930
## rkrecode 6856.45 1186.70 5.778 6.23e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2880 on 46 degrees of freedom
## Multiple R-squared: 0.7863, Adjusted R-squared: 0.763
## F-statistic: 33.84 on 5 and 46 DF, p-value: 2.461e-14
## 2.5 % 97.5 %
## (Intercept) 14885.07648 20638.5722
## sx -2597.47771 1502.5290
## yr 135.56889 576.9402
## dg -2983.60125 1864.9356
## yd -77.31372 232.0466
## rkrecode 4467.75405 9245.1439
##
## Call:
## lm(formula = sexdiscrimination$sl ~ sexdiscrimination$sx)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8602.8 -4296.6 -100.8 3513.1 16687.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24697 938 26.330 <2e-16 ***
## sexdiscrimination$sx -3340 1808 -1.847 0.0706 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5782 on 50 degrees of freedom
## Multiple R-squared: 0.0639, Adjusted R-squared: 0.04518
## F-statistic: 3.413 on 1 and 50 DF, p-value: 0.0706
Is sl related to sx?
I read from the findings that:
## Multiple R-squared: 0.0639, Adjusted R-squared: 0.04518
## F-statistic: 3.413 on 1 and 50 DF, p-value: 0.0706
If α = .05, then the p-value, 0.07 (rounded), is more than α. Therefore, I fail to reject the null hypothesis that there is no relationship between sl (academic year salary) and sx (sex). In other words, there does not seem to be a relationship between academic year salary and sex in this small sample of University faculty.
For this t-test, I have the following hypotheses:
\(H_{0}: \mu_{Salary} - \mu_{Sex} = 0\)
Alternatively
\(H_{1}: \mu_{Salary} - \mu_{Sex} \neq 0\)
I set \(\alpha\) = 0.05. (Note: Male = 0 and Female = 1.)
##
## Two Sample t-test
##
## data: sl by sx
## t = 1.8474, df = 50, p-value = 0.0706
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -291.257 6970.550
## sample estimates:
## mean in group 0 mean in group 1
## 24696.79 21357.14
The results of the regression equation with sl as the dependent variable and sx as the sole independent variable is the same as the t-test of the difference in mean sl by sx. Both reveal of p-value of 0.07 (rounded). Since my α = .05, then the p-value, 0.07 (rounded), is more than α. Therefore both test fail to reject the null hypothesis that there is no difference in academic year salary between the men and women. In other words, there does not seem to be a relationship between academic year salary and sex in this small sample of University faculty.
Analyze whether there are associations among U.S. state-level indicators.
I first read the state-level indicators data file into R and then convert the data fram to a table frame by using the dplyr function, tbl_df.
## Observations: 50
## Variables: 7
## $ stateNames (fctr) Alabama, Alaska, Arizona, Arkansas, California, Co...
## $ Population (int) 3615, 365, 2212, 2110, 21198, 2541, 3100, 579, 8277...
## $ Income (int) 3624, 6315, 4530, 3378, 5114, 4884, 5348, 4809, 481...
## $ Illiteracy (dbl) 2.1, 1.5, 1.8, 1.9, 1.1, 0.7, 1.1, 0.9, 1.3, 2.0, 1...
## $ LifeExp (dbl) 69.05, 69.31, 70.55, 70.66, 71.71, 72.06, 72.48, 70...
## $ Murder (dbl) 15.1, 11.3, 7.8, 10.1, 10.3, 6.8, 3.1, 6.2, 10.7, 1...
## $ HSGrad (dbl) 41.3, 66.7, 58.1, 39.9, 62.6, 63.9, 56.0, 54.6, 52....
## stateNames Population Income Illiteracy
## Alabama : 1 Min. : 365 Min. :3098 Min. :0.500
## Alaska : 1 1st Qu.: 1080 1st Qu.:3993 1st Qu.:0.625
## Arizona : 1 Median : 2838 Median :4519 Median :0.950
## Arkansas : 1 Mean : 4246 Mean :4436 Mean :1.170
## California: 1 3rd Qu.: 4968 3rd Qu.:4814 3rd Qu.:1.575
## Colorado : 1 Max. :21198 Max. :6315 Max. :2.800
## (Other) :44
## LifeExp Murder HSGrad
## Min. :67.96 Min. : 1.400 Min. :37.80
## 1st Qu.:70.12 1st Qu.: 4.350 1st Qu.:48.05
## Median :70.67 Median : 6.850 Median :53.25
## Mean :70.88 Mean : 7.378 Mean :53.11
## 3rd Qu.:71.89 3rd Qu.:10.675 3rd Qu.:59.15
## Max. :73.60 Max. :15.100 Max. :67.30
##
Then I require the RStudio packages I’ll need, as well as the cormat functions.
Next, I conduct Estimate Pearson Product-Moment Correlations for the six variables included in the State-Level Indicators dataset we were provided. First I must select all but the stateNames variable.
## Source: local data frame [50 x 6]
##
## Population Income Illiteracy LifeExp Murder HSGrad
## (int) (int) (dbl) (dbl) (dbl) (dbl)
## 1 3615 3624 2.1 69.05 15.1 41.3
## 2 365 6315 1.5 69.31 11.3 66.7
## 3 2212 4530 1.8 70.55 7.8 58.1
## 4 2110 3378 1.9 70.66 10.1 39.9
## 5 21198 5114 1.1 71.71 10.3 62.6
## 6 2541 4884 0.7 72.06 6.8 63.9
## 7 3100 5348 1.1 72.48 3.1 56.0
## 8 579 4809 0.9 70.06 6.2 54.6
## 9 8277 4815 1.3 70.66 10.7 52.6
## 10 4931 4091 2.0 68.54 13.9 40.6
## .. ... ... ... ... ... ...
## $r
## LifeExp Income HSGrad Population Illiteracy Murder
## LifeExp 1
## Income 0.34 1
## HSGrad 0.58 0.62 1
## Population -0.068 0.21 -0.098 1
## Illiteracy -0.59 -0.44 -0.66 0.11 1
## Murder -0.78 -0.23 -0.49 0.34 0.7 1
##
## $p
## LifeExp Income HSGrad Population Illiteracy Murder
## LifeExp 0
## Income 0.016 0
## HSGrad 9.2e-06 1.6e-06 0
## Population 0.64 0.15 0.5 0
## Illiteracy 7e-06 0.0015 2.2e-07 0.46 0
## Murder 2.3e-11 0.11 0.00032 0.015 1.3e-08 0
##
## $sym
## LifeExp Income HSGrad Population Illiteracy Murder
## LifeExp 1
## Income . 1
## HSGrad . , 1
## Population 1
## Illiteracy . . , 1
## Murder , . . , 1
## attr(,"legend")
## [1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
a. Plots that demonstrate the relationship between
**i. HSGrad and Income
**b. A scatterplot of Murder by Illiteracy grouped by HSGrad
##
## 37.8 38.5 39.9 40.6 41 41.3 41.6 41.8 42.2 46.4 47.4 47.8 48.8 50.2 50.3
## 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1
## 51.6 52.3 52.5 52.6 52.7 52.8 52.9 53.2 53.3 54.5 54.6 54.7 55.2 56 57.1
## 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1
## 57.6 58.1 58.5 59 59.2 59.3 59.5 59.9 60 61.9 62.6 62.9 63.5 63.9 65.2
## 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 66.7 67.3
## 1 1
##
## 30 40 50 60
## 4 10 28 8