I note the significant outlier for females at around $38,000.
This will be important for later analysis.
December 3, 2015
I note the significant outlier for females at around $38,000.
This will be important for later analysis.
This plot also includes a linear model with a 95% CI.
This plot is grouped by rk, the current academic rank of faculty.
However, by also including linear models for each group, the plot becomes easier to read and more useful…
This plot includes a linear model for each group.
The Null Hypothesis would state that there is no relationship between the variable sl (Salary) and the entire set of variables sx, yr, dg, yd, and rk (recoded).
In other words, using a simple linear regression analysis, the results of our regression coefficient for this entire set of variables would be equal to 0.
I have set my acceptable level of type 1 error to 0.05. (\(\alpha\) = 0.05)
My alternative hypothesis would state that there is some relationship between the variable sl (Salary) and the entire set of variables sx, yr, dg, yd, and rk (recoded).
I read from my calculations that:
Multiple R-squared: 0.7863, Adjusted R-squared: 0.763
F-statistic: 33.84 on 5 and 46 DF, p-value: 2.461e-14
This shows that my regression coefficient (33.84) is not equal to 0. If \(\alpha\) = 0.05, then the p-value, 2.461e-14, is less than \(\alpha\). Therefore, I reject the null hypothesis that there is no relationship between sl and the entire set of independent variables.
This also allows me to test each independent variable as it relates to sl.
The regression coefficient for the independent variable sx is:
Estimate Std., -547.47
Error, 1018.44
t-value, -0.538
Pr(>|t|), 0.59347
Again, if \(\alpha\) = 0.05, our p-value of 0.59347 is greater than \(\alpha\). In this instance I fail to reject the null hypothesis that is no relationship between salary (sl) and sex (sx).
The reported regression coefficient for Sex is -547.47, however, our 95% confidence interval also shows the failure to reject the null hypothesis for this variable. It shows
2.5% CI = -2597.48 (rounded)
97.5% CI = 1502.53 (rounded)
and since these values span 0, it shows the failure to reject the null hypothesis using the current data.
Using sx as the sole independent variable to the dependent variable sl shows the following:
Multiple R-squared: 0.0639, Adjusted R-squared: 0.04518
F-statistic: 3.413 on 1 and 50 DF, p-value: 0.0706
Again, if \(\alpha\) = 0.05, our p-value of 0.0706 is still greater than \(\alpha\). In this instance I again fail to reject the null hypothesis that is no relationship between salary (sl) and sex (sx).
The results of this test shows similar results to the regression analysis in terms of the 95% confidence interval values of -291.257 and 6970.550 (the regresssion analysis showed these same values except reversed in terms of positive and negative).
It also showed the same p-value found in our regression analysis, 0.0706, which is greater than \(\alpha\). Therefore, the t-test also shows the failure to reject the null hypothesis that that there is no difference in the mean salary for men and women.
However, I also wondered if the outlier I noted previously in the assignment was affecting the analysis.
I created a histogram of the relevant data for the entire population.
This showed the outlier, but it didn't seem signficant for the entire set of data.
However, a similar histogram for just women did show that it was significant.
A histogram did not show the same issue for men's salary.
After removing this outlier from the data set, I was given a more representative sample by which to do my analysis.
The cleaned data did allow me to reject my null hypothesis that there is no difference in the mean salary for men and women. The difference in means for men and women show:
Men = $24,696.79
Women = $20,073.46
p-value = 0.009018
And since our p-value is less than \(\alpha\) I reject my the null hypothesis and accept my alternative hypothesis that sex is related to a difference in average salary.
The results of a regression analysis with this data also shows that we can reject the original null hypothesis that there is not relationship between salary and sex.
Multiple R-squared: 0.1311, Adjusted R-squared: 0.1134
F-statistic: 7.396 on 1 and 49 DF, p-value: 0.009018
This shows that my regression coefficient is not equal to 0. If \(\alpha\) = 0.05, then the p-value, 2.461e-14, is less than \(\alpha\). Therefore, I reject the null hypothesis that there is no relationship between sl and sx.
Further, the regression analysis shows that estimate of the coefficient is -4,623.30. This means that there is a negative relationship between sl and sx, females earn less than men. Another way of saying this is that women earn $4,623.30 less per year than men.
The 95% CI shows:
2.5% CI = -8039.66 (rounded)
97.5% CI = -1206.995 (rounded
Our best estimate is that during a year, women earn $4,623.30 less than men. However, we are 95% confident that this difference is between $1,206.99 and $8,039.66.