Report the results of each of these three analyses and provide an interpretation of the results (including tests of the key assumptions). Your results should be written as if you were preparing them for inclusion in a research paper.
Independent Sample t-tests
Paired sample t-tests
One-way ANOVA
One sample t-tests: Comparing a sample to a population.
A.K.A single sample or one sample t-test.
We are testing if the sample came from a population.
Example: Do psychology majors have higher GPAs than all IIT students?
Independent sample (between groups): Comparing two different samples of individuals.
A.K.A independent groups, or between groups, t-test
Don’t know the population mean and SD.
Variances should be equal. If they aren’t, then use different math.
Example: Do women have higher GPAs than men?
Paired sample (within groups): Comparing the same sample of individuals on two occasions.
A.K.A within-groups, or repeated measures, t-test.
Example: Do individuals’ attitudes toward the President of the U.S. change over time?
ANOVA: An ANOVA test, or ‘analysis of variance’ test, is used to test differences between more than 2 means. It is a way to find out if an experiment’s results are significant or not.
Run the following independent sample t-tests:
# Code
# Create subset of data set that only includes data points with the job title Principal and Assistant Principal
indepentenddata1 <- subset(Proj3, Job.Title == "Principal" | Job.Title == "Assistant Principal")
# Look at annual salary by job title "Principal" and "Assistant principle"
ttest(Annual.Salary19 ~ Job.Title, data = indepentenddata1)
##
## Compare Annual.Salary19 across Job.Title levels Principal and Assistant Principal
## --------------------------------------------------------------
##
##
## ------ Description ------
##
## Annual.Salary19 for Job.Title Principal: n.miss = 0, n = 377, mean = 147641.180, sd = 8947.850
## Annual.Salary19 for Job.Title Assistant Principal: n.miss = 0, n = 413, mean = 116026.995, sd = 8310.191
##
## Sample Mean Difference of Annual.Salary19: 31614.185
##
## Within-group Standard Deviation: 8620.340
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of Annual.Salary19.
## Group Principal: Sample mean assumed normal because n>30, so no test needed.
## Group Assistant Principal: Sample mean assumed normal because n>30, so no test needed.
##
## Null hypothesis is equal variances of Annual.Salary19, i.e., homogeneous.
## Variance Ratio test: F = 80064019.308/69059267.408 = 1.159, df = 376;412, p-value = 0.142
## Levene's test, Brown-Forsythe: t = 2.522, df = 788, p-value = 0.012
##
##
## ------ Inference ------
##
## --- Assume equal population variances of Annual.Salary19 for each Job.Title
##
## t-cutoff: tcut = 1.963
## Standard Error of Mean Difference: SE = 614.034
##
## Hypothesis Test of 0 Mean Diff: t = 51.486, df = 788, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 1205.335
## 95% Confidence Interval for Mean Difference: 30408.850 to 32819.521
##
##
## --- Do not assume equal population variances of Annual.Salary19 for each Job.Title
##
## t-cutoff: tcut = 1.963
## Standard Error of Mean Difference: SE = 616.105
##
## Hypothesis Test of 0 Mean Diff: t = 51.313, df = 767.159, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 1209.451
## 95% Confidence Interval for Mean Difference: 30404.734 to 32823.637
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of Annual.Salary19 for each Job.Title
##
## Standardized Mean Difference of Annual.Salary19, Cohen's d: 3.667
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for Job.Title Principal: 3113.308
## Density bandwidth for Job.Title Assistant Principal: 1957.381
The mean salary for the job title ‘Assistant Principal’ is 116026.995. The standard deviation for the job title ‘Assistant Principal’ is 8310.191. The mean salary for the job title ‘Principal’ is 147641.180. The standard deviation for the job title ‘Principal’ is 8947.850. The degrees of freedom is 788. The t-value is 51.486. The p-value is 0.000.
The null hypothesis is that the two groups have the same mean. We want to disprove the null hypothesis using the results from the t-test. If the p is less than 0.05, then we reject the null hypothesis. If the p is greater than 0.05, then we don’t have enough infomration to be statistically different.
The p-value of 0.000 is less than 0.05, so we disprove the null hypothesis. This means that Assistant Principles are not paid the same as Principals on average. Or more accurately, Principals are paid more than Assistant Principles on average.
Null hypothesis, for each group, is a normal distribution of Annual.Salary19. Group Principal: Sample mean assumed normal because n>30, so no test needed. Group Assistant Principal: Sample mean assumed normal because n>30, so no test needed.
Null hypothesis is equal variances of Annual.Salary19, i.e., homogeneous. Variance Ratio test: F = 80064019.308/69059267.408 = 1.159, df = 376;412, p-value = 0.142 Levene’s test, Brown-Forsythe: t = 2.522, df = 788, p-value = 0.012
Assistant Principles did have significantly different Annual Salaries (M = 116,027.00, SD = 8310.19) than Principles (M = 147641.18, SD = 8947.85), t(788) = 51.49, ns (two-tailed).
Run the following paired sample t-tests:
# Code
paireddata1 <- subset(Proj3, Job.Title == "Regular Teacher")
ttest(Annual.Salary18, Annual.Salary19, data = paireddata1, paired = TRUE)
##
##
## ------ Description ------
##
## Difference: n.miss = 0, n = 10032, mean = 1240.013, sd = 1649.945
##
##
## ------ Normality Assumption ------
##
## Sample mean assumed normal because n>30, so no test needed.
##
##
## ------ Inference ------
##
## t-cutoff: tcut = 1.960
## Standard Error of Mean: SE = 16.473
##
## Hypothesized Value H0: mu = 0
## Hypothesis Test of Mean: t-value = 75.275, df = 10031, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 32.291
## 95% Confidence Interval for Mean: 1207.723 to 1272.304
##
##
## ------ Effect Size ------
##
## Distance of sample mean from hypothesized: 1240.013
## Standardized Distance, Cohen's d: 0.752
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for 297.540
## --------------------------------------------------
The n is 10,032. The mean is 1240.013. The standard deviation is 1,649.945 The degrees of freedom is 10,031. The t-value is 75.275. The p-value is 0.000.
The null hypothesis is that the two seperate times have the same mean. We want to disprove the null hypothesis using the results from the t-test. If the p is less than 0.05, then we reject the null hypothesis. If the p is greater than 0.05, then we don’t have enough information to be statistically different.
The mean difference from 2018 to 2019 is $1240.01, so Regular Teachers made 1240.01 more dollars in 2019 than 2018.
The p-value of 0.000 is less than 0.05, so we disprove the null hypothesis. This means that Regular Teachers made a statistically different annual salary in 2019 than 2018. More accurately, Regular Teachers made $1240.01 more in 2019 than they did in 2018.
Normality Assumption: The sample mean is assumed normal because n>30. This means that no test is needed.
Regular Teachers recieved significantly higher annual salaries in 2019 (M = 1240.01, SD = 1,649.95) than they did in 2018, t(10,031) = 75.275, p < 0.05 (two-tailed).
Run the following one-way ANOVA test:
# Code
anovadata1 <- subset(Proj3, Job.Title == "Principal" | Job.Title == "Assistant Principal" | Job.Title == "Regular Teacher" | Job.Title == "Special Education Teacher" | Job.Title == "Custodial Worker")
ANOVA(Annual.Salary19 ~ Job.Title, data = anovadata1)
##
## >>> Note: Converting Job.Title to a factor for this analysis only.
## BACKGROUND
##
## Response Variable: Annual.Salary19
##
## Factor Variable: Job.Title
## Levels: Assistant Principal Custodial Worker Principal Regular Teacher Special Education Teacher
##
## Number of cases (rows) of data: 14125
## Number of cases retained for analysis: 14125
##
##
## DESCRIPTIVE STATISTICS
##
## n mean sd min max
## Assistant Principal 413 116027.00 8310.19 62139.00 137409.00
## Custodial Worker 547 35362.15 4336.20 28323.00 50509.00
## Principal 377 147641.18 8947.85 128750.00 167417.00
## Regular Teacher 10032 80068.26 14752.45 11533.00 149329.00
## Special Education Teacher 2756 79522.61 14703.75 11663.00 117648.00
##
## Grand Mean: 81085.459
##
##
## BASIC ANALYSIS
##
## df Sum Sq Mean Sq F-value p-value
## Job.Title 4 3334900909599.48 833725227399.87 4134.15 0.0000
## Residuals 14120 2847547864600.48 201667695.79
##
##
## R Squared: 0.54
## R Sq Adjusted: 0.54
## Omega Squared: 0.54
##
## Cohen's f: 1.08
##
##
## TUKEY MULTIPLE COMPARISONS OF MEANS
##
## Family-wise Confidence Level:
## -----------------------------------------------------------------------------------
## diff lwr upr p adj
## Custodial Worker-Assistant Principal -80664.85 -83190.36 -78139.34 0.00
## Principal-Assistant Principal 31614.19 28854.56 34373.81 0.00
## Regular Teacher-Assistant Principal -35958.73 -37903.95 -34013.51 0.00
## Special Education Teacher-Assistant Principal -36504.38 -38548.61 -34460.16 0.00
## Principal-Custodial Worker 112279.03 109685.72 114872.34 0.00
## Regular Teacher-Custodial Worker 44706.12 43005.07 46407.17 0.00
## Special Education Teacher-Custodial Worker 44160.47 42347.02 45973.91 0.00
## Regular Teacher-Principal -67572.92 -69605.38 -65540.45 0.00
## Special Education Teacher-Principal -68118.57 -70245.99 -65991.15 0.00
## Special Education Teacher-Regular Teacher -545.65 -1378.85 287.55 0.38
##
##
## RESIDUALS
##
## Fitted Values, Residuals, Standardized Residuals
## [sorted by Standardized Residuals, ignoring + or - sign]
## [res_rows = 20, out of 14125 cases (rows) of data, or res_rows="all"]
## -----------------------------------------------------------------------------
## Job.Title Annual.Salary19 fitted residual z-resid
## 6671 Regular Teacher 149329 80068.26 69260.74 4.88
## 10699 Regular Teacher 11533 80068.26 -68535.26 -4.83
## 10702 Regular Teacher 11773 80068.26 -68295.26 -4.81
## 24674 Special Education Teacher 11663 79522.61 -67859.61 -4.78
## 8095 Regular Teacher 12386 80068.26 -67682.26 -4.77
## 9544 Regular Teacher 12386 80068.26 -67682.26 -4.77
## 8096 Regular Teacher 12753 80068.26 -67315.26 -4.74
## 8097 Regular Teacher 12883 80068.26 -67185.26 -4.73
## 9368 Regular Teacher 12883 80068.26 -67185.26 -4.73
## 9545 Regular Teacher 12883 80068.26 -67185.26 -4.73
## 9546 Regular Teacher 12883 80068.26 -67185.26 -4.73
## 9874 Regular Teacher 14316 80068.26 -65752.26 -4.63
## 24675 Special Education Teacher 14495 79522.61 -65027.61 -4.58
## 24676 Special Education Teacher 15361 79522.61 -64161.61 -4.52
## 24677 Special Education Teacher 16117 79522.61 -63405.61 -4.47
## 24673 Special Education Teacher 16166 79522.61 -63356.61 -4.46
## 11101 Regular Teacher 17652 80068.26 -62416.26 -4.40
## 9119 Regular Teacher 17733 80068.26 -62335.26 -4.39
## 10230 Regular Teacher 17750 80068.26 -62318.26 -4.39
## 9367 Regular Teacher 18061 80068.26 -62007.26 -4.37
##
##
## ----------------------------------------
## Plot 1: Scatterplot with Cell Means
## Plot 2: 95% family-wise confidence level
## ----------------------------------------
The descriptive statistics for the group are listed above. The degrees of freedom is 4. The sum of squaes is 3334900909599.48. The mean squares is 833725227399.87. The f-value is 4134.15. The p-value is 0.0000.
The null hypothesis is that the jobs Principals, Asst. Principals, Regular Teacher, Special Education Teacher, and Custodial Worker have the same mean. We want to disprove the null hypothesis using the results ANOVA. If the p is less than 0.05, then we reject the null hypothesis. If the p is greater than 0.05, then we don’t have enough information to be statistically different.
If p is less than 0.05, then we need to find out why. This is done by doing a Tukey post hoc comparison test for each of the groups. If any of the groups have a greater than 0.05, then we reject the null hypothesis for those groups.
A one-way analysis of variance was done between the jobs of Assistant Principal, Custodial Worker, Principal, Regular Teacher, and Special Education Teacher. The anaysis resulted in an F-value of 4134.15. This is the ratio of variance between groups and variance within groups.
The p value is 0.0000, so we reject the null hypothesis. This means that there is a significant difference between at least two of the groups. This is important because now we need to identify where the difference lies.
The Tukey multiple comparison of means tests indicates that every group has a p value of 0.000 except for Special Education Teacher vs. Regular Teacher.
Special Education Teacher vs. Regular Teacher has a p-value of 0.38. This is higher than 0.05, so we reject the null hypothesis. This means that there is a significant difference between Special Education Teachers and Regular Teachers. This difference is $-545.65. The upper limit is 287.55, and the lower limit is -1378.85.
“The distribution is unimodal” (From Chapter 13 slides)
“The distribution has a lower limit of 0 which means it’s positively skewed” (From Chapter 13 slides)
“Has two difference degrees of freedom” (From Chapter 13 slides)
A one-way analysis of variance was used to test the annual salary between five treatments. Significant differences were observed between means of the five groups (F(4,14120) = 4134.15, p<0.05). Tukey post hoc comparisons of the groups indicated that both the Special Education Teachers and the Regular Teachers had signficantly different annual salaries than the other groups.