Background

At the end of the semester, an “Introduction to Statistics” instructor wanted to gain insight into his students’ performance by analyzing gradebook data. The instructor taught 3 large lecture sections offered at different times during the day. Since each section, depending on the time it was taught, attracted different types of students (in terms of major, age, full-time/part-time, etc.) the instructor chose a random sample of 35 students from each section to insure proper representation.

Raw Data

The data observes the scores of individual statistics students, over the course of the academic term.

rm(list=ls())
load("gradebook.RData")
x<-data
head(x)
##   Midterm1 Midterm2 Diff_Mid Extra_Credit Final Class
## 1       83       77        6            1    81     4
## 2       72       77       -5            0    76     1
## 3       71       63        8            1    77     2
## 4       74       77       -3            0    70     1
## 5       94       85        9            1    93     2
## 6       80       81       -1            0    77     1

Q1. Do the data provide evidence that the students who did not do the extra credit assignment (group 1) performed significantly worse on the final than those who did (group 2)?

plot(factor(x$Extra_Credit), x$Final, main="Final Score Distribution", xlab="Turned in Extra Credit", 
     ylab="Final Score", names=c("No", "Yes"))

tapply(x$Final, factor(x$Extra_Credit), summary)
## $`0`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   63.00   70.00   74.50   73.86   76.00   83.00 
## 
## $`1`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   75.00   78.00   82.00   84.08   88.00   94.00

Visually, we see that the entire distribution of final scores for students who turned in extra credit, is higher than for those who didn’t. The mean for extra credit students is 84, which is a full letter grade higher than the non extra credit of 74. Additionally, the minimum extra credit final score (75) is greater than half of the non extra credit final scores (median = 74.5). We may now proceed to conduct a more formal analysis.

#t.test(x$Extra_Credit, x$Final, alternative = "less")
t.test(x$Final~x$Extra_Credit, alternative="less")
## 
##  Welch Two Sample t-test
## 
## data:  x$Final by x$Extra_Credit
## t = -8.7839, df = 64.585, p-value = 6.271e-13
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf -8.27294
## sample estimates:
## mean in group 0 mean in group 1 
##        73.86364        84.07692

We performed a 2 sample t test (which may account for unequal sample distribution variance):

  • Null Hypothesis: mean final score (no extra credit) == mean final score (extra credit)
  • Alternative Hypothesis: mean final score (no extra credit) < mean final score (extra credit)
  • Test Statistic: t = -8.784
  • P Value: p = 6.27e-13, <.0000001

Given our extremely low p value, we are able to reject the null hypothesis, and conclude that the performance on the final of students who did not do the extra credit assignment is significantly worse than the performance of the students who did do the extra credit assignment.

However, given the nature of this study as observational, we are unable to take this result and draw a causal relationship between extra credit completion and final score.

Q2. The material covered by the second midterm is harder than the material covered in the first. Is this reflected by the students’ grades?

It is given that the second midterm is harder. We will explore the students’ grades to see if the test scores reflect this. If true, we would expect the mean of the differences in M1-M2 to be positive.

hist(x$Diff_Mid, main="Difference in Midterm scores (M1-M2)", xlab="M1-M2", ylab="Difference")

summary(x$Diff_Mid)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -11.0000  -3.0000   0.0000   0.4095   4.0000  14.0000

We see that visually, the data is approximately symmetric and centered close to 0, so at a first glance, it doesn’t appear that there is a significant difference. We will perform a more robust analysis to confirm this.

t.test(x$Midterm1 - x$Midterm2, alternative="greater")
## 
##  One Sample t-test
## 
## data:  x$Midterm1 - x$Midterm2
## t = 0.83584, df = 104, p-value = 0.2026
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  -0.4036208        Inf
## sample estimates:
## mean of x 
## 0.4095238

We performed a paired t test:

  • Null Hypothesis: mean midterm1 - mean midterm2 = 0
  • Alternative Hypothesis: mean midterm1 - mean midterm2 > 0
  • Test Statistic: t = .8
  • P Value: p = .2

Given our high p value (>.05), we cannot reject the null hypothesis, and we fail to conclude that students performed worse on the second midterm than the first.

Q3. Do the data provide evidence for a significant “class effect” on final performance?

plot(factor(x$Class), x$Final, names=c("Freshman", "Sophomore", "Junior", "Senior"), 
     main="Final Score distribution by Class", xlab="Class", ylab="Final Score")

tapply(x$Final, factor(x$Class), summary)
## $`1`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   63.00   72.75   76.00   77.27   82.00   93.00 
## 
## $`2`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   63.00   75.25   77.00   78.77   86.00   94.00 
## 
## $`3`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   72.00   75.00   80.00   80.53   83.00   94.00 
## 
## $`4`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   63.00   69.00   75.00   73.43   77.00   81.00

We see that the means for Freshman, Sophomores, and Juniors (77, 78, 80) are relatively consistent, with about a 3 point spread. However, Seniors lag slightly with a mean of 73. We may also observe that Juniors and Seniors have tighter spreads when compared to the underclassmen.

To explore this question in more depth, we can apply ANOVA procedures.

anova(lm(x$Final~factor(x$Class)))
## Analysis of Variance Table
## 
## Response: x$Final
##                  Df Sum Sq Mean Sq F value  Pr(>F)  
## factor(x$Class)   3  429.9 143.300  2.8303 0.04219 *
## Residuals       101 5113.8  50.631                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Null Hypothesis: all class means are equal
  • Alternative Hypothesis: at least one class mean is different
  • Test Statistic: F = 2.8
  • P Value: p = .0423

Given p<.05, we may reject the null hypothesis, and conclude that there are significant differences in performance on the final across class groups. We can surmise that the most pronounced difference occurs between Seniors and Juniors, based off our visual inspection and descriptive statistics.