Use the file Data for Assignment 02.csv for this assignment. Make sure that for each question (or each part of a question) you provide one or two (at most) plots that allow you to visualize and explain your results effectively. You can and must use the appropriate inferential statistical analysis for this assignment. You need to explain your results well by interpreting the statistical significance of the results using the p-values. These are simulated data for three quiz scores for students in three sections of the same course.
How would you describe the performance of the students over the three quizzes?
mydf <- read.csv('/Users/jackcarlson/Downloads/Data for Assignment 02.csv')
avgperf <- subset(mydf, select = c(Quiz.1,Quiz.2,Quiz.3))
boxplot(avgperf,
xlab = "Quiz", ylab = "Score")
There is an increase from Quiz 1 to Quiz 2 and from Quiz 2 to Quiz 3.
Specifically, is there an average improvement of Quiz 2 over Quiz 1 and of Quiz 3 over Quiz 2?
H0: mudiff = 0
HA: mudiff != 0
H0: The average difference between Quiz 1 and Quiz 2 is 0
HA: The average difference between Quiz 1 and Quiz 2 is not 0
Q2subQ1 <- mydf$Quiz.2-mydf$Quiz.1
t.test(Q2subQ1, mu=0, alternative = "two.sided")
##
## One Sample t-test
##
## data: Q2subQ1
## t = 1.8061, df = 37, p-value = 0.07904
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.08016026 1.39594973
## sample estimates:
## mean of x
## 0.6578947
The average difference (mean of x) between Quiz 2 and Quiz 1 is 0.6579 (the average increase from Quiz 1 to 2). Since the P-value is 0.08 we fail to reject the null hypothesis, this difference is not statistically significant.
H0: mudiff = 0
HA: mudiff != 0
H0: The average difference between Quiz 3 and Quiz 2 is 0
HA: The average difference between Quiz 3 and Quiz 2 is not 0
Q3subQ2 <- mydf$Quiz.3-mydf$Quiz.2
t.test(Q3subQ2, mu=0, alternative = "two.sided")
##
## One Sample t-test
##
## data: Q3subQ2
## t = 4.7316, df = 37, p-value = 3.223e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.8877513 2.2175118
## sample estimates:
## mean of x
## 1.552632
The average difference (mean of x) between Quiz 2 and Quiz 1 is 1.553 (the average increase from Quiz 2 to 3). Since the P-value is much less than 0.05 we can reject the null hypothesis, this difference is statistically significant.
plot(mydf$Quiz.1~mydf$Section,
main= "Quiz 1",
xlab = "Section", ylab = "Score")
Section B performed best on Quiz 1.
plot(mydf$Quiz.2~mydf$Section,
main= "Quiz 2",
xlab = "Section", ylab = "Score")
Section C performed best on Quiz 2.
plot(mydf$Quiz.3~mydf$Section,
main= "Quiz 3",
xlab = "Section", ylab = "Score")
Section B performed best on Quiz 3. ii. Does the sex of the student make a difference when it comes to average performance in each of the quizzes?
plot(mydf$Quiz.1~mydf$Sex,
main= "Quiz 1",
xlab = "Sex", ylab = "Score")
The mean for both sexes was the same, with a higher vairation for females on Quiz 1.
plot(mydf$Quiz.2~mydf$Sex,
main= "Quiz 2",
xlab = "Sex", ylab = "Score")
Same as Quiz 1, the mean for both sexes was the same, with a higher vairation for females on Quiz 2.
plot(mydf$Quiz.3~mydf$Sex,
main= "Quiz 3",
xlab = "Sex", ylab = "Score")
Females performed better overall for Quiz 3.
Does the section and sex make a difference when it comes to average performance over Quizzes 1, 2 and 3?
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
mydf1 <- tbl_df(mydf)
partc <- gather(mydf1, QuizName, Scores, 3:5)
interaction.plot(partc$Sex, partc$Section, partc$Scores,xlab = 'Sex', ylab = 'Avg Quiz Score', fun = mean, type = "b", col=c("blue","orange","black"),pch=c(1,2,3), fixed=TRUE, leg.bty = "o")
Sex/section make a difference in terms of average performance on quizzes. Section B performed the highest overall, and Females scored higher in sections A and B.