The purpose of this document is to:
I store all marks for the course in an Excel spreadsheet. This allows me to combine marks from different sources (Canvas and the Peer Evaluation system) and to conduct simple transformations to get the required distributions.
I use tidyverse functions in this analysis (filter, mutate, and ggplot), so I have to load the tidyverse library. Similarly, I want to use the read_excel function, so readxl has to be loaded.
library(tidyverse)
library(readxl)
mk <- read_excel("<marks file>.xlsx", sheet = "Marks", range = "B2:N100") #just the part of the page with marks data
view(mk)
I read all the way down to row 100 of my spreadsheet to ensure I do not miss any students. But now I want to filter out all the empty rows. To do this, I filter out all rows in which the “Student No” column is blank. This can be done in R with the is.na() function.
I then transform the column “Section” into a factor. This allows me to use the formula notation later when running t-tests. This shows up when I run summary: the counts in each section are shown.
Finally, I use attach(mk) function to save myself some typing in the future. Attach tells R to look for column names in the mk (“marks”) data frame.
mk <- mk %>% filter(!is.na(`Student No`))
mk <- mk %>% mutate(Section = as_factor(Section))
summary (mk)
## Student No Name Section Quizzes
## Min. :301235277 Length:60 BUS462 E100:30 Min. :0.4822
## 1st Qu.:301274552 Class :character BUS462 D100:30 1st Qu.:0.7119
## Median :301289749 Mode :character Median :0.8373
## Mean :301293852 Mean :0.7988
## 3rd Qu.:301308482 3rd Qu.:0.8960
## Max. :301414781 Max. :0.9732
## Power Assignments Peer Final Project Adjusted Final Exam
## Min. :0.8187 Min. :0.5735 Min. :0.6250 Min. :0.4098
## 1st Qu.:0.9101 1st Qu.:0.7385 1st Qu.:0.8250 1st Qu.:0.6803
## Median :0.9371 Median :0.7924 Median :0.8250 Median :0.7377
## Mean :0.9253 Mean :0.7938 Mean :0.8133 Mean :0.7318
## 3rd Qu.:0.9488 3rd Qu.:0.8558 3rd Qu.:0.8500 3rd Qu.:0.8033
## Max. :0.9683 Max. :0.9890 Max. :0.9250 Max. :0.9262
## Course Mark Local Fudge Adjusted Course Mark
## Min. :0.6574 Min. :0 Min. :0.6574
## 1st Qu.:0.7773 1st Qu.:0 1st Qu.:0.7773
## Median :0.8366 Median :0 Median :0.8366
## Mean :0.8237 Mean :0 Mean :0.8237
## 3rd Qu.:0.8718 3rd Qu.:0 3rd Qu.:0.8718
## Max. :0.9336 Max. :0 Max. :0.9336
attach(mk)
I start by plotting exam and course marks for the whole course (combined sections):
ggplot(data=mk) +
geom_histogram(mapping=aes(x=`Peer`), bins=10, fill="blue") +
ggtitle("Distribution of Net Participation")
ggplot(data=mk) +
geom_histogram(mapping=aes(x=`Adjusted Final Exam`), bins=10, fill="blue") +
ggtitle("Distribution of Adjusted Exam Mark")
ggplot(data=mk) +
geom_histogram(mapping=aes(x=`Course Mark`), bins=10, fill="blue") +
ggtitle("Distribution of Course Mark")
A question that may arise is whether the two sections performed the same. I can redo these plots stacking them by Section:
ggplot(data=mk) +
geom_histogram(mapping=aes(x=`Adjusted Final Exam`, y=..density..), bins=10, fill="blue") +
ggtitle("Distribution of Final Exam Mark") +
facet_wrap( ~ Section, nrow=2)
ggplot(data=mk) +
geom_histogram(mapping=aes(x=`Course Mark`, y=..density..), bins=10, fill="blue") +
ggtitle("Distribution of Course Mark") +
facet_wrap( ~ Section, nrow=2)
Or as boxplots:
ggplot(data=mk) +
geom_boxplot(mapping=aes(x=Section, y=`Adjusted Final Exam`))
ggplot(data=mk) +
geom_boxplot(mapping=aes(x=Section, y=`Course Mark`))
Unfortunately, the graphical analysis does not really tell us if the two sections are different. For this, we need a t-test. Before running the test, however, I need to know whether the variances of the two samples (D100 and E100) are the same or different:
var.test(`Adjusted Final Exam` ~ Section, mk)
##
## F test to compare two variances
##
## data: Adjusted Final Exam by Section
## F = 0.59292, num df = 29, denom df = 29, p-value = 0.1653
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.2822097 1.2457254
## sample estimates:
## ratio of variances
## 0.5929214
var.test(`Course Mark` ~ Section, mk)
##
## F test to compare two variances
##
## data: Course Mark by Section
## F = 1.0836, num df = 29, denom df = 29, p-value = 0.8303
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.5157477 2.2766050
## sample estimates:
## ratio of variances
## 1.083584
It is clear for both the final exam and the final course mark that the variances are equal. Or, more precisely, the p-values are so large that we cannot reject the null hypothesis that they are equal.
This means I should use the equal variances version of the t-test:
t.test(`Adjusted Final Exam` ~ Section, mk, var.equal=TRUE)
##
## Two Sample t-test
##
## data: Adjusted Final Exam by Section
## t = -0.0093522, df = 58, p-value = 0.9926
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.05875318 0.05820673
## sample estimates:
## mean in group BUS462 E100 mean in group BUS462 D100
## 0.7316940 0.7319672
t.test(`Course Mark` ~ Section, mk, var.equal=TRUE)
##
## Two Sample t-test
##
## data: Course Mark by Section
## t = 0.485, df = 58, p-value = 0.6295
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.02522289 0.04135392
## sample estimates:
## mean in group BUS462 E100 mean in group BUS462 D100
## 0.8277204 0.8196549
As we see from the sample means for both the final exam and the overall course mark, the two sections this semester performed almost identically. There were some differences in variances (as shown by the boxplots), but these differences are not significant.