Summary of Marks in Bus 462

Purpose

The purpose of this document is to:

examine the distribution of marks in Bus 462
confirm that no systematic differences exist between sections

Data preparation

I store all marks for the course in an Excel spreadsheet. This allows me to combine marks from different sources (Canvas and the Peer Evaluation system) and to conduct simple transformations to get the required distributions.

Load libraries and data

I use tidyverse functions in this analysis (filter, mutate, and ggplot), so I have to load the tidyverse library. Similarly, I want to use the read_excel function, so readxl has to be loaded.

library(tidyverse)
library(readxl)
mk <- read_excel("<marks file>.xlsx", sheet = "Marks", range = "B2:N100") #just the part of the page with marks data
view(mk)

Clean and transform

I read all the way down to row 100 of my spreadsheet to ensure I do not miss any students. But now I want to filter out all the empty rows. To do this, I filter out all rows in which the “Student No” column is blank. This can be done in R with the is.na() function.

I then transform the column “Section” into a factor. This allows me to use the formula notation later when running t-tests. This shows up when I run summary: the counts in each section are shown.

Finally, I use attach(mk) function to save myself some typing in the future. Attach tells R to look for column names in the mk (“marks”) data frame.

mk <- mk %>% filter(!is.na(`Student No`))
mk <- mk %>% mutate(Section = as_factor(Section))

summary (mk)

##    Student No            Name                  Section      Quizzes      
##  Min.   :301235277   Length:60          BUS462 E100:30   Min.   :0.4822  
##  1st Qu.:301274552   Class :character   BUS462 D100:30   1st Qu.:0.7119  
##  Median :301289749   Mode  :character                    Median :0.8373  
##  Mean   :301293852                                       Mean   :0.7988  
##  3rd Qu.:301308482                                       3rd Qu.:0.8960  
##  Max.   :301414781                                       Max.   :0.9732  
##  Power Assignments      Peer        Final Project    Adjusted Final Exam
##  Min.   :0.8187    Min.   :0.5735   Min.   :0.6250   Min.   :0.4098     
##  1st Qu.:0.9101    1st Qu.:0.7385   1st Qu.:0.8250   1st Qu.:0.6803     
##  Median :0.9371    Median :0.7924   Median :0.8250   Median :0.7377     
##  Mean   :0.9253    Mean   :0.7938   Mean   :0.8133   Mean   :0.7318     
##  3rd Qu.:0.9488    3rd Qu.:0.8558   3rd Qu.:0.8500   3rd Qu.:0.8033     
##  Max.   :0.9683    Max.   :0.9890   Max.   :0.9250   Max.   :0.9262     
##   Course Mark      Local Fudge Adjusted Course Mark
##  Min.   :0.6574   Min.   :0    Min.   :0.6574      
##  1st Qu.:0.7773   1st Qu.:0    1st Qu.:0.7773      
##  Median :0.8366   Median :0    Median :0.8366      
##  Mean   :0.8237   Mean   :0    Mean   :0.8237      
##  3rd Qu.:0.8718   3rd Qu.:0    3rd Qu.:0.8718      
##  Max.   :0.9336   Max.   :0    Max.   :0.9336

attach(mk)

Graphical analysis

Whole course plots

I start by plotting exam and course marks for the whole course (combined sections):

ggplot(data=mk) +
  geom_histogram(mapping=aes(x=`Peer`), bins=10, fill="blue") +
  ggtitle("Distribution of Net Participation")

ggplot(data=mk) +
  geom_histogram(mapping=aes(x=`Adjusted Final Exam`), bins=10, fill="blue") +
  ggtitle("Distribution of Adjusted Exam Mark")

ggplot(data=mk) +
  geom_histogram(mapping=aes(x=`Course Mark`), bins=10, fill="blue") +
  ggtitle("Distribution of Course Mark")

By section

A question that may arise is whether the two sections performed the same. I can redo these plots stacking them by Section:

ggplot(data=mk) +
  geom_histogram(mapping=aes(x=`Adjusted Final Exam`, y=..density..), bins=10, fill="blue") +
  ggtitle("Distribution of Final Exam Mark") +
  facet_wrap( ~ Section, nrow=2)

ggplot(data=mk) +
  geom_histogram(mapping=aes(x=`Course Mark`, y=..density..), bins=10, fill="blue") +
  ggtitle("Distribution of Course Mark") +
  facet_wrap( ~ Section, nrow=2)

Or as boxplots:

ggplot(data=mk) +
  geom_boxplot(mapping=aes(x=Section, y=`Adjusted Final Exam`))

ggplot(data=mk) +
  geom_boxplot(mapping=aes(x=Section, y=`Course Mark`))

Statistical analysis

Unfortunately, the graphical analysis does not really tell us if the two sections are different. For this, we need a t-test. Before running the test, however, I need to know whether the variances of the two samples (D100 and E100) are the same or different:

var.test(`Adjusted Final Exam` ~ Section, mk)

## 
##  F test to compare two variances
## 
## data:  Adjusted Final Exam by Section
## F = 0.59292, num df = 29, denom df = 29, p-value = 0.1653
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.2822097 1.2457254
## sample estimates:
## ratio of variances 
##          0.5929214

var.test(`Course Mark` ~ Section, mk)

## 
##  F test to compare two variances
## 
## data:  Course Mark by Section
## F = 1.0836, num df = 29, denom df = 29, p-value = 0.8303
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.5157477 2.2766050
## sample estimates:
## ratio of variances 
##           1.083584

It is clear for both the final exam and the final course mark that the variances are equal. Or, more precisely, the p-values are so large that we cannot reject the null hypothesis that they are equal.

This means I should use the equal variances version of the t-test:

t.test(`Adjusted Final Exam` ~ Section, mk, var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  Adjusted Final Exam by Section
## t = -0.0093522, df = 58, p-value = 0.9926
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.05875318  0.05820673
## sample estimates:
## mean in group BUS462 E100 mean in group BUS462 D100 
##                 0.7316940                 0.7319672

t.test(`Course Mark` ~ Section, mk, var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  Course Mark by Section
## t = 0.485, df = 58, p-value = 0.6295
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.02522289  0.04135392
## sample estimates:
## mean in group BUS462 E100 mean in group BUS462 D100 
##                 0.8277204                 0.8196549

Conclusions

As we see from the sample means for both the final exam and the overall course mark, the two sections this semester performed almost identically. There were some differences in variances (as shown by the boxplots), but these differences are not significant.