This data analysis report evaluate **relationships between formative assessments (i.e. low-stake quizzes, reading assignments, and in-class polling questions) and a summative assessment (i.e. Midterm Exam) in a Introductory Microbiology Lab Course. In addition, scores in a Pre-Test summative assessment administered the first day of class were compared to those of the Midterm Exam. Similar types of data analysis have been carry out in courses that I have taught at Miami Dade College with the main goals of identifying areas that students may be struggling and/or learning practices that may need to be modified. In addition, comparing scores in earlier formative examinations with recent ones will provide an initial assessment into the students progress in the course.
The main questions that this data analysis report wants to answer are the following:
In addition, the other question of relevance at this point was:
This data analysis was performed with R version 3.2.5. To see the complete code use in this report, visit the github repository of this data project here.
To begin this data analysis, datasets for Reef Polling, Reading Assignment, Quizzes, and Midterm Exam scores will be loaded. No information that can identify students was included in the datasets. Specific column indexes were stated in the codes to exclude variables with no observations at this point of the semester.
# load datasets
## set working directory
setwd("C:/Users/Felix/Dropbox/Teaching/1_Miami-Dade College/Statistical assessments")
## load each Excel datasets with student columns names excluded
library (xlsx)
### dataset of Reef Polling's scores
rp <- read.xlsx ("MCB2010L-2015-3-903895-RF.xlsx", 1, colIndex = 2:5)
### dataset of Reading Assignments' scores
ra <- read.xlsx ("MCB2010L-2015-3-RAs.xlsx", 1, colIndex = 3:7)
### dataset of Quizzes' scores
qz <- read.xlsx ("MCB2010L-2015-3-quizzes.xlsx", 1, colIndex = 2:6)
### dataset of Midterm Exams' Scores
ex <- read.xlsx ("MCB2010L-2015-3-903895-exams.xlsx", 1, colIndex = 2)
## examine the dataset: variables' names and classes
str (rp); str (ra); str (qz); str (ex)
## 'data.frame': 9 obs. of 4 variables:
## $ RF1: num 33 66 NA NA 33 66 66 100 66
## $ RF2: num 33 NA NA 75 33 0 33 66 66
## $ RF3: num 0 50 50 NA 50 50 100 100 100
## $ RF4: num 0 80 NA NA NA 80 50 75 25
## 'data.frame': 10 obs. of 5 variables:
## $ RA2: num 65 95 NA 65 30 ...
## $ RA3: num 70 60 70 80 20 ...
## $ RA4: num 80 80 NA NA 40 ...
## $ RA5: num 60 NA NA NA NA 90 60 60 NA 67.5
## $ RA6: num 80 100 NA NA NA 70 70 90 70 80
## 'data.frame': 10 obs. of 5 variables:
## $ Quiz1 : num 70 NA 40 50 NA 70 90 60 100 80
## $ Quiz2 : num 90 90 70 NA NA 65 120 85 100 120
## $ Quiz3 : num 40 60 NA 30 NA 30 40 60 100 NA
## $ Quiz4.: num 60 80 NA NA NA 20 80 100 100 100
## $ Quiz5 : num 40 100 NA NA NA NA 90 70 70 NA
## 'data.frame': 6 obs. of 1 variable:
## $ Midterm: num 52 92 76 44 96 84
After an initial evaluation, the classes for each variable within all of the datastets are correctly coded as numeric.
Given that all variables were correctly coded, a column with mean scores for each assessment will be created into each dataset: this won’t be performed in the Midterm Exam dataset because it only contains one column. In addition, observations that correspond to students not currently enrolled and whom did not take the Midterm Exam were removed. After creating the mean column in each dataset, it will be extracted from each dataset and merged into a dataframe.
#--------------------------------------
# Cleaning Datasets and create one dataset with all assessments, and one tidy dataset
## Remove rows containing Ferreira, Fontes, Gonzalez, Lopez in
## Reef Polling, Reading Assignments, and Quizzes
rp1 <- rp[-c(3:5),]; ra1 <- ra[-c(3:5, 10),]; qz1 <- qz[-c(3:6),]
## calculate row means for Reef Polling, Reading Assignments, and Quizzes
## row means for Reef Pollilng, Reading Assignments and Quizzes
rp1$avgrp <- apply (rp1, 1, mean, na.rm=TRUE)
ra1$avgra <- apply (ra1, 1, mean, na.rm=TRUE)
qz1$avgqz <- apply (qz1, 1, mean, na.rm=TRUE)
## Combine means of all assessments
assessmavg <- cbind(rp1$avgrp, ra1$avgra, qz1$avgqz, ex)
head (assessmavg)
## rp1$avgrp ra1$avgra qz1$avgqz Midterm
## 1 16.50000 71.00 60.0 52
## 2 65.33333 83.75 82.5 92
## 3 49.00000 76.00 84.0 76
## 4 62.25000 64.00 75.0 44
## 5 85.25000 86.00 94.0 96
## 6 64.25000 70.00 100.0 84
## Rename variables
names(assessmavg) <- c("RP", "RAs", "QZs", "MT")
## tidy dataset with all assessments
library (tidyr)
assessmtidy <- gather (assessmavg, Assessments, Scores, RP:MT)
## re-order Assessments variable to plot them with
## RAs first, RP second, QZs third, and MI forth
assessmtidy$Assessments <- factor (assessmtidy$Assessments, order = TRUE,
levels = c("RAs", "RP", "QZs", "MT"))
An initial comparison of the descriptive statistics for each assessment will be performed.
# Exploratory Analysis
## descriptive statisticcs for all assessments
summary(assessmavg)
## RP RAs QZs MT
## Min. :16.50 Min. :64.00 Min. : 60.00 Min. :44
## 1st Qu.:52.31 1st Qu.:70.25 1st Qu.: 76.88 1st Qu.:58
## Median :63.25 Median :73.50 Median : 83.25 Median :80
## Mean :57.10 Mean :75.12 Mean : 82.58 Mean :74
## 3rd Qu.:65.06 3rd Qu.:81.81 3rd Qu.: 91.50 3rd Qu.:90
## Max. :85.25 Max. :86.00 Max. :100.00 Max. :96
From the descriptive examination above, all the descriptive statistics are lowest in the Reef Polling questions scores. Interestingly, the 1st Quartile of the Reef Polling (52.31) and Midterm Exam (58.00) are comparable and lowest among the four assessments. On the other hand, the 3rd Quartile of the Quizzes (91.50) and Midterm Exam (90.00) are similar and highest among the forth assessments. This suggest that the distribution of scores from all assessments will be overlapped by the distribution of scores in the Midterm exam, and the distribution of Reef Polling scores will be lower than those of Reading Assignment and Quizzes.
To confirm the differences in distributions mentioned above, each assessment will be compared in a boxplot, with scores plotted in the following order: RAs, RP, QZs, and MT.
As expect, figure 1 shows that the distribution of the scores in the Reef Polling questions was the lowest and that it was only overlapped by scores in the Midterm Exam. This means that 75% of the scores in the Reading Assignments and the Quizzes were higher than more than 75% of the scores in the Reef Polling questions.
Due to small sample size, a non-parametric test will be performed to determine if there are statistically significant differences between the median scores from all assessments.
##
## Kruskal-Wallis rank sum test
##
## data: Scores by Assessments
## Kruskal-Wallis chi-squared = 4.3488, df = 3, p-value = 0.2262
After performing a Krustal-Wallis rank sum test, it was determined that there was no statistically significance differences (p = 0.23) between the medians of all four assessments. Although no statistical significance was found, it is important to note that the distribution of scores was higher for Reading Assignment and Quizzes when compared to Reef Polling questions.
To answer the questions stated in the summary section, linear regression models will be built. From figure 1, the expectation is linear regression models in which Reading Assignments and/or Quizzes are included may yield a better fit than when including Reef Polling Questions.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| assessmavg$RAs | 2.135 | 0.6731 | 3.171 | 0.03382 |
| (Intercept) | -86.37 | 50.84 | -1.699 | 0.1646 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 6 | 12.78 | 0.7154 | 0.6443 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| assessmavg$RP | 0.5888 | 0.3602 | 1.635 | 0.1775 |
| (Intercept) | 40.38 | 21.92 | 1.842 | 0.1392 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 6 | 18.55 | 0.4005 | 0.2506 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| assessmavg$QZs | 1.147 | 0.493 | 2.327 | 0.08051 |
| (Intercept) | -20.75 | 41.21 | -0.5035 | 0.6411 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 6 | 15.62 | 0.5752 | 0.4689 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| assessmavg$RAs | 1.666 | 0.326 | 5.109 | 0.01451 |
| assessmavg$QZs | 0.7937 | 0.1954 | 4.061 | 0.02692 |
| (Intercept) | -116.7 | 24.21 | -4.82 | 0.01701 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 6 | 5.789 | 0.9562 | 0.927 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| assessmavg$RAs | 1.8 | 0.2786 | 6.463 | 0.02311 |
| assessmavg$QZs | 1.101 | 0.2503 | 4.398 | 0.048 |
| assessmavg$RP | -0.2569 | 0.1616 | -1.59 | 0.2528 |
| (Intercept) | -137.5 | 23.66 | -5.812 | 0.02835 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 6 | 4.712 | 0.9807 | 0.9516 |
## Question 2: Is there an increase in scores from Pre-test to Midterm Exam
### load dataset
pt <- read.xlsx ("MCB2010L-2015-3-pretest.xlsx", 1, colIndex= 2)
### remove observations from students not currently enrolled
pt1 <- pt[-c(3:6),]
### combine the scores of the Midterm Exam and Pre-Test into a dataframe
ptmt <- data.frame (assessmavg$MT, pt1)
### Rename the variables
names (ptmt) <- c("MT", "PT")
### summary statistics
summary (ptmt)
## MT PT
## Min. :44 Min. :61.50
## 1st Qu.:58 1st Qu.:71.12
## Median :80 Median :76.90
## Mean :74 Mean :78.18
## 3rd Qu.:90 3rd Qu.:88.45
## Max. :96 Max. :92.30
From the descriptive statistics above for the Pre-Test and the Midterm Exam, only the minimum and the 1st Quartile demonstrate a marked decreased in the Midterm Exam. This suggest that there was a more spread distribution of scores in the Midterm Exam compared to the Pre-Test. To further evaluate this observation, the distribution of Pre-Test and Midterm Exam scores will be plotted.
As seen in the descriptive statistics, the distribution of scores is wider in the Midterm Exam, but not higher (Figure 2). This finding suggests that some students reported lower scores in the Midterm Exam compare to the Pre-Test while other reported comparable ones: at least 75% of the scores in the Pre-Test were higher than 65, which was not the case in the Midterm Exam.
In this data analysis report, the relationship between the Midterm Exam, which was administered to students recently, and formative assessments (i.e. low-stakes polling questions, reading assignments, and quizzes) included in the learning plans in a Microbiology Lab course was evaluation. In addition, the differences between the Midterm Exam and an Pre-Test administered the first day of course were examined. In-class polling questions reported the lowest scores among all assessments, which suggest that students may experience difficulties in correctly answering these questions in-class. Nevertheless, the purpose of these questions is for the professor and the student to identify areas in which they may need more work and thus may make sense for students to report lower scores in this questions. On the other hand, it seems that Reading Assignments and Quizzes may be serving their purpose as knowledge rehearsal tools given that the linear regression model in which both of these variables were included reported a high predictive value. Lastly, Midterm Exam scores may suggest that progress of some students may need to be revisited given that