Note- We generally proceed through a similar analysis of test results with the students as a method for demonstrating many of the techniqueds of statistical analysis and of visualising data that are relevant to their course materials. As such, several parts of this report have didactic value as well
The AS Psychology Midterm examination consisted of 15 questions from 5 categories with a total value of 50 points covering Research Methods and the three core studies of “The Biological Approach”. The questions and their point values can be seen in the table below:
##KNITTED TABLE OF QUESTIONS AND POINT VALUES
QuestionsTable <-
MidtermGradesA %>%
head(15) %>%
subset(select = c(QNum, Question, Value)) %>%
setNames(c("Number", "Question", "Value")) %>%
knitr::kable(caption = "AS Psychology- Questions and Values", row.names = F) %>%
kable_styling(full_width = F) %>%
pack_rows("The Biological Approach (5 Marks Total)", 1, 2, label_row_css = "background-color: #8b0000; color: #fff;") %>%
pack_rows("Canli et al. (10 Marks Total)", 3, 7, label_row_css = "background-color: #8b0000; color: #fff;") %>%
pack_rows("Dement & Kleitman (16 Marks Total)", 8, 10, label_row_css = "background-color: #8b0000; color: #fff;") %>%
pack_rows("Schachter & Singer (8 Marks Total)", 11, 12, label_row_css = "background-color: #8b0000; color: #fff;") %>%
pack_rows("Research Methods (11 Marks Total)", 13, 15, label_row_css = "background-color: #8b0000; color: #fff;")
save_kable(QuestionsTable, "QuestionsTable.png")
QuestionsTable
| Number | Question | Value |
|---|---|---|
| The Biological Approach (5 Marks Total) | ||
| 1A | Identify two differences between a neurotransmitter and a hormone | 2 |
| 1B | Explain one assumption of the biological approach, including an example that illustrates it | 3 |
| Canli et al. (10 Marks Total) | ||
| 2AI |
|
2 |
| 2AII |
|
1 |
| 2B | Explain one weakness of using brain scans in this study | 2 |
| 2C | Outline the scale that was used by participants to indicate their emotional arousal to each picture | 2 |
| 2D | Describe what happened to a participant during the ‘recognition test’ in this study | 3 |
| Dement & Kleitman (16 Marks Total) | ||
| 3A | Explain why Study 3 from Dement & Kleitman was not an experiment | 2 |
| 3B | Explain why Dement + Kleitman suggested that dreams do not occur during nREM sleep | 4 |
| 3C | Evaluate this study in terms of two strengths and two weaknesses. At least one of your evaluation points must be about extraneous variables | 10 |
| Schachter & Singer (8 Marks Total) | ||
| 4A | Explain what is meant by the ‘physiological component’ of an emotional response, using an example | 2 |
| 4B | Alan and June are discussing the ethics of the study by Schachter & Singer. Alan thinks the study is ethical, but June thinks it is unethical. Explain one reason why Alan is correct and one reason by June is correct, using evidence from the study | 6 |
| Research Methods (11 Marks Total) | ||
| 5A | Outline what is meant by the term ‘generalisation’ in psychology | 2 |
| 5B | From each of the studies in the biological approach, identify one issue related to generalization from the study | 3 |
| 5C | Discuss the benefits and problems of making generalisations based on psychological research | 6 |
Generally, the first thing we want to look at in one of these analyses is the overall distribution of grades, which we can see in the histogram below (Recall that you can tell this is a histogram rather than a bar chart because the Dependent variable (grades) is on the X axis).
#Single Histogram
#Histograms
ggplot(data=MidtermTotals, aes(x=Grade)) +
geom_histogram(aes(y=..density..), alpha = 0.6, position = "identity") +
labs(x="Grade (Percentage)", y="Density") +
stat_function(fun=dnorm, args = list(mean=mean(MidtermTotals$`Adjusted Grade`), sd=sd(MidtermTotals$`Adjusted Grade`)),
color="black", size = 1.4) +
scale_fill_manual(values= c("darkred", "darkgrey")) +
scale_x_continuous(limits = c(0,100)) +
theme_alan() +
ggtitle("Histogram of Grades in AS Psychology Midterm")
ggsave(here("Midterm 1 Histogram.png"), plot = last_plot(), device = NULL, path = NULL,
width = 8, height = 4, units = c("in", "cm", "mm"),
dpi = 600)
The first thing we can note is a very normal distribution (Shapiro-Wilk Normality test: W= 0.99, p= 0.998). Ultimately in most methods of assessment a normal distribution is desirable.
Overall performance was around what we expected with an average score of 60.69% and a standard deviation of 17.9 (Range = 18% to 98%).
Obviously a grade of 18% was far too low, while we were surprised to see a student with a grade as high as 98%, who had clearly put in an incredible amount of work studying for the exam.
So next we decided to take a look at the test on a question-by-question basis, which you can see in the bar chart below (Recall that you can tell this is a bar chart and not a histogram because the dependent variable (grades) is on the y axis, and because there are spaces between the levels of the factor (question number) on the X axis - which means they represent independent data points).
MidtermSummary <- summarySE(MidtermGrades, measurevar = "Grade", groupvars = c("Teacher", "QNum"))
ggplot(data=MidtermSummary, aes(x=QNum, y= Grade, fill = Teacher)) +
geom_bar(stat= "summary", position = pd, width = 0.8) +
geom_errorbar(aes(ymin= Grade - se, ymax= Grade + se), width= 0.2, size = 1, position= pd)+
#annotate("text", x=11.5, y=7.2, label = "Average Grades", hjust = 0, fontface = "bold") +
#annotate("text", x=11.5, y=6.7, label = "Alan's Class = 52.3%", hjust = 0, color = "darkred") +
#annotate("text", x=11.5, y=6.2, label = "June's Class = 67.2%", hjust = 0, color = "darkgrey") +
labs(x="Question", y="Average Grade (Value)") +
theme_alan() +
#scale_y_continuous(limits = c(0,3), breaks=c(0:3)) +
scale_color_manual(values= c("darkred", "darkgrey")) +
scale_fill_manual(values= c("darkred", "darkgrey")) +
ggtitle("Bar Plot of Grades on AS Psychology Midterm")
ggsave(here("MidtermGrades.png"), plot = last_plot(), device = NULL, path = NULL,
width = 8, height = 4, units = c("in", "cm", "mm"),
dpi = 600)
This is one way to visualize the results, but perhaps without reference it isn’t very valuable. What it does reveal, however, is the difference in performance between June’s and Alan’s classes. We will delve into the question-by-question performance later in this report, but for the time being lets explore the differences between classes.
Most valuably we can look at overall performance differences between the two classes, then perhaps return to any performance differences on individual questions that seem notable. You can see the distribution of performance between the two classes in the histogram below:
#Histograms by Class
JuneMTGrades <- subset(MidtermTotals, Teacher == "June")
AlanMTGrades <- subset(MidtermTotals, Teacher == "Alan")
ggplot(data=MidtermTotals, aes(x=Grade, fill = Teacher)) +
geom_histogram(aes(y=..density..), alpha = 0.6, position = "identity") +
labs(x="Grade (Percentage)", y="Density") +
# geom_text(x= 180, y = .075, label = str_wrap(testreport, 20)) +
stat_function(fun=dnorm, args = list(mean=mean(JuneMTGrades$Grade), sd=sd(JuneMTGrades$Grade)), color="darkgrey", size = 1.2) +
stat_function(fun=dnorm, args = list(mean=mean(AlanMTGrades$Grade), sd=sd(AlanMTGrades$Grade)), color="darkred", size = 1.2) +
scale_fill_manual(values= c("darkred", "darkgrey")) +
scale_x_continuous(limits = c(0,100)) +
theme_alan() +
ggtitle("Histogram of Grades in AS Psychology Midterm - Comparison of Classes")
ggsave(here("Midterm 1 Class Histograms.png"), plot = last_plot(), device = NULL, path = NULL,
width = 8, height = 4, units = c("in", "cm", "mm"),
dpi = 600)
testcomp <- t.test(JuneMTGrades$Grade, AlanMTGrades$Grade, warning = FALSE, message = FALSE)
We can observe two main things from the above histogram. First, June’s class had a much higher range of performance, from a low of 18% to a high of 98%, while Alan’s class had a more modest range of 34% to 76%. Alan’s students were in fact pretty tightly clustered around their mean. Although with very small sets of data its hard to draw conclusions about normality, the distribution of grades in Alan’s class (Shapiro-Wilk Normality test: W= 0.98, p= 0.924) appears to be more normal than in June’s class (Shapiro-Wilk Normality test: W= 0.89, p= 0.097), which can also be seen visually in the density plots.
The second is that the distribution of grades in June’s class is shifted to the right. In more common parlance, June’s class performed significantly better on the test (M= 69.1, sd= 21.2) than did Alan’s class (M= 53.9, sd= 11.3; t(17.4) = 2.33, p= 0.03).
We could explore a number of reasons why such a difference might be observed in our results, but the first we should attempt to exclude is the possibility that participant variables alone account for this difference - i.e. that June’s class simply has stronger students than does Alan’s Class.
We can do this first by attempting to correlate performance on this midterm with last year’s (S1) grades for the students in each class. (Note that we cannot make any causal claims here, because we are simply correlating two covariables - we haven’t actually manipulated an independent variable as we would in an experiment). NB- These “S1 Grades” are based on mass data of grade boundaries and lack much distinction - e.g. all grades of “C” were scored as a 65, all grades of “B” as a 75, and so on. Exploration of actual grades might reveal more interesting trends!) (Students should consider especially what this means whenever we use contructs like rating scales in studies, and especially when we average sdcores across multiple scales - how much data are we losing?)
PrevYearGrades <-
MidtermTotals %>%
mutate(PrevYear = as.numeric(plyr::mapvalues(Student,
from = Grades2019$Name,
to= Grades2019$Average))) %>%
subset(Student != "Echo Lin")
JuneMT <- subset(PrevYearGrades, Teacher == "June")
AlanMT <- subset(PrevYearGrades, Teacher == "Alan")
#Correlations
JuneCor <- round(cor.test(JuneMT$Grade, JuneMT$PrevYear)$estimate,2)
JuneCorLabel <- paste("June's' Class: r == ", JuneCor, sep = "")
AlanCor <- round(cor.test(AlanMT$Grade, AlanMT$PrevYear)$estimate,2)
AlanCorLabel <- paste("Alan's Class: r == ", AlanCor, sep = "")
Cor <- round(cor.test(PrevYearGrades$Grade, PrevYearGrades$PrevYear)$estimate,2)
CorLabel <- paste("Overall r == ", Cor, sep = "")
ggplot(data=PrevYearGrades, aes(x=Grade, y = PrevYear)) +
geom_point(aes(color = Teacher), size = 2) +
geom_smooth(method = lm, aes(color = Teacher), se = FALSE, size = 1.2) +
geom_smooth(method = lm, color = "Black", se = FALSE) +
scale_color_manual(values= c("darkred", "darkgrey")) +
annotate("text", x=70, y=65, label = "Correlations", hjust = 0, fontface = "bold") +
annotate("text", x=70, y=61, label = JuneCorLabel, hjust = 0, color = "darkgrey") +
annotate("text", x=70, y=57, label = AlanCorLabel, hjust = 0, color = "darkred") +
annotate("text", x=70, y=53, label = CorLabel, hjust = 0, color = "black") +
scale_x_continuous(limits = c(25,100)) +
scale_y_continuous(limits = c(50,100)) +
labs(x="Midterm Grade", y="2019 Average") +
theme_alan() +
ggtitle("Scatterplot of Test Scores (Midterm vs 2019 Average) in AS Psychology")
ggsave(here("ASCorrelationMidterm2.png"), plot = last_plot(), device = NULL, path = NULL,
width = 7, height = 6, units = c("in", "cm", "mm"),
dpi = 600)
We can see a pretty strong overall correlation here of r= 0.7, which can be seen in the black line above (Recall that the strength of a correlation refers to the average distance of the individual points of the distribution away from this line - the black line cuts neatly through our data). This is a strong positive correlation - students who performed better in their 2019 classes also performed better on the AS Psychology Midterm.
But the overall correlation does not tell the whole story, because we can see that the individual correlations are stronger within June’s class (r= 0.76) and Alan’s class (r= 0.79) than when considered together, which lends further support to the idea that there is a true difference in performance between the two classes on the midterm, not only traceable to participant variables. We can see this also by noting that while June’s class performed somewhat better in 2019 (M= 84%) than Alan’s Class (M= 82%)
Perhaps we should instead see how the results correlate with performance on the first In-Class quiz for AS Psychology? You can see the results of that correlation below.
MidtermTotals %<>%
mutate(SeptGrade = as.numeric(plyr::mapvalues(Student,
from = SeptGrades$Student,
to= SeptGrades$Revised.Grade)))
JuneMT <- subset(MidtermTotals, Teacher == "June")
AlanMT <- subset(MidtermTotals, Teacher == "Alan")
#Correlations
JuneCor <- round(cor.test(JuneMT$Grade, JuneMT$SeptGrade)$estimate,2)
JuneCorLabel <- paste("June's' Class: r == ", JuneCor, sep = "")
AlanCor <- round(cor.test(AlanMT$Grade, AlanMT$SeptGrade)$estimate,2)
AlanCorLabel <- paste("Alan's Class: r == ", AlanCor, sep = "")
Cor <- round(cor.test(MidtermTotals$Grade, MidtermTotals$SeptGrade)$estimate,2)
CorLabel <- paste("Overall r == ", Cor, sep = "")
ggplot(data=MidtermTotals, aes(x=Grade, y = SeptGrade)) +
geom_point(aes(color = Teacher), size = 2) +
geom_smooth(method = lm, aes(color = Teacher), se = FALSE, size = 1.2) +
geom_smooth(method = lm, color = "Black", se = FALSE) +
scale_color_manual(values= c("darkred", "darkgrey")) +
annotate("text", x=70, y=45, label = "Correlations", hjust = 0, fontface = "bold") +
annotate("text", x=70, y=40, label = JuneCorLabel, hjust = 0, color = "darkgrey") +
annotate("text", x=70, y=35, label = AlanCorLabel, hjust = 0, color = "darkred") +
annotate("text", x=70, y=30, label = CorLabel, hjust = 0, color = "black") +
scale_x_continuous(limits = c(0,100)) +
scale_y_continuous(limits = c(0,100)) +
labs(x="Midterm Grade", y="September Quiz Grade") +
theme_alan() +
ggtitle("Scatterplot of Test Scores (Midterm vs September Quiz) in AS Psychology")
ggsave(here("ASCorrelationMidterm.png"), plot = last_plot(), device = NULL, path = NULL,
width = 7, height = 6, units = c("in", "cm", "mm"),
dpi = 600)
This correlation actually ends up telling us less - June’s class scored 15.2% higher on the Midterm, but scored a similar amount higher (12.8%) on the first in-class quiz as well.
Students were asked to consider what we have learned about extraneous variables, especially confounding variables to come up with explanations for why we might observe this difference. They came up with many options, some of which are shown below:
Although its likely that all of these variables affected results (along with many more variables we couldn’t think of (remember we can always try to control extraneous variables, but the “random effects” will almost always be larger than the “treatment effects”)), the final one seems likely. Students generally report lower levels of satisfaction with online lessons, and even when the content is equivalent and they are given many resources (e.g. all of Alan’s lectures were recorded and shared with students in both video and MP3 format), students may fail to engage with the materials or with their instructor to the same degree.
To that end, it was decided that if students in Alan’s class can demonstrate levels of performance on subsequent assessments that match the performance of June’s students, they will have theirr midterm grades retroactively increased. That is, if on Quiz 2 and some additional practice quizzes Alan’s students (who are now learning in person under the smae condition as June’s) can increase their performance, we will happily assume that their lower Midterm 1 scores were primarily a reflection of the problem with online learning. As it would be unfair to punish students for being taught in a different fashion, we would happily “correct” their grades.
The performance of students in each class in 2019, on Quiz 1, and on the Midterm can be seen in the table below.
ClassGrades2019 <-
PrevYearGrades %>%
group_by(Teacher) %>%
dplyr::summarise(`Average Grade` = mean(PrevYear))
ClassGradesQuiz1 <-
MidtermTotals %>%
group_by(Teacher) %>%
dplyr::summarise(`Average Grade` = mean(SeptGrade))
ClassGradesOverall <-
cbind.data.frame(ClassGrades2019, ClassGradesQuiz1$`Average Grade`, ClassGrades$`Average Grade`)
ClassGradesTable <-
ClassGradesOverall %>%
mutate_if(is.numeric, round, 1) %>%
setNames(c("Teacher", "2019 Average", "2020 Quiz 1", "2020 Midterm")) %>%
knitr::kable(caption = "AS Psychology- Grades by Class", "html", row.names = F) %>%
kable_styling(full_width = F)
save_kable(ClassGradesTable, "ClassGradesTable.png")
ClassGradesTable
| Teacher | 2019 Average | 2020 Quiz 1 | 2020 Midterm |
|---|---|---|---|
| Alan | 82 | 64.7 | 53.9 |
| June | 84 | 77.5 | 69.1 |
Looking at differences between classes can provide us with one metric of performance, but ultimately as teachers we also want to be able to identify problem areas for our students, both in terms of content and in terms of types of questions that students are struggling with. The by-question performance was shown above previously, but lets take a look at it below again:
ggplot(data=MidtermSummary, aes(x=QNum, y= Grade, fill = Teacher)) +
geom_bar(stat= "summary", position = pd, width = 0.8) +
geom_errorbar(aes(ymin= Grade - se, ymax= Grade + se), width= 0.2, size = 1, position= pd)+
#annotate("text", x=11.5, y=7.2, label = "Average Grades", hjust = 0, fontface = "bold") +
#annotate("text", x=11.5, y=6.7, label = "Alan's Class = 52.3%", hjust = 0, color = "darkred") +
#annotate("text", x=11.5, y=6.2, label = "June's Class = 67.2%", hjust = 0, color = "darkgrey") +
labs(x="Question", y="Average Grade (Value)") +
theme_alan() +
#scale_y_continuous(limits = c(0,3), breaks=c(0:3)) +
scale_color_manual(values= c("darkred", "darkgrey")) +
scale_fill_manual(values= c("darkred", "darkgrey")) +
ggtitle("Bar Plot of Grades on AS Psychology Midterm")
A few areas stand out immediately as the most different between the classes - notably students in Alan’s class did much worse on Question 1A, 1B, 2D, 3B, and 3C.
The first two of these questions were introductory questions to the Biological Approach - foundations questions about what it is that we’re attempting to study in the biological approach and on the relationship between biology and psychology. The results of the first question were particularly surprising, becuase students in Alan’s class were specifically taught about the differences between hormones and neurotransmitters twice, and also received a “study card” from the teacher on this topic. Students have already been talked to about the importance of making use of resources and of “taking hints” from their instructors.
The second question concerned the fundamental assumptions of the biological approach. Alan has now made it very clear to students that they need to understand these assumptions very well indeed, and has recapitulated the fundamental assumptions of the cognitive approach (current unit under study) multiple times over the last week.
Question 2D concerned the procedure of one of the studies, and students in Alan’s class were likely let down by their English skills. It is not a very well worded question, but it was chosen specifically because it is an actual Cambridge question and students need to be prepared for occasional badly-worded questions. In both classes we will focus on providing additional training on how to interpret questions.
Question 3B had very low performance overall - The question had a maximum value of 4 marks and almost no students obtained a mark of higher the 2/4 - you can see the distribution of those grades below
MidtermGrades$GradeF <- factor(MidtermGrades$Grade)
ggplot(data=subset(MidtermGrades, QNum == "3B"), aes(x=GradeF)) +
geom_histogram(stat = "count", fill = "darkred") +
labs(x="Score", y="Count") +
theme_alan() +
#facet_wrap(~QNum, scales = "free_x", ncol = 4)+
theme(text = element_text(size=24)) +
ggtitle("AS Psychology Midterm - Q 3B")
This question was one we knew would be hard for students - a complete answer required them to report not only the main finding of the study but also a follow-up analysis. However, because almost no students answered this question well we opted to give all students an additional 2 marks on the test, and in the follow-up to the test provided them with further clarification about the importance of this question for understand the materials generally.
Finally, Question 3C was the “evaluate” question for this exam - the 10 point question that students fear the most. The distribution of grades on this question looked like this:
MidtermGrades$GradeF <- factor(MidtermGrades$Grade)
ggplot(data=subset(MidtermGrades, QNum == "3C"), aes(x=GradeF)) +
geom_histogram(stat = "count", fill = "darkred") +
labs(x="Score", y="Count") +
theme_alan() +
#facet_wrap(~QNum, scales = "free_x", ncol = 4)+
theme(text = element_text(size=24)) +
ggtitle("AS Psychology Midterm - Q 3C")
The grades on this question clustered on 6-7 as a mean, although we were fairly lenient here and the responses would more likely be scored as 5-6 by Cambridge. We will continue to work on these evaluate questions, as they make up for a very large proportion of the marks in AS Psychology Exams. Students are encouraged to tackle as many of these questions on their free time from the various test banks as they possibly can.
Finally, students across the board performed very poorly on question 5C about the benfits and costs of making generalisations in Psychology. THere are, we think, two reasons for this: first, many students simply ran out of time. Second, neither of us adequately stressed the benefits of making generalizations. AS these strengths were worth 2 of the 6 marks, we awarded all students an additional 2 marks on the exam.
MidtermGrades$GradeF <- factor(MidtermGrades$Grade)
ggplot(data=subset(MidtermGrades, QNum == "5C"), aes(x=GradeF)) +
geom_histogram(stat = "count", fill = "darkred") +
labs(x="Score", y="Count") +
theme_alan() +
#facet_wrap(~QNum, scales = "free_x", ncol = 4)+
theme(text = element_text(size=24)) +
ggtitle("AS Psychology Midterm - Q 5C")
Overall, the performance of students on this test was a mixed bag, and we were quite surprised by some of the results. Some of the questions we assumed would be easiest (e.g. 1A, 1B) were performed very poorly on, while some of the longer-format questions (e.g. 4B) actually had the highest performance of all.
We anticipated that students would have trouble with the “evaluate” style questions - these are, after all, the questions that set the very best students apart from the rest of the pack on Cambridge exams. In addition to requiring students to understand the material well, they also require strong English proficiency.
Students who are struggling should take the following general steps:
If you have any questions or comments about student performance in the class, please don’t hesitate to get in touch via email to Alan Nielsen or June Zhu.
This report was generated using R Markdown.