Basics of A2 Psychology Midterm

Note- We generally proceed through a similar analysis of test results with the students as a method for demonstrating many of the techniqueds of statistical analysis and of visualising data that are relevant to their course materials. As such, several parts of this report have didactic value as well

The A2 Psychology Midterm examination consisted of 8 questions from Units 7.1 to 7.4 of Consumer Decision Making. Because students had previously been tested on Sections 7.1 and 7.2 in their first term quiz, these sections were each worth only 15% of the grade on this midterm, with the bulk of the content coming from sections 7.3 and 7.4

##KNITTED TABLE OF QUESTIONS AND POINT VALUES
QuestionsTable <- 
  MidtermGrades %>%
    head(8)%>%
    subset(select = c(Qnum, Question, Value)) %>%
    setNames(c("Number", "Question", "Value")) %>%
    knitr::kable(caption = "A2 Psychology- Questions and Values", row.names = F) %>%
    kable_styling(full_width = F) 

save_kable(QuestionsTable, "QuestionsTable.png")

QuestionsTable
A2 Psychology- Questions and Values
Number Question Value
1 Explain what is meant by “playground design” as a casino design. 2
2 Describe the study by Chebat & Michon on odour and shopper emotion 4
3 Explain what is meant by the behaviour constraint model of personal space 2
4 Identify the extent to which Dayan & Bar-Hillel (2001) found support for a primacy-recency effect in their study (menu item position) 4
5 Outline two aims of the study by Wansink et al. (1998; Purchase Anchors) on consumer decision making 4
6 Explain two practical applications of the theory of planned behavior that would help a company to sell products 6
7 Describe what psychologists have discovered about “selling the product” 8
8 Evaluate what psychologists have discovered about intuitive thinking and its imperfections in consumer decision-making, including a discussion about determinism 10

Overall Performance

Generally, the first thing we want to look at in one of these analyses is the overall distribution of grades, which we can see in the histogram below (Recall that you can tell this is a histogram rather than a bar chart because the Dependent variable (grades) is on the X axis).

#Single Histogram
#Histograms
ggplot(data=MidtermTotals, aes(x=Grade)) +
  geom_histogram(aes(y=..density..), alpha = 0.6, position = "identity") +
  labs(x="Grade (Percentage)", y="Density") +
  stat_function(fun=dnorm, args = list(mean=mean(MidtermTotals$Grade), sd=sd(MidtermTotals$Grade)), 
                color="black", size   = 1.4) +
  scale_fill_manual(values= c("darkred", "darkgrey")) +
  scale_x_continuous(limits = c(0,100)) +
  theme_alan() +
  ggtitle("Histogram of Grades in A2 Psychology Midterm")

ggsave(here("Midterm 1 Histogram.png"), plot = last_plot(), device = NULL, path = NULL,
       width = 8, height = 4, units = c("in", "cm", "mm"),
       dpi = 600)

The first thing we can note is a very normal distribution (Shapiro-Wilk Normality test: W= 0.96, p= 0.449). Ultimately in most methods of assessment a normal distribution is desirable.

Overall performance was around what we expected with an average score of 64.2% and a standard deviation of 19 (Range = 22.5% to 95%).

Obviously a grade of 22.5% was far too low, while we were pleasantly surprised to see a student with a grade as high as 95%, who had clearly put in an incredible amount of work studying for the exam.

So next we decided to take a look at the test on a question-by-question basis, which you can see in the bar chart below (Recall that you can tell this is a bar chart and not a histogram because the dependent variable (grades) is on the y axis, and because there are spaces between the levels of the factor (question number) on the X axis - which means they represent independent data points).

MidtermSummary <- summarySE(MidtermGrades, measurevar = "Grade", groupvars = c("Teacher", "Qnum"))

ggplot(data=MidtermSummary, aes(x=Qnum, y= Grade, fill = Teacher)) +
  geom_bar(stat= "summary", position = pd, width = 0.8) +
  geom_errorbar(aes(ymin= Grade - se, ymax= Grade + se), width= 0.2, size = 1, position= pd)+
  #annotate("text", x=11.5, y=7.2, label = "Average Grades", hjust = 0, fontface = "bold") +
  #annotate("text", x=11.5, y=6.7, label = "Alan's Class = 52.3%", hjust = 0, color = "darkred") +
  #annotate("text", x=11.5, y=6.2, label = "June's Class = 67.2%", hjust = 0, color = "darkgrey") +
  labs(x="Question", y="Average Grade (Value)") +
  theme_alan() +
  #scale_y_continuous(limits = c(0,3), breaks=c(0:3)) +
  scale_x_continuous(breaks = 1:8) +
  scale_color_manual(values= c("darkred", "darkgrey")) +
  scale_fill_manual(values= c("darkred", "darkgrey")) +
  ggtitle("Bar Plot of Grades on AS Psychology Midterm") 

ggsave(here("MidtermGrades.png"), plot = last_plot(), device = NULL, path = NULL,
       width = 8, height = 4, units = c("in", "cm", "mm"),
       dpi = 600)

This is one way to visualize the results, but perhaps without reference it isn’t very valuable. What it does reveal, however, is the difference in performance between June’s and Alan’s classes. We will delve into the question-by-question performance later in this report, but for the time being lets explore the differences between classes.

Between Class Performance

Most valuably we can look at overall performance differences between the two classes, then perhaps return to any performance differences on individual questions that seem notable. You can see the distribution of performance between the two classes in the histogram below:

#Histograms by Class
JuneMTGrades <- subset(MidtermTotals, Teacher == "June")
AlanMTGrades <- subset(MidtermTotals, Teacher == "Alan")


ggplot(data=MidtermTotals, aes(x=Grade, fill = Teacher)) +
  geom_histogram(aes(y=..density..), alpha = 0.6, position = "identity") +
  labs(x="Grade (Percentage)", y="Density") +
  # geom_text(x= 180, y = .075, label = str_wrap(testreport, 20)) +
  stat_function(fun=dnorm, args = list(mean=mean(JuneMTGrades$Grade), sd=sd(JuneMTGrades$Grade)), color="darkgrey", size = 1.2) +
  stat_function(fun=dnorm, args = list(mean=mean(AlanMTGrades$Grade), sd=sd(AlanMTGrades$Grade)), color="darkred", size = 1.2) +
  scale_fill_manual(values= c("darkred", "darkgrey")) +
  scale_x_continuous(limits = c(0,100)) +
  theme_alan() +
  ggtitle("Histogram of Grades in AS Psychology Midterm - Comparison of Classes")

ggsave(here("Midterm 1 Class Histograms.png"), plot = last_plot(), device = NULL, path = NULL,
       width = 8, height = 4, units = c("in", "cm", "mm"),
       dpi = 600)

testcomp <- t.test(JuneMTGrades$Grade, AlanMTGrades$Grade, warning = FALSE, message = FALSE)

We can observe two main things from the above histogram. First, June’s class had a much higher range of performance, from a low of 22.5% to a high of 90%, while Alan’s class had a more modest range of 42.5% to 95%. With very small sets of data its hard to draw conclusions about normality, the distribution of grades in both classes was similarly normal (Alan’s class: Shapiro-Wilk Normality test: W= 0.95, p= 0.611; June’s class: Shapiro-Wilk Normality test: W= 0.97, p= 0.848).

The second is that the distribution of grades in Alan’s class is shifted to the right. In more common parlance, Alan’s class performed significantly better on the test (M= 72.7, sd= 16.3 than did June’s class (M= 58.7, sd= 18.9)`; t(23.7) = -2.09, p= 0.05).

We could explore a number of reasons why such a difference might be observed in our results, but the first we should attempt to exclude is the possibility that participant variables alone account for this difference - i.e. that Alan’s class simply has stronger students than does June’s Class.

We can do this first by attempting to correlate performance on this midterm with last year’s (S2) grades for the students in each class. (Note that we cannot make any causal claims here, because we are simply correlating two covariables - we haven’t actually manipulated an independent variable as we would in an experiment). NB- These “S1 Grades” are based on mass data of grade boundaries and lack much distinction - e.g. all grades of “C” were scored as a 65, all grades of “B” as a 75, and so on. Exploration of actual grades might reveal more interesting trends!) (Students should consider especially what this means whenever we use contructs like rating scales in studies, and especially when we average sdcores across multiple scales - how much data are we losing?)

PrevYearGrades <- 
  MidtermTotals %>%
    mutate(PrevYear = as.numeric(plyr::mapvalues(Student,
                                        from = Grades2019$Name,
                                        to= Grades2019$Average))) %>%
    subset(Student != "Echo Lin")
  

JuneMT <- subset(PrevYearGrades, Teacher == "June")
AlanMT <- subset(PrevYearGrades, Teacher == "Alan")

#Correlations

JuneCor <- round(cor.test(JuneMT$Grade, JuneMT$PrevYear)$estimate,2)
JuneCorLabel <- paste("June's' Class: r == ", JuneCor, sep = "")

AlanCor <- round(cor.test(AlanMT$Grade, AlanMT$PrevYear)$estimate,2)
AlanCorLabel <- paste("Alan's Class: r == ", AlanCor, sep = "")

Cor <- round(cor.test(PrevYearGrades$Grade, PrevYearGrades$PrevYear)$estimate,2)
CorLabel <- paste("Overall r == ", Cor, sep = "")


ggplot(data=PrevYearGrades, aes(x=Grade, y = PrevYear)) +
  geom_point(aes(color = Teacher), size = 2) +
  geom_smooth(method = lm, aes(color = Teacher), se = FALSE, size = 1.2) +
  geom_smooth(method = lm, color = "Black", se = FALSE) +
  scale_color_manual(values= c("darkred", "darkgrey")) +
  annotate("text", x=70, y=65, label = "Correlations", hjust = 0, fontface = "bold") +
  annotate("text", x=70, y=61, label = JuneCorLabel, hjust = 0, color = "darkgrey") +
  annotate("text", x=70, y=57, label = AlanCorLabel, hjust = 0, color = "darkred") +
  annotate("text", x=70, y=53, label = CorLabel, hjust = 0, color = "black") +
  scale_x_continuous(limits = c(25,100)) +
  scale_y_continuous(limits = c(50,100)) +
  labs(x="Midterm Grade", y="2019 Average") +
  theme_alan() +
  ggtitle("Scatterplot of Test Scores (Midterm vs 2019 Average) in A2 Psychology")

ggsave(here("ASCorrelationMidterm2.png"), plot = last_plot(), device = NULL, path = NULL,
       width = 7, height = 6, units = c("in", "cm", "mm"),
       dpi = 600)

We can see a pretty strong overall correlation here of r= 0.7, which can be seen in the black line above (Recall that the strength of a correlation refers to the average distance of the individual points of the distribution away from this line - the black line cuts neatly through our data). This is a strong positive correlation - students who performed better in their 2019 classes also performed better on the A2 Psychology Midterm.

We can see that the overall correlation is stronger than the individual correlations within both June’s class (r= 0.68) and Alan’s class (r= 0.61). This is to be expected from the larger data set - in this case we’d only have seen additional information if the correlation was better within classes than combining the two. So what this correlation tells us is that indeed much of the variance in class test performance is likely to be due to participant variables. Alan’s class performed better on the Midterm, but also performed better in their 2019 S2 classes (M= 80.6%) than June’s class (M= 74.4%).

By-Question Performance

Looking at differences between classes can provide us with one metric of performance, but ultimately as teachers we also want to be able to identify problem areas for our students, both in terms of content and in terms of types of questions that students are struggling with. The by-question performance was shown above previously, but lets take a look at it below again:

ggplot(data=MidtermSummary, aes(x=Qnum, y= Grade, fill = Teacher)) +
  geom_bar(stat= "summary", position = pd, width = 0.8) +
  geom_errorbar(aes(ymin= Grade - se, ymax= Grade + se), width= 0.2, size = 1, position= pd)+
  #annotate("text", x=11.5, y=7.2, label = "Average Grades", hjust = 0, fontface = "bold") +
  #annotate("text", x=11.5, y=6.7, label = "Alan's Class = 52.3%", hjust = 0, color = "darkred") +
  #annotate("text", x=11.5, y=6.2, label = "June's Class = 67.2%", hjust = 0, color = "darkgrey") +
  labs(x="Question", y="Average Grade (Value)") +
  theme_alan() +
  #scale_y_continuous(limits = c(0,3), breaks=c(0:3)) +
    scale_x_continuous(breaks = 1:8) +
  scale_color_manual(values= c("darkred", "darkgrey")) +
  scale_fill_manual(values= c("darkred", "darkgrey")) +
  ggtitle("Bar Plot of Grades on A2 Psychology Midterm") 

Not very much stands out here worth mentioning.

Alan’s class performed better by similar amounts on basically all questions except question 2, where both classes performed very similarly.

There were no questions where the overall performance was very poor - i.e. no questions that we should remove from the test becuase they were too difficult for even the best students to answer - instead individual students appear to have done well on different parts of the test, which is probably to be expected when students are tested on a broad range of materials - there were quite a few students who left some responses blank, which means we should focus on ensuring that our students continue receiving sufficient practice to complete their actual Cambridge exams in the allotted time.

Lets see if we can spot anything else by looking at some by-question histograms of responses

MidtermGrades$GradeF <- factor(MidtermGrades$Grade)

ggplot(data=MidtermGrades, aes(x=GradeF)) +
  geom_histogram(stat = "count", fill = "darkred") +
  labs(x="Score", y="Count") +
  theme_alan() +
  facet_wrap(~Qnum, scales = "free_x", ncol = 4)+
  theme(text = element_text(size=24)) +
  ggtitle("A2 Psychology Midterm - By Question Grade Distributions")

ggsave(here("A2ByQuestion.png"), plot = last_plot(), device = NULL, path = NULL,
       width = 12, height = 6, units = c("in", "cm", "mm"),
       dpi = 600)

Nothing stands out here - there are very few questions where a large number of students scored a zero on the question, and also only a few where they was a performance ceiling, e.g. question 1 where almost all students who earned any marks earned full marks.

The high-value questions are also approximately normally distrbuted around a mean of 6/8 (question 7) and 6/10 (question 8). The overall lower performance on question 8 is due partially to the fact that many students failed to actually address the named issue (determinism) - which under Cambridge guidelines means they could not score more than 5 marks on the question.

On Grading

All tests were originally blind-graded by Dr. Nielsen, who obscured the names of the students before marking the tests and graded them in random order so that he would be blind to which students tests he was grading (except in cases of students with distinctive handwriting…). From these tests ten were selected and given to June for secondary grading to ensure the two teachers agreed on the application of the marking scheme.

A few points of contention arose with how certain questions were dealt with, but after the resolution of these issues the teachers were found to have an inter-rater reliability exceeding 90% when considered on a by-item basis. All grades were then finalised after a final pass through the tests to ensure the newly agreed upon marking scheme was applied evenly to all students, not just the ten students who had been graded twice.

Conclusions

Overall the results of this test followed the results of the first quiz. Overall the results were around where we expected them to be, but there was a clear performance difference between the A2 classes. Although it is likely that a large number of extraneous variables affect those differences, the largest and easiest to undestand seems to simply be that Alan’s class has stronger students to begin with, and that those students continue to perform better in A2 psychology as they did on the initial quiz. This may of course be exacerbated by the fact that Alan’s A2 class has far fewer students than June’s, which means that students work with each other more closely and that the strongest students are best able to help the other students with their performance.

If you have any questions or comments about student performance in the class, please don’t hesitate to get in touch via email to Alan Nielsen or June Zhu.

This report was generated using R Markdown.