Extended Time Analysis: Part 1 - Percent Correct Winter 2018

Data

allData is a list with 20 objects. 4 are the Winter 2018 exam files, 14 are the past exam files. 2 are gender and time cluster information for Winter 2018.

allData_Elements	Name
1	WN2018_E4
2	WN2018_E3
3	WN2018_E2
4	WN2018_E1
5	WN2016_E1
6	WN2015_E4
7	WN2015_E3
8	WN2015_E2
9	WN2015_E1
10	WN2014_E4
11	WN2014_E3
12	WN2014_E2
13	WN2014_E1
14	WN2013_E4
15	WN2013_E3
16	WN2013_E2
17	WN2013_E1
18	FA2004_E3
19	WN2018_gender_3cluster
20	WN2018_gender_2cluster

41 Repeated Questions - matched to WN2018 term and original term

Number	WN2018_Info	OriginalTerm_Info
1	WN2018_E4_Q1	WN2013_E4_Q6
2	WN2018_E4_Q5	WN2013_E4_Q4
3	WN2018_E4_Q7	WN2015_E4_Q3
4	WN2018_E4_Q13	WN2013_E4_Q7
5	WN2018_E4_Q14	WN2013_E4_Q10
6	WN2018_E4_Q16	WN2015_E4_Q17
7	WN2018_E4_Q17	WN2015_E4_Q14
8	WN2018_E4_Q18	WN2015_E4_Q15
9	WN2018_E4_Q19	WN2014_E4_Q12
10	WN2018_E4_Q20	WN2013_E4_Q19
11	WN2018_E4_Q21	WN2013_E4_Q24
12	WN2018_E4_Q22	WN2015_E4_Q19
13	WN2018_E3_Q4	WN2015_E3_Q4
14	WN2018_E3_Q5	WN2013_E3_Q5
15	WN2018_E3_Q6	FA2004_E3_Q5
16	WN2018_E3_Q7	WN2014_E3_Q7
17	WN2018_E3_Q9	WN2015_E3_Q9
18	WN2018_E3_Q12	FA2004_E3_Q12
19	WN2018_E3_Q13	WN2013_E3_Q13
20	WN2018_E3_Q17	FA2004_E3_Q17
21	WN2018_E3_Q18	WN2013_E3_Q18
22	WN2018_E2_Q1	WN2013_E2_Q1
23	WN2018_E2_Q2	WN2015_E2_Q2
24	WN2018_E2_Q4	WN2014_E2_Q4
25	WN2018_E2_Q7	WN2014_E2_Q7
26	WN2018_E2_Q8	WN2013_E2_Q8
27	WN2018_E2_Q11	WN2013_E2_Q11
28	WN2018_E2_Q12	WN2013_E2_Q13
29	WN2018_E2_Q13	WN2014_E2_Q13
30	WN2018_E2_Q14	WN2013_E2_Q14
31	WN2018_E2_Q19	WN2015_E2_Q19
32	WN2018_E1_Q2	WN2016_E1_Q2
33	WN2018_E1_Q3	WN2014_E4_Q4
34	WN2018_E1_Q4	WN2013_E1_Q4
35	WN2018_E1_Q5	WN2014_E1_Q5
36	WN2018_E1_Q6	WN2014_E4_Q6
37	WN2018_E1_Q8	WN2013_E1_Q8
38	WN2018_E1_Q9	WN2016_E1_Q8
39	WN2018_E1_Q12	WN2015_E1_Q12
40	WN2018_E1_Q16	WN2013_E1_Q16
41	WN2018_E1_Q19	WN2014_E1_Q19

Analysis - Percent Correct Differences

Percent correct for 41 repeated questions

percent_Correct_WN2018 <- percent_Correct_OG <- c()
stdError_pCorrect_WN2018 <- stdError_pCorrect_OG <-c ()

# Bootstrap - sample with replacement many times, recalculate percent correct each time. 
# To determine R = 10000 for the boot() function, I kept changing the R value for a specific question until the hundredth place of the standard error did not change. 

bootfunc <- function(d, i) {
  d1 <- d[i]
  return(mean(d1))
}
# Do these commands for all 41 questions...
percent_Correct_WN2018[1] = mean(allData[[1]][9][[1]])
stdError_pCorrect_WN2018[1] = sd(boot(allData[[1]][9][[1]], bootfunc, 10000)$t)

percent_Correct_OG[1] = mean(allData[[14]][14][[1]])
stdError_pCorrect_OG[1] = sd(boot(allData[[14]][14][[1]], bootfunc, 10000)$t)

Number	Percent Correct WN2018 Term	Standard Error WN2018 Term	Percent Correct Original Term	Standard Error Original Term	Percent Correct Difference (WN2018 - Original)
1	0.7536232	0.0169661	0.6293605	0.0187286	0.1242627
2	0.6988728	0.0184550	0.6540698	0.0181234	0.0448030
3	0.8599034	0.0140002	0.7593496	0.0173605	0.1005538
4	0.7777778	0.0167628	0.8430233	0.0137352	-0.0652455
5	0.7842190	0.0164800	0.7863372	0.0157601	-0.0021182
6	0.7262480	0.0179759	0.6325203	0.0193025	0.0937277
7	0.8421900	0.0148623	0.7593496	0.0172501	0.0828404
8	0.8792271	0.0131503	0.8016260	0.0159578	0.0776010
9	0.8325282	0.0149568	0.8629032	0.0138018	-0.0303750
10	0.6537842	0.0190724	0.7281977	0.0170455	-0.0744135
11	0.5169082	0.0199587	0.6191860	0.0187723	-0.1022778
12	0.5362319	0.0200336	0.6243902	0.0193795	-0.0881584
13	0.7929374	0.0161274	0.7777778	0.0169865	0.0151596
14	0.8170144	0.0155208	0.7503650	0.0165763	0.0666495
15	0.5345104	0.0200333	0.4850575	0.0238474	0.0494530
16	0.6869984	0.0185154	0.7335474	0.0178428	-0.0465490
17	0.7415730	0.0172459	0.6094771	0.0197528	0.1320959
18	0.7174960	0.0180781	0.6091954	0.0234581	0.1083006
19	0.6099518	0.0195422	0.7007299	0.0175317	-0.0907781
20	0.3772071	0.0194826	0.2896552	0.0216083	0.0875519
21	0.4991974	0.0200391	0.4656934	0.0191107	0.0335040
22	0.7445483	0.0173632	0.6847978	0.0175396	0.0597505
23	0.9065421	0.0114969	0.7622821	0.0170522	0.1442600
24	0.8068536	0.0154699	0.7291982	0.0172619	0.0776554
25	0.8333333	0.0147333	0.6717095	0.0181446	0.1616238
26	0.9065421	0.0114281	0.8284519	0.0140708	0.0780902
27	0.7274143	0.0177250	0.6652720	0.0175615	0.0621424
28	0.7056075	0.0179726	0.8730823	0.0122269	-0.1674748
29	0.5342679	0.0194465	0.5975794	0.0190363	-0.0633115
30	0.6417445	0.0189417	0.7726639	0.0158207	-0.1309193
31	0.6137072	0.0190935	0.5198098	0.0202127	0.0938973
32	0.7939394	0.0156617	0.7756410	0.0169048	0.0182984
33	0.8060606	0.0154830	0.8451613	0.0145866	-0.0391007
34	0.8000000	0.0156790	0.5076709	0.0184042	0.2923291
35	0.6863636	0.0178351	0.7319277	0.0172859	-0.0455641
36	0.7833333	0.0161162	0.7790323	0.0166706	0.0043011
37	0.7863636	0.0160673	0.8019526	0.0148541	-0.0155889
38	0.7287879	0.0174603	0.6971154	0.0182426	0.0316725
39	0.9015152	0.0114976	0.6069731	0.0195205	0.2945421
40	0.6000000	0.0191950	0.6080893	0.0179578	-0.0080893
41	0.8000000	0.0154638	0.7515060	0.0168846	0.0484940

Plot proportion of students that answered each question correctly.

Red corresponds to the performance of students in the original term for each question, ranging from Fall 2004 to Winter 2016. Blue corresponds to student performance for the Winter 2018 term. A later plot shows the difference in these proportions between the original terms and the recent term.

ggplot(percentCorrectTable) + geom_point(aes(Number, `Percent Correct`, color = Term), position = position_dodge(width = 1)) + 
  geom_errorbar(aes(Number, ymin = `Percent Correct` - `Standard Error`, 
                    ymax = `Percent Correct` + `Standard Error`, color = Term), 
                position = position_dodge(width = 1)) +
  labs(title = "Average grade for each repeated question for WN2018 and its original term") +
  geom_vline(xintercept = c(5.5, 10.5, 15.5, 20.5, 25.5, 30.5, 35.5, 40.5), 
             linetype = "dashed", color = "gray")

The above plot shows information for all 41 questions. To make it easier to read, these points have been displayed below with only one question in each plot.

This plot reorders the x axis so the questions are in the order they appeared on the Winter 2018 exams. The order of the repeated questions for the Winter 2018 midterm exams very closely follows the original order of the questions. The order for the questions for the Winter 2018 final are a little different from the original order. All exams were combined, so if there are two exams with the same question number, the earlier exam would be placed first. Example: Exams 1, 3, and 4 all had a repeated question for their 5th question. So they are plotted in the order E1_Q5, E3_Q5, E4_Q5.

Plot difference in proportion of students that answered each question correctly between the original term and the Winter 2018 term.

The blue point at the far right represents the average percent correct difference between the original and Winter 2018 term. Therefore, on average students performed about 3% better on the repeated questions in Winter 2018 than in the original terms.

ggplot(percentCorrectDifference) + geom_point(aes(Number, `Percent Correct Difference (WN2018 - Original)`)) +
  geom_errorbar(aes(Number, ymin = `Percent Correct Difference (WN2018 - Original)` - 
                      `Standard Error Difference`, ymax = `Percent Correct Difference (WN2018 - Original)` +
                      `Standard Error Difference`)) +
  geom_hline(yintercept = 0, color = c("#009E73")) + 
  geom_point(aes(45, avgDiff), color = c("#0072B2")) + 
  geom_errorbar(aes(45, ymin = avgDiff - avgDiffError, ymax = avgDiff + avgDiffError), color = c("#0072B2")) +
  geom_vline(xintercept = c(5.5, 10.5, 15.5, 20.5, 25.5, 30.5, 35.5, 40.5, 44), 
             linetype = "dashed", color = "gray")

Average percent correct difference (in orange) after not counting the two questions with almost 30% shifts.

The percent correct difference is 2%.

In the OGRank and WN2018Rank, the rankings were determined by the percent correct for each question from the particular term. 1 is the hardest (lowest percent correct), 41 is the easiest. The rankings were similar, but not the same, for the OG terms and Winter 2018.

Distribution of percent correct difference from original terms to Winter 2018 term.

The width of the distribution appears to be wider than the error bars from the difference plots above. This leads us to believe there are other factors at play that are contributing to the wider spread of percent correct differences. These could be differences in instruction, student preparation, student ability (standardized test scores), or other variables.

ggplot(percentCorrectDifference) + 
  geom_histogram(aes(`Percent Correct Difference (WN2018 - Original)`), 
                 fill = c("#D55E00"), alpha = .6, binwidth = .04) +
  geom_vline(xintercept = 0, linetype = "longdash", alpha = .6) + geom_vline(xintercept = mean(percentCorrectDifference$`Percent Correct Difference (WN2018 - Original)`), linetype = "dashed", color = c("#0072B2"))

Analysis - Winter 2018 Gender

Percent correct by gender for 41 repeated questions in Winter 2018

Number	Percent Correct	Female Percent Correct	Female Standard Error	Male Percent Correct	Male Standard Error	Percent Correct Difference (F-M)
1	0.7536232	0.7021277	0.0294723	0.7853403	0.0213252	-0.0832127
2	0.6988728	0.6851064	0.0303891	0.7094241	0.0231463	-0.0243177
3	0.8599034	0.8170213	0.0251430	0.8848168	0.0163881	-0.0677955
4	0.7777778	0.7234043	0.0292279	0.8115183	0.0200602	-0.0881141
5	0.7842190	0.6893617	0.0303359	0.8455497	0.0185254	-0.1561880
6	0.7262480	0.7234043	0.0295959	0.7277487	0.0228273	-0.0043444
7	0.8421900	0.8468085	0.0234486	0.8376963	0.0189204	0.0091122
8	0.8792271	0.8553191	0.0228453	0.8952880	0.0156623	-0.0399688
9	0.8325282	0.7276596	0.0290371	0.8979058	0.0154595	-0.1702462
10	0.6537842	0.6000000	0.0320695	0.6858639	0.0236135	-0.0858639
11	0.5169082	0.4255319	0.0322050	0.5759162	0.0253432	-0.1503843
12	0.5362319	0.4723404	0.0320372	0.5732984	0.0255538	-0.1009580
13	0.7929374	0.7805907	0.0269842	0.8036649	0.0201992	-0.0230742
14	0.8170144	0.7594937	0.0276276	0.8507853	0.0182779	-0.0912917
15	0.5345104	0.3924051	0.0317596	0.6204188	0.0248303	-0.2280138
16	0.6869984	0.6666667	0.0305534	0.7041885	0.0237250	-0.0375218
17	0.7415730	0.6877637	0.0299926	0.7748691	0.0212387	-0.0871054
18	0.7174960	0.6582278	0.0309355	0.7513089	0.0222558	-0.0930811
19	0.6099518	0.5907173	0.0322789	0.6178010	0.0248229	-0.0270837
20	0.3772071	0.3248945	0.0304215	0.4083770	0.0253039	-0.0834824
21	0.4991974	0.4641350	0.0324737	0.5235602	0.0254816	-0.0594252
22	0.7445483	0.7581967	0.0273296	0.7328244	0.0221507	0.0253723
23	0.9065421	0.8934426	0.0197492	0.9134860	0.0142161	-0.0200434
24	0.8068536	0.7745902	0.0196377	0.8320611	0.0188579	-0.0574709
25	0.8333333	0.8606557	0.0222643	0.8193384	0.0194512	0.0413173
26	0.9065421	0.9098361	0.0180802	0.9083969	0.0146703	0.0014391
27	0.7274143	0.7418033	0.0285250	0.7201018	0.0226964	0.0217015
28	0.7056075	0.6639344	0.0302549	0.7353690	0.0223523	-0.0714345
29	0.5342679	0.4672131	0.0318076	0.5776081	0.0251077	-0.1103950
30	0.6417445	0.6311475	0.0309919	0.6513995	0.0241163	-0.0202520
31	0.6137072	0.6188525	0.0309016	0.6106870	0.0248960	0.0081654
32	0.7939394	0.7800000	0.0263325	0.8019802	0.0198158	-0.0219802
33	0.8060606	0.7880000	0.0253385	0.8193069	0.0191517	-0.0313069
34	0.8000000	0.7760000	0.0264033	0.8168317	0.0194113	-0.0408317
35	0.6863636	0.5680000	0.0315425	0.7574257	0.0216400	-0.1894257
36	0.7833333	0.8040000	0.0249706	0.7722772	0.0210344	0.0317228
37	0.7863636	0.7240000	0.0285411	0.8242574	0.0188573	-0.1002574
38	0.7287879	0.7360000	0.0277420	0.7252475	0.0219598	0.0107525
39	0.9015152	0.8520000	0.0224068	0.9306931	0.0126083	-0.0786931
40	0.6000000	0.6280000	0.0303796	0.5841584	0.0245712	0.0438416
41	0.8000000	0.7960000	0.0253619	0.8044554	0.0198442	-0.0084554

Reordered plot by order of appearance in Winter 2018 exams.

Plot difference in proportion of students that answered each question correctly between males and females in Winter 2018.

The orange point at the far right represents the average percent correct difference between males and females. Therefore, on average female students performed about 5% worse on the repeated questions in Winter 2018 than the males.

Reordered plot by order of appearance in Winter 2018 exams.

The rankings here are the same ones as before. 1 is the hardest (lowest percent correct), 41 is the easiest.

Distribution of percent correct difference between females and males in the Winter 2018 term.

Once we have the gender data from the original terms, we can compare to see if this distribution is consistent with or different from the previous terms.

Next Steps

There was a 3% overall shift in performance from the original terms to Winter 2018. How much of this shift can be accounted for by the extended time on exams?

Look at whether differences in performances show up in different class sections or different years of the same instructor different class sections
Look at standardized test scores to see whether student abilities are different
- Can determine (linear) relationship between standardized scores and probability of getting these exam questions correct
- Can match students by standardized test scores to compare performance
Look at gain scores to see how much scores shift relative to how much they could have shifted. For example, an OG percent correct of 80% can only go up a max of 20% in Winter 2018. How much of this did it actually go up?
- (final score - initial score) / (100 - initial score) for each question and compare