available at: at http://rpubs.com/kirstenz/24661
with Kay and Louise Friday 8 August
Students who report looking at lecture recordings, do they?
Student plans don’t come to fruition (ref)
-> so if students didn’t say they use lecture recordings and now plan to (in response to weakness or Mid-Sem), does access change? For how long?
=> by eye, lectures were tues arvo and wed morning and access peaks tue, wed and thurs (can visualise with calendarHeat figures)
how do students prepare for classes => do students look at prac videos
=> by eye Thurs/Fri for Fri prac,
how many of 5 prac’s with videos (as how many times/weeks they access)
are there changes over time ie students stop looking when they realise not so useful
actually figures give how many students access how often first, but calculations do determine which students look, and when
data in “Folder access across semester.xls” moved to “LectAccess.csv”
clean - remove name column, remove empty rows (233-678)
move total column and total row to new vectors, and remove
clean - keep only consenting students reports number of students by number of variables
## [1] 230 116
## [1] 99 116
clean - De-ID students so can push to html
basic structure of data
## StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1 S6089847 1 1 0 0
## 3 S8117889 0 1 4 0
## 4 S8118323 2 0 0 0
## 5 S8152093 0 0 0 0
## 6 S8239113 1 1 0 0
## 7 S8283571 0 2 0 0
## 9 S8395419 1 0 0 0
## 12 S8407099 3 8 4 0
## 13 S8408815 0 0 0 0
## 14 S8465063 0 0 1 0
dimensions (rows by columns)
## [1] 99 116
Number of students who looked at lectures x number of times
clipped x axis at 100 access clicks to zoom into lower end
Converted x axis to log to spread clumped data into normal-ish curve
NB Log scale:
0 = 1
1 ~ 3
2 ~ 7
3 ~ 20
4 ~ 55
5 ~ 150
6 ~ 403
Working out viewings by day - number of times folder accessed per day (access.day), number of students who access each day (stud.day)…
Working out number of times (access.stud) and number of days (days.stud) each student accessed…
Loading ‘describe’ function to get descriptive stat’s…
Descriptive stat’s for viewings by day and by student:
Number of times lecture recording folder was accessed per day
## min max median mean SD SEM n NAs sum
## 0.0 162.0 30.0 40.1 31.3 2.9 114.0 0.0 4570.0
Number of students who accessed lecture recordings each day
## min max median mean SD SEM n NAs sum
## 0.0 43.0 13.0 14.9 9.9 0.9 114.0 0.0 1697.0
Number of times each student accessed the lecture recordings
## min max median mean SD SEM n NAs sum
## 4.0 194.0 35.0 46.2 35.0 3.5 99.0 0.0 4570.0
Number of days each student accessed the lecture recordings
## min max median mean SD SEM n NAs sum
## 3.0 41.0 16.0 17.1 8.5 0.9 99.0 0.0 1697.0
Useful conclusions: 114 days (16 weeks, 2 days) in data for 99 consenting students (cohort 231)
Large range in the number of access hits (0, 22) recorded for each student each day. Overall, the number of access hits per day is 2-3x number of students who access per day, and number of access hits per student is also 2-3x the number of days a student access the folder.
Since we don’t really know how the number of folder openings is tracked by Blackboard (could be refreshings), the number of students is probably a better way of looking at the data than number of times the folder is ‘opened’.
On average 15 +/- 1 (mean+/-SEM) students accessed each day, with a max of 43 students one day (1/5/14).
On average students accessed lecture recordings on 17 days, with a max of 41 days and a minimum of 3 days. So there were no students who didn’t access lecture recordings at all?
## days.stud
## 3 4 5 6 7 8 9 10 12 13 14 15 16 17 18 19 20 21 22 23 25 26 27 28 29
## 1 1 3 3 6 1 5 8 5 5 5 6 1 6 6 1 5 3 4 3 3 4 1 1 1
## 30 31 32 33 34 35 41
## 2 1 1 1 3 2 1
So, only 1 student looked on three days… only 1 student looked on 41 days, the majority looked on 7-26 days. Or as a histogram:
Transposed data (lat) so we can get real dates…
## [1] 114 100
## Date S8530605 S8636955 S8475915 S8645607
## 1 2014-03-04 0 1 3 2
## 2 2014-03-05 1 2 8 1
## 3 2014-03-06 0 4 0 0
## 4 2014-03-07 0 2 0 0
## 5 2014-03-08 0 2 0 0
Summed numbers of times the lecture recording folder was accessed and number of students who accessed lecture recording folder per day…
Loaded calanderHeat function…
Calendar of number of times lecture recording folder was accessesed each day
Calendar of number of students who accessed lecture recordings each day
Built data frame with student ID and T/F for access each day… NB created 2 data frames: la.norm 0 = not accessed, 1 = accessed; la.norm2 1 = accessed, NA = not accessed (NA = missing value), but cluster analysis errors with NA => don’t use data with missing values for cluster analysis
Use la.norm to cluster leture recording access
distances = dist(la.norm[2:115], method = "euclidean")
clusterLA = hclust(distances, method = "ward")
plot(clusterLA)
clusterGroups3 = cutree(clusterLA, k = 3)
la.norm$cluster3 = clusterGroups3
dim(la.norm)
## [1] 99 116
la.norm[1:5,1:5]
## StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1 S6089847 1 1 0 0
## 3 S8117889 0 1 1 0
## 4 S8118323 1 0 0 0
## 5 S8152093 0 0 0 0
## 6 S8239113 1 1 0 0
la.norm[1:5,110:116]
## X20.06.14 X21.06.14 X22.06.14 X23.06.14 X24.06.14 X25.06.14 cluster3
## 1 0 0 0 0 0 0 1
## 3 1 1 1 1 1 0 2
## 4 1 0 0 0 0 0 1
## 5 0 1 0 0 0 0 3
## 6 0 0 0 0 0 0 1
## [1] "The number of students in each cluster by the the number of variables"
## [1] 26 116
## [1] 36 116
## [1] 37 116
## [1] 0 116
## [1] 0 116
So what are the characteristics of the clusters - how often do students view lecture recordings and when:
## min max median mean SD SEM n NAs sum
## 15.0 41.0 25.5 25.7 7.4 1.5 26.0 0.0 668.0
## min max median mean SD SEM n NAs sum
## 12.0 30.0 18.0 19.1 5.3 0.9 36.0 0.0 687.0
## min max median mean SD SEM n NAs sum
## 3.0 17.0 9.0 9.2 3.5 0.6 37.0 0.0 342.0
## Warning: no non-missing arguments to min; returning Inf
## Warning: no non-missing arguments to max; returning -Inf
## min max median mean SD SEM n NAs sum
## Inf -Inf NA NaN NA NA 0 0 0
## Warning: no non-missing arguments to min; returning Inf
## Warning: no non-missing arguments to max; returning -Inf
## min max median mean SD SEM n NAs sum
## Inf -Inf NA NaN NA NA 0 0 0
To see ‘when’ need to get cluster groups into lat (transposed version)
The run calendarHeat for all 3 clusters…
Load in qualitative coding “pattern of lecture recording use ML” -> “qual.csv”
clean - de-identify
clean - capitalisation, converted “no info”, “deferred” and “” to NA (ie missing)
Data structure
## [1] 99 10
## StudentID ML1.previous ML2.planMS ML3.usedMS ML4.planEOS total.no
## 1 S6089847 No Yes No No 3
## 2 S8117889 No No No No 4
## 3 S8118323 Yes No No No 3
## 4 S8152093 Maybe No Maybe No 2
## 5 S8239113 Yes No Yes No 2
## total.yes total.maybe total.noinfo access
## 1 1 0 0 21
## 2 0 0 0 75
## 3 1 0 0 122
## 4 0 2 0 13
## 5 2 0 0 89
## ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## Maybe:18 Maybe: 5 Maybe: 4 Maybe: 3
## No :52 No :77 No :69 No :71
## Yes :27 Yes :15 Yes :23 Yes :17
## NA's : 2 NA's : 2 NA's : 3 NA's : 8
Patterns of self-reported lecture recording use
##
## Maybe No Yes
## 18 52 27
## [1] "ML1.previous"
##
## Maybe No Yes
## 5 77 15
## [1] "ML2.planMS"
##
## Maybe No Yes
## 4 69 23
## [1] "ML3.usedMS"
##
## Maybe No Yes
## 3 71 17
## [1] "ML4.planEOS"
## ML2.planMS
## ML1.previous Maybe No Yes Sum
## Maybe 1 16 0 17
## No 2 40 9 51
## Yes 2 19 6 27
## Sum 5 75 15 95
## ML3.usedMS
## ML2.planMS Maybe No Yes Sum
## Maybe 0 4 0 4
## No 4 57 14 75
## Yes 0 6 9 15
## Sum 4 67 23 94
Concl:
Most students (52/99 (i.e. 53%)) report that they don’t usually use lecture recordings, even more didn’t plan to use lecture recordings for mid-semeter exam (77/99) and a similar number didn’t use lecture recordings for mid-semeter exam (69/99), and this was the same for the end of semester exam (71/99).
This seems inconsistent with the number of students who do use lecture recordings (all 99 at some point), and the majority used lecture recordings on 7-26 days, which is still half to twice the number of weeks in semester so ~ once/fortnight to twice/week.
What are the patterns of No, No, No, No etc, similar to what Kay calculated as number of No’s, Yes’, Maybe’s (tables have the number of no’s 0-4 in header row, then frequency (number of students) in 2nd row)
##
## 0 1 2 3 4
## 2 15 21 32 29
## [1] "total.no"
##
## 0 1 2 3 4
## 52 27 7 11 2
## [1] "total.yes"
##
## 0 1 2
## 72 24 3
## [1] "total.maybe"
Most frequent patterns of repsonse:
##
## No Yes Yes Yes Yes No No Yes Yes Yes Yes No No Yes No No Yes No Yes Yes
## 3 3 3 4 4
## Yes No No No Maybe No No No No No No No
## 8 9 29
everything else was reported by 2 or less students.
So there is definitely a group of 29 students who never report using lecture recordings (LR). There are 27 students who report that they usually used LR.
Of these, 6 plan to use LR for mid-sem, 2 maybes, and 19 don’t mention LR for mid-sem prep. There are 52 (51?) students who don’t report usually using LR. Of these, 9 plan to use LR for mid-sem, 2 maybes, and 40 don’t mention LR for mid-sem prep.
## [1] 99 116
## StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1 S6089847 1 1 0 0
## 3 S8117889 0 1 1 0
## 4 S8118323 1 0 0 0
## 5 S8152093 0 0 0 0
## 6 S8239113 1 1 0 0
## X20.06.14 X21.06.14 X22.06.14 X23.06.14 X24.06.14 X25.06.14 cluster3
## 1 0 0 0 0 0 0 1
## 3 1 1 1 1 1 0 2
## 4 1 0 0 0 0 0 1
## 5 0 1 0 0 0 0 3
## 6 0 0 0 0 0 0 1
## [1] 99 126
## StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1 S6089847 1 1 0 0
## 2 S8117889 0 1 1 0
## 3 S8118323 1 0 0 0
## 4 S8152093 0 0 0 0
## 5 S8239113 1 1 0 0
## X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1 0 1 No Yes No No
## 2 0 2 No No No No
## 3 0 1 Yes No No No
## 4 0 3 Maybe No Maybe No
## 5 0 1 Yes No Yes No
## total.no total.yes total.maybe total.noinfo access pattern
## 1 3 1 0 0 21 No Yes No No
## 2 4 0 0 0 75 No No No No
## 3 3 1 0 0 122 Yes No No No
## 4 2 0 2 0 13 Maybe No Maybe No
## 5 2 2 0 0 89 Yes No Yes No
## [1] 99 128
## X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1 0 1 No Yes No No
## 2 0 2 No No No No
## 3 0 1 Yes No No No
## 4 0 3 Maybe No Maybe No
## 5 0 1 Yes No Yes No
## total.no total.yes total.maybe total.noinfo access pattern
## 1 3 1 0 0 21 No Yes No No
## 2 4 0 0 0 75 No No No No
## 3 3 1 0 0 122 Yes No No No
## 4 2 0 2 0 13 Maybe No Maybe No
## 5 2 2 0 0 89 Yes No Yes No
## prevLR access.days
## 1 No 15
## 2 No 26
## 3 Yes 33
## 4 Yes 10
## 5 Yes 28
Statistical tests:
Wilcox (ie unpaired t test for categorical data)
Do students who report usually using LR, access more LR? First as number of folder openings, then as number of days. (order is test, mean, sem)
##
## Wilcoxon rank sum test with continuity correction
##
## data: access by prevLR
## W = 739, p-value = 0.00184
## alternative hypothesis: true location shift is not equal to 0
## No Yes
## 37.9 56.2
## [1] 4.796
## [1] 5.033
## No Yes
## 4.8 5.0
##
## Wilcoxon rank sum test with continuity correction
##
## data: access.days by prevLR
## W = 790, p-value = 0.005989
## alternative hypothesis: true location shift is not equal to 0
## No Yes
## 14.96 19.82
## [1] 1.078
## [1] 1.312
## No Yes
## 1.078 1.312
Do students who report usually using LR, fall into different clusters? (order is test, table, mean, sem)
##
## Wilcoxon rank sum test with continuity correction
##
## data: cluster3 by prevLR
## W = 1474, p-value = 0.01951
## alternative hypothesis: true location shift is not equal to 0
## cluster3
## prevLR 1 2 3 Sum
## No 9 19 24 52
## Yes 16 17 12 45
## Sum 25 36 36 97
## No Yes
## 2.29 1.91
## [1] 0.104
## [1] 0.1182
## No Yes
## 0.10 0.12
1 Previous and did >3 y yyyy ynyy yyny yyyn ynyn 2 previous/intended, but did not ynnn yynn ynny -> 3
3 No previous use, but then did or intended nnyy nyyy nnny -> 2 4 No previous use, intention but not nyny nynn
5 No report nnnn
0 Don’t fit?
Load “Kay.gp.index.csv” fixed for paper version of group names ie 2 and 3 swapped clean - de-identified
merge into la.norm.qual
## X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1 0 1 No Yes No No
## 2 0 2 No No No No
## 3 0 1 Yes No No No
## 4 0 3 Maybe No Maybe No
## 5 0 1 Yes No Yes No
## total.no total.yes total.maybe total.noinfo access pattern
## 1 3 1 0 0 21 No Yes No No
## 2 4 0 0 0 75 No No No No
## 3 3 1 0 0 122 Yes No No No
## 4 2 0 2 0 13 Maybe No Maybe No
## 5 2 2 0 0 89 Yes No Yes No
## prevLR access.days Kay.pattern
## 1 No 15 4
## 2 No 26 5
## 3 Yes 33 3
## 4 Yes 10 0
## 5 Yes 28 1
##
## Maybe Maybe No No Maybe NA No No Maybe No Maybe No Maybe No NA No
## 0 1 1 2 1
## 1 0 0 0 0
## 2 0 0 0 0
## 3 0 0 0 0
## 4 0 0 0 0
## 5 0 0 0 0
##
## Maybe No No NA Maybe No No No Maybe No Yes NA Maybe No Yes No
## 0 1 0 0 0
## 1 0 0 1 2
## 2 0 0 0 0
## 3 0 0 0 0
## 4 0 0 0 0
## 5 0 9 0 0
##
## NA No No No NA No No Yes No Maybe No No No NA No No No No Maybe No
## 0 1 1 2 0 0
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 0 0 0 0 0
## 4 0 0 0 0 0
## 5 0 0 0 1 1
##
## No No Maybe Yes No No NA No No No No NA No No No No No No No Yes
## 0 0 1 0 0 0
## 1 0 0 0 0 0
## 2 1 0 0 0 1
## 3 0 0 0 0 0
## 4 0 0 0 0 0
## 5 0 0 2 29 0
##
## No No Yes NA No No Yes No No No Yes Yes No Yes No Maybe No Yes No No
## 0 0 0 0 0 0
## 1 0 0 0 0 0
## 2 2 2 1 0 0
## 3 0 0 0 0 0
## 4 0 0 0 1 4
## 5 0 0 0 0 0
##
## No Yes Yes Maybe No Yes Yes Yes Yes Maybe NA No Yes Maybe No No
## 0 0 0 1 1
## 1 0 0 0 0
## 2 1 3 0 0
## 3 0 0 0 0
## 4 0 0 0 0
## 5 0 0 0 0
##
## Yes No No Maybe Yes No No NA Yes No No No Yes No No Yes Yes No Yes NA
## 0 0 1 0 0 1
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 1 0 8 3 0
## 4 0 0 0 0 0
## 5 0 0 0 0 0
##
## Yes No Yes No Yes No Yes Yes Yes Yes No Yes Yes Yes Yes No
## 0 0 0 0 0
## 1 1 4 1 3
## 2 0 0 0 0
## 3 0 0 0 0
## 4 0 0 0 0
## 5 0 0 0 0
##
## Yes Yes Yes Yes
## 0 0
## 1 2
## 2 0
## 3 0
## 4 0
## 5 0
Alignment Kay gp with cluster3
## Kay.pattern
## cluster3 0 1 2 3 4 5
## 1 3 5 3 5 1 9
## 2 2 8 6 4 4 12
## 3 10 1 2 3 0 21
What do the 3 clusters look like?
##
## 1 2 3
## 26 36 37
Alignment between 3 clusters and self report
## ML1.previous
## cluster3 Maybe No Yes
## 1 6 9 10
## 2 5 19 12
## 3 7 24 5
## ML2.planMS
## cluster3 Maybe No Yes
## 1 0 20 6
## 2 1 28 7
## 3 4 29 2
## ML3.usedMS
## cluster3 Maybe No Yes
## 1 0 18 8
## 2 2 21 13
## 3 2 30 2
## ML4.planEOS
## cluster3 Maybe No Yes
## 1 2 14 6
## 2 1 24 8
## 3 0 33 3
## total.no
## cluster3 0 1 2 3 4 Sum
## 1 0.50000 0.16667 0.11905 0.15625 0.06897 0.13131
## 2 0.00000 0.23333 0.23810 0.17188 0.13793 0.18182
## 3 0.00000 0.10000 0.14286 0.17188 0.29310 0.18687
## Sum 0.50000 0.50000 0.50000 0.50000 0.50000 0.50000
##
## cluster3 FALSE TRUE Sum
## 1 12 14 26
## 2 17 19 36
## 3 9 28 37
## Sum 38 61 99
## total.yes
## cluster3 0 1 2 3 4
## 1 10 9 2 3 2
## 2 12 14 4 6 0
## 3 30 4 1 2 0
Kay’s rules for 3 groups 1 = 3-4 y 2 = any 2 y + 2 N, 2n+y+m, 2y+n+m 3 = 3-4n 0 = noinfo
## [1] "3" "3" "3" "" "" "3" "3" "1" "3" "3" "3" "" "1" "" "3"
##
## 1 2 3 Sum
## 2 13 23 61 99
## [1] 23 130
## ML1.previous ML2.planMS ML3.usedMS ML4.planEOS total.no total.yes
## 4 Maybe No Maybe No 2 0
## 5 Yes No Yes No 2 2
## 12 Yes No Yes <NA> 1 2
## 14 Maybe Maybe No No 2 0
## 16 No Yes Yes Maybe 1 2
## 18 Maybe No Yes No 2 1
## 19 Yes No No Yes 2 2
## 23 Maybe No Yes No 2 1
## 25 Maybe No Maybe No 2 0
## 33 Maybe <NA> No No 2 0
## 42 No Yes No Maybe 2 1
## 45 Yes No No Yes 2 2
## 46 No No Yes <NA> 2 1
## 53 <NA> No No Yes 2 1
## 59 Maybe No No <NA> 2 0
## 60 Yes No No <NA> 2 1
## 63 Yes No No Maybe 2 1
## 70 No No Maybe Yes 2 1
## 71 Yes Maybe No No 2 1
## 74 No No Yes <NA> 2 1
## 79 Yes No No Yes 2 2
## 82 Maybe No <NA> No 2 0
## 99 No No Yes Yes 2 2
## [1] 8636869
## [1] 60
## StudentID ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 60 S8636869 Yes No No <NA>
## ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1 No Yes No No
## 2 No No No No
## 3 Yes No No No
## 4 Maybe No Maybe No
## 5 Yes No Yes No
## 6 No No No No
## 7 No No No No
## 8 No Yes Yes Yes
## 9 Yes No No No
## 10 No Yes No No
## 11 Maybe No No No
## 12 Yes No Yes <NA>
## 13 Yes Yes Yes Yes
## 14 Maybe Maybe No No
## 15 No No No No
## 16 No Yes Yes Maybe
## 17 Yes No No No
## 18 Maybe No Yes No
## 19 Yes No No Yes
## 20 Yes No No No
##
## FALSE TRUE
## 381 15
##
## FALSE TRUE
## 351 30
## cluster3
## Kay3 1 2 3 Sum
## 1 0 1 2
## 1 5 6 2 13
## 2 6 11 6 23
## 3 14 19 28 61
## Sum 26 36 37 99
Trying a 2 cluster solution Use la.norm to cluster leture recording access
distances = dist(la.norm[2:115], method = "euclidean")
clusterLA = hclust(distances, method = "ward")
plot(clusterLA)
clusterGroups2 = cutree(clusterLA, k = 2)
la.norm$cluster2 = clusterGroups2
dim(la.norm)
## [1] 99 117
la.norm[1:5,1:5]
## StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1 S6089847 1 1 0 0
## 3 S8117889 0 1 1 0
## 4 S8118323 1 0 0 0
## 5 S8152093 0 0 0 0
## 6 S8239113 1 1 0 0
la.norm[1:5,ncol(la.norm)]
## [1] 1 1 1 2 1
la.norm[1:5,115:117]
## X25.06.14 cluster3 cluster2
## 1 0 1 1
## 3 0 2 1
## 4 0 1 1
## 5 0 3 2
## 6 0 1 1
addmargins(with(la.norm, table(cluster3, cluster2)))
## cluster2
## cluster3 1 2 Sum
## 1 26 0 26
## 2 36 0 36
## 3 0 37 37
## Sum 62 37 99
moving cluster2 over to la.norm.qual
Kay.office = la.norm
df = Kay.office
dim(df)
## [1] 99 117
df = cbind(df$StudentID, df[117])
dim(df)
## [1] 99 2
df[1:5,]
## df$StudentID cluster2
## 1 S6089847 1
## 3 S8117889 1
## 4 S8118323 1
## 5 S8152093 2
## 6 S8239113 1
Kay.office = df
dim(Kay.office)
## [1] 99 2
Kay.office[1:5,]
## df$StudentID cluster2
## 1 S6089847 1
## 3 S8117889 1
## 4 S8118323 1
## 5 S8152093 2
## 6 S8239113 1
names(Kay.office) = c("StudentID", "cluster2")
la.norm.qual = merge(la.norm.qual, Kay.office, by="StudentID")
dim(la.norm.qual)
## [1] 99 131
la.norm.qual[1:5,1:5]
## StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1 S6089847 1 1 0 0
## 2 S8117889 0 1 1 0
## 3 S8118323 1 0 0 0
## 4 S8152093 0 0 0 0
## 5 S8239113 1 1 0 0
la.norm.qual[1:5,125:131]
## access pattern prevLR access.days Kay.pattern Kay3 cluster2
## 1 21 No Yes No No No 15 4 3 1
## 2 75 No No No No No 26 5 3 1
## 3 122 Yes No No No Yes 33 3 3 1
## 4 13 Maybe No Maybe No Yes 10 0 2 2
## 5 89 Yes No Yes No Yes 28 1 2 1
with(la.norm.qual, table(Kay3, cluster2))
## cluster2
## Kay3 1 2
## 1 1
## 1 11 2
## 2 17 6
## 3 33 28
addmargins(with(la.norm.qual, table(total.yes, cluster2)))
## cluster2
## total.yes 1 2 Sum
## 0 22 30 52
## 1 23 4 27
## 2 6 1 7
## 3 9 2 11
## 4 2 0 2
## Sum 62 37 99
with(la.norm.qual, tapply(access, cluster2, mean))
## 1 2
## 62.50 18.78
with(la.norm.qual, tapply(access, cluster2, sem))
## [1] 4.276
## [1] 2.205
## 1 2
## 4.276 2.205
with(la.norm.qual, tapply(access.days, cluster2, mean))
## 1 2
## 21.855 9.243
with(la.norm.qual, tapply(access.days, cluster2, sem))
## [1] 0.8915
## [1] 0.57
## 1 2
## 0.8915 0.5700
la.norm.qual[1:5,115:131]
## X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1 0 1 No Yes No No
## 2 0 2 No No No No
## 3 0 1 Yes No No No
## 4 0 3 Maybe No Maybe No
## 5 0 1 Yes No Yes No
## total.no total.yes total.maybe total.noinfo access pattern
## 1 3 1 0 0 21 No Yes No No
## 2 4 0 0 0 75 No No No No
## 3 3 1 0 0 122 Yes No No No
## 4 2 0 2 0 13 Maybe No Maybe No
## 5 2 2 0 0 89 Yes No Yes No
## prevLR access.days Kay.pattern Kay3 cluster2
## 1 No 15 4 3 1
## 2 No 26 5 3 1
## 3 Yes 33 3 3 1
## 4 Yes 10 0 2 2
## 5 Yes 28 1 2 1
addmargins(with(la.norm.qual, table(ML1.previous, cluster2)))
## cluster2
## ML1.previous 1 2 Sum
## Maybe 11 7 18
## No 28 24 52
## Yes 22 5 27
## Sum 61 36 97
addmargins(with(la.norm.qual, table(ML2.planMS, cluster2)))
## cluster2
## ML2.planMS 1 2 Sum
## Maybe 1 4 5
## No 48 29 77
## Yes 13 2 15
## Sum 62 35 97
addmargins(with(la.norm.qual, table(ML3.usedMS, cluster2)))
## cluster2
## ML3.usedMS 1 2 Sum
## Maybe 2 2 4
## No 39 30 69
## Yes 21 2 23
## Sum 62 34 96
addmargins(with(la.norm.qual, table(ML4.planEOS, cluster2)))
## cluster2
## ML4.planEOS 1 2 Sum
## Maybe 3 0 3
## No 38 33 71
## Yes 14 3 17
## Sum 55 36 91
addmargins(with(la.norm.qual, table(ML1.previous == "Yes", cluster2)))
## cluster2
## 1 2 Sum
## FALSE 39 31 70
## TRUE 22 5 27
## Sum 61 36 97
addmargins(with(la.norm.qual, table(ML2.planMS == "Yes", cluster2)))
## cluster2
## 1 2 Sum
## FALSE 49 33 82
## TRUE 13 2 15
## Sum 62 35 97
addmargins(with(la.norm.qual, table(ML3.usedMS == "Yes", cluster2)))
## cluster2
## 1 2 Sum
## FALSE 41 32 73
## TRUE 21 2 23
## Sum 62 34 96
addmargins(with(la.norm.qual, table(ML4.planEOS == "Yes", cluster2)))
## cluster2
## 1 2 Sum
## FALSE 41 33 74
## TRUE 14 3 17
## Sum 55 36 91
Then run calendarHeat for all 2 clusters…
## [1] 62 131
## [1] 37 131
## [1] 62 115
## StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1 S6089847 1 1 0 0
## 2 S8117889 0 1 1 0
## 3 S8118323 1 0 0 0
## 5 S8239113 1 1 0 0
## 6 S8283571 0 1 0 0
## X20.06.14 X21.06.14 X22.06.14 X23.06.14 X24.06.14 X25.06.14
## 1 0 0 0 0 0 0
## 2 1 1 1 1 1 0
## 3 1 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 1 0
## [1] "matrix"
## [1] 115 62
## 1 2 3 5 6
## StudentID "S6089847" "S8117889" "S8118323" "S8239113" "S8283571"
## X4.03.14 "1" "0" "1" "1" "0"
## X5.03.14 "1" "1" "0" "1" "1"
## X6.03.14 "0" "1" "0" "0" "0"
## X7.03.14 "0" "0" "0" "0" "0"
## 79 81 86 88 90
## StudentID "S8643917" "S8644267" "S8646161" "S8646489" "S8647069"
## X4.03.14 "0" "0" "1" "1" "1"
## X5.03.14 "1" "0" "1" "1" "1"
## X6.03.14 "1" "1" "1" "0" "0"
## X7.03.14 "0" "0" "0" "0" "0"
## 95 98 99
## StudentID "S8648397" "S8651655" "S8651793"
## X4.03.14 "0" "0" "1"
## X5.03.14 "0" "0" "0"
## X6.03.14 "0" "1" "1"
## X7.03.14 "0" "0" "0"
## [1] "data.frame"
## [1] 115 62
## 1 2 3 5 6
## StudentID S6089847 S8117889 S8118323 S8239113 S8283571
## X4.03.14 1 0 1 1 0
## X5.03.14 1 1 0 1 1
## X6.03.14 0 1 0 0 0
## X7.03.14 0 0 0 0 0
## 79 81 86 88 90 95 98
## StudentID S8643917 S8644267 S8646161 S8646489 S8647069 S8648397 S8651655
## X4.03.14 0 0 1 1 1 0 0
## X5.03.14 1 0 1 1 1 0 0
## X6.03.14 1 1 1 0 0 0 1
## X7.03.14 0 0 0 0 0 0 0
## 99
## StudentID S8651793
## X4.03.14 1
## X5.03.14 0
## X6.03.14 1
## X7.03.14 0
## 79 81 86 88 90 95 98 99 Dates
## X4.03.14 0 0 1 1 1 0 0 1 X4.03.14
## X5.03.14 1 0 1 1 1 0 0 0 X5.03.14
## X6.03.14 1 1 1 0 0 0 1 1 X6.03.14
## X7.03.14 0 0 0 0 0 0 0 0 X7.03.14
## [1] "data.frame"
## [1] 114 63
## 1 2 3 5 6
## X4.03.14 1 0 1 1 0
## X5.03.14 1 1 0 1 1
## X6.03.14 0 1 0 0 0
## X7.03.14 0 0 0 0 0
## X8.03.14 0 0 0 1 0
## 79 81 86 88 90 95 98 99 Dates
## X4.03.14 0 0 1 1 1 0 0 1 X4.03.14
## X5.03.14 1 0 1 1 1 0 0 0 X5.03.14
## X6.03.14 1 1 1 0 0 0 1 1 X6.03.14
## X7.03.14 0 0 0 0 0 0 0 0 X7.03.14
## X8.03.14 0 0 0 0 0 0 0 0 X8.03.14
## X9.03.14 0 0 0 0 0 0 0 0 X9.03.14
## X10.03.14 0 1 0 0 1 0 0 0 X10.03.14
## X11.03.14 0 1 0 1 0 0 0 0 X11.03.14
## X12.03.14 1 0 0 0 0 0 0 0 X12.03.14
## X13.03.14 0 0 0 0 0 0 1 0 X13.03.14
## X14.03.14 1 0 0 0 0 0 0 0 X14.03.14
## X15.03.14 0 0 0 0 0 0 0 0 X15.03.14
## X16.03.14 0 0 0 0 0 0 0 0 X16.03.14
## X17.03.14 0 0 0 0 0 0 0 0 X17.03.14
## X18.03.14 1 0 0 0 0 1 0 0 X18.03.14
## 79 81 86 88 90 95 98 99 Dates Dates2
## X4.03.14 0 0 1 1 1 0 0 1 X4.03.14 4.03.14
## X5.03.14 1 0 1 1 1 0 0 0 X5.03.14 5.03.14
## X6.03.14 1 1 1 0 0 0 1 1 X6.03.14 6.03.14
## X7.03.14 0 0 0 0 0 0 0 0 X7.03.14 7.03.14
## X8.03.14 0 0 0 0 0 0 0 0 X8.03.14 8.03.14
## X9.03.14 0 0 0 0 0 0 0 0 X9.03.14 9.03.14
## X10.03.14 0 1 0 0 1 0 0 0 X10.03.14 10.03.14
## X11.03.14 0 1 0 1 0 0 0 0 X11.03.14 11.03.14
## X12.03.14 1 0 0 0 0 0 0 0 X12.03.14 12.03.14
## X13.03.14 0 0 0 0 0 0 1 0 X13.03.14 13.03.14
## X14.03.14 1 0 0 0 0 0 0 0 X14.03.14 14.03.14
## X15.03.14 0 0 0 0 0 0 0 0 X15.03.14 15.03.14
## X16.03.14 0 0 0 0 0 0 0 0 X16.03.14 16.03.14
## X17.03.14 0 0 0 0 0 0 0 0 X17.03.14 17.03.14
## X18.03.14 1 0 0 0 0 1 0 0 X18.03.14 18.03.14
## chr [1:114] "4.03.14" "5.03.14" "6.03.14" "7.03.14" ...
## 79 81 86 88 90 95 98 99 Dates Dates2
## X4.03.14 0 0 1 1 1 0 0 1 X4.03.14 2014-03-04
## X5.03.14 1 0 1 1 1 0 0 0 X5.03.14 2014-03-05
## X6.03.14 1 1 1 0 0 0 1 1 X6.03.14 2014-03-06
## X7.03.14 0 0 0 0 0 0 0 0 X7.03.14 2014-03-07
## X8.03.14 0 0 0 0 0 0 0 0 X8.03.14 2014-03-08
## X9.03.14 0 0 0 0 0 0 0 0 X9.03.14 2014-03-09
## X10.03.14 0 1 0 0 1 0 0 0 X10.03.14 2014-03-10
## X11.03.14 0 1 0 1 0 0 0 0 X11.03.14 2014-03-11
## X12.03.14 1 0 0 0 0 0 0 0 X12.03.14 2014-03-12
## X13.03.14 0 0 0 0 0 0 1 0 X13.03.14 2014-03-13
## X14.03.14 1 0 0 0 0 0 0 0 X14.03.14 2014-03-14
## X15.03.14 0 0 0 0 0 0 0 0 X15.03.14 2014-03-15
## X16.03.14 0 0 0 0 0 0 0 0 X16.03.14 2014-03-16
## X17.03.14 0 0 0 0 0 0 0 0 X17.03.14 2014-03-17
## X18.03.14 1 0 0 0 0 1 0 0 X18.03.14 2014-03-18
## 79 81 86 88 90 95 98 99 Dates Dates2
## X4.03.14 0 0 1 1 1 0 0 1 X4.03.14 2014-03-04
## X5.03.14 1 0 1 1 1 0 0 0 X5.03.14 2014-03-05
## X6.03.14 1 1 1 0 0 0 1 1 X6.03.14 2014-03-06
## X7.03.14 0 0 0 0 0 0 0 0 X7.03.14 2014-03-07
## X8.03.14 0 0 0 0 0 0 0 0 X8.03.14 2014-03-08
## X9.03.14 0 0 0 0 0 0 0 0 X9.03.14 2014-03-09
## X10.03.14 0 1 0 0 1 0 0 0 X10.03.14 2014-03-10
## X11.03.14 0 1 0 1 0 0 0 0 X11.03.14 2014-03-11
## X12.03.14 1 0 0 0 0 0 0 0 X12.03.14 2014-03-12
## X13.03.14 0 0 0 0 0 0 1 0 X13.03.14 2014-03-13
## X14.03.14 1 0 0 0 0 0 0 0 X14.03.14 2014-03-14
## X15.03.14 0 0 0 0 0 0 0 0 X15.03.14 2014-03-15
## X16.03.14 0 0 0 0 0 0 0 0 X16.03.14 2014-03-16
## X17.03.14 0 0 0 0 0 0 0 0 X17.03.14 2014-03-17
## X18.03.14 1 0 0 0 0 1 0 0 X18.03.14 2014-03-18
## 'data.frame': 114 obs. of 5 variables:
## $ 1: chr "1" "1" "0" "0" ...
## $ 2: chr "0" "1" "1" "0" ...
## $ 3: chr "1" "0" "0" "0" ...
## $ 5: chr "1" "1" "0" "0" ...
## $ 6: chr "0" "1" "0" "0" ...
## 79 81 86 88 90 95 98 99 Dates Dates2
## X4.03.14 0 0 1 1 1 0 0 1 X4.03.14 2014-03-04
## X5.03.14 1 0 1 1 1 0 0 0 X5.03.14 2014-03-05
## X6.03.14 1 1 1 0 0 0 1 1 X6.03.14 2014-03-06
## X7.03.14 0 0 0 0 0 0 0 0 X7.03.14 2014-03-07
## X8.03.14 0 0 0 0 0 0 0 0 X8.03.14 2014-03-08
## X9.03.14 0 0 0 0 0 0 0 0 X9.03.14 2014-03-09
## X10.03.14 0 1 0 0 1 0 0 0 X10.03.14 2014-03-10
## X11.03.14 0 1 0 1 0 0 0 0 X11.03.14 2014-03-11
## X12.03.14 1 0 0 0 0 0 0 0 X12.03.14 2014-03-12
## X13.03.14 0 0 0 0 0 0 1 0 X13.03.14 2014-03-13
## X14.03.14 1 0 0 0 0 0 0 0 X14.03.14 2014-03-14
## X15.03.14 0 0 0 0 0 0 0 0 X15.03.14 2014-03-15
## X16.03.14 0 0 0 0 0 0 0 0 X16.03.14 2014-03-16
## X17.03.14 0 0 0 0 0 0 0 0 X17.03.14 2014-03-17
## X18.03.14 1 0 0 0 0 1 0 0 X18.03.14 2014-03-18
## 'data.frame': 114 obs. of 5 variables:
## $ 1: num 1 1 0 0 0 0 0 0 0 1 ...
## $ 2: num 0 1 1 0 0 1 0 0 1 0 ...
## $ 3: num 1 0 0 0 0 1 1 1 0 1 ...
## $ 5: num 1 1 0 0 1 0 0 0 0 0 ...
## $ 6: num 0 1 0 0 0 0 0 0 0 0 ...
## [1] 114 65
## 79 81 86 88 90 95 98 99 Dates Dates2 Total
## X4.03.14 0 0 1 1 1 0 0 1 X4.03.14 2014-03-04 29
## X5.03.14 1 0 1 1 1 0 0 0 X5.03.14 2014-03-05 32
## X6.03.14 1 1 1 0 0 0 1 1 X6.03.14 2014-03-06 17
## X7.03.14 0 0 0 0 0 0 0 0 X7.03.14 2014-03-07 6
## X8.03.14 0 0 0 0 0 0 0 0 X8.03.14 2014-03-08 10
## X9.03.14 0 0 0 0 0 0 0 0 X9.03.14 2014-03-09 11
## X10.03.14 0 1 0 0 1 0 0 0 X10.03.14 2014-03-10 15
## X11.03.14 0 1 0 1 0 0 0 0 X11.03.14 2014-03-11 21
## X12.03.14 1 0 0 0 0 0 0 0 X12.03.14 2014-03-12 16
## X13.03.14 0 0 0 0 0 0 1 0 X13.03.14 2014-03-13 7
## X14.03.14 1 0 0 0 0 0 0 0 X14.03.14 2014-03-14 7
## X15.03.14 0 0 0 0 0 0 0 0 X15.03.14 2014-03-15 6
## X16.03.14 0 0 0 0 0 0 0 0 X16.03.14 2014-03-16 10
## X17.03.14 0 0 0 0 0 0 0 0 X17.03.14 2014-03-17 17
## X18.03.14 1 0 0 0 0 1 0 0 X18.03.14 2014-03-18 19
## [1] 114 39
## 94 96 97 Dates Dates2
## X4.03.14 1 0 1 X4.03.14 2014-03-04
## X5.03.14 0 0 0 X5.03.14 2014-03-05
## X6.03.14 0 0 0 X6.03.14 2014-03-06
## X7.03.14 0 0 0 X7.03.14 2014-03-07
## X8.03.14 0 0 0 X8.03.14 2014-03-08
## [1] 114 40
## 94 96 97 Dates Dates2 Total
## X4.03.14 1 0 1 X4.03.14 2014-03-04 10
## X5.03.14 0 0 0 X5.03.14 2014-03-05 8
## X6.03.14 0 0 0 X6.03.14 2014-03-06 2
## X7.03.14 0 0 0 X7.03.14 2014-03-07 2
## X8.03.14 0 0 0 X8.03.14 2014-03-08 1
## X9.03.14 0 0 0 X9.03.14 2014-03-09 2
## X10.03.14 0 0 0 X10.03.14 2014-03-10 3
## X11.03.14 0 0 0 X11.03.14 2014-03-11 3
## X12.03.14 1 0 0 X12.03.14 2014-03-12 9
## X13.03.14 0 0 1 X13.03.14 2014-03-13 3
## X14.03.14 0 0 0 X14.03.14 2014-03-14 1
## X15.03.14 0 0 0 X15.03.14 2014-03-15 0
## X16.03.14 0 0 0 X16.03.14 2014-03-16 1
## X17.03.14 0 0 0 X17.03.14 2014-03-17 4
## X18.03.14 0 0 0 X18.03.14 2014-03-18 7
Kay’s email Wed 13 Aug 2014
Low 37 18.8+2.2 9.2+0.57 Meta-learning response (did and/or intended to access) n mean
yes 47 66+5.6
0 yes 52 28.23+2.5**
dim(la.norm.qual)
## [1] 99 131
with(la.norm.qual, tapply(access, total.yes == 0, mean))
## FALSE TRUE
## 66.00 28.23
with(la.norm.qual, tapply(access, total.yes == 0, sem))
## [1] 5.591
## [1] 2.542
## FALSE TRUE
## 5.591 2.542
with(la.norm.qual, tapply(access.days, total.yes == 0, mean))
## FALSE TRUE
## 21.30 13.38
with(la.norm.qual, tapply(access.days, total.yes == 0, sem))
## [1] 1.273
## [1] 0.8848
## FALSE TRUE
## 1.2727 0.8848
with(la.norm.qual, table(total.yes == 0, cluster2))
## cluster2
## 1 2
## FALSE 40 7
## TRUE 22 30
Days before for ML1-4 and Ass plus AcP for course (and Ass), plus qual categories 3 and 5 (up to 4 types of each so ordinal data) using MLsub that has Ml1-4 submission and due dates and time differences (should be loaded into global - if not then some indications of code in markup v1)
clean - consent
dim(ci)
## [1] 231 2
MLsub = NULL
MLsub = read.csv("MLsub.csv")
dim(MLsub)
## [1] 876 11
MLsub[,11] = NULL
MLsub[,1] = NULL
str(MLsub)
## 'data.frame': 876 obs. of 9 variables:
## $ StudentID: Factor w/ 230 levels "s3044923","s361850",..: 56 89 31 148 19 198 136 206 137 68 ...
## $ Date : Factor w/ 33 levels "1/06/14","10/04/14",..: 19 20 20 19 21 18 14 16 22 16 ...
## $ Submitted: Factor w/ 866 levels "0:01:00","0:03:09",..: 586 304 427 490 144 244 571 260 218 114 ...
## $ Duration : Factor w/ 785 levels "","0:00:42","0:01:07",..: 73 220 718 697 207 684 541 280 611 435 ...
## $ MLtask : Factor w/ 4 levels "ML1","ML2","ML3",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Open : Factor w/ 4 levels "19/03/14","27/05/14",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Due : Factor w/ 4 levels "14/05/14","16/04/14",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ SubDT : Factor w/ 850 levels "1/06/14 0:28",..: 476 491 494 468 510 441 338 386 543 378 ...
## $ DueDT : Factor w/ 4 levels "14/05/14 17:00",..: 3 3 3 3 3 3 3 3 3 3 ...
MLsub$SubDT = as.character(MLsub$SubDT)
MLsub$DueDT = as.character(MLsub$DueDT)
str(MLsub)
## 'data.frame': 876 obs. of 9 variables:
## $ StudentID: Factor w/ 230 levels "s3044923","s361850",..: 56 89 31 148 19 198 136 206 137 68 ...
## $ Date : Factor w/ 33 levels "1/06/14","10/04/14",..: 19 20 20 19 21 18 14 16 22 16 ...
## $ Submitted: Factor w/ 866 levels "0:01:00","0:03:09",..: 586 304 427 490 144 244 571 260 218 114 ...
## $ Duration : Factor w/ 785 levels "","0:00:42","0:01:07",..: 73 220 718 697 207 684 541 280 611 435 ...
## $ MLtask : Factor w/ 4 levels "ML1","ML2","ML3",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Open : Factor w/ 4 levels "19/03/14","27/05/14",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Due : Factor w/ 4 levels "14/05/14","16/04/14",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ SubDT : chr "23/03/14 20:43" "24/03/14 14:31" "24/03/14 17:06" "23/03/14 18:28" ...
## $ DueDT : chr "26/03/14 17:00" "26/03/14 17:00" "26/03/14 17:00" "26/03/14 17:00" ...
#MLsub[1:3,]
#dtp = as.POSIXct(dt, format = "%d/%m/%Y %H:%M:%S", tz="UTC")
MLsub$SubDT = as.POSIXct(MLsub$SubDT, "%d/%m/%y %H:%M", tz="UTC")
MLsub$DueDT = as.POSIXct(MLsub$DueDT, "%d/%m/%y %H:%M", tz="UTC")
#MLsub[1:3,]
MLsub$Earliness = difftime(MLsub$DueDT, MLsub$SubDT)
#MLsub[1:3,]
#MLsub[1:5,1:5]
MLsub.names = names(MLsub)
MLsub.names
## [1] "StudentID" "Date" "Submitted" "Duration" "MLtask"
## [6] "Open" "Due" "SubDT" "DueDT" "Earliness"
MLsub.names[1] = "StudentID"
names(MLsub) = MLsub.names
#MLsub[1:5,1:5]
clean - De-ID
## [1] 876 10
## StudentID Date Submitted Duration MLtask Open Due
## 1 S8579275 23/03/14 20:43:42 0:09:16 ML1 19/03/14 26/03/14
## 2 S8587419 24/03/14 14:31:37 0:16:02 ML1 19/03/14 26/03/14
## 3 S8530605 24/03/14 17:06:10 47:58:06 ML1 19/03/14 26/03/14
## 4 S8636955 23/03/14 18:28:04 4:40:22 ML1 19/03/14 26/03/14
## 5 S8475915 25/03/14 12:00:45 0:15:28 ML1 19/03/14 26/03/14
## SubDT DueDT Earliness
## 1 2014-03-23 20:43:00 2014-03-26 17:00:00 4097 mins
## 2 2014-03-24 14:31:00 2014-03-26 17:00:00 3029 mins
## 3 2014-03-24 17:06:00 2014-03-26 17:00:00 2874 mins
## 4 2014-03-23 18:28:00 2014-03-26 17:00:00 4232 mins
## 5 2014-03-25 12:00:00 2014-03-26 17:00:00 1740 mins
str(MLsub)
## 'data.frame': 876 obs. of 10 variables:
## $ StudentID: chr "S8579275" "S8587419" "S8530605" "S8636955" ...
## $ Date : Factor w/ 33 levels "1/06/14","10/04/14",..: 19 20 20 19 21 18 14 16 22 16 ...
## $ Submitted: Factor w/ 866 levels "0:01:00","0:03:09",..: 586 304 427 490 144 244 571 260 218 114 ...
## $ Duration : Factor w/ 785 levels "","0:00:42","0:01:07",..: 73 220 718 697 207 684 541 280 611 435 ...
## $ MLtask : Factor w/ 4 levels "ML1","ML2","ML3",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Open : Factor w/ 4 levels "19/03/14","27/05/14",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Due : Factor w/ 4 levels "14/05/14","16/04/14",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ SubDT : POSIXct, format: "2014-03-23 20:43:00" "2014-03-24 14:31:00" ...
## $ DueDT : POSIXct, format: "2014-03-26 17:00:00" "2014-03-26 17:00:00" ...
## $ Earliness:Class 'difftime' atomic [1:876] 4097 3029 2874 4232 1740 ...
## .. ..- attr(*, "tzone")= chr "UTC"
## .. ..- attr(*, "units")= chr "mins"
dim(MLsub)
## [1] 876 10
require(lubridate)
## Loading required package: lubridate
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:chron':
##
## days, hours, minutes, seconds, years
mean(MLsub$Earliness)
## Time difference of 5747 mins
mean(difftime(MLsub$DueDT, MLsub$SubDT, units = "hours"))
## Time difference of 95.78 hours
sem(difftime(MLsub$DueDT, MLsub$SubDT, units = "hours"))
## [1] 2.028
mean(difftime(MLsub$DueDT, MLsub$SubDT, units = "days"))
## Time difference of 3.991 days
sem(difftime(MLsub$DueDT, MLsub$SubDT, units = "days"))
## [1] 0.08452
MLsub[1:5,1:3]
## StudentID Date Submitted
## 1 S8579275 23/03/14 20:43:42
## 2 S8587419 24/03/14 14:31:37
## 3 S8530605 24/03/14 17:06:10
## 4 S8636955 23/03/14 18:28:04
## 5 S8475915 25/03/14 12:00:45
correlations within ML submission to check
want ML1 vs 2 vs 3 vs 4 for Earliness
so ML 1…4 need to be columsn where StudID needs to be rows -> too hard to transform data, try boxplot for consistency instead
str(MLsub)
## 'data.frame': 876 obs. of 10 variables:
## $ StudentID: chr "S8579275" "S8587419" "S8530605" "S8636955" ...
## $ Date : Factor w/ 33 levels "1/06/14","10/04/14",..: 19 20 20 19 21 18 14 16 22 16 ...
## $ Submitted: Factor w/ 866 levels "0:01:00","0:03:09",..: 586 304 427 490 144 244 571 260 218 114 ...
## $ Duration : Factor w/ 785 levels "","0:00:42","0:01:07",..: 73 220 718 697 207 684 541 280 611 435 ...
## $ MLtask : Factor w/ 4 levels "ML1","ML2","ML3",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Open : Factor w/ 4 levels "19/03/14","27/05/14",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Due : Factor w/ 4 levels "14/05/14","16/04/14",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ SubDT : POSIXct, format: "2014-03-23 20:43:00" "2014-03-24 14:31:00" ...
## $ DueDT : POSIXct, format: "2014-03-26 17:00:00" "2014-03-26 17:00:00" ...
## $ Earliness:Class 'difftime' atomic [1:876] 4097 3029 2874 4232 1740 ...
## .. ..- attr(*, "tzone")= chr "UTC"
## .. ..- attr(*, "units")= chr "mins"
MLsub$Early.hr = difftime(MLsub$DueDT, MLsub$SubDT, units = "hours")
MLsub$Early.hr[1:5]
## Time differences in hours
## [1] 68.28 50.48 47.90 70.53 29.00
MLsub$Early.hr.num = as.numeric(MLsub$Early.hr)
boxplot(Early.hr.num ~ MLtask, data=MLsub)
#check if there is a difference in earliness between ML tasks...
aov.out = NULL
aov.out = aov(Early.hr.num ~ MLtask * StudentID + Error(StudentID), data=MLsub)
summary(aov.out)
##
## Error: StudentID
## Df Sum Sq Mean Sq
## MLtask 3 3420 1140
## StudentID 226 1885492 8343
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## MLtask 3 88890 29630 10.79 0.013 *
## MLtask:StudentID 638 1162107 1821 0.66 0.815
## Residuals 5 13736 2747
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#now sig difference...
library(car)
with(MLsub, pairwise.t.test(Early.hr.num, MLtask, p.adjust.method = "bonferroni"))
##
## Pairwise comparisons using t tests with pooled SD
##
## data: Early.hr.num and MLtask
##
## ML1 ML2 ML3
## ML2 0.241 - -
## ML3 1.000 0.148 -
## ML4 0.027 8.5e-06 0.053
##
## P value adjustment method: bonferroni
with(MLsub, tapply(Early.hr.num, MLtask, mean))
## ML1 ML2 ML3 ML4
## 94.34 82.72 95.53 110.45
with(MLsub, tapply(Early.hr.num, MLtask, sem))
## [1] 3.686
## [1] 3.858
## [1] 3.887
## [1] 4.57
## ML1 ML2 ML3 ML4
## 3.686 3.858 3.887 4.570
add MLsub to la.norm.qual
ML1 = subset(MLsub, MLtask =="ML1")
ML2 = subset(MLsub, MLtask =="ML2")
ML3 = subset(MLsub, MLtask =="ML3")
ML4 = subset(MLsub, MLtask =="ML4")
dim(la.norm.qual)
## [1] 99 131
la.norm.qual[1:5,125:131]
## access pattern prevLR access.days Kay.pattern Kay3 cluster2
## 1 21 No Yes No No No 15 4 3 1
## 2 75 No No No No No 26 5 3 1
## 3 122 Yes No No No Yes 33 3 3 1
## 4 13 Maybe No Maybe No Yes 10 0 2 2
## 5 89 Yes No Yes No Yes 28 1 2 1
all = NULL
dim(ML1)
## [1] 225 12
ML1[1:5,]
## StudentID Date Submitted Duration MLtask Open Due
## 1 S8579275 23/03/14 20:43:42 0:09:16 ML1 19/03/14 26/03/14
## 2 S8587419 24/03/14 14:31:37 0:16:02 ML1 19/03/14 26/03/14
## 3 S8530605 24/03/14 17:06:10 47:58:06 ML1 19/03/14 26/03/14
## 4 S8636955 23/03/14 18:28:04 4:40:22 ML1 19/03/14 26/03/14
## 5 S8475915 25/03/14 12:00:45 0:15:28 ML1 19/03/14 26/03/14
## SubDT DueDT Earliness Early.hr
## 1 2014-03-23 20:43:00 2014-03-26 17:00:00 4097 mins 68.28 hours
## 2 2014-03-24 14:31:00 2014-03-26 17:00:00 3029 mins 50.48 hours
## 3 2014-03-24 17:06:00 2014-03-26 17:00:00 2874 mins 47.90 hours
## 4 2014-03-23 18:28:00 2014-03-26 17:00:00 4232 mins 70.53 hours
## 5 2014-03-25 12:00:00 2014-03-26 17:00:00 1740 mins 29.00 hours
## Early.hr.num
## 1 68.28
## 2 50.48
## 3 47.90
## 4 70.53
## 5 29.00
all = merge(la.norm.qual, ML1[, c(1, 12)], by="StudentID")
dim(all)
## [1] 99 132
all[1:5, 125:132]
## access pattern prevLR access.days Kay.pattern Kay3 cluster2
## 1 21 No Yes No No No 15 4 3 1
## 2 21 No Yes No No No 15 4 3 1
## 3 75 No No No No No 26 5 3 1
## 4 122 Yes No No No Yes 33 3 3 1
## 5 13 Maybe No Maybe No Yes 10 0 2 2
## Early.hr.num
## 1 139.867
## 2 8.733
## 3 19.867
## 4 43.200
## 5 117.333
all.names = names(all)
all.names[132] = "ML1earliness"
names(all) = all.names
all[1:5,130:132]
## Kay3 cluster2 ML1earliness
## 1 3 1 139.867
## 2 3 1 8.733
## 3 3 1 19.867
## 4 3 1 43.200
## 5 2 2 117.333
all = merge(all, ML2[, c(1, 12)], by="StudentID")
all = merge(all, ML3[, c(1, 12)], by="StudentID")
all = merge(all, ML4[, c(1, 12)], by="StudentID")
dim(all)
## [1] 97 135
all[1:5,130:135]
## Kay3 cluster2 ML1earliness Early.hr.num.x Early.hr.num.y Early.hr.num
## 1 3 1 139.867 163.900 140.3 115.9
## 2 3 1 8.733 163.900 140.3 115.9
## 3 3 1 19.867 47.367 163.0 186.3
## 4 3 1 43.200 66.950 19.2 186.7
## 5 2 2 117.333 5.467 0.5 168.6
all.names = names(all)
all.names[133] = "ML2earliness"
all.names[134] = "ML3earliness"
all.names[135] = "ML4earliness"
names(all) = all.names
all[1:5,130:135]
## Kay3 cluster2 ML1earliness ML2earliness ML3earliness ML4earliness
## 1 3 1 139.867 163.900 140.3 115.9
## 2 3 1 8.733 163.900 140.3 115.9
## 3 3 1 19.867 47.367 163.0 186.3
## 4 3 1 43.200 66.950 19.2 186.7
## 5 2 2 117.333 5.467 0.5 168.6
cor(all[,132:135])
## ML1earliness ML2earliness ML3earliness ML4earliness
## ML1earliness 1.0000 0.4466 0.4805 0.4446
## ML2earliness 0.4466 1.0000 0.5419 0.3660
## ML3earliness 0.4805 0.5419 1.0000 0.5333
## ML4earliness 0.4446 0.3660 0.5333 1.0000
add in assignment submission
ass = read.csv("Ass.csv")
dim(ass)
## [1] 220 4
str(ass)
## 'data.frame': 220 obs. of 4 variables:
## $ StudentID: Factor w/ 220 levels "s3044923","s361850",..: 144 105 56 90 197 87 213 19 65 52 ...
## $ Ass.mark : int 82 75 86 83 92 94 81 93 83 78 ...
## $ Sub.Date : Factor w/ 12 levels "12/05/2014","13/05/2014",..: 1 2 3 4 4 4 5 5 5 6 ...
## $ Sub.Time : Factor w/ 220 levels "0:07:45","0:21:45",..: 42 88 16 211 103 137 197 80 101 219 ...
clean - consent
dim(ci)
## [1] 231 2
df = merge(ass, ci, by ="StudentID")
dim(df) #drops from 231 to 230 coz uqlipitt removed
## [1] 219 5
#df[1:10,1:10]
#df[1:5,110:116]
df = subset(df, Consent == "Yes")
dim(df)
## [1] 96 5
ass = df
dim(ass)
## [1] 96 5
str(ass)
## 'data.frame': 96 obs. of 5 variables:
## $ StudentID: Factor w/ 220 levels "s3044923","s361850",..: 1 3 4 5 6 7 9 10 11 12 ...
## $ Ass.mark : int 86 75 94 82 91 83 77 88 83 87 ...
## $ Sub.Date : Factor w/ 12 levels "12/05/2014","13/05/2014",..: 8 10 8 10 10 9 7 8 8 9 ...
## $ Sub.Time : Factor w/ 220 levels "0:07:45","0:21:45",..: 125 210 149 64 21 107 81 86 190 202 ...
## $ Consent : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
clean - De-ID
## [1] 96 5
## StudentID Ass.mark Sub.Date Sub.Time Consent
## 1 S6089847 86 19/05/2014 20:49:26 Yes
## 3 S8117889 75 21/05/2014 9:35:45 Yes
## 4 S8118323 94 19/05/2014 22:45:19 Yes
## 5 S8152093 82 21/05/2014 11:54:37 Yes
## 6 S8239113 91 21/05/2014 10:17:41 Yes
merging ass into all Assignment due 12 noon 21/05/14
ass[1:5,]
## StudentID Ass.mark Sub.Date Sub.Time Consent
## 1 S6089847 86 19/05/2014 20:49:26 Yes
## 3 S8117889 75 21/05/2014 9:35:45 Yes
## 4 S8118323 94 19/05/2014 22:45:19 Yes
## 5 S8152093 82 21/05/2014 11:54:37 Yes
## 6 S8239113 91 21/05/2014 10:17:41 Yes
str(ass)
## 'data.frame': 96 obs. of 5 variables:
## $ StudentID: chr "S6089847" "S8117889" "S8118323" "S8152093" ...
## $ Ass.mark : int 86 75 94 82 91 83 77 88 83 87 ...
## $ Sub.Date : Factor w/ 12 levels "12/05/2014","13/05/2014",..: 8 10 8 10 10 9 7 8 8 9 ...
## $ Sub.Time : Factor w/ 220 levels "0:07:45","0:21:45",..: 125 210 149 64 21 107 81 86 190 202 ...
## $ Consent : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
ass$Sub.Date = as.character(ass$Sub.Date)
ass$Sub.Time = as.character(ass$Sub.Time)
str(ass)
## 'data.frame': 96 obs. of 5 variables:
## $ StudentID: chr "S6089847" "S8117889" "S8118323" "S8152093" ...
## $ Ass.mark : int 86 75 94 82 91 83 77 88 83 87 ...
## $ Sub.Date : chr "19/05/2014" "21/05/2014" "19/05/2014" "21/05/2014" ...
## $ Sub.Time : chr "20:49:26" "9:35:45" "22:45:19" "11:54:37" ...
## $ Consent : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
ass$Sub.Ass = paste(ass$Sub.Date, ass$Sub.Time)
dim(ass)
## [1] 96 6
ass[1:5,]
## StudentID Ass.mark Sub.Date Sub.Time Consent Sub.Ass
## 1 S6089847 86 19/05/2014 20:49:26 Yes 19/05/2014 20:49:26
## 3 S8117889 75 21/05/2014 9:35:45 Yes 21/05/2014 9:35:45
## 4 S8118323 94 19/05/2014 22:45:19 Yes 19/05/2014 22:45:19
## 5 S8152093 82 21/05/2014 11:54:37 Yes 21/05/2014 11:54:37
## 6 S8239113 91 21/05/2014 10:17:41 Yes 21/05/2014 10:17:41
str(ass[,6])
## chr [1:96] "19/05/2014 20:49:26" "21/05/2014 9:35:45" ...
ass$Sub.Ass = as.POSIXct(ass$Sub.Ass, format = "%d/%m/%Y %H:%M:%S")
str(ass[,6])
## POSIXct[1:96], format: "2014-05-19 20:49:26" "2014-05-21 09:35:45" ...
ass$Due = as.POSIXct("2014-05-21 12:00:00", tz="UCT")
tz(ass$Sub.Ass)
## [1] ""
ass$Sub.Ass = force_tz(ass$Sub.Ass, "UTC")
tz(ass$Sub.Ass)
## [1] "UTC"
dim(ass)
## [1] 96 7
ass[1:5,]
## StudentID Ass.mark Sub.Date Sub.Time Consent Sub.Ass
## 1 S6089847 86 19/05/2014 20:49:26 Yes 2014-05-19 20:49:26
## 3 S8117889 75 21/05/2014 9:35:45 Yes 2014-05-21 09:35:45
## 4 S8118323 94 19/05/2014 22:45:19 Yes 2014-05-19 22:45:19
## 5 S8152093 82 21/05/2014 11:54:37 Yes 2014-05-21 11:54:37
## 6 S8239113 91 21/05/2014 10:17:41 Yes 2014-05-21 10:17:41
## Due
## 1 2014-05-21 12:00:00
## 3 2014-05-21 12:00:00
## 4 2014-05-21 12:00:00
## 5 2014-05-21 12:00:00
## 6 2014-05-21 12:00:00
str(ass)
## 'data.frame': 96 obs. of 7 variables:
## $ StudentID: chr "S6089847" "S8117889" "S8118323" "S8152093" ...
## $ Ass.mark : int 86 75 94 82 91 83 77 88 83 87 ...
## $ Sub.Date : chr "19/05/2014" "21/05/2014" "19/05/2014" "21/05/2014" ...
## $ Sub.Time : chr "20:49:26" "9:35:45" "22:45:19" "11:54:37" ...
## $ Consent : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ Sub.Ass : POSIXct, format: "2014-05-19 20:49:26" "2014-05-21 09:35:45" ...
## $ Due : POSIXct, format: "2014-05-21 12:00:00" "2014-05-21 12:00:00" ...
ass$Ass.earliness = difftime(ass$Due, ass$Sub.Ass, units="hours")
str(ass)
## 'data.frame': 96 obs. of 8 variables:
## $ StudentID : chr "S6089847" "S8117889" "S8118323" "S8152093" ...
## $ Ass.mark : int 86 75 94 82 91 83 77 88 83 87 ...
## $ Sub.Date : chr "19/05/2014" "21/05/2014" "19/05/2014" "21/05/2014" ...
## $ Sub.Time : chr "20:49:26" "9:35:45" "22:45:19" "11:54:37" ...
## $ Consent : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ Sub.Ass : POSIXct, format: "2014-05-19 20:49:26" "2014-05-21 09:35:45" ...
## $ Due : POSIXct, format: "2014-05-21 12:00:00" "2014-05-21 12:00:00" ...
## $ Ass.earliness:Class 'difftime' atomic [1:96] 39.1761 2.4042 37.2447 0.0897 1.7053 ...
## .. ..- attr(*, "tzone")= chr "UCT"
## .. ..- attr(*, "units")= chr "hours"
ass[1:5,]
## StudentID Ass.mark Sub.Date Sub.Time Consent Sub.Ass
## 1 S6089847 86 19/05/2014 20:49:26 Yes 2014-05-19 20:49:26
## 3 S8117889 75 21/05/2014 9:35:45 Yes 2014-05-21 09:35:45
## 4 S8118323 94 19/05/2014 22:45:19 Yes 2014-05-19 22:45:19
## 5 S8152093 82 21/05/2014 11:54:37 Yes 2014-05-21 11:54:37
## 6 S8239113 91 21/05/2014 10:17:41 Yes 2014-05-21 10:17:41
## Due Ass.earliness
## 1 2014-05-21 12:00:00 39.17611 hours
## 3 2014-05-21 12:00:00 2.40417 hours
## 4 2014-05-21 12:00:00 37.24472 hours
## 5 2014-05-21 12:00:00 0.08972 hours
## 6 2014-05-21 12:00:00 1.70528 hours
dim(ass)
## [1] 96 8
dim(all)
## [1] 97 135
all = merge(all, ass[, c(1:2, 8)], by="StudentID")
dim(all)
## [1] 94 137
all[1:5,131:ncol(all)]
## cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1 1 8.733 163.900 140.3 115.9 86
## 2 1 139.867 163.900 140.3 115.9 86
## 3 1 19.867 47.367 163.0 186.3 75
## 4 1 43.200 66.950 19.2 186.7 94
## 5 2 117.333 5.467 0.5 168.6 82
## Ass.earliness
## 1 39.17611 hours
## 2 39.17611 hours
## 3 2.40417 hours
## 4 37.24472 hours
## 5 0.08972 hours
Academic performance as course grade (access vs performance -> AcP.csv)
AcP = read.csv("AcP.csv")
#for s4123456
df = AcP
df$StudID = substr(df$StudentID, start=2, stop=8)
df$StudID = as.numeric(df$StudID)
df$StudID = df$StudID*2+1
df$StudID = as.character(df$StudID)
df$StudID = paste("S", df$StudID, sep="")
df$StudentID = df$StudID
df$StudID = NULL
AcP = df
AcP[1:5,]
## StudentID Course.grade
## 1 S8529183 40.5
## 2 S8636687 47.2
## 3 S8624451 47.8
## 4 S8633919 51.9
## 5 S8583807 52.5
all = merge(all, AcP, by="StudentID")
dim(all)
## [1] 94 138
all[1:5,131:138]
## cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1 1 8.733 163.900 140.3 115.9 86
## 2 1 139.867 163.900 140.3 115.9 86
## 3 1 19.867 47.367 163.0 186.3 75
## 4 1 43.200 66.950 19.2 186.7 94
## 5 2 117.333 5.467 0.5 168.6 82
## Ass.earliness Course.grade
## 1 39.17611 hours 66.6
## 2 39.17611 hours 66.6
## 3 2.40417 hours 64.7
## 4 37.24472 hours 78.0
## 5 0.08972 hours 80.7
correlations (ass.early.num = hours before Assignmnet due date 12noon)
str(all[131:ncol(all)])
## 'data.frame': 94 obs. of 8 variables:
## $ cluster2 : int 1 1 1 1 2 1 1 1 1 1 ...
## $ ML1earliness : num 8.73 139.87 19.87 43.2 117.33 ...
## $ ML2earliness : num 163.9 163.9 47.37 66.95 5.47 ...
## $ ML3earliness : num 140.3 140.3 163 19.2 0.5 ...
## $ ML4earliness : num 116 116 186 187 169 ...
## $ Ass.mark : int 86 86 75 94 82 91 83 77 88 83 ...
## $ Ass.earliness:Class 'difftime' atomic [1:94] 39.1761 39.1761 2.4042 37.2447 0.0897 ...
## .. ..- attr(*, "units")= chr "hours"
## $ Course.grade : num 66.6 66.6 64.7 78 80.7 68.4 82.3 92.2 84.9 81.7 ...
all$ass.early.num = as.numeric(all$Ass.earliness)
dim(all)
## [1] 94 139
all[1:5,131:ncol(all)]
## cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1 1 8.733 163.900 140.3 115.9 86
## 2 1 139.867 163.900 140.3 115.9 86
## 3 1 19.867 47.367 163.0 186.3 75
## 4 1 43.200 66.950 19.2 186.7 94
## 5 2 117.333 5.467 0.5 168.6 82
## Ass.earliness Course.grade ass.early.num
## 1 39.17611 hours 66.6 39.17611
## 2 39.17611 hours 66.6 39.17611
## 3 2.40417 hours 64.7 2.40417
## 4 37.24472 hours 78.0 37.24472
## 5 0.08972 hours 80.7 0.08972
cor(all[c(132:136, 138, ncol(all))])
## ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## ML1earliness 1.00000 0.44635 0.49048 0.45386 0.04824
## ML2earliness 0.44635 1.00000 0.53738 0.35783 0.08964
## ML3earliness 0.49048 0.53738 1.00000 0.52623 0.06346
## ML4earliness 0.45386 0.35783 0.52623 1.00000 0.02611
## Ass.mark 0.04824 0.08964 0.06346 0.02611 1.00000
## Course.grade 0.18798 0.24793 0.18772 0.14718 0.24003
## ass.early.num 0.27721 0.16755 0.18932 0.17576 0.06852
## Course.grade ass.early.num
## ML1earliness 0.1880 0.27721
## ML2earliness 0.2479 0.16755
## ML3earliness 0.1877 0.18932
## ML4earliness 0.1472 0.17576
## Ass.mark 0.2400 0.06852
## Course.grade 1.0000 0.28559
## ass.early.num 0.2856 1.00000
cor(all[c(131:136, 138, ncol(all))])
## cluster2 ML1earliness ML2earliness ML3earliness ML4earliness
## cluster2 1.00000 0.27705 0.06319 0.23329 0.20728
## ML1earliness 0.27705 1.00000 0.44635 0.49048 0.45386
## ML2earliness 0.06319 0.44635 1.00000 0.53738 0.35783
## ML3earliness 0.23329 0.49048 0.53738 1.00000 0.52623
## ML4earliness 0.20728 0.45386 0.35783 0.52623 1.00000
## Ass.mark 0.06075 0.04824 0.08964 0.06346 0.02611
## Course.grade 0.10996 0.18798 0.24793 0.18772 0.14718
## ass.early.num -0.01725 0.27721 0.16755 0.18932 0.17576
## Ass.mark Course.grade ass.early.num
## cluster2 0.06075 0.1100 -0.01725
## ML1earliness 0.04824 0.1880 0.27721
## ML2earliness 0.08964 0.2479 0.16755
## ML3earliness 0.06346 0.1877 0.18932
## ML4earliness 0.02611 0.1472 0.17576
## Ass.mark 1.00000 0.2400 0.06852
## Course.grade 0.24003 1.0000 0.28559
## ass.early.num 0.06852 0.2856 1.00000
Organisation qual coded as categories 3 and 5
org.qual = read.csv("ML1-4qual.csv")
dim(org.qual)
## [1] 99 5
#org.qual[1:5,]
org.qual.names = c("StudentID", "Cat3", "Cat5", "Cat3or5", "Sum.Cat3and5")
names(org.qual) = org.qual.names
#for s4123456
df = org.qual
df$StudID = substr(df$StudentID, start=2, stop=8)
df$StudID = as.numeric(df$StudID)
df$StudID = df$StudID*2+1
df$StudID = as.character(df$StudID)
df$StudID = paste("S", df$StudID, sep="")
df$StudentID = df$StudID
df$StudID = NULL
org.qual = df
org.qual[1:5,]
## StudentID Cat3 Cat5 Cat3or5 Sum.Cat3and5
## 1 S8646489 4 1 2 5
## 2 S8283571 3 1 2 4
## 3 S8586369 4 0 1 4
## 4 S8641669 3 1 2 4
## 5 S8152093 2 1 2 3
all = merge(all, org.qual, by="StudentID")
dim(all)
## [1] 94 143
all[1:2,c(131:136, 138, 141:ncol(all))]
## cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1 1 8.733 163.9 140.3 115.9 86
## 2 1 139.867 163.9 140.3 115.9 86
## Course.grade Cat5 Cat3or5 Sum.Cat3and5
## 1 66.6 0 0 0
## 2 66.6 0 0 0
str(all[1:2,c(131:136, 138, 141:ncol(all))])
## 'data.frame': 2 obs. of 10 variables:
## $ cluster2 : int 1 1
## $ ML1earliness: num 8.73 139.87
## $ ML2earliness: num 164 164
## $ ML3earliness: num 140 140
## $ ML4earliness: num 116 116
## $ Ass.mark : int 86 86
## $ Course.grade: num 66.6 66.6
## $ Cat5 : int 0 0
## $ Cat3or5 : num 0 0
## $ Sum.Cat3and5: int 0 0
cor(all[c(131:136, 138, 141:ncol(all))])
## cluster2 ML1earliness ML2earliness ML3earliness ML4earliness
## cluster2 1.00000 0.27705 0.063187 0.23329 0.20728
## ML1earliness 0.27705 1.00000 0.446352 0.49048 0.45386
## ML2earliness 0.06319 0.44635 1.000000 0.53738 0.35783
## ML3earliness 0.23329 0.49048 0.537383 1.00000 0.52623
## ML4earliness 0.20728 0.45386 0.357834 0.52623 1.00000
## Ass.mark 0.06075 0.04824 0.089638 0.06346 0.02611
## Course.grade 0.10996 0.18798 0.247928 0.18772 0.14718
## Cat5 0.14579 0.07512 -0.006347 -0.02654 0.10722
## Cat3or5 0.01898 -0.07669 -0.160959 -0.11521 0.04154
## Sum.Cat3and5 0.01351 0.01142 -0.179925 -0.07817 0.01499
## Ass.mark Course.grade Cat5 Cat3or5 Sum.Cat3and5
## cluster2 0.06075 0.10996 0.145787 0.01898 0.01351
## ML1earliness 0.04824 0.18798 0.075120 -0.07669 0.01142
## ML2earliness 0.08964 0.24793 -0.006347 -0.16096 -0.17993
## ML3earliness 0.06346 0.18772 -0.026537 -0.11521 -0.07817
## ML4earliness 0.02611 0.14718 0.107219 0.04154 0.01499
## Ass.mark 1.00000 0.24003 -0.031987 -0.08725 -0.10659
## Course.grade 0.24003 1.00000 0.103260 -0.07779 -0.10408
## Cat5 -0.03199 0.10326 1.000000 0.64094 0.54055
## Cat3or5 -0.08725 -0.07779 0.640939 1.00000 0.86549
## Sum.Cat3and5 -0.10659 -0.10408 0.540547 0.86549 1.00000
all$MLearliness = (rowSums(all[,132:135]))/4
dim(all)
## [1] 94 144
all[1:5,131:ncol(all)]
## cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1 1 8.733 163.900 140.3 115.9 86
## 2 1 139.867 163.900 140.3 115.9 86
## 3 1 19.867 47.367 163.0 186.3 75
## 4 1 43.200 66.950 19.2 186.7 94
## 5 2 117.333 5.467 0.5 168.6 82
## Ass.earliness Course.grade ass.early.num Cat3 Cat5 Cat3or5 Sum.Cat3and5
## 1 39.17611 hours 66.6 39.17611 0 0 0 0
## 2 39.17611 hours 66.6 39.17611 0 0 0 0
## 3 2.40417 hours 64.7 2.40417 2 0 1 2
## 4 37.24472 hours 78.0 37.24472 0 1 1 1
## 5 0.08972 hours 80.7 0.08972 2 1 2 3
## MLearliness
## 1 107.20
## 2 139.98
## 3 104.13
## 4 79.02
## 5 72.97
cor(all[c(131:136, 138, 140:ncol(all))])
## cluster2 ML1earliness ML2earliness ML3earliness ML4earliness
## cluster2 1.00000 0.27705 0.063187 0.23329 0.20728
## ML1earliness 0.27705 1.00000 0.446352 0.49048 0.45386
## ML2earliness 0.06319 0.44635 1.000000 0.53738 0.35783
## ML3earliness 0.23329 0.49048 0.537383 1.00000 0.52623
## ML4earliness 0.20728 0.45386 0.357834 0.52623 1.00000
## Ass.mark 0.06075 0.04824 0.089638 0.06346 0.02611
## Course.grade 0.10996 0.18798 0.247928 0.18772 0.14718
## Cat3 -0.06553 -0.02848 -0.209608 -0.07777 -0.04222
## Cat5 0.14579 0.07512 -0.006347 -0.02654 0.10722
## Cat3or5 0.01898 -0.07669 -0.160959 -0.11521 0.04154
## Sum.Cat3and5 0.01351 0.01142 -0.179925 -0.07817 0.01499
## MLearliness 0.25008 0.75391 0.747977 0.82090 0.77709
## Ass.mark Course.grade Cat3 Cat5 Cat3or5
## cluster2 0.06075 0.10996 -0.06553 0.145787 0.01898
## ML1earliness 0.04824 0.18798 -0.02848 0.075120 -0.07669
## ML2earliness 0.08964 0.24793 -0.20961 -0.006347 -0.16096
## ML3earliness 0.06346 0.18772 -0.07777 -0.026537 -0.11521
## ML4earliness 0.02611 0.14718 -0.04222 0.107219 0.04154
## Ass.mark 1.00000 0.24003 -0.10838 -0.031987 -0.08725
## Course.grade 0.24003 1.00000 -0.18106 0.103260 -0.07779
## Cat3 -0.10838 -0.18106 1.00000 0.081062 0.66686
## Cat5 -0.03199 0.10326 0.08106 1.000000 0.64094
## Cat3or5 -0.08725 -0.07779 0.66686 0.640939 1.00000
## Sum.Cat3and5 -0.10659 -0.10408 0.88236 0.540547 0.86549
## MLearliness 0.07208 0.24652 -0.11484 0.050721 -0.09463
## Sum.Cat3and5 MLearliness
## cluster2 0.01351 0.25008
## ML1earliness 0.01142 0.75391
## ML2earliness -0.17993 0.74798
## ML3earliness -0.07817 0.82090
## ML4earliness 0.01499 0.77709
## Ass.mark -0.10659 0.07208
## Course.grade -0.10408 0.24652
## Cat3 0.88236 -0.11484
## Cat5 0.54055 0.05072
## Cat3or5 0.86549 -0.09463
## Sum.Cat3and5 1.00000 -0.07299
## MLearliness -0.07299 1.00000
t.tests for 2 clusters
dim(all)
## [1] 94 144
wilcox.test(MLearliness ~ cluster2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: MLearliness by cluster2
## W = 706, p-value = 0.01748
## alternative hypothesis: true location shift is not equal to 0
round(with(all, calc(mean, MLearliness, cluster2)),1)
## 1 2
## 94.2 117.3
round(with(all, calc.sem(sem, MLearliness, cluster2)),1)
## [1] 5.814
## [1] 6.657
## 1 2
## 5.8 6.7
wilcox.test(ass.early.num ~ cluster2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: ass.early.num by cluster2
## W = 998, p-value = 0.9495
## alternative hypothesis: true location shift is not equal to 0
round(with(all, calc(mean, ass.early.num, cluster2)),1)
## 1 2
## 29.9 28.1
round(with(all, calc.sem(sem, ass.early.num, cluster2)),1)
## [1] 6.842
## [1] 6.428
## 1 2
## 6.8 6.4
wilcox.test(Course.grade ~ cluster2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Course.grade by cluster2
## W = 889, p-value = 0.354
## alternative hypothesis: true location shift is not equal to 0
round(with(all, calc(mean, Course.grade, cluster2)),1)
## 1 2
## 77.8 79.8
round(with(all, calc.sem(sem, Course.grade, cluster2)),1)
## [1] 1.236
## [1] 1.227
## 1 2
## 1.2 1.2
wilcox.test(Sum.Cat3and5 ~ cluster2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Sum.Cat3and5 by cluster2
## W = 1002, p-value = 0.9701
## alternative hypothesis: true location shift is not equal to 0
round(with(all, calc(mean, Sum.Cat3and5, cluster2)),1)
## 1 2
## 1.2 1.2
round(with(all, calc.sem(sem, Sum.Cat3and5, cluster2)),1)
## [1] 0.1451
## [1] 0.1983
## 1 2
## 0.1 0.2
anovas for Cat3and5
aov.cat = aov(Course.grade ~ Sum.Cat3and5, data=all)
summary(aov.cat)
## Df Sum Sq Mean Sq F value Pr(>F)
## Sum.Cat3and5 1 79 78.8 1.01 0.32
## Residuals 92 7193 78.2
aov.cat2 = aov(MLearliness ~ Sum.Cat3and5, data=all)
summary(aov.cat2)
## Df Sum Sq Mean Sq F value Pr(>F)
## Sum.Cat3and5 1 969 969 0.49 0.48
## Residuals 92 180914 1966
aov.cat3 = aov(ass.early.num ~ Sum.Cat3and5, data=all)
summary(aov.cat3)
## Df Sum Sq Mean Sq F value Pr(>F)
## Sum.Cat3and5 1 23 23 0.01 0.92
## Residuals 92 215028 2337
using more data for clusters
distances.all = dist(all[c(2:115, 117:120, 125, 132:136, 137, 141:143)], method = "euclidean")
## Warning: NAs introduced by coercion
cluster.all = hclust(distances.all, method = "ward")
plot(cluster.all)
distances.all2 = dist(all[c(117:120, 125, 132:136, 137, 141:143)], method = "euclidean")
## Warning: NAs introduced by coercion
cluster.all2 = hclust(distances.all2, method = "ward")
plot(cluster.all2)
cluster.all2.groups = cutree(cluster.all2, k = 2)
all$cluster.all2 = cluster.all2.groups
with(all, table(cluster2, cluster.all2))
## cluster.all2
## cluster2 1 2
## 1 50 11
## 2 31 2
cluster.all.groups = cutree(cluster.all, k = 2)
all$cluster.all = cluster.all.groups
with(all, table(cluster2, cluster.all))
## cluster.all
## cluster2 1 2
## 1 50 11
## 2 31 2
with(all, table(cluster.all, cluster.all2))
## cluster.all2
## cluster.all 1 2
## 1 81 0
## 2 0 13
dim(all)
## [1] 94 146
all[1:2,115:146]
## X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1 0 1 No Yes No No
## 2 0 1 No Yes No No
## total.no total.yes total.maybe total.noinfo access pattern prevLR
## 1 3 1 0 0 21 No Yes No No No
## 2 3 1 0 0 21 No Yes No No No
## access.days Kay.pattern Kay3 cluster2 ML1earliness ML2earliness
## 1 15 4 3 1 8.733 163.9
## 2 15 4 3 1 139.867 163.9
## ML3earliness ML4earliness Ass.mark Ass.earliness Course.grade
## 1 140.3 115.9 86 39.18 hours 66.6
## 2 140.3 115.9 86 39.18 hours 66.6
## ass.early.num Cat3 Cat5 Cat3or5 Sum.Cat3and5 MLearliness cluster.all2
## 1 39.18 0 0 0 0 107.2 1
## 2 39.18 0 0 0 0 140.0 1
## cluster.all
## 1 1
## 2 1
all$total.yes.gp = ifelse(all$total.yes == 0, 2, 1)
table(all$total.yes.gp)
##
## 1 2
## 47 47
wilcox.test(access.days ~ total.yes.gp, data=all)
## Warning: cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: access.days by total.yes.gp
## W = 1660, p-value = 2.654e-05
## alternative hypothesis: true location shift is not equal to 0
#ML3 submission date
dim(MLsub)
## [1] 876 12
MLsub[1:5,]
## StudentID Date Submitted Duration MLtask Open Due
## 1 S8579275 23/03/14 20:43:42 0:09:16 ML1 19/03/14 26/03/14
## 2 S8587419 24/03/14 14:31:37 0:16:02 ML1 19/03/14 26/03/14
## 3 S8530605 24/03/14 17:06:10 47:58:06 ML1 19/03/14 26/03/14
## 4 S8636955 23/03/14 18:28:04 4:40:22 ML1 19/03/14 26/03/14
## 5 S8475915 25/03/14 12:00:45 0:15:28 ML1 19/03/14 26/03/14
## SubDT DueDT Earliness Early.hr
## 1 2014-03-23 20:43:00 2014-03-26 17:00:00 4097 mins 68.28 hours
## 2 2014-03-24 14:31:00 2014-03-26 17:00:00 3029 mins 50.48 hours
## 3 2014-03-24 17:06:00 2014-03-26 17:00:00 2874 mins 47.90 hours
## 4 2014-03-23 18:28:00 2014-03-26 17:00:00 4232 mins 70.53 hours
## 5 2014-03-25 12:00:00 2014-03-26 17:00:00 1740 mins 29.00 hours
## Early.hr.num
## 1 68.28
## 2 50.48
## 3 47.90
## 4 70.53
## 5 29.00
ML3[1:5,]
## StudentID Date Submitted Duration MLtask Open Due
## 441 S8587419 10/05/14 18:10:01 0:22:41 ML3 7/05/14 14/05/14
## 442 S8530605 12/05/14 15:09:04 0:19:05 ML3 7/05/14 14/05/14
## 443 S8636955 12/05/14 21:42:34 0:45:45 ML3 7/05/14 14/05/14
## 444 S8475915 12/05/14 11:57:22 0:21:43 ML3 7/05/14 14/05/14
## 445 S8645607 7/05/14 17:30:54 0:16:33 ML3 7/05/14 14/05/14
## SubDT DueDT Earliness Early.hr
## 441 2014-05-10 18:10:00 2014-05-14 17:00:00 5690 mins 94.83 hours
## 442 2014-05-12 15:09:00 2014-05-14 17:00:00 2991 mins 49.85 hours
## 443 2014-05-12 21:42:00 2014-05-14 17:00:00 2598 mins 43.30 hours
## 444 2014-05-12 11:57:00 2014-05-14 17:00:00 3183 mins 53.05 hours
## 445 2014-05-07 17:30:00 2014-05-14 17:00:00 10050 mins 167.50 hours
## Early.hr.num
## 441 94.83
## 442 49.85
## 443 43.30
## 444 53.05
## 445 167.50
table(ML3$Date)
##
## 1/06/14 10/04/14 10/05/14 11/04/14 11/05/14 12/04/14 12/05/14 13/04/14
## 0 0 11 0 26 0 28 0
## 13/05/14 14/04/14 14/05/14 15/04/14 16/04/14 19/03/14 2/06/14 20/03/14
## 29 0 20 0 0 0 0 0
## 21/03/14 22/03/14 23/03/14 24/03/14 25/03/14 26/03/14 27/05/14 28/05/14
## 0 0 0 0 0 0 0 0
## 29/05/14 3/06/14 30/05/14 31/05/14 4/06/14 7/05/14 8/05/14 9/04/14
## 0 0 0 0 0 44 37 0
## 9/05/14
## 24
#check submission time clusters against lecture recording clusters
all[1:2,117:ncol(all)]
## ML1.previous ML2.planMS ML3.usedMS ML4.planEOS total.no total.yes
## 1 No Yes No No 3 1
## 2 No Yes No No 3 1
## total.maybe total.noinfo access pattern prevLR access.days
## 1 0 0 21 No Yes No No No 15
## 2 0 0 21 No Yes No No No 15
## Kay.pattern Kay3 cluster2 ML1earliness ML2earliness ML3earliness
## 1 4 3 1 8.733 163.9 140.3
## 2 4 3 1 139.867 163.9 140.3
## ML4earliness Ass.mark Ass.earliness Course.grade ass.early.num Cat3 Cat5
## 1 115.9 86 39.18 hours 66.6 39.18 0 0
## 2 115.9 86 39.18 hours 66.6 39.18 0 0
## Cat3or5 Sum.Cat3and5 MLearliness cluster.all2 cluster.all total.yes.gp
## 1 0 0 107.2 1 1 1
## 2 0 0 140.0 1 1 1
with(all, table(cluster2, cluster.all))
## cluster.all
## cluster2 1 2
## 1 50 11
## 2 31 2
with(all, table(cluster2, cluster.all2))
## cluster.all2
## cluster2 1 2
## 1 50 11
## 2 31 2
with(all, tapply(access, cluster2, mean))
## 1 2
## 63.44 18.94
with(all, tapply(access, cluster.all, mean))
## 1 2
## 45.58 61.77
with(all, tapply(access, cluster.all2, mean))
## 1 2
## 45.58 61.77
with(all, tapply(access.days, cluster.all, mean))
## 1 2
## 17.05 20.08
with(all, tapply(MLearliness, cluster.all, mean))
## 1 2
## 115.36 20.93
with(all, tapply(ass.early.num, cluster.all, mean))
## 1 2
## 32.20 10.89
names(all[c(117:120, 125, 132:136, 137, 141:143)])
## [1] "ML1.previous" "ML2.planMS" "ML3.usedMS" "ML4.planEOS"
## [5] "access" "ML1earliness" "ML2earliness" "ML3earliness"
## [9] "ML4earliness" "Ass.mark" "Ass.earliness" "Cat5"
## [13] "Cat3or5" "Sum.Cat3and5"
with(all, tapply(Ass.mark, cluster.all, mean))
## 1 2
## 84.62 80.54
with(all, tapply(Course.grade, cluster.all, mean))
## 1 2
## 80.17 68.26
CONCLUSIONS:
clusters based on ML responses, lect recording access, earliness, ass mark and Cat3/5 have: huge difference in ML earliness substantial diff in ass earliness no diff in ass mark very large diff in course grade
wilcox.test(Course.grade ~ cluster.all, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Course.grade by cluster.all
## W = 931, p-value = 9.634e-06
## alternative hypothesis: true location shift is not equal to 0
checking impact of qual in cluster.all determination
with(all, tapply(Cat3or5, cluster.all, mean))
## 1 2
## 0.8272 1.0769
with(all, tapply(Sum.Cat3and5, cluster.all, mean))
## 1 2
## 1.148 1.462
addmargins(with(all, table(ML1.previous, cluster.all)))
## cluster.all
## ML1.previous 1 2 Sum
## Maybe 14 2 16
## No 44 6 50
## Yes 23 5 28
## Sum 81 13 94
with(all, table(ML2.planMS, cluster.all))
## cluster.all
## ML2.planMS 1 2
## Maybe 4 1
## No 64 9
## Yes 12 3
with(all, table(ML3.usedMS, cluster.all))
## cluster.all
## ML3.usedMS 1 2
## Maybe 4 0
## No 61 3
## Yes 14 10
with(all, table(ML4.planEOS, cluster.all))
## cluster.all
## ML4.planEOS 1 2
## Maybe 2 0
## No 60 7
## Yes 13 4
qual categories and phases
cate = read.csv("strat.csv")
#str(cate)
#for s4123456
df = cate
df$StudID = substr(df$StudentID, start=2, stop=8)
df$StudID = as.numeric(df$StudID)
df$StudID = df$StudID*2+1
df$StudID = as.character(df$StudID)
df$StudID = paste("S", df$StudID, sep="")
df$StudentID = df$StudID
df$StudID = NULL
cate = df
cate[1:5,]
## StudentID strat1 strat2 strat3 strat4 strat5 strat6 strat7 strat8 strat9
## 1 S8152093 0 0 2 0 1 1 1 0 1
## 2 S8469547 0 0 2 1 0 1 1 0 0
## 3 S8522577 0 0 1 0 2 1 1 0 0
## 4 S8533121 0 0 2 0 0 1 0 0 1
## 5 S8575195 0 0 1 1 2 1 2 0 2
## strat10 foretht perf eval phases
## 1 0 0 4 1 2
## 2 0 0 4 0 1
## 3 0 0 4 0 1
## 4 0 0 2 1 2
## 5 0 0 5 1 2
all = merge(all, cate, by="StudentID")
dim(all)
## [1] 94 161
addmargins(with(all, table(foretht, cluster.all)))
## cluster.all
## foretht 1 2 Sum
## 0 74 13 87
## 1 6 0 6
## 2 1 0 1
## Sum 81 13 94
addmargins(with(all, table(foretht, cluster2)))
## cluster2
## foretht 1 2 Sum
## 0 56 31 87
## 1 4 2 6
## 2 1 0 1
## Sum 61 33 94
addmargins(with(all, table(perf, cluster.all)))
## cluster.all
## perf 1 2 Sum
## 1 5 0 5
## 2 18 2 20
## 3 29 4 33
## 4 21 4 25
## 5 8 1 9
## 6 0 2 2
## Sum 81 13 94
addmargins(with(all, table(perf, cluster2)))
## cluster2
## perf 1 2 Sum
## 1 3 2 5
## 2 14 6 20
## 3 20 13 33
## 4 16 9 25
## 5 6 3 9
## 6 2 0 2
## Sum 61 33 94
addmargins(with(all, table(eval, cluster.all)))
## cluster.all
## eval 1 2 Sum
## 0 27 4 31
## 1 40 8 48
## 2 14 1 15
## Sum 81 13 94
addmargins(with(all, table(eval, cluster2)))
## cluster2
## eval 1 2 Sum
## 0 18 13 31
## 1 34 14 48
## 2 9 6 15
## Sum 61 33 94
addmargins(with(all, table(phases, cluster.all)))
## cluster.all
## phases 1 2 Sum
## 1 25 4 29
## 2 51 9 60
## 3 5 0 5
## Sum 81 13 94
addmargins(with(all, table(phases, cluster2)))
## cluster2
## phases 1 2 Sum
## 1 17 12 29
## 2 40 20 60
## 3 4 1 5
## Sum 61 33 94
5 cluster and 3 cluster solutions for cluster.all2 check
cluster.all2.groups.k5 = cutree(cluster.all2, k = 5)
all$cluster.all2k5 = cluster.all2.groups.k5
with(all, tapply(Course.grade, cluster.all2k5, mean))
## 1 2 3 4 5
## 79.15 80.55 77.99 83.91 68.26
cluster.all2.groups.k3 = cutree(cluster.all2, k = 3)
all$cluster.all2k3 = cluster.all2.groups.k3
with(all, tapply(Course.grade, cluster.all2k3, mean))
## 1 2 3
## 79.94 80.39 68.26
with(all, tapply(Course.grade, cluster.all2, mean))
## 1 2
## 80.17 68.26
with(all, tapply(Course.grade, cluster.all2, sem))
## [1] 0.8933
## [1] 1.812
## 1 2
## 0.8933 1.8121
wilcox.test(Course.grade ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Course.grade by cluster.all2
## W = 931, p-value = 9.634e-06
## alternative hypothesis: true location shift is not equal to 0
with(all, tapply(access, cluster.all2, mean))
## 1 2
## 45.58 61.77
with(all, tapply(access, cluster.all2, sem))
## [1] 4.046
## [1] 6.871
## 1 2
## 4.046 6.871
wilcox.test(access ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: access by cluster.all2
## W = 330, p-value = 0.03178
## alternative hypothesis: true location shift is not equal to 0
with(all, tapply(access.days, cluster.all2, mean))
## 1 2
## 17.05 20.08
with(all, tapply(access.days, cluster.all2, sem))
## [1] 0.967
## [1] 2.077
## 1 2
## 0.967 2.077
wilcox.test(access.days ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: access.days by cluster.all2
## W = 401.5, p-value = 0.1722
## alternative hypothesis: true location shift is not equal to 0
all$ML1.previous[1:5]
## [1] No No No Yes Maybe
## Levels: Maybe No Yes
#with(all, tapply(ML1.previous, cluster.all2, mean))
#with(all, tapply(ML1.previous, cluster.all2, sem))
#wilcox.test(ML1.previous ~ cluster.all2, data=all)
#not numerical
with(all, tapply(total.yes, cluster.all2, mean))
## 1 2
## 0.7654 1.6923
with(all, tapply(total.yes, cluster.all2, sem))
## [1] 0.1182
## [1] 0.3469
## 1 2
## 0.1182 0.3469
wilcox.test(total.yes ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: total.yes by cluster.all2
## W = 304, p-value = 0.008412
## alternative hypothesis: true location shift is not equal to 0
addmargins(with(all, table(ML1.previous, cluster.all2)))
## cluster.all2
## ML1.previous 1 2 Sum
## Maybe 14 2 16
## No 44 6 50
## Yes 23 5 28
## Sum 81 13 94
addmargins(with(all, table(ML2.planMS, cluster.all2)))
## cluster.all2
## ML2.planMS 1 2 Sum
## Maybe 4 1 5
## No 64 9 73
## Yes 12 3 15
## Sum 80 13 93
addmargins(with(all, table(ML3.usedMS, cluster.all2)))
## cluster.all2
## ML3.usedMS 1 2 Sum
## Maybe 4 0 4
## No 61 3 64
## Yes 14 10 24
## Sum 79 13 92
addmargins(with(all, table(ML4.planEOS, cluster.all2)))
## cluster.all2
## ML4.planEOS 1 2 Sum
## Maybe 2 0 2
## No 60 7 67
## Yes 13 4 17
## Sum 75 11 86
str(all$ML1earliness)
## num [1:94] 8.73 139.87 19.87 43.2 117.33 ...
with(all, tapply(ML1earliness, cluster.all2, mean))
## 1 2
## 107.94 32.47
with(all, tapply(ML1earliness, cluster.all2, sem))
## [1] 5.194
## [1] 7.732
## 1 2
## 5.194 7.732
wilcox.test(ML1earliness ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: ML1earliness by cluster.all2
## W = 947, p-value = 4.223e-06
## alternative hypothesis: true location shift is not equal to 0
with(all, tapply(ML2earliness, cluster.all2, mean))
## 1 2
## 101.3 16.4
with(all, tapply(ML2earliness, cluster.all2, sem))
## [1] 5.697
## [1] 4.362
## 1 2
## 5.697 4.362
wilcox.test(ML2earliness ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: ML2earliness by cluster.all2
## W = 973, p-value = 1.035e-06
## alternative hypothesis: true location shift is not equal to 0
with(all, tapply(ML3earliness, cluster.all2, mean))
## 1 2
## 118.89 19.33
with(all, tapply(ML3earliness, cluster.all2, sem))
## [1] 5.167
## [1] 5.197
## 1 2
## 5.167 5.197
wilcox.test(ML3earliness ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: ML3earliness by cluster.all2
## W = 1020, p-value = 6.678e-08
## alternative hypothesis: true location shift is not equal to 0
with(all, tapply(ML4earliness, cluster.all2, mean))
## 1 2
## 133.28 15.52
with(all, tapply(ML4earliness, cluster.all2, sem))
## [1] 5.958
## [1] 4.307
## 1 2
## 5.958 4.307
wilcox.test(ML4earliness ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: ML4earliness by cluster.all2
## W = 1018, p-value = 7.541e-08
## alternative hypothesis: true location shift is not equal to 0
with(all, tapply(MLearliness, cluster.all2, mean))
## 1 2
## 115.36 20.93
with(all, tapply(MLearliness, cluster.all2, sem))
## [1] 3.52
## [1] 3.298
## 1 2
## 3.520 3.298
wilcox.test(MLearliness ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: MLearliness by cluster.all2
## W = 1053, p-value = 8.36e-09
## alternative hypothesis: true location shift is not equal to 0
115/24
## [1] 4.792
.791667*24
## [1] 19
with(all, table(Ass.earliness, cluster.all2))
## cluster.all2
## Ass.earliness 1 2
## -143.96 0 1
## -0.0652777777777778 1 0
## 0.0736111111111111 1 0
## 0.0897222222222222 1 0
## 0.143055555555556 1 0
## 0.147222222222222 1 0
## 0.203611111111111 1 0
## 0.265277777777778 1 0
## 0.303888888888889 0 1
## 0.516111111111111 0 1
## 0.546111111111111 0 1
## 0.555277777777778 0 1
## 0.681944444444444 1 0
## 0.687777777777778 1 0
## 0.713055555555556 1 0
## 0.771111111111111 1 0
## 0.895555555555556 1 0
## 0.977222222222222 0 1
## 1.51083333333333 0 1
## 1.70527777777778 1 0
## 1.89444444444444 1 0
## 1.91222222222222 1 0
## 2.05 1 0
## 2.40416666666667 1 0
## 2.57666666666667 0 1
## 2.65722222222222 1 0
## 2.66472222222222 1 0
## 2.79722222222222 1 0
## 2.81888888888889 1 0
## 2.9925 1 0
## 3.26722222222222 1 0
## 3.47472222222222 1 0
## 4.72 1 0
## 6.31416666666667 0 1
## 9.57305555555556 1 0
## 9.71305555555556 1 0
## 10.8986111111111 1 0
## 11.6347222222222 1 0
## 11.6375 1 0
## 11.8708333333333 1 0
## 12.3691666666667 1 0
## 12.7441666666667 0 1
## 12.7788888888889 1 0
## 12.9455555555556 1 0
## 13.0108333333333 1 0
## 13.0341666666667 1 0
## 13.2358333333333 1 0
## 13.64 1 0
## 13.7522222222222 1 0
## 14.5788888888889 1 0
## 14.8811111111111 1 0
## 15.5013888888889 1 0
## 15.5069444444444 2 0
## 16.6544444444444 1 0
## 19.9622222222222 0 1
## 21.5005555555556 1 0
## 22.4719444444444 1 0
## 22.9075 0 1
## 23.2275 1 0
## 23.2405555555556 1 0
## 23.6263888888889 1 0
## 24.3558333333333 1 0
## 24.59 1 0
## 26.6661111111111 1 0
## 27.5688888888889 1 0
## 28.4233333333333 1 0
## 37.2447222222222 1 0
## 37.8005555555556 1 0
## 39.1761111111111 2 0
## 40.1525 1 0
## 40.8602777777778 1 0
## 42.7508333333333 1 0
## 43.4602777777778 1 0
## 46.4233333333333 1 0
## 47.0469444444444 1 0
## 47.6402777777778 1 0
## 51.3877777777778 1 0
## 51.4061111111111 1 0
## 71.1552777777778 1 0
## 71.3391666666667 1 0
## 73.3827777777778 1 0
## 75.9794444444444 1 0
## 84.4688888888889 1 0
## 90.9058333333333 1 0
## 119.182222222222 1 0
## 123.219722222222 1 0
## 134.078888888889 1 0
## 137.369722222222 1 0
## 146.389722222222 1 0
## 178.101388888889 1 0
## 189.9875 1 0
## 216.656111111111 0 1
#figuring out why Ass.earliness errors (time diff instead)
str(all$Ass.earliness)
## Class 'difftime' atomic [1:94] 39.1761 39.1761 2.4042 37.2447 0.0897 ...
## ..- attr(*, "units")= chr "hours"
all$Ass.earliness.num = as.numeric(all$Ass.earliness)
with(all, tapply(Ass.earliness, cluster.all2, mean))
## 1 2
## 32.20 10.89
with(all, tapply(Ass.earliness, cluster.all2, sem))
## [1] 4.693
## [1] 20.76
## 1 2
## 4.693 20.764
wilcox.test(Ass.earliness.num ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Ass.earliness.num by cluster.all2
## W = 754, p-value = 0.01291
## alternative hypothesis: true location shift is not equal to 0
with(all, tapply(Ass.earliness, cluster.all2, median))
## 1 2
## 14.881 1.511
with(all, boxplot(Ass.earliness.num ~ cluster.all2))
range(all$Ass.earliness.num)
## [1] -144.0 216.7
sort(all$Ass.earliness.num)
## [1] -143.96000 -0.06528 0.07361 0.08972 0.14306 0.14722
## [7] 0.20361 0.26528 0.30389 0.51611 0.54611 0.55528
## [13] 0.68194 0.68778 0.71306 0.77111 0.89556 0.97722
## [19] 1.51083 1.70528 1.89444 1.91222 2.05000 2.40417
## [25] 2.57667 2.65722 2.66472 2.79722 2.81889 2.99250
## [31] 3.26722 3.47472 4.72000 6.31417 9.57306 9.71306
## [37] 10.89861 11.63472 11.63750 11.87083 12.36917 12.74417
## [43] 12.77889 12.94556 13.01083 13.03417 13.23583 13.64000
## [49] 13.75222 14.57889 14.88111 15.50139 15.50694 15.50694
## [55] 16.65444 19.96222 21.50056 22.47194 22.90750 23.22750
## [61] 23.24056 23.62639 24.35583 24.59000 26.66611 27.56889
## [67] 28.42333 37.24472 37.80056 39.17611 39.17611 40.15250
## [73] 40.86028 42.75083 43.46028 46.42333 47.04694 47.64028
## [79] 51.38778 51.40611 71.15528 71.33917 73.38278 75.97944
## [85] 84.46889 90.90583 119.18222 123.21972 134.07889 137.36972
## [91] 146.38972 178.10139 189.98750 216.65611
#with(all, sort(table(cluster.all2, Ass.earliness.num)))
with(all, boxplot(log(Ass.earliness.num) ~ cluster.all2))
## Warning: NaNs produced
which.min(all$Ass.earliness.num)
## [1] 19
dim(all)
## [1] 94 164
all[19,c(143:148, 164)]
## Sum.Cat3and5 MLearliness cluster.all2 cluster.all total.yes.gp strat1
## 19 2 1.021 2 2 1 0
## Ass.earliness.num
## 19 -144
with(all, table(cluster.all2))
## cluster.all2
## 1 2
## 81 13
143.9600/24
## [1] 5.998
.998333*24
## [1] 23.96
.95999*60
## [1] 57.6
.5994*60
## [1] 35.96
#removing value for student who had an extension
#Ass.earliness.num
#19 -143.96
all[19,164] = ""
all[15:20, 160:164]
## eval phases cluster.all2k5 cluster.all2k3 Ass.earliness.num
## 15 0 1 2 1 119.182222222222
## 16 2 2 4 2 73.3827777777778
## 17 2 2 2 1 21.5005555555556
## 18 1 2 4 2 -0.0652777777777778
## 19 1 2 5 3
## 20 2 3 1 1 14.8811111111111
str(all$Ass.earliness.num)
## chr [1:94] "39.1761111111111" "39.1761111111111" ...
all$Ass.earliness.num = as.numeric(all$Ass.earliness.num)
str(all$Ass.earliness.num)
## num [1:94] 39.1761 39.1761 2.4042 37.2447 0.0897 ...
wilcox.test(Ass.earliness.num ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Ass.earliness.num by cluster.all2
## W = 673, p-value = 0.03257
## alternative hypothesis: true location shift is not equal to 0
with(all, tapply(Ass.earliness.num, cluster.all2, mean, na.rm=T))
## 1 2
## 32.2 23.8
#with(all, tapply(Ass.earliness.num, cluster.all2, sem, na.rm=T))
with(all, tapply(Ass.earliness.num, cluster.all2, sd, na.rm=T))
## 1 2
## 42.24 61.26
61.25978/sqrt(12)
## [1] 17.68
32/24
## [1] 1.333
.3*24
## [1] 7.2
#Emailed to Kay
#With that student removed:
#p-value = 0.03257
#High performing cluster: 32.20305 +/- 4.692823 hours
#Low performing cluster: 23.79752 +/- 17.68418 hours
#but actually should have just converted to th 2.5min before the new deadline for their extension
all$Ass.earliness.num[19]
## [1] NA
6*24 -143.96
## [1] 0.04
all$Ass.earliness.num[19] = 0.04
all$Ass.earliness.num[18:20]
## [1] -0.06528 0.04000 14.88111
str(all$Ass.earliness.num)
## num [1:94] 39.1761 39.1761 2.4042 37.2447 0.0897 ...
wilcox.test(Ass.earliness.num ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Ass.earliness.num by cluster.all2
## W = 753, p-value = 0.01331
## alternative hypothesis: true location shift is not equal to 0
with(all, tapply(Ass.earliness.num, cluster.all2, mean))
## 1 2
## 32.20 21.97
with(all, tapply(Ass.earliness.num, cluster.all2, sem))
## [1] 4.693
## [1] 16.37
## 1 2
## 4.693 16.369
#checking if organisation qual differs between high and low performing clusters
with(all, tapply(Cat3, cluster.all2, mean))
## 1 2
## 0.8272 1.1538
with(all, tapply(Cat3, cluster.all2, sem))
## [1] 0.1066
## [1] 0.2493
## 1 2
## 0.1066 0.2493
wilcox.test(Cat3 ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Cat3 by cluster.all2
## W = 405, p-value = 0.1565
## alternative hypothesis: true location shift is not equal to 0
with(all, tapply(Cat5, cluster.all2, mean))
## 1 2
## 0.3210 0.3077
with(all, tapply(Cat5, cluster.all2, sem))
## [1] 0.06042
## [1] 0.1332
## 1 2
## 0.06042 0.13323
wilcox.test(Cat5 ~ cluster.all2, data=all)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Cat5 by cluster.all2
## W = 520, p-value = 0.9336
## alternative hypothesis: true location shift is not equal to 0
cluster.all2 independent variable check
names(all[c(117:120, 125, 132:136, 137, 141:143)])
## [1] "ML1.previous" "ML2.planMS" "ML3.usedMS" "ML4.planEOS"
## [5] "access" "ML1earliness" "ML2earliness" "ML3earliness"
## [9] "ML4earliness" "Ass.mark" "Ass.earliness" "Cat5"
## [13] "Cat3or5" "Sum.Cat3and5"
cluster.all check - do we use timing or not?
with(all, table(cluster.all, cluster.all2))
## cluster.all2
## cluster.all 1 2
## 1 81 0
## 2 0 13
access related to academic performance
dim(all)
## [1] 94 164
names(all[120:163])
## [1] "ML4.planEOS" "total.no" "total.yes" "total.maybe"
## [5] "total.noinfo" "access" "pattern" "prevLR"
## [9] "access.days" "Kay.pattern" "Kay3" "cluster2"
## [13] "ML1earliness" "ML2earliness" "ML3earliness" "ML4earliness"
## [17] "Ass.mark" "Ass.earliness" "Course.grade" "ass.early.num"
## [21] "Cat3" "Cat5" "Cat3or5" "Sum.Cat3and5"
## [25] "MLearliness" "cluster.all2" "cluster.all" "total.yes.gp"
## [29] "strat1" "strat2" "strat3" "strat4"
## [33] "strat5" "strat6" "strat7" "strat8"
## [37] "strat9" "strat10" "foretht" "perf"
## [41] "eval" "phases" "cluster.all2k5" "cluster.all2k3"
all$total.yes.gp[1:10]
## [1] 1 1 2 1 2 1 2 2 1 1
with(all, table(total.yes, total.yes.gp))
## total.yes.gp
## total.yes 1 2
## 0 0 47
## 1 26 0
## 2 7 0
## 3 12 0
## 4 2 0
aov.acp.lr = aov(Course.grade ~ total.yes, data=all)
summary(aov.acp.lr)
## Df Sum Sq Mean Sq F value Pr(>F)
## total.yes 1 7 7.1 0.09 0.76
## Residuals 92 7265 79.0
wilcox.test(Course.grade ~ total.yes.gp, data=all)
## Warning: cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: Course.grade by total.yes.gp
## W = 962.5, p-value = 0.2846
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(access.days ~ total.yes.gp, data=all)
## Warning: cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: access.days by total.yes.gp
## W = 1660, p-value = 2.654e-05
## alternative hypothesis: true location shift is not equal to 0