Clear all objects

clean - remove name column, remove empty rows (233-678)
move total column and total row to new vectors, and remove

clean - keep only consenting students reports number of students by number of variables

## [1] 230 116

## [1]  99 116

clean - De-ID students so can push to html

basic structure of data

##    StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1   S6089847        1        1        0        0
## 3   S8117889        0        1        4        0
## 4   S8118323        2        0        0        0
## 5   S8152093        0        0        0        0
## 6   S8239113        1        1        0        0
## 7   S8283571        0        2        0        0
## 9   S8395419        1        0        0        0
## 12  S8407099        3        8        4        0
## 13  S8408815        0        0        0        0
## 14  S8465063        0        0        1        0

dimensions (rows by columns)

## [1]  99 116

Number of students who looked at lectures x number of times

clipped x axis at 100 access clicks to zoom into lower end

Converted x axis to log to spread clumped data into normal-ish curve

NB Log scale:
0 = 1
1 ~ 3
2 ~ 7
3 ~ 20
4 ~ 55
5 ~ 150
6 ~ 403

Working out viewings by day - number of times folder accessed per day (access.day), number of students who access each day (stud.day)…

Working out number of times (access.stud) and number of days (days.stud) each student accessed…

Loading ‘describe’ function to get descriptive stat’s…

Descriptive stat’s for viewings by day and by student:
Number of times lecture recording folder was accessed per day

##    min    max median   mean     SD    SEM      n    NAs    sum 
##    0.0  162.0   30.0   40.1   31.3    2.9  114.0    0.0 4570.0

Number of students who accessed lecture recordings each day

##    min    max median   mean     SD    SEM      n    NAs    sum 
##    0.0   43.0   13.0   14.9    9.9    0.9  114.0    0.0 1697.0

Number of times each student accessed the lecture recordings

##    min    max median   mean     SD    SEM      n    NAs    sum 
##    4.0  194.0   35.0   46.2   35.0    3.5   99.0    0.0 4570.0

Number of days each student accessed the lecture recordings

##    min    max median   mean     SD    SEM      n    NAs    sum 
##    3.0   41.0   16.0   17.1    8.5    0.9   99.0    0.0 1697.0

Useful conclusions: 114 days (16 weeks, 2 days) in data for 99 consenting students (cohort 231)
Large range in the number of access hits (0, 22) recorded for each student each day. Overall, the number of access hits per day is 2-3x number of students who access per day, and number of access hits per student is also 2-3x the number of days a student access the folder.
Since we don’t really know how the number of folder openings is tracked by Blackboard (could be refreshings), the number of students is probably a better way of looking at the data than number of times the folder is ‘opened’.

On average 15 +/- 1 (mean+/-SEM) students accessed each day, with a max of 43 students one day (1/5/14).

On average students accessed lecture recordings on 17 days, with a max of 41 days and a minimum of 3 days. So there were no students who didn’t access lecture recordings at all?

## days.stud
##  3  4  5  6  7  8  9 10 12 13 14 15 16 17 18 19 20 21 22 23 25 26 27 28 29 
##  1  1  3  3  6  1  5  8  5  5  5  6  1  6  6  1  5  3  4  3  3  4  1  1  1 
## 30 31 32 33 34 35 41 
##  2  1  1  1  3  2  1

So, only 1 student looked on three days… only 1 student looked on 41 days, the majority looked on 7-26 days. Or as a histogram:

Transposed data (lat) so we can get real dates…

## [1] 114 100

##         Date S8530605 S8636955 S8475915 S8645607
## 1 2014-03-04        0        1        3        2
## 2 2014-03-05        1        2        8        1
## 3 2014-03-06        0        4        0        0
## 4 2014-03-07        0        2        0        0
## 5 2014-03-08        0        2        0        0

Summed numbers of times the lecture recording folder was accessed and number of students who accessed lecture recording folder per day…

Loaded calanderHeat function…

## Loading required package: chron

Calendar of number of times lecture recording folder was accessesed each day

Calendar of number of students who accessed lecture recordings each day

Cluster analysis

Built data frame with student ID and T/F for access each day… NB created 2 data frames: la.norm 0 = not accessed, 1 = accessed; la.norm2 1 = accessed, NA = not accessed (NA = missing value), but cluster analysis errors with NA => don’t use data with missing values for cluster analysis

Use la.norm to cluster leture recording access

distances = dist(la.norm[2:115], method = "euclidean")
clusterLA = hclust(distances, method = "ward.D") 
plot(clusterLA)

clusterGroups3 = cutree(clusterLA, k = 3)
la.norm$cluster3 = clusterGroups3
dim(la.norm)

## [1]  99 116

la.norm[1:5,1:5]

##   StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1  S6089847        1        1        0        0
## 3  S8117889        0        1        1        0
## 4  S8118323        1        0        0        0
## 5  S8152093        0        0        0        0
## 6  S8239113        1        1        0        0

la.norm[1:5,110:116]

##   X20.06.14 X21.06.14 X22.06.14 X23.06.14 X24.06.14 X25.06.14 cluster3
## 1         0         0         0         0         0         0        1
## 3         1         1         1         1         1         0        2
## 4         1         0         0         0         0         0        1
## 5         0         1         0         0         0         0        3
## 6         0         0         0         0         0         0        1

## [1] "The number of students in each cluster by the the number of variables"

## [1]  26 116
## [1]  36 116
## [1]  37 116
## [1]   0 116
## [1]   0 116

So what are the characteristics of the clusters - how often do students view lecture recordings and when:

##    min    max median   mean     SD    SEM      n    NAs    sum 
##   15.0   41.0   25.5   25.7    7.4    1.5   26.0    0.0  668.0 
##    min    max median   mean     SD    SEM      n    NAs    sum 
##   12.0   30.0   18.0   19.1    5.3    0.9   36.0    0.0  687.0 
##    min    max median   mean     SD    SEM      n    NAs    sum 
##    3.0   17.0    9.0    9.2    3.5    0.6   37.0    0.0  342.0

## Warning in min(x, na.rm = T): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = T): no non-missing arguments to max; returning -
## Inf

##    min    max median   mean     SD    SEM      n    NAs    sum 
##    Inf   -Inf     NA    NaN     NA     NA      0      0      0

## Warning in min(x, na.rm = T): no non-missing arguments to min; returning
## Inf

## Warning in min(x, na.rm = T): no non-missing arguments to max; returning -
## Inf

##    min    max median   mean     SD    SEM      n    NAs    sum 
##    Inf   -Inf     NA    NaN     NA     NA      0      0      0

To see ‘when’ need to get cluster groups into lat (transposed version)

The run calendarHeat for all 3 clusters…

Load in qualitative coding “pattern of lecture recording use ML” -> “qual.csv”
clean - de-identify

clean - capitalisation, converted “no info”, “deferred” and “” to NA (ie missing)

Data structure

## [1] 99 10

##   StudentID ML1.previous ML2.planMS ML3.usedMS ML4.planEOS total.no
## 1  S6089847           No        Yes         No          No        3
## 2  S8117889           No         No         No          No        4
## 3  S8118323          Yes         No         No          No        3
## 4  S8152093        Maybe         No      Maybe          No        2
## 5  S8239113          Yes         No        Yes          No        2
##   total.yes total.maybe total.noinfo access
## 1         1           0            0     21
## 2         0           0            0     75
## 3         1           0            0    122
## 4         0           2            0     13
## 5         2           0            0     89

##  ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
##  Maybe:18     Maybe: 5   Maybe: 4   Maybe: 3   
##  No   :52     No   :77   No   :69   No   :71   
##  Yes  :27     Yes  :15   Yes  :23   Yes  :17   
##  NA's : 2     NA's : 2   NA's : 3   NA's : 8

Patterns of self-reported lecture recording use

## 
## Maybe    No   Yes 
##    18    52    27 
## [1] "ML1.previous"
## 
## Maybe    No   Yes 
##     5    77    15 
## [1] "ML2.planMS"
## 
## Maybe    No   Yes 
##     4    69    23 
## [1] "ML3.usedMS"
## 
## Maybe    No   Yes 
##     3    71    17 
## [1] "ML4.planEOS"

##             ML2.planMS
## ML1.previous Maybe No Yes Sum
##        Maybe     1 16   0  17
##        No        2 40   9  51
##        Yes       2 19   6  27
##        Sum       5 75  15  95

##           ML3.usedMS
## ML2.planMS Maybe No Yes Sum
##      Maybe     0  4   0   4
##      No        4 57  14  75
##      Yes       0  6   9  15
##      Sum       4 67  23  94

Concl:
Most students (52/99 (i.e. 53%)) report that they don’t usually use lecture recordings, even more didn’t plan to use lecture recordings for mid-semeter exam (77/99) and a similar number didn’t use lecture recordings for mid-semeter exam (69/99), and this was the same for the end of semester exam (71/99).
This seems inconsistent with the number of students who do use lecture recordings (all 99 at some point), and the majority used lecture recordings on 7-26 days, which is still half to twice the number of weeks in semester so ~ once/fortnight to twice/week.

What are the patterns of No, No, No, No etc, similar to what Kay calculated as number of No’s, Yes’, Maybe’s (tables have the number of no’s 0-4 in header row, then frequency (number of students) in 2nd row)

## 
##  0  1  2  3  4 
##  2 15 21 32 29 
## [1] "total.no"
## 
##  0  1  2  3  4 
## 52 27  7 11  2 
## [1] "total.yes"
## 
##  0  1  2 
## 72 24  3 
## [1] "total.maybe"

Most frequent patterns of repsonse:

## 
## No Yes Yes Yes  Yes No No Yes Yes Yes Yes No   No Yes No No Yes No Yes Yes 
##              3              3              3              4              4 
##   Yes No No No Maybe No No No    No No No No 
##              8              9             29

everything else was reported by 2 or less students.

So there is definitely a group of 29 students who never report using lecture recordings (LR). There are 27 students who report that they usually used LR.
Of these, 6 plan to use LR for mid-sem, 2 maybes, and 19 don’t mention LR for mid-sem prep. There are 52 (51?) students who don’t report usually using LR. Of these, 9 plan to use LR for mid-sem, 2 maybes, and 40 don’t mention LR for mid-sem prep.

How often do these groups of students use LR?

## [1]  99 116

##   StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1  S6089847        1        1        0        0
## 3  S8117889        0        1        1        0
## 4  S8118323        1        0        0        0
## 5  S8152093        0        0        0        0
## 6  S8239113        1        1        0        0

##   X20.06.14 X21.06.14 X22.06.14 X23.06.14 X24.06.14 X25.06.14 cluster3
## 1         0         0         0         0         0         0        1
## 3         1         1         1         1         1         0        2
## 4         1         0         0         0         0         0        1
## 5         0         1         0         0         0         0        3
## 6         0         0         0         0         0         0        1

## [1]  99 126

##   StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1  S6089847        1        1        0        0
## 2  S8117889        0        1        1        0
## 3  S8118323        1        0        0        0
## 4  S8152093        0        0        0        0
## 5  S8239113        1        1        0        0

##   X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1         0        1           No        Yes         No          No
## 2         0        2           No         No         No          No
## 3         0        1          Yes         No         No          No
## 4         0        3        Maybe         No      Maybe          No
## 5         0        1          Yes         No        Yes          No
##   total.no total.yes total.maybe total.noinfo access           pattern
## 1        3         1           0            0     21      No Yes No No
## 2        4         0           0            0     75       No No No No
## 3        3         1           0            0    122      Yes No No No
## 4        2         0           2            0     13 Maybe No Maybe No
## 5        2         2           0            0     89     Yes No Yes No

## [1]  99 128

##   X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1         0        1           No        Yes         No          No
## 2         0        2           No         No         No          No
## 3         0        1          Yes         No         No          No
## 4         0        3        Maybe         No      Maybe          No
## 5         0        1          Yes         No        Yes          No
##   total.no total.yes total.maybe total.noinfo access           pattern
## 1        3         1           0            0     21      No Yes No No
## 2        4         0           0            0     75       No No No No
## 3        3         1           0            0    122      Yes No No No
## 4        2         0           2            0     13 Maybe No Maybe No
## 5        2         2           0            0     89     Yes No Yes No
##   prevLR access.days
## 1     No          15
## 2     No          26
## 3    Yes          33
## 4    Yes          10
## 5    Yes          28

Statistical tests:
Wilcox (ie unpaired t test for categorical data)
Do students who report usually using LR, access more LR? First as number of folder openings, then as number of days. (order is test, mean, sem)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  access by prevLR
## W = 739, p-value = 0.00184
## alternative hypothesis: true location shift is not equal to 0

##   No  Yes 
## 37.9 56.2

## [1] 4.7961
## [1] 5.032717

##  No Yes 
## 4.8 5.0

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  access.days by prevLR
## W = 790, p-value = 0.005989
## alternative hypothesis: true location shift is not equal to 0

##       No      Yes 
## 14.96154 19.82222

## [1] 1.077664
## [1] 1.311676

##       No      Yes 
## 1.077664 1.311676

Do students who report usually using LR, fall into different clusters? (order is test, table, mean, sem)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  cluster3 by prevLR
## W = 1473.5, p-value = 0.01951
## alternative hypothesis: true location shift is not equal to 0

##       cluster3
## prevLR  1  2  3 Sum
##    No   9 19 24  52
##    Yes 16 17 12  45
##    Sum 25 36 36  97

##   No  Yes 
## 2.29 1.91

## [1] 0.1039801
## [1] 0.1181602

##   No  Yes 
## 0.10 0.12

Kay’s patterns

1 Previous and did >3 y yyyy ynyy yyny yyyn ynyn 2 previous/intended, but did not ynnn yynn ynny -> 3
3 No previous use, but then did or intended nnyy nyyy nnny -> 2 4 No previous use, intention but not nyny nynn
5 No report nnnn
0 Don’t fit?

Load “Kay.gp.index.csv” fixed for paper version of group names ie 2 and 3 swapped clean - de-identified
merge into la.norm.qual

##   X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1         0        1           No        Yes         No          No
## 2         0        2           No         No         No          No
## 3         0        1          Yes         No         No          No
## 4         0        3        Maybe         No      Maybe          No
## 5         0        1          Yes         No        Yes          No
##   total.no total.yes total.maybe total.noinfo access           pattern
## 1        3         1           0            0     21      No Yes No No
## 2        4         0           0            0     75       No No No No
## 3        3         1           0            0    122      Yes No No No
## 4        2         0           2            0     13 Maybe No Maybe No
## 5        2         2           0            0     89     Yes No Yes No
##   prevLR access.days Kay.pattern
## 1     No          15           4
## 2     No          26           5
## 3    Yes          33           3
## 4    Yes          10           0
## 5    Yes          28           1

##    
##     Maybe Maybe No No Maybe NA No No Maybe No Maybe No Maybe No NA No
##   0                 1              1                 2              1
##   1                 0              0                 0              0
##   2                 0              0                 0              0
##   3                 0              0                 0              0
##   4                 0              0                 0              0
##   5                 0              0                 0              0
##    
##     Maybe No No NA Maybe No No No Maybe No Yes NA Maybe No Yes No
##   0              1              0               0               0
##   1              0              0               1               2
##   2              0              0               0               0
##   3              0              0               0               0
##   4              0              0               0               0
##   5              0              9               0               0
##    
##     NA No No No NA No No Yes No Maybe No No No NA No No No No Maybe No
##   0           1            1              2           0              0
##   1           0            0              0           0              0
##   2           0            0              0           0              0
##   3           0            0              0           0              0
##   4           0            0              0           0              0
##   5           0            0              0           1              1
##    
##     No No Maybe Yes No No NA No No No No NA No No No No No No No Yes
##   0               0           1           0           0            0
##   1               0           0           0           0            0
##   2               1           0           0           0            1
##   3               0           0           0           0            0
##   4               0           0           0           0            0
##   5               0           0           2          29            0
##    
##     No No Yes NA No No Yes No No No Yes Yes No Yes No Maybe No Yes No No
##   0            0            0             0               0            0
##   1            0            0             0               0            0
##   2            2            2             1               0            0
##   3            0            0             0               0            0
##   4            0            0             0               1            4
##   5            0            0             0               0            0
##    
##     No Yes Yes Maybe No Yes Yes Yes Yes Maybe NA No Yes Maybe No No
##   0                0              0               1               1
##   1                0              0               0               0
##   2                1              3               0               0
##   3                0              0               0               0
##   4                0              0               0               0
##   5                0              0               0               0
##    
##     Yes No No Maybe Yes No No NA Yes No No No Yes No No Yes Yes No Yes NA
##   0               0            1            0             0             1
##   1               0            0            0             0             0
##   2               0            0            0             0             0
##   3               1            0            8             3             0
##   4               0            0            0             0             0
##   5               0            0            0             0             0
##    
##     Yes No Yes No Yes No Yes Yes Yes Yes No Yes Yes Yes Yes No
##   0             0              0              0              0
##   1             1              4              1              3
##   2             0              0              0              0
##   3             0              0              0              0
##   4             0              0              0              0
##   5             0              0              0              0
##    
##     Yes Yes Yes Yes
##   0               0
##   1               2
##   2               0
##   3               0
##   4               0
##   5               0

Alignment Kay gp with cluster3

##         Kay.pattern
## cluster3  0  1  2  3  4  5
##        1  3  5  3  5  1  9
##        2  2  8  6  4  4 12
##        3 10  1  2  3  0 21

What do the 3 clusters look like?

## 
##  1  2  3 
## 26 36 37

Alignment between 3 clusters and self report

##         ML1.previous
## cluster3 Maybe No Yes
##        1     6  9  10
##        2     5 19  12
##        3     7 24   5

##         ML2.planMS
## cluster3 Maybe No Yes
##        1     0 20   6
##        2     1 28   7
##        3     4 29   2

##         ML3.usedMS
## cluster3 Maybe No Yes
##        1     0 18   8
##        2     2 21  13
##        3     2 30   2

##         ML4.planEOS
## cluster3 Maybe No Yes
##        1     2 14   6
##        2     1 24   8
##        3     0 33   3

##         total.no
## cluster3          0          1          2          3          4        Sum
##      1   0.50000000 0.16666667 0.11904762 0.15625000 0.06896552 0.13131313
##      2   0.00000000 0.23333333 0.23809524 0.17187500 0.13793103 0.18181818
##      3   0.00000000 0.10000000 0.14285714 0.17187500 0.29310345 0.18686869
##      Sum 0.50000000 0.50000000 0.50000000 0.50000000 0.50000000 0.50000000

##         
## cluster3 FALSE TRUE Sum
##      1      12   14  26
##      2      17   19  36
##      3       9   28  37
##      Sum    38   61  99

##         total.yes
## cluster3  0  1  2  3  4
##        1 10  9  2  3  2
##        2 12 14  4  6  0
##        3 30  4  1  2  0

Kay’s rules for 3 groups 1 = 3-4 y 2 = any 2 y + 2 N, 2n+y+m, 2y+n+m 3 = 3-4n 0 = noinfo

##  [1] "3" "3" "3" ""  ""  "3" "3" "1" "3" "3" "3" ""  "1" ""  "3"

## 
##       1   2   3 Sum 
##   2  13  23  61  99

## [1]  23 130

##    ML1.previous ML2.planMS ML3.usedMS ML4.planEOS total.no total.yes
## 4         Maybe         No      Maybe          No        2         0
## 5           Yes         No        Yes          No        2         2
## 12          Yes         No        Yes        <NA>        1         2
## 14        Maybe      Maybe         No          No        2         0
## 16           No        Yes        Yes       Maybe        1         2
## 18        Maybe         No        Yes          No        2         1
## 19          Yes         No         No         Yes        2         2
## 23        Maybe         No        Yes          No        2         1
## 25        Maybe         No      Maybe          No        2         0
## 33        Maybe       <NA>         No          No        2         0
## 42           No        Yes         No       Maybe        2         1
## 45          Yes         No         No         Yes        2         2
## 46           No         No        Yes        <NA>        2         1
## 53         <NA>         No         No         Yes        2         1
## 59        Maybe         No         No        <NA>        2         0
## 60          Yes         No         No        <NA>        2         1
## 63          Yes         No         No       Maybe        2         1
## 70           No         No      Maybe         Yes        2         1
## 71          Yes      Maybe         No          No        2         1
## 74           No         No        Yes        <NA>        2         1
## 79          Yes         No         No         Yes        2         2
## 82        Maybe         No       <NA>          No        2         0
## 99           No         No        Yes         Yes        2         2

## [1] 8636869

## [1] 60

##    StudentID ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 60  S8636869          Yes         No         No        <NA>

##    ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1            No        Yes         No          No
## 2            No         No         No          No
## 3           Yes         No         No          No
## 4         Maybe         No      Maybe          No
## 5           Yes         No        Yes          No
## 6            No         No         No          No
## 7            No         No         No          No
## 8            No        Yes        Yes         Yes
## 9           Yes         No         No          No
## 10           No        Yes         No          No
## 11        Maybe         No         No          No
## 12          Yes         No        Yes        <NA>
## 13          Yes        Yes        Yes         Yes
## 14        Maybe      Maybe         No          No
## 15           No         No         No          No
## 16           No        Yes        Yes       Maybe
## 17          Yes         No         No          No
## 18        Maybe         No        Yes          No
## 19          Yes         No         No         Yes
## 20          Yes         No         No          No

## 
## FALSE  TRUE 
##   381    15

## 
## FALSE  TRUE 
##   351    30

##      cluster3
## Kay3   1  2  3 Sum
##        1  0  1   2
##   1    5  6  2  13
##   2    6 11  6  23
##   3   14 19 28  61
##   Sum 26 36 37  99

Trying a 2 cluster solution Use la.norm to cluster leture recording access

distances = dist(la.norm[2:115], method = "euclidean")
clusterLA = hclust(distances, method = "ward.D") 
plot(clusterLA)

clusterGroups2 = cutree(clusterLA, k = 2)
la.norm$cluster2 = clusterGroups2
dim(la.norm)

## [1]  99 117

la.norm[1:5,1:5]

##   StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1  S6089847        1        1        0        0
## 3  S8117889        0        1        1        0
## 4  S8118323        1        0        0        0
## 5  S8152093        0        0        0        0
## 6  S8239113        1        1        0        0

la.norm[1:5,ncol(la.norm)]

## [1] 1 1 1 2 1

la.norm[1:5,115:117]

##   X25.06.14 cluster3 cluster2
## 1         0        1        1
## 3         0        2        1
## 4         0        1        1
## 5         0        3        2
## 6         0        1        1

addmargins(with(la.norm, table(cluster3, cluster2)))

##         cluster2
## cluster3  1  2 Sum
##      1   26  0  26
##      2   36  0  36
##      3    0 37  37
##      Sum 62 37  99

moving cluster2 over to la.norm.qual

Kay.office = la.norm
df = Kay.office 
dim(df)

## [1]  99 117

df = cbind(df$StudentID, df[117])
dim(df)

## [1] 99  2

df[1:5,]

##   df$StudentID cluster2
## 1     S6089847        1
## 3     S8117889        1
## 4     S8118323        1
## 5     S8152093        2
## 6     S8239113        1

Kay.office = df
dim(Kay.office)

## [1] 99  2

Kay.office[1:5,]

##   df$StudentID cluster2
## 1     S6089847        1
## 3     S8117889        1
## 4     S8118323        1
## 5     S8152093        2
## 6     S8239113        1

names(Kay.office) = c("StudentID", "cluster2")

la.norm.qual = merge(la.norm.qual, Kay.office, by="StudentID")

dim(la.norm.qual)

## [1]  99 131

la.norm.qual[1:5,1:5]

##   StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1  S6089847        1        1        0        0
## 2  S8117889        0        1        1        0
## 3  S8118323        1        0        0        0
## 4  S8152093        0        0        0        0
## 5  S8239113        1        1        0        0

la.norm.qual[1:5,125:131]

##   access           pattern prevLR access.days Kay.pattern Kay3 cluster2
## 1     21      No Yes No No     No          15           4    3        1
## 2     75       No No No No     No          26           5    3        1
## 3    122      Yes No No No    Yes          33           3    3        1
## 4     13 Maybe No Maybe No    Yes          10           0    2        2
## 5     89     Yes No Yes No    Yes          28           1    2        1

with(la.norm.qual, table(Kay3, cluster2))

##     cluster2
## Kay3  1  2
##       1  1
##    1 11  2
##    2 17  6
##    3 33 28

addmargins(with(la.norm.qual, table(total.yes, cluster2)))

##          cluster2
## total.yes  1  2 Sum
##       0   22 30  52
##       1   23  4  27
##       2    6  1   7
##       3    9  2  11
##       4    2  0   2
##       Sum 62 37  99

with(la.norm.qual, tapply(access, cluster2, mean))

##        1        2 
## 62.50000 18.78378

with(la.norm.qual, tapply(access, cluster2, sem))

## [1] 4.276394
## [1] 2.204669

##        1        2 
## 4.276394 2.204669

with(la.norm.qual, tapply(access.days, cluster2, mean))

##         1         2 
## 21.854839  9.243243

with(la.norm.qual, tapply(access.days, cluster2, sem))

## [1] 0.89148
## [1] 0.570029

##        1        2 
## 0.891480 0.570029

la.norm.qual[1:5,115:131]

##   X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1         0        1           No        Yes         No          No
## 2         0        2           No         No         No          No
## 3         0        1          Yes         No         No          No
## 4         0        3        Maybe         No      Maybe          No
## 5         0        1          Yes         No        Yes          No
##   total.no total.yes total.maybe total.noinfo access           pattern
## 1        3         1           0            0     21      No Yes No No
## 2        4         0           0            0     75       No No No No
## 3        3         1           0            0    122      Yes No No No
## 4        2         0           2            0     13 Maybe No Maybe No
## 5        2         2           0            0     89     Yes No Yes No
##   prevLR access.days Kay.pattern Kay3 cluster2
## 1     No          15           4    3        1
## 2     No          26           5    3        1
## 3    Yes          33           3    3        1
## 4    Yes          10           0    2        2
## 5    Yes          28           1    2        1

addmargins(with(la.norm.qual, table(ML1.previous, cluster2)))

##             cluster2
## ML1.previous  1  2 Sum
##        Maybe 11  7  18
##        No    28 24  52
##        Yes   22  5  27
##        Sum   61 36  97

addmargins(with(la.norm.qual, table(ML2.planMS, cluster2)))

##           cluster2
## ML2.planMS  1  2 Sum
##      Maybe  1  4   5
##      No    48 29  77
##      Yes   13  2  15
##      Sum   62 35  97

addmargins(with(la.norm.qual, table(ML3.usedMS, cluster2)))

##           cluster2
## ML3.usedMS  1  2 Sum
##      Maybe  2  2   4
##      No    39 30  69
##      Yes   21  2  23
##      Sum   62 34  96

addmargins(with(la.norm.qual, table(ML4.planEOS, cluster2)))

##            cluster2
## ML4.planEOS  1  2 Sum
##       Maybe  3  0   3
##       No    38 33  71
##       Yes   14  3  17
##       Sum   55 36  91

addmargins(with(la.norm.qual, table(ML1.previous == "Yes", cluster2)))

##        cluster2
##          1  2 Sum
##   FALSE 39 31  70
##   TRUE  22  5  27
##   Sum   61 36  97

addmargins(with(la.norm.qual, table(ML2.planMS == "Yes", cluster2)))

##        cluster2
##          1  2 Sum
##   FALSE 49 33  82
##   TRUE  13  2  15
##   Sum   62 35  97

addmargins(with(la.norm.qual, table(ML3.usedMS == "Yes", cluster2)))

##        cluster2
##          1  2 Sum
##   FALSE 41 32  73
##   TRUE  21  2  23
##   Sum   62 34  96

addmargins(with(la.norm.qual, table(ML4.planEOS == "Yes", cluster2)))

##        cluster2
##          1  2 Sum
##   FALSE 41 33  74
##   TRUE  14  3  17
##   Sum   55 36  91

Then run calendarHeat for all 2 clusters…

## [1]  62 131

## [1]  37 131

## [1]  62 115

##   StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1  S6089847        1        1        0        0
## 2  S8117889        0        1        1        0
## 3  S8118323        1        0        0        0
## 5  S8239113        1        1        0        0
## 6  S8283571        0        1        0        0

##   X20.06.14 X21.06.14 X22.06.14 X23.06.14 X24.06.14 X25.06.14
## 1         0         0         0         0         0         0
## 2         1         1         1         1         1         0
## 3         1         0         0         0         0         0
## 5         0         0         0         0         0         0
## 6         0         0         0         0         1         0

## [1] "matrix"

## [1] 115  62

##           1          2          3          5          6         
## StudentID "S6089847" "S8117889" "S8118323" "S8239113" "S8283571"
## X4.03.14  "1"        "0"        "1"        "1"        "0"       
## X5.03.14  "1"        "1"        "0"        "1"        "1"       
## X6.03.14  "0"        "1"        "0"        "0"        "0"       
## X7.03.14  "0"        "0"        "0"        "0"        "0"

##           79         81         86         88         90        
## StudentID "S8643917" "S8644267" "S8646161" "S8646489" "S8647069"
## X4.03.14  "0"        "0"        "1"        "1"        "1"       
## X5.03.14  "1"        "0"        "1"        "1"        "1"       
## X6.03.14  "1"        "1"        "1"        "0"        "0"       
## X7.03.14  "0"        "0"        "0"        "0"        "0"       
##           95         98         99        
## StudentID "S8648397" "S8651655" "S8651793"
## X4.03.14  "0"        "0"        "1"       
## X5.03.14  "0"        "0"        "0"       
## X6.03.14  "0"        "1"        "1"       
## X7.03.14  "0"        "0"        "0"

## [1] "data.frame"

## [1] 115  62

##                  1        2        3        5        6
## StudentID S6089847 S8117889 S8118323 S8239113 S8283571
## X4.03.14         1        0        1        1        0
## X5.03.14         1        1        0        1        1
## X6.03.14         0        1        0        0        0
## X7.03.14         0        0        0        0        0

##                 79       81       86       88       90       95       98
## StudentID S8643917 S8644267 S8646161 S8646489 S8647069 S8648397 S8651655
## X4.03.14         0        0        1        1        1        0        0
## X5.03.14         1        0        1        1        1        0        0
## X6.03.14         1        1        1        0        0        0        1
## X7.03.14         0        0        0        0        0        0        0
##                 99
## StudentID S8651793
## X4.03.14         1
## X5.03.14         0
## X6.03.14         1
## X7.03.14         0

##          79 81 86 88 90 95 98 99    Dates
## X4.03.14  0  0  1  1  1  0  0  1 X4.03.14
## X5.03.14  1  0  1  1  1  0  0  0 X5.03.14
## X6.03.14  1  1  1  0  0  0  1  1 X6.03.14
## X7.03.14  0  0  0  0  0  0  0  0 X7.03.14

## [1] "data.frame"

## [1] 114  63

##          1 2 3 5 6
## X4.03.14 1 0 1 1 0
## X5.03.14 1 1 0 1 1
## X6.03.14 0 1 0 0 0
## X7.03.14 0 0 0 0 0
## X8.03.14 0 0 0 1 0

##           79 81 86 88 90 95 98 99     Dates
## X4.03.14   0  0  1  1  1  0  0  1  X4.03.14
## X5.03.14   1  0  1  1  1  0  0  0  X5.03.14
## X6.03.14   1  1  1  0  0  0  1  1  X6.03.14
## X7.03.14   0  0  0  0  0  0  0  0  X7.03.14
## X8.03.14   0  0  0  0  0  0  0  0  X8.03.14
## X9.03.14   0  0  0  0  0  0  0  0  X9.03.14
## X10.03.14  0  1  0  0  1  0  0  0 X10.03.14
## X11.03.14  0  1  0  1  0  0  0  0 X11.03.14
## X12.03.14  1  0  0  0  0  0  0  0 X12.03.14
## X13.03.14  0  0  0  0  0  0  1  0 X13.03.14
## X14.03.14  1  0  0  0  0  0  0  0 X14.03.14
## X15.03.14  0  0  0  0  0  0  0  0 X15.03.14
## X16.03.14  0  0  0  0  0  0  0  0 X16.03.14
## X17.03.14  0  0  0  0  0  0  0  0 X17.03.14
## X18.03.14  1  0  0  0  0  1  0  0 X18.03.14

##           79 81 86 88 90 95 98 99     Dates   Dates2
## X4.03.14   0  0  1  1  1  0  0  1  X4.03.14  4.03.14
## X5.03.14   1  0  1  1  1  0  0  0  X5.03.14  5.03.14
## X6.03.14   1  1  1  0  0  0  1  1  X6.03.14  6.03.14
## X7.03.14   0  0  0  0  0  0  0  0  X7.03.14  7.03.14
## X8.03.14   0  0  0  0  0  0  0  0  X8.03.14  8.03.14
## X9.03.14   0  0  0  0  0  0  0  0  X9.03.14  9.03.14
## X10.03.14  0  1  0  0  1  0  0  0 X10.03.14 10.03.14
## X11.03.14  0  1  0  1  0  0  0  0 X11.03.14 11.03.14
## X12.03.14  1  0  0  0  0  0  0  0 X12.03.14 12.03.14
## X13.03.14  0  0  0  0  0  0  1  0 X13.03.14 13.03.14
## X14.03.14  1  0  0  0  0  0  0  0 X14.03.14 14.03.14
## X15.03.14  0  0  0  0  0  0  0  0 X15.03.14 15.03.14
## X16.03.14  0  0  0  0  0  0  0  0 X16.03.14 16.03.14
## X17.03.14  0  0  0  0  0  0  0  0 X17.03.14 17.03.14
## X18.03.14  1  0  0  0  0  1  0  0 X18.03.14 18.03.14

##  chr [1:114] "4.03.14" "5.03.14" "6.03.14" "7.03.14" ...

##           79 81 86 88 90 95 98 99     Dates     Dates2
## X4.03.14   0  0  1  1  1  0  0  1  X4.03.14 2014-03-04
## X5.03.14   1  0  1  1  1  0  0  0  X5.03.14 2014-03-05
## X6.03.14   1  1  1  0  0  0  1  1  X6.03.14 2014-03-06
## X7.03.14   0  0  0  0  0  0  0  0  X7.03.14 2014-03-07
## X8.03.14   0  0  0  0  0  0  0  0  X8.03.14 2014-03-08
## X9.03.14   0  0  0  0  0  0  0  0  X9.03.14 2014-03-09
## X10.03.14  0  1  0  0  1  0  0  0 X10.03.14 2014-03-10
## X11.03.14  0  1  0  1  0  0  0  0 X11.03.14 2014-03-11
## X12.03.14  1  0  0  0  0  0  0  0 X12.03.14 2014-03-12
## X13.03.14  0  0  0  0  0  0  1  0 X13.03.14 2014-03-13
## X14.03.14  1  0  0  0  0  0  0  0 X14.03.14 2014-03-14
## X15.03.14  0  0  0  0  0  0  0  0 X15.03.14 2014-03-15
## X16.03.14  0  0  0  0  0  0  0  0 X16.03.14 2014-03-16
## X17.03.14  0  0  0  0  0  0  0  0 X17.03.14 2014-03-17
## X18.03.14  1  0  0  0  0  1  0  0 X18.03.14 2014-03-18

##           79 81 86 88 90 95 98 99     Dates     Dates2
## X4.03.14   0  0  1  1  1  0  0  1  X4.03.14 2014-03-04
## X5.03.14   1  0  1  1  1  0  0  0  X5.03.14 2014-03-05
## X6.03.14   1  1  1  0  0  0  1  1  X6.03.14 2014-03-06
## X7.03.14   0  0  0  0  0  0  0  0  X7.03.14 2014-03-07
## X8.03.14   0  0  0  0  0  0  0  0  X8.03.14 2014-03-08
## X9.03.14   0  0  0  0  0  0  0  0  X9.03.14 2014-03-09
## X10.03.14  0  1  0  0  1  0  0  0 X10.03.14 2014-03-10
## X11.03.14  0  1  0  1  0  0  0  0 X11.03.14 2014-03-11
## X12.03.14  1  0  0  0  0  0  0  0 X12.03.14 2014-03-12
## X13.03.14  0  0  0  0  0  0  1  0 X13.03.14 2014-03-13
## X14.03.14  1  0  0  0  0  0  0  0 X14.03.14 2014-03-14
## X15.03.14  0  0  0  0  0  0  0  0 X15.03.14 2014-03-15
## X16.03.14  0  0  0  0  0  0  0  0 X16.03.14 2014-03-16
## X17.03.14  0  0  0  0  0  0  0  0 X17.03.14 2014-03-17
## X18.03.14  1  0  0  0  0  1  0  0 X18.03.14 2014-03-18

## 'data.frame':    114 obs. of  5 variables:
##  $ 1: chr  "1" "1" "0" "0" ...
##  $ 2: chr  "0" "1" "1" "0" ...
##  $ 3: chr  "1" "0" "0" "0" ...
##  $ 5: chr  "1" "1" "0" "0" ...
##  $ 6: chr  "0" "1" "0" "0" ...

##           79 81 86 88 90 95 98 99     Dates     Dates2
## X4.03.14   0  0  1  1  1  0  0  1  X4.03.14 2014-03-04
## X5.03.14   1  0  1  1  1  0  0  0  X5.03.14 2014-03-05
## X6.03.14   1  1  1  0  0  0  1  1  X6.03.14 2014-03-06
## X7.03.14   0  0  0  0  0  0  0  0  X7.03.14 2014-03-07
## X8.03.14   0  0  0  0  0  0  0  0  X8.03.14 2014-03-08
## X9.03.14   0  0  0  0  0  0  0  0  X9.03.14 2014-03-09
## X10.03.14  0  1  0  0  1  0  0  0 X10.03.14 2014-03-10
## X11.03.14  0  1  0  1  0  0  0  0 X11.03.14 2014-03-11
## X12.03.14  1  0  0  0  0  0  0  0 X12.03.14 2014-03-12
## X13.03.14  0  0  0  0  0  0  1  0 X13.03.14 2014-03-13
## X14.03.14  1  0  0  0  0  0  0  0 X14.03.14 2014-03-14
## X15.03.14  0  0  0  0  0  0  0  0 X15.03.14 2014-03-15
## X16.03.14  0  0  0  0  0  0  0  0 X16.03.14 2014-03-16
## X17.03.14  0  0  0  0  0  0  0  0 X17.03.14 2014-03-17
## X18.03.14  1  0  0  0  0  1  0  0 X18.03.14 2014-03-18

## 'data.frame':    114 obs. of  5 variables:
##  $ 1: num  1 1 0 0 0 0 0 0 0 1 ...
##  $ 2: num  0 1 1 0 0 1 0 0 1 0 ...
##  $ 3: num  1 0 0 0 0 1 1 1 0 1 ...
##  $ 5: num  1 1 0 0 1 0 0 0 0 0 ...
##  $ 6: num  0 1 0 0 0 0 0 0 0 0 ...

## [1] 114  65

##           79 81 86 88 90 95 98 99     Dates     Dates2 Total
## X4.03.14   0  0  1  1  1  0  0  1  X4.03.14 2014-03-04    29
## X5.03.14   1  0  1  1  1  0  0  0  X5.03.14 2014-03-05    32
## X6.03.14   1  1  1  0  0  0  1  1  X6.03.14 2014-03-06    17
## X7.03.14   0  0  0  0  0  0  0  0  X7.03.14 2014-03-07     6
## X8.03.14   0  0  0  0  0  0  0  0  X8.03.14 2014-03-08    10
## X9.03.14   0  0  0  0  0  0  0  0  X9.03.14 2014-03-09    11
## X10.03.14  0  1  0  0  1  0  0  0 X10.03.14 2014-03-10    15
## X11.03.14  0  1  0  1  0  0  0  0 X11.03.14 2014-03-11    21
## X12.03.14  1  0  0  0  0  0  0  0 X12.03.14 2014-03-12    16
## X13.03.14  0  0  0  0  0  0  1  0 X13.03.14 2014-03-13     7
## X14.03.14  1  0  0  0  0  0  0  0 X14.03.14 2014-03-14     7
## X15.03.14  0  0  0  0  0  0  0  0 X15.03.14 2014-03-15     6
## X16.03.14  0  0  0  0  0  0  0  0 X16.03.14 2014-03-16    10
## X17.03.14  0  0  0  0  0  0  0  0 X17.03.14 2014-03-17    17
## X18.03.14  1  0  0  0  0  1  0  0 X18.03.14 2014-03-18    19

## [1] 114  39

##          94 96 97    Dates     Dates2
## X4.03.14  1  0  1 X4.03.14 2014-03-04
## X5.03.14  0  0  0 X5.03.14 2014-03-05
## X6.03.14  0  0  0 X6.03.14 2014-03-06
## X7.03.14  0  0  0 X7.03.14 2014-03-07
## X8.03.14  0  0  0 X8.03.14 2014-03-08

## [1] 114  40

##           94 96 97     Dates     Dates2 Total
## X4.03.14   1  0  1  X4.03.14 2014-03-04    10
## X5.03.14   0  0  0  X5.03.14 2014-03-05     8
## X6.03.14   0  0  0  X6.03.14 2014-03-06     2
## X7.03.14   0  0  0  X7.03.14 2014-03-07     2
## X8.03.14   0  0  0  X8.03.14 2014-03-08     1
## X9.03.14   0  0  0  X9.03.14 2014-03-09     2
## X10.03.14  0  0  0 X10.03.14 2014-03-10     3
## X11.03.14  0  0  0 X11.03.14 2014-03-11     3
## X12.03.14  1  0  0 X12.03.14 2014-03-12     9
## X13.03.14  0  0  1 X13.03.14 2014-03-13     3
## X14.03.14  0  0  0 X14.03.14 2014-03-14     1
## X15.03.14  0  0  0 X15.03.14 2014-03-15     0
## X16.03.14  0  0  0 X16.03.14 2014-03-16     1
## X17.03.14  0  0  0 X17.03.14 2014-03-17     4
## X18.03.14  0  0  0 X18.03.14 2014-03-18     7

Kay’s email Wed 13 Aug 2014
Low 37 18.8+2.2 9.2+0.57 Meta-learning response (did and/or intended to access) n mean
yes 47 66+5.6
0 yes 52 28.23+2.5**

dim(la.norm.qual)

## [1]  99 131

with(la.norm.qual, tapply(access, total.yes == 0, mean))

##    FALSE     TRUE 
## 66.00000 28.23077

with(la.norm.qual, tapply(access, total.yes == 0, sem))

## [1] 5.590563
## [1] 2.54175

##    FALSE     TRUE 
## 5.590563 2.541750

with(la.norm.qual, tapply(access.days, total.yes == 0, mean))

##    FALSE     TRUE 
## 21.29787 13.38462

with(la.norm.qual, tapply(access.days, total.yes == 0, sem))

## [1] 1.272681
## [1] 0.8848285

##     FALSE      TRUE 
## 1.2726813 0.8848285

with(la.norm.qual, table(total.yes == 0, cluster2))

##        cluster2
##          1  2
##   FALSE 40  7
##   TRUE  22 30

Correlations

Days before for ML1-4 and Ass plus AcP for course (and Ass), plus qual categories 3 and 5 (up to 4 types of each so ordinal data) using MLsub that has Ml1-4 submission and due dates and time differences (should be loaded into global - if not then some indications of code in markup v1)
clean - consent

dim(ci)

## [1] 231   2

MLsub = NULL
MLsub = read.csv("MLsub.csv")
dim(MLsub)

## [1] 876  11

MLsub[,11] = NULL
MLsub[,1] = NULL

str(MLsub)

## 'data.frame':    876 obs. of  9 variables:
##  $ StudentID: Factor w/ 230 levels "s3044923","s361850",..: 56 89 31 148 19 198 136 206 137 68 ...
##  $ Date     : Factor w/ 33 levels "1/06/14","10/04/14",..: 19 20 20 19 21 18 14 16 22 16 ...
##  $ Submitted: Factor w/ 866 levels "0:01:00","0:03:09",..: 586 304 427 490 144 244 571 260 218 114 ...
##  $ Duration : Factor w/ 785 levels "","0:00:42","0:01:07",..: 73 220 718 697 207 684 541 280 611 435 ...
##  $ MLtask   : Factor w/ 4 levels "ML1","ML2","ML3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Open     : Factor w/ 4 levels "19/03/14","27/05/14",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Due      : Factor w/ 4 levels "14/05/14","16/04/14",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ SubDT    : Factor w/ 850 levels "1/06/14 0:28",..: 476 491 494 468 510 441 338 386 543 378 ...
##  $ DueDT    : Factor w/ 4 levels "14/05/14 17:00",..: 3 3 3 3 3 3 3 3 3 3 ...

MLsub$SubDT = as.character(MLsub$SubDT)
MLsub$DueDT = as.character(MLsub$DueDT)

str(MLsub)

## 'data.frame':    876 obs. of  9 variables:
##  $ StudentID: Factor w/ 230 levels "s3044923","s361850",..: 56 89 31 148 19 198 136 206 137 68 ...
##  $ Date     : Factor w/ 33 levels "1/06/14","10/04/14",..: 19 20 20 19 21 18 14 16 22 16 ...
##  $ Submitted: Factor w/ 866 levels "0:01:00","0:03:09",..: 586 304 427 490 144 244 571 260 218 114 ...
##  $ Duration : Factor w/ 785 levels "","0:00:42","0:01:07",..: 73 220 718 697 207 684 541 280 611 435 ...
##  $ MLtask   : Factor w/ 4 levels "ML1","ML2","ML3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Open     : Factor w/ 4 levels "19/03/14","27/05/14",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Due      : Factor w/ 4 levels "14/05/14","16/04/14",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ SubDT    : chr  "23/03/14 20:43" "24/03/14 14:31" "24/03/14 17:06" "23/03/14 18:28" ...
##  $ DueDT    : chr  "26/03/14 17:00" "26/03/14 17:00" "26/03/14 17:00" "26/03/14 17:00" ...

#MLsub[1:3,]

#dtp = as.POSIXct(dt, format = "%d/%m/%Y %H:%M:%S", tz="UTC")
MLsub$SubDT = as.POSIXct(MLsub$SubDT, "%d/%m/%y %H:%M", tz="UTC")
MLsub$DueDT = as.POSIXct(MLsub$DueDT, "%d/%m/%y %H:%M", tz="UTC")
#MLsub[1:3,]

MLsub$Earliness = difftime(MLsub$DueDT, MLsub$SubDT)
#MLsub[1:3,]

#MLsub[1:5,1:5]
MLsub.names = names(MLsub)
MLsub.names

##  [1] "StudentID" "Date"      "Submitted" "Duration"  "MLtask"   
##  [6] "Open"      "Due"       "SubDT"     "DueDT"     "Earliness"

MLsub.names[1] = "StudentID"
names(MLsub) = MLsub.names
#MLsub[1:5,1:5]

clean - De-ID

## [1] 876  10

##   StudentID     Date Submitted Duration MLtask     Open      Due
## 1  S8579275 23/03/14  20:43:42  0:09:16    ML1 19/03/14 26/03/14
## 2  S8587419 24/03/14  14:31:37  0:16:02    ML1 19/03/14 26/03/14
## 3  S8530605 24/03/14  17:06:10 47:58:06    ML1 19/03/14 26/03/14
## 4  S8636955 23/03/14  18:28:04  4:40:22    ML1 19/03/14 26/03/14
## 5  S8475915 25/03/14  12:00:45  0:15:28    ML1 19/03/14 26/03/14
##                 SubDT               DueDT Earliness
## 1 2014-03-23 20:43:00 2014-03-26 17:00:00 4097 mins
## 2 2014-03-24 14:31:00 2014-03-26 17:00:00 3029 mins
## 3 2014-03-24 17:06:00 2014-03-26 17:00:00 2874 mins
## 4 2014-03-23 18:28:00 2014-03-26 17:00:00 4232 mins
## 5 2014-03-25 12:00:00 2014-03-26 17:00:00 1740 mins

str(MLsub)

## 'data.frame':    876 obs. of  10 variables:
##  $ StudentID: chr  "S8579275" "S8587419" "S8530605" "S8636955" ...
##  $ Date     : Factor w/ 33 levels "1/06/14","10/04/14",..: 19 20 20 19 21 18 14 16 22 16 ...
##  $ Submitted: Factor w/ 866 levels "0:01:00","0:03:09",..: 586 304 427 490 144 244 571 260 218 114 ...
##  $ Duration : Factor w/ 785 levels "","0:00:42","0:01:07",..: 73 220 718 697 207 684 541 280 611 435 ...
##  $ MLtask   : Factor w/ 4 levels "ML1","ML2","ML3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Open     : Factor w/ 4 levels "19/03/14","27/05/14",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Due      : Factor w/ 4 levels "14/05/14","16/04/14",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ SubDT    : POSIXct, format: "2014-03-23 20:43:00" "2014-03-24 14:31:00" ...
##  $ DueDT    : POSIXct, format: "2014-03-26 17:00:00" "2014-03-26 17:00:00" ...
##  $ Earliness:Class 'difftime'  atomic [1:876] 4097 3029 2874 4232 1740 ...
##   .. ..- attr(*, "units")= chr "mins"

dim(MLsub)

## [1] 876  10

#install.packages("lubridate")
require(lubridate)

## Loading required package: lubridate

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:chron':
## 
##     days, hours, minutes, seconds, years

## The following object is masked from 'package:base':
## 
##     date

mean(MLsub$Earliness)

## Time difference of 5746.612 mins

mean(difftime(MLsub$DueDT, MLsub$SubDT, units = "hours"))

## Time difference of 95.77686 hours

sem(difftime(MLsub$DueDT, MLsub$SubDT, units = "hours"))

## [1] 2.028385

mean(difftime(MLsub$DueDT, MLsub$SubDT, units = "days"))

## Time difference of 3.990703 days

sem(difftime(MLsub$DueDT, MLsub$SubDT, units = "days"))

## [1] 0.08451603

MLsub[1:5,1:3]

##   StudentID     Date Submitted
## 1  S8579275 23/03/14  20:43:42
## 2  S8587419 24/03/14  14:31:37
## 3  S8530605 24/03/14  17:06:10
## 4  S8636955 23/03/14  18:28:04
## 5  S8475915 25/03/14  12:00:45

correlations within ML submission to check
want ML1 vs 2 vs 3 vs 4 for Earliness
so ML 1…4 need to be columsn where StudID needs to be rows -> too hard to transform data, try boxplot for consistency instead

str(MLsub)

## 'data.frame':    876 obs. of  10 variables:
##  $ StudentID: chr  "S8579275" "S8587419" "S8530605" "S8636955" ...
##  $ Date     : Factor w/ 33 levels "1/06/14","10/04/14",..: 19 20 20 19 21 18 14 16 22 16 ...
##  $ Submitted: Factor w/ 866 levels "0:01:00","0:03:09",..: 586 304 427 490 144 244 571 260 218 114 ...
##  $ Duration : Factor w/ 785 levels "","0:00:42","0:01:07",..: 73 220 718 697 207 684 541 280 611 435 ...
##  $ MLtask   : Factor w/ 4 levels "ML1","ML2","ML3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Open     : Factor w/ 4 levels "19/03/14","27/05/14",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Due      : Factor w/ 4 levels "14/05/14","16/04/14",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ SubDT    : POSIXct, format: "2014-03-23 20:43:00" "2014-03-24 14:31:00" ...
##  $ DueDT    : POSIXct, format: "2014-03-26 17:00:00" "2014-03-26 17:00:00" ...
##  $ Earliness:Class 'difftime'  atomic [1:876] 4097 3029 2874 4232 1740 ...
##   .. ..- attr(*, "units")= chr "mins"

MLsub$Early.hr = difftime(MLsub$DueDT, MLsub$SubDT, units = "hours")
MLsub$Early.hr[1:5]

## Time differences in hours
## [1] 68.28333 50.48333 47.90000 70.53333 29.00000

MLsub$Early.hr.num = as.numeric(MLsub$Early.hr)

boxplot(Early.hr.num ~ MLtask, data=MLsub)

#check if there is a difference in earliness between ML tasks...

aov.out = NULL
aov.out = aov(Early.hr.num ~ MLtask * StudentID + Error(StudentID), data=MLsub)
summary(aov.out)

## 
## Error: StudentID
##            Df  Sum Sq Mean Sq
## MLtask      3    3420    1140
## StudentID 226 1885492    8343
## 
## Error: Within
##                   Df  Sum Sq Mean Sq F value Pr(>F)  
## MLtask             3   88890   29630  10.786 0.0127 *
## MLtask:StudentID 638 1162107    1821   0.663 0.8149  
## Residuals          5   13736    2747                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#now sig difference...
#install.packages("car")
library(car)

with(MLsub, pairwise.t.test(Early.hr.num, MLtask, p.adjust.method = "bonferroni"))

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  Early.hr.num and MLtask 
## 
##     ML1   ML2     ML3  
## ML2 0.241 -       -    
## ML3 1.000 0.148   -    
## ML4 0.027 8.5e-06 0.053
## 
## P value adjustment method: bonferroni

with(MLsub, tapply(Early.hr.num, MLtask, mean))

##       ML1       ML2       ML3       ML4 
##  94.34444  82.71682  95.53326 110.44762

with(MLsub, tapply(Early.hr.num, MLtask, sem))

## [1] 3.686183
## [1] 3.857621
## [1] 3.887297
## [1] 4.569826

##      ML1      ML2      ML3      ML4 
## 3.686183 3.857621 3.887297 4.569826

kw = kruskal.test(Early.hr.num ~ MLtask, data=MLsub)
kw

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Early.hr.num by MLtask
## Kruskal-Wallis chi-squared = 28.324, df = 3, p-value = 3.106e-06

#Testing For Homogeneity of Variance
bartlett.test(Early.hr.num ~ MLtask, data=MLsub)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Early.hr.num by MLtask
## Bartlett's K-squared = 10.966, df = 3, p-value = 0.01191

MLsub$Early.hr.log = log(MLsub$Early.hr.num)

bartlett.test(Early.hr.log ~ MLtask, data=MLsub)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Early.hr.log by MLtask
## Bartlett's K-squared = 15.65, df = 3, p-value = 0.001338

ML1 = subset(MLsub, MLtask =="ML1")
ML2 = subset(MLsub, MLtask =="ML2")
ML3 = subset(MLsub, MLtask =="ML3")
ML4 = subset(MLsub, MLtask =="ML4")

m <- list(ML1, ML2, ML3, ML4)
for (i in 1:4)
  hist(m[[i]][,12])

#install.packages("ggplot2")
require(ggplot2)

df = MLsub
#install.packages("devtools")
library(devtools)
#install_github("easyGgplot2", "kassambara") #got warning, maybe need install_github("kassambara/easyGgplot2")
library(easyGgplot2)
ggplot2.histogram(data=df, xName='Early.hr.num', groupName='MLtask', alpha=0.5, xtitle="Before deadline (hr)")

add MLsub to la.norm.qual

dim(la.norm.qual)

## [1]  99 131

la.norm.qual[1:5,125:131]

##   access           pattern prevLR access.days Kay.pattern Kay3 cluster2
## 1     21      No Yes No No     No          15           4    3        1
## 2     75       No No No No     No          26           5    3        1
## 3    122      Yes No No No    Yes          33           3    3        1
## 4     13 Maybe No Maybe No    Yes          10           0    2        2
## 5     89     Yes No Yes No    Yes          28           1    2        1

all = NULL
dim(ML1)

## [1] 225  13

ML1[1:5,]

##   StudentID     Date Submitted Duration MLtask     Open      Due
## 1  S8579275 23/03/14  20:43:42  0:09:16    ML1 19/03/14 26/03/14
## 2  S8587419 24/03/14  14:31:37  0:16:02    ML1 19/03/14 26/03/14
## 3  S8530605 24/03/14  17:06:10 47:58:06    ML1 19/03/14 26/03/14
## 4  S8636955 23/03/14  18:28:04  4:40:22    ML1 19/03/14 26/03/14
## 5  S8475915 25/03/14  12:00:45  0:15:28    ML1 19/03/14 26/03/14
##                 SubDT               DueDT Earliness       Early.hr
## 1 2014-03-23 20:43:00 2014-03-26 17:00:00 4097 mins 68.28333 hours
## 2 2014-03-24 14:31:00 2014-03-26 17:00:00 3029 mins 50.48333 hours
## 3 2014-03-24 17:06:00 2014-03-26 17:00:00 2874 mins 47.90000 hours
## 4 2014-03-23 18:28:00 2014-03-26 17:00:00 4232 mins 70.53333 hours
## 5 2014-03-25 12:00:00 2014-03-26 17:00:00 1740 mins 29.00000 hours
##   Early.hr.num Early.hr.log
## 1     68.28333     4.223666
## 2     50.48333     3.921643
## 3     47.90000     3.869116
## 4     70.53333     4.256085
## 5     29.00000     3.367296

all = merge(la.norm.qual, ML1[, c(1, 12)], by="StudentID")
dim(all)

## [1]  99 132

all[1:5, 125:132]

##   access           pattern prevLR access.days Kay.pattern Kay3 cluster2
## 1     21      No Yes No No     No          15           4    3        1
## 2     21      No Yes No No     No          15           4    3        1
## 3     75       No No No No     No          26           5    3        1
## 4    122      Yes No No No    Yes          33           3    3        1
## 5     13 Maybe No Maybe No    Yes          10           0    2        2
##   Early.hr.num
## 1   139.866667
## 2     8.733333
## 3    19.866667
## 4    43.200000
## 5   117.333333

all.names = names(all)
all.names[132] = "ML1earliness"
names(all) = all.names
all[1:5,130:132]

##   Kay3 cluster2 ML1earliness
## 1    3        1   139.866667
## 2    3        1     8.733333
## 3    3        1    19.866667
## 4    3        1    43.200000
## 5    2        2   117.333333

all = merge(all, ML2[, c(1, 12)], by="StudentID")
all = merge(all, ML3[, c(1, 12)], by="StudentID")
all = merge(all, ML4[, c(1, 12)], by="StudentID")

dim(all)

## [1]  97 135

all[1:5,130:135]

##   Kay3 cluster2 ML1earliness Early.hr.num.x Early.hr.num.y Early.hr.num
## 1    3        1   139.866667     163.900000       140.2667     115.9000
## 2    3        1     8.733333     163.900000       140.2667     115.9000
## 3    3        1    19.866667      47.366667       162.9833     186.3167
## 4    3        1    43.200000      66.950000        19.2000     186.7333
## 5    2        2   117.333333       5.466667         0.5000     168.6000

all.names = names(all)
all.names[133] = "ML2earliness"
all.names[134] = "ML3earliness"
all.names[135] = "ML4earliness"
names(all) = all.names
all[1:5,130:135]

##   Kay3 cluster2 ML1earliness ML2earliness ML3earliness ML4earliness
## 1    3        1   139.866667   163.900000     140.2667     115.9000
## 2    3        1     8.733333   163.900000     140.2667     115.9000
## 3    3        1    19.866667    47.366667     162.9833     186.3167
## 4    3        1    43.200000    66.950000      19.2000     186.7333
## 5    2        2   117.333333     5.466667       0.5000     168.6000

cor(all[,132:135])

##              ML1earliness ML2earliness ML3earliness ML4earliness
## ML1earliness    1.0000000    0.4466270    0.4804985    0.4446415
## ML2earliness    0.4466270    1.0000000    0.5419437    0.3660182
## ML3earliness    0.4804985    0.5419437    1.0000000    0.5333472
## ML4earliness    0.4446415    0.3660182    0.5333472    1.0000000

add in assignment submission

ass = read.csv("Ass.csv")
dim(ass)

## [1] 220   4

str(ass)

## 'data.frame':    220 obs. of  4 variables:
##  $ StudentID: Factor w/ 220 levels "s3044923","s361850",..: 144 105 56 90 197 87 213 19 65 52 ...
##  $ Ass.mark : int  82 75 86 83 92 94 81 93 83 78 ...
##  $ Sub.Date : Factor w/ 12 levels "12/05/2014","13/05/2014",..: 1 2 3 4 4 4 5 5 5 6 ...
##  $ Sub.Time : Factor w/ 220 levels "0:07:45","0:21:45",..: 42 88 16 211 103 137 197 80 101 219 ...

clean - consent

dim(ci)

## [1] 231   2

df = merge(ass, ci, by ="StudentID")
dim(df) #drops from 231 to 230 coz uqlipitt removed

## [1] 219   5

#df[1:10,1:10]
#df[1:5,110:116]
df = subset(df, Consent == "Yes")
dim(df)

## [1] 96  5

ass = df
dim(ass)

## [1] 96  5

str(ass)

## 'data.frame':    96 obs. of  5 variables:
##  $ StudentID: Factor w/ 220 levels "s3044923","s361850",..: 1 3 4 5 6 7 9 10 11 12 ...
##  $ Ass.mark : int  86 75 94 82 91 83 77 88 83 87 ...
##  $ Sub.Date : Factor w/ 12 levels "12/05/2014","13/05/2014",..: 8 10 8 10 10 9 7 8 8 9 ...
##  $ Sub.Time : Factor w/ 220 levels "0:07:45","0:21:45",..: 125 210 149 64 21 107 81 86 190 202 ...
##  $ Consent  : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...

clean - De-ID

## [1] 96  5

##   StudentID Ass.mark   Sub.Date Sub.Time Consent
## 1  S6089847       86 19/05/2014 20:49:26     Yes
## 3  S8117889       75 21/05/2014  9:35:45     Yes
## 4  S8118323       94 19/05/2014 22:45:19     Yes
## 5  S8152093       82 21/05/2014 11:54:37     Yes
## 6  S8239113       91 21/05/2014 10:17:41     Yes

merging ass into all Assignment due 12 noon 21/05/14

ass[1:5,]

##   StudentID Ass.mark   Sub.Date Sub.Time Consent
## 1  S6089847       86 19/05/2014 20:49:26     Yes
## 3  S8117889       75 21/05/2014  9:35:45     Yes
## 4  S8118323       94 19/05/2014 22:45:19     Yes
## 5  S8152093       82 21/05/2014 11:54:37     Yes
## 6  S8239113       91 21/05/2014 10:17:41     Yes

str(ass)

## 'data.frame':    96 obs. of  5 variables:
##  $ StudentID: chr  "S6089847" "S8117889" "S8118323" "S8152093" ...
##  $ Ass.mark : int  86 75 94 82 91 83 77 88 83 87 ...
##  $ Sub.Date : Factor w/ 12 levels "12/05/2014","13/05/2014",..: 8 10 8 10 10 9 7 8 8 9 ...
##  $ Sub.Time : Factor w/ 220 levels "0:07:45","0:21:45",..: 125 210 149 64 21 107 81 86 190 202 ...
##  $ Consent  : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...

ass$Sub.Date = as.character(ass$Sub.Date)
ass$Sub.Time = as.character(ass$Sub.Time)
str(ass)

## 'data.frame':    96 obs. of  5 variables:
##  $ StudentID: chr  "S6089847" "S8117889" "S8118323" "S8152093" ...
##  $ Ass.mark : int  86 75 94 82 91 83 77 88 83 87 ...
##  $ Sub.Date : chr  "19/05/2014" "21/05/2014" "19/05/2014" "21/05/2014" ...
##  $ Sub.Time : chr  "20:49:26" "9:35:45" "22:45:19" "11:54:37" ...
##  $ Consent  : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...

ass$Sub.Ass = paste(ass$Sub.Date, ass$Sub.Time)
dim(ass)

## [1] 96  6

ass[1:5,]

##   StudentID Ass.mark   Sub.Date Sub.Time Consent             Sub.Ass
## 1  S6089847       86 19/05/2014 20:49:26     Yes 19/05/2014 20:49:26
## 3  S8117889       75 21/05/2014  9:35:45     Yes  21/05/2014 9:35:45
## 4  S8118323       94 19/05/2014 22:45:19     Yes 19/05/2014 22:45:19
## 5  S8152093       82 21/05/2014 11:54:37     Yes 21/05/2014 11:54:37
## 6  S8239113       91 21/05/2014 10:17:41     Yes 21/05/2014 10:17:41

str(ass[,6])

##  chr [1:96] "19/05/2014 20:49:26" "21/05/2014 9:35:45" ...

ass$Sub.Ass = as.POSIXct(ass$Sub.Ass, format = "%d/%m/%Y %H:%M:%S")
str(ass[,6])

##  POSIXct[1:96], format: "2014-05-19 20:49:26" "2014-05-21 09:35:45" ...

ass$Due = as.POSIXct("2014-05-21 12:00:00", tz="UCT")

tz(ass$Sub.Ass)

## [1] ""

ass$Sub.Ass = force_tz(ass$Sub.Ass, "UTC")
tz(ass$Sub.Ass)

## [1] "UTC"

dim(ass)

## [1] 96  7

ass[1:5,]

##   StudentID Ass.mark   Sub.Date Sub.Time Consent             Sub.Ass
## 1  S6089847       86 19/05/2014 20:49:26     Yes 2014-05-19 20:49:26
## 3  S8117889       75 21/05/2014  9:35:45     Yes 2014-05-21 09:35:45
## 4  S8118323       94 19/05/2014 22:45:19     Yes 2014-05-19 22:45:19
## 5  S8152093       82 21/05/2014 11:54:37     Yes 2014-05-21 11:54:37
## 6  S8239113       91 21/05/2014 10:17:41     Yes 2014-05-21 10:17:41
##                   Due
## 1 2014-05-21 12:00:00
## 3 2014-05-21 12:00:00
## 4 2014-05-21 12:00:00
## 5 2014-05-21 12:00:00
## 6 2014-05-21 12:00:00

str(ass)

## 'data.frame':    96 obs. of  7 variables:
##  $ StudentID: chr  "S6089847" "S8117889" "S8118323" "S8152093" ...
##  $ Ass.mark : int  86 75 94 82 91 83 77 88 83 87 ...
##  $ Sub.Date : chr  "19/05/2014" "21/05/2014" "19/05/2014" "21/05/2014" ...
##  $ Sub.Time : chr  "20:49:26" "9:35:45" "22:45:19" "11:54:37" ...
##  $ Consent  : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Sub.Ass  : POSIXct, format: "2014-05-19 20:49:26" "2014-05-21 09:35:45" ...
##  $ Due      : POSIXct, format: "2014-05-21 12:00:00" "2014-05-21 12:00:00" ...

ass$Ass.earliness = difftime(ass$Due, ass$Sub.Ass, units="hours")
str(ass)

## 'data.frame':    96 obs. of  8 variables:
##  $ StudentID    : chr  "S6089847" "S8117889" "S8118323" "S8152093" ...
##  $ Ass.mark     : int  86 75 94 82 91 83 77 88 83 87 ...
##  $ Sub.Date     : chr  "19/05/2014" "21/05/2014" "19/05/2014" "21/05/2014" ...
##  $ Sub.Time     : chr  "20:49:26" "9:35:45" "22:45:19" "11:54:37" ...
##  $ Consent      : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Sub.Ass      : POSIXct, format: "2014-05-19 20:49:26" "2014-05-21 09:35:45" ...
##  $ Due          : POSIXct, format: "2014-05-21 12:00:00" "2014-05-21 12:00:00" ...
##  $ Ass.earliness:Class 'difftime'  atomic [1:96] 39.1761 2.4042 37.2447 0.0897 1.7053 ...
##   .. ..- attr(*, "units")= chr "hours"

ass[1:5,]

##   StudentID Ass.mark   Sub.Date Sub.Time Consent             Sub.Ass
## 1  S6089847       86 19/05/2014 20:49:26     Yes 2014-05-19 20:49:26
## 3  S8117889       75 21/05/2014  9:35:45     Yes 2014-05-21 09:35:45
## 4  S8118323       94 19/05/2014 22:45:19     Yes 2014-05-19 22:45:19
## 5  S8152093       82 21/05/2014 11:54:37     Yes 2014-05-21 11:54:37
## 6  S8239113       91 21/05/2014 10:17:41     Yes 2014-05-21 10:17:41
##                   Due     Ass.earliness
## 1 2014-05-21 12:00:00 39.17611111 hours
## 3 2014-05-21 12:00:00  2.40416667 hours
## 4 2014-05-21 12:00:00 37.24472222 hours
## 5 2014-05-21 12:00:00  0.08972222 hours
## 6 2014-05-21 12:00:00  1.70527778 hours

dim(ass)

## [1] 96  8

dim(all)

## [1]  97 135

all = merge(all, ass[, c(1:2, 8)], by="StudentID")
dim(all)

## [1]  94 137

all[1:5,131:ncol(all)]

##   cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1        1     8.733333   163.900000     140.2667     115.9000       86
## 2        1   139.866667   163.900000     140.2667     115.9000       86
## 3        1    19.866667    47.366667     162.9833     186.3167       75
## 4        1    43.200000    66.950000      19.2000     186.7333       94
## 5        2   117.333333     5.466667       0.5000     168.6000       82
##       Ass.earliness
## 1 39.17611111 hours
## 2 39.17611111 hours
## 3  2.40416667 hours
## 4 37.24472222 hours
## 5  0.08972222 hours

Academic performance as course grade (access vs performance -> AcP.csv)

##   StudentID Course.grade
## 1  S8529183         40.5
## 2  S8636687         47.2
## 3  S8624451         47.8
## 4  S8633919         51.9
## 5  S8583807         52.5

## [1]  94 138

##   cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1        1     8.733333   163.900000     140.2667     115.9000       86
## 2        1   139.866667   163.900000     140.2667     115.9000       86
## 3        1    19.866667    47.366667     162.9833     186.3167       75
## 4        1    43.200000    66.950000      19.2000     186.7333       94
## 5        2   117.333333     5.466667       0.5000     168.6000       82
##       Ass.earliness Course.grade
## 1 39.17611111 hours         66.6
## 2 39.17611111 hours         66.6
## 3  2.40416667 hours         64.7
## 4 37.24472222 hours         78.0
## 5  0.08972222 hours         80.7

correlations (ass.early.num = hours before Assignmnet due date 12noon)

str(all[131:ncol(all)])

## 'data.frame':    94 obs. of  8 variables:
##  $ cluster2     : int  1 1 1 1 2 1 1 1 1 1 ...
##  $ ML1earliness : num  8.73 139.87 19.87 43.2 117.33 ...
##  $ ML2earliness : num  163.9 163.9 47.37 66.95 5.47 ...
##  $ ML3earliness : num  140.3 140.3 163 19.2 0.5 ...
##  $ ML4earliness : num  116 116 186 187 169 ...
##  $ Ass.mark     : int  86 86 75 94 82 91 83 77 88 83 ...
##  $ Ass.earliness:Class 'difftime'  atomic [1:94] 39.1761 39.1761 2.4042 37.2447 0.0897 ...
##   .. ..- attr(*, "units")= chr "hours"
##  $ Course.grade : num  66.6 66.6 64.7 78 80.7 68.4 82.3 92.2 84.9 81.7 ...

all$ass.early.num = as.numeric(all$Ass.earliness)
dim(all)

## [1]  94 139

all[1:5,131:ncol(all)]

##   cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1        1     8.733333   163.900000     140.2667     115.9000       86
## 2        1   139.866667   163.900000     140.2667     115.9000       86
## 3        1    19.866667    47.366667     162.9833     186.3167       75
## 4        1    43.200000    66.950000      19.2000     186.7333       94
## 5        2   117.333333     5.466667       0.5000     168.6000       82
##       Ass.earliness Course.grade ass.early.num
## 1 39.17611111 hours         66.6   39.17611111
## 2 39.17611111 hours         66.6   39.17611111
## 3  2.40416667 hours         64.7    2.40416667
## 4 37.24472222 hours         78.0   37.24472222
## 5  0.08972222 hours         80.7    0.08972222

#cor(all[c(132:134, 136:ncol(all))])
cor(all[c(131:136, 138:ncol(all))])

##                  cluster2 ML1earliness ML2earliness ML3earliness
## cluster2       1.00000000   0.27704850   0.06318748   0.23328512
## ML1earliness   0.27704850   1.00000000   0.44635213   0.49048355
## ML2earliness   0.06318748   0.44635213   1.00000000   0.53738345
## ML3earliness   0.23328512   0.49048355   0.53738345   1.00000000
## ML4earliness   0.20727516   0.45386072   0.35783384   0.52623056
## Ass.mark       0.06074930   0.04823572   0.08963771   0.06346325
## Course.grade   0.10995817   0.18798308   0.24792821   0.18772086
## ass.early.num -0.01724803   0.27721408   0.16754510   0.18931786
##               ML4earliness   Ass.mark Course.grade ass.early.num
## cluster2        0.20727516 0.06074930    0.1099582   -0.01724803
## ML1earliness    0.45386072 0.04823572    0.1879831    0.27721408
## ML2earliness    0.35783384 0.08963771    0.2479282    0.16754510
## ML3earliness    0.52623056 0.06346325    0.1877209    0.18931786
## ML4earliness    1.00000000 0.02611241    0.1471807    0.17576185
## Ass.mark        0.02611241 1.00000000    0.2400318    0.06852016
## Course.grade    0.14718068 0.24003185    1.0000000    0.28558545
## ass.early.num   0.17576185 0.06852016    0.2855855    1.00000000

Organisation qual coded as categories 3 and 5

## [1] 99  5

##   StudentID Cat3 Cat5 Cat3or5 Sum.Cat3and5
## 1  S8646489    4    1       2            5
## 2  S8283571    3    1       2            4
## 3  S8586369    4    0       1            4
## 4  S8641669    3    1       2            4
## 5  S8152093    2    1       2            3

## [1]  94 143

##   cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1        1     8.733333        163.9     140.2667        115.9       86
## 2        1   139.866667        163.9     140.2667        115.9       86
##   Course.grade Cat5 Cat3or5 Sum.Cat3and5
## 1         66.6    0       0            0
## 2         66.6    0       0            0

## 'data.frame':    2 obs. of  10 variables:
##  $ cluster2    : int  1 1
##  $ ML1earliness: num  8.73 139.87
##  $ ML2earliness: num  164 164
##  $ ML3earliness: num  140 140
##  $ ML4earliness: num  116 116
##  $ Ass.mark    : int  86 86
##  $ Course.grade: num  66.6 66.6
##  $ Cat5        : int  0 0
##  $ Cat3or5     : num  0 0
##  $ Sum.Cat3and5: int  0 0

##                cluster2 ML1earliness ML2earliness ML3earliness
## cluster2     1.00000000   0.27704850  0.063187482   0.23328512
## ML1earliness 0.27704850   1.00000000  0.446352129   0.49048355
## ML2earliness 0.06318748   0.44635213  1.000000000   0.53738345
## ML3earliness 0.23328512   0.49048355  0.537383445   1.00000000
## ML4earliness 0.20727516   0.45386072  0.357833836   0.52623056
## Ass.mark     0.06074930   0.04823572  0.089637706   0.06346325
## Course.grade 0.10995817   0.18798308  0.247928207   0.18772086
## Cat5         0.14578738   0.07512043 -0.006346801  -0.02653697
## Cat3or5      0.01898149  -0.07668516 -0.160959046  -0.11521145
## Sum.Cat3and5 0.01351256   0.01142279 -0.179925085  -0.07817110
##              ML4earliness    Ass.mark Course.grade         Cat5
## cluster2       0.20727516  0.06074930   0.10995817  0.145787375
## ML1earliness   0.45386072  0.04823572   0.18798308  0.075120429
## ML2earliness   0.35783384  0.08963771   0.24792821 -0.006346801
## ML3earliness   0.52623056  0.06346325   0.18772086 -0.026536971
## ML4earliness   1.00000000  0.02611241   0.14718068  0.107219094
## Ass.mark       0.02611241  1.00000000   0.24003185 -0.031986505
## Course.grade   0.14718068  0.24003185   1.00000000  0.103259521
## Cat5           0.10721909 -0.03198650   0.10325952  1.000000000
## Cat3or5        0.04154490 -0.08725225  -0.07779212  0.640939203
## Sum.Cat3and5   0.01498710 -0.10658540  -0.10408124  0.540546730
##                  Cat3or5 Sum.Cat3and5
## cluster2      0.01898149   0.01351256
## ML1earliness -0.07668516   0.01142279
## ML2earliness -0.16095905  -0.17992509
## ML3earliness -0.11521145  -0.07817110
## ML4earliness  0.04154490   0.01498710
## Ass.mark     -0.08725225  -0.10658540
## Course.grade -0.07779212  -0.10408124
## Cat5          0.64093920   0.54054673
## Cat3or5       1.00000000   0.86549100
## Sum.Cat3and5  0.86549100   1.00000000

## [1]  94 144

##   cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1        1     8.733333   163.900000     140.2667     115.9000       86
## 2        1   139.866667   163.900000     140.2667     115.9000       86
## 3        1    19.866667    47.366667     162.9833     186.3167       75
## 4        1    43.200000    66.950000      19.2000     186.7333       94
## 5        2   117.333333     5.466667       0.5000     168.6000       82
##       Ass.earliness Course.grade ass.early.num Cat3 Cat5 Cat3or5
## 1 39.17611111 hours         66.6   39.17611111    0    0       0
## 2 39.17611111 hours         66.6   39.17611111    0    0       0
## 3  2.40416667 hours         64.7    2.40416667    2    0       1
## 4 37.24472222 hours         78.0   37.24472222    0    1       1
## 5  0.08972222 hours         80.7    0.08972222    2    1       2
##   Sum.Cat3and5 MLearliness
## 1            0   107.20000
## 2            0   139.98333
## 3            2   104.13333
## 4            1    79.02083
## 5            3    72.97500

##                 cluster2 ML1earliness ML2earliness ML3earliness
## cluster2      1.00000000   0.27704850  0.063187482   0.23328512
## ML1earliness  0.27704850   1.00000000  0.446352129   0.49048355
## ML2earliness  0.06318748   0.44635213  1.000000000   0.53738345
## ML3earliness  0.23328512   0.49048355  0.537383445   1.00000000
## ML4earliness  0.20727516   0.45386072  0.357833836   0.52623056
## Ass.mark      0.06074930   0.04823572  0.089637706   0.06346325
## Course.grade  0.10995817   0.18798308  0.247928207   0.18772086
## Cat3         -0.06553437  -0.02848423 -0.209608252  -0.07776687
## Cat5          0.14578738   0.07512043 -0.006346801  -0.02653697
## Cat3or5       0.01898149  -0.07668516 -0.160959046  -0.11521145
## Sum.Cat3and5  0.01351256   0.01142279 -0.179925085  -0.07817110
## MLearliness   0.25007838   0.75390591  0.747976825   0.82090373
##              ML4earliness    Ass.mark Course.grade        Cat3
## cluster2       0.20727516  0.06074930   0.10995817 -0.06553437
## ML1earliness   0.45386072  0.04823572   0.18798308 -0.02848423
## ML2earliness   0.35783384  0.08963771   0.24792821 -0.20960825
## ML3earliness   0.52623056  0.06346325   0.18772086 -0.07776687
## ML4earliness   1.00000000  0.02611241   0.14718068 -0.04221520
## Ass.mark       0.02611241  1.00000000   0.24003185 -0.10838137
## Course.grade   0.14718068  0.24003185   1.00000000 -0.18106140
## Cat3          -0.04221520 -0.10838137  -0.18106140  1.00000000
## Cat5           0.10721909 -0.03198650   0.10325952  0.08106184
## Cat3or5        0.04154490 -0.08725225  -0.07779212  0.66685730
## Sum.Cat3and5   0.01498710 -0.10658540  -0.10408124  0.88236300
## MLearliness    0.77708920  0.07208286   0.24651552 -0.11483597
##                      Cat5     Cat3or5 Sum.Cat3and5 MLearliness
## cluster2      0.145787375  0.01898149   0.01351256  0.25007838
## ML1earliness  0.075120429 -0.07668516   0.01142279  0.75390591
## ML2earliness -0.006346801 -0.16095905  -0.17992509  0.74797682
## ML3earliness -0.026536971 -0.11521145  -0.07817110  0.82090373
## ML4earliness  0.107219094  0.04154490   0.01498710  0.77708920
## Ass.mark     -0.031986505 -0.08725225  -0.10658540  0.07208286
## Course.grade  0.103259521 -0.07779212  -0.10408124  0.24651552
## Cat3          0.081061839  0.66685730   0.88236300 -0.11483597
## Cat5          1.000000000  0.64093920   0.54054673  0.05072051
## Cat3or5       0.640939203  1.00000000   0.86549100 -0.09463110
## Sum.Cat3and5  0.540546730  0.86549100   1.00000000 -0.07298578
## MLearliness   0.050720510 -0.09463110  -0.07298578  1.00000000

t.tests for 2 clusters

dim(all)

## [1]  94 144

wilcox.test(MLearliness ~ cluster2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  MLearliness by cluster2
## W = 706, p-value = 0.01748
## alternative hypothesis: true location shift is not equal to 0

round(with(all, calc(mean, MLearliness, cluster2)),1)

##     1     2 
##  94.2 117.3

round(with(all, calc.sem(sem, MLearliness, cluster2)),1)

## [1] 5.813671
## [1] 6.657489

##   1   2 
## 5.8 6.7

wilcox.test(ass.early.num ~ cluster2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  ass.early.num by cluster2
## W = 998, p-value = 0.9495
## alternative hypothesis: true location shift is not equal to 0

round(with(all, calc(mean, ass.early.num, cluster2)),1)

##    1    2 
## 29.9 28.1

round(with(all, calc.sem(sem, ass.early.num, cluster2)),1)

## [1] 6.842439
## [1] 6.427785

##   1   2 
## 6.8 6.4

wilcox.test(Course.grade ~ cluster2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Course.grade by cluster2
## W = 889, p-value = 0.354
## alternative hypothesis: true location shift is not equal to 0

round(with(all, calc(mean, Course.grade, cluster2)),1)

##    1    2 
## 77.8 79.8

round(with(all, calc.sem(sem, Course.grade, cluster2)),1)

## [1] 1.236442
## [1] 1.226599

##   1   2 
## 1.2 1.2

wilcox.test(Sum.Cat3and5 ~ cluster2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Sum.Cat3and5 by cluster2
## W = 1001.5, p-value = 0.9701
## alternative hypothesis: true location shift is not equal to 0

round(with(all, calc(mean, Sum.Cat3and5, cluster2)),1)

##   1   2 
## 1.2 1.2

round(with(all, calc.sem(sem, Sum.Cat3and5, cluster2)),1)

## [1] 0.1450611
## [1] 0.1982766

##   1   2 
## 0.1 0.2

anovas for Cat3and5

aov.cat = aov(Course.grade ~ Sum.Cat3and5, data=all)
summary(aov.cat)

##              Df Sum Sq Mean Sq F value Pr(>F)
## Sum.Cat3and5  1     79   78.78   1.008  0.318
## Residuals    92   7193   78.19

aov.cat2 = aov(MLearliness ~ Sum.Cat3and5, data=all)
summary(aov.cat2)

##              Df Sum Sq Mean Sq F value Pr(>F)
## Sum.Cat3and5  1    969   968.9   0.493  0.484
## Residuals    92 180914  1966.5

aov.cat3 = aov(ass.early.num ~ Sum.Cat3and5, data=all)
summary(aov.cat3)

##              Df Sum Sq Mean Sq F value Pr(>F)
## Sum.Cat3and5  1     23    23.4    0.01  0.921
## Residuals    92 215028  2337.3

using more data for clusters

distances.all = dist(all[c(2:115, 117:120, 125, 132:136, 137, 141:143)], method = "euclidean")
cluster.all = hclust(distances.all, method = "ward.D2") 
plot(cluster.all)

distances.all2 = dist(all[c(117:120, 125, 132:136, 137, 141:143)], method = "euclidean")
cluster.all2 = hclust(distances.all2, method = "ward.D2") 
plot(cluster.all2)

cluster.all2.groups = cutree(cluster.all2, k = 2)
all$cluster.all2 = cluster.all2.groups

with(all, table(cluster2, cluster.all2))

##         cluster.all2
## cluster2  1  2
##        1 50 11
##        2 31  2

cluster.all.groups = cutree(cluster.all, k = 2)
all$cluster.all = cluster.all.groups

with(all, table(cluster2, cluster.all))

##         cluster.all
## cluster2  1  2
##        1 50 11
##        2 31  2

with(all, table(cluster.all, cluster.all2))

##            cluster.all2
## cluster.all  1  2
##           1 81  0
##           2  0 13

for paper edits

dim(all)

## [1]  94 146

all[1:2,115:146]

##   X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1         0        1           No        Yes         No          No
## 2         0        1           No        Yes         No          No
##   total.no total.yes total.maybe total.noinfo access      pattern prevLR
## 1        3         1           0            0     21 No Yes No No     No
## 2        3         1           0            0     21 No Yes No No     No
##   access.days Kay.pattern Kay3 cluster2 ML1earliness ML2earliness
## 1          15           4    3        1     8.733333        163.9
## 2          15           4    3        1   139.866667        163.9
##   ML3earliness ML4earliness Ass.mark  Ass.earliness Course.grade
## 1     140.2667        115.9       86 39.17611 hours         66.6
## 2     140.2667        115.9       86 39.17611 hours         66.6
##   ass.early.num Cat3 Cat5 Cat3or5 Sum.Cat3and5 MLearliness cluster.all2
## 1      39.17611    0    0       0            0    107.2000            1
## 2      39.17611    0    0       0            0    139.9833            1
##   cluster.all
## 1           1
## 2           1

all$total.yes.gp = ifelse(all$total.yes == 0, 2, 1)
table(all$total.yes.gp)

## 
##  1  2 
## 47 47

wilcox.test(access.days ~ total.yes.gp, data=all)

## Warning in wilcox.test.default(x = c(15, 15, 33, 28, 41, 31, 13, 17, 25, :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  access.days by total.yes.gp
## W = 1660, p-value = 2.654e-05
## alternative hypothesis: true location shift is not equal to 0

#ML3 submission date
dim(MLsub)

## [1] 876  13

MLsub[1:5,]

##   StudentID     Date Submitted Duration MLtask     Open      Due
## 1  S8579275 23/03/14  20:43:42  0:09:16    ML1 19/03/14 26/03/14
## 2  S8587419 24/03/14  14:31:37  0:16:02    ML1 19/03/14 26/03/14
## 3  S8530605 24/03/14  17:06:10 47:58:06    ML1 19/03/14 26/03/14
## 4  S8636955 23/03/14  18:28:04  4:40:22    ML1 19/03/14 26/03/14
## 5  S8475915 25/03/14  12:00:45  0:15:28    ML1 19/03/14 26/03/14
##                 SubDT               DueDT Earliness       Early.hr
## 1 2014-03-23 20:43:00 2014-03-26 17:00:00 4097 mins 68.28333 hours
## 2 2014-03-24 14:31:00 2014-03-26 17:00:00 3029 mins 50.48333 hours
## 3 2014-03-24 17:06:00 2014-03-26 17:00:00 2874 mins 47.90000 hours
## 4 2014-03-23 18:28:00 2014-03-26 17:00:00 4232 mins 70.53333 hours
## 5 2014-03-25 12:00:00 2014-03-26 17:00:00 1740 mins 29.00000 hours
##   Early.hr.num Early.hr.log
## 1     68.28333     4.223666
## 2     50.48333     3.921643
## 3     47.90000     3.869116
## 4     70.53333     4.256085
## 5     29.00000     3.367296

ML3[1:5,]

##     StudentID     Date Submitted Duration MLtask    Open      Due
## 441  S8587419 10/05/14  18:10:01  0:22:41    ML3 7/05/14 14/05/14
## 442  S8530605 12/05/14  15:09:04  0:19:05    ML3 7/05/14 14/05/14
## 443  S8636955 12/05/14  21:42:34  0:45:45    ML3 7/05/14 14/05/14
## 444  S8475915 12/05/14  11:57:22  0:21:43    ML3 7/05/14 14/05/14
## 445  S8645607  7/05/14  17:30:54  0:16:33    ML3 7/05/14 14/05/14
##                   SubDT               DueDT  Earliness        Early.hr
## 441 2014-05-10 18:10:00 2014-05-14 17:00:00  5690 mins  94.83333 hours
## 442 2014-05-12 15:09:00 2014-05-14 17:00:00  2991 mins  49.85000 hours
## 443 2014-05-12 21:42:00 2014-05-14 17:00:00  2598 mins  43.30000 hours
## 444 2014-05-12 11:57:00 2014-05-14 17:00:00  3183 mins  53.05000 hours
## 445 2014-05-07 17:30:00 2014-05-14 17:00:00 10050 mins 167.50000 hours
##     Early.hr.num Early.hr.log
## 441     94.83333     4.552121
## 442     49.85000     3.909018
## 443     43.30000     3.768153
## 444     53.05000     3.971235
## 445    167.50000     5.120983

table(ML3$Date)

## 
##  1/06/14 10/04/14 10/05/14 11/04/14 11/05/14 12/04/14 12/05/14 13/04/14 
##        0        0       11        0       26        0       28        0 
## 13/05/14 14/04/14 14/05/14 15/04/14 16/04/14 19/03/14  2/06/14 20/03/14 
##       29        0       20        0        0        0        0        0 
## 21/03/14 22/03/14 23/03/14 24/03/14 25/03/14 26/03/14 27/05/14 28/05/14 
##        0        0        0        0        0        0        0        0 
## 29/05/14  3/06/14 30/05/14 31/05/14  4/06/14  7/05/14  8/05/14  9/04/14 
##        0        0        0        0        0       44       37        0 
##  9/05/14 
##       24

#check submission time clusters against lecture recording clusters
all[1:2,117:ncol(all)]

##   ML1.previous ML2.planMS ML3.usedMS ML4.planEOS total.no total.yes
## 1           No        Yes         No          No        3         1
## 2           No        Yes         No          No        3         1
##   total.maybe total.noinfo access      pattern prevLR access.days
## 1           0            0     21 No Yes No No     No          15
## 2           0            0     21 No Yes No No     No          15
##   Kay.pattern Kay3 cluster2 ML1earliness ML2earliness ML3earliness
## 1           4    3        1     8.733333        163.9     140.2667
## 2           4    3        1   139.866667        163.9     140.2667
##   ML4earliness Ass.mark  Ass.earliness Course.grade ass.early.num Cat3
## 1        115.9       86 39.17611 hours         66.6      39.17611    0
## 2        115.9       86 39.17611 hours         66.6      39.17611    0
##   Cat5 Cat3or5 Sum.Cat3and5 MLearliness cluster.all2 cluster.all
## 1    0       0            0    107.2000            1           1
## 2    0       0            0    139.9833            1           1
##   total.yes.gp
## 1            1
## 2            1

with(all, table(cluster2, cluster.all))

##         cluster.all
## cluster2  1  2
##        1 50 11
##        2 31  2

with(all, table(cluster2, cluster.all2))

##         cluster.all2
## cluster2  1  2
##        1 50 11
##        2 31  2

with(all, tapply(access, cluster2, mean))

##        1        2 
## 63.44262 18.93939

with(all, tapply(access, cluster.all, mean))

##        1        2 
## 45.58025 61.76923

with(all, tapply(access, cluster.all2, mean))

##        1        2 
## 45.58025 61.76923

with(all, tapply(access.days, cluster.all, mean))

##        1        2 
## 17.04938 20.07692

with(all, tapply(MLearliness, cluster.all, mean))

##         1         2 
## 115.35874  20.92981

with(all, tapply(ass.early.num, cluster.all, mean))

##        1        2 
## 32.20305 10.89310

names(all[c(117:120, 125, 132:136, 137, 141:143)])

##  [1] "ML1.previous"  "ML2.planMS"    "ML3.usedMS"    "ML4.planEOS"  
##  [5] "access"        "ML1earliness"  "ML2earliness"  "ML3earliness" 
##  [9] "ML4earliness"  "Ass.mark"      "Ass.earliness" "Cat5"         
## [13] "Cat3or5"       "Sum.Cat3and5"

with(all, tapply(Ass.mark, cluster.all, mean))

##        1        2 
## 84.61728 80.53846

with(all, tapply(Course.grade, cluster.all, mean))

##        1        2 
## 80.17160 68.26154

CONCLUSIONS:
clusters based on ML responses, lect recording access, earliness, ass mark and Cat3/5 have: huge difference in ML earliness substantial diff in ass earliness no diff in ass mark very large diff in course grade

wilcox.test(Course.grade ~ cluster.all, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Course.grade by cluster.all
## W = 931, p-value = 9.634e-06
## alternative hypothesis: true location shift is not equal to 0

checking impact of qual in cluster.all determination

with(all, tapply(Cat3or5, cluster.all, mean))

##         1         2 
## 0.8271605 1.0769231

with(all, tapply(Sum.Cat3and5, cluster.all, mean))

##        1        2 
## 1.148148 1.461538

addmargins(with(all, table(ML1.previous, cluster.all)))

##             cluster.all
## ML1.previous  1  2 Sum
##        Maybe 14  2  16
##        No    44  6  50
##        Yes   23  5  28
##        Sum   81 13  94

with(all, table(ML2.planMS, cluster.all))

##           cluster.all
## ML2.planMS  1  2
##      Maybe  4  1
##      No    64  9
##      Yes   12  3

with(all, table(ML3.usedMS, cluster.all))

##           cluster.all
## ML3.usedMS  1  2
##      Maybe  4  0
##      No    61  3
##      Yes   14 10

with(all, table(ML4.planEOS, cluster.all))

##            cluster.all
## ML4.planEOS  1  2
##       Maybe  2  0
##       No    60  7
##       Yes   13  4

qual categories and phases

##   StudentID strat1 strat2 strat3 strat4 strat5 strat6 strat7 strat8 strat9
## 1  S8152093      0      0      2      0      1      1      1      0      1
## 2  S8469547      0      0      2      1      0      1      1      0      0
## 3  S8522577      0      0      1      0      2      1      1      0      0
## 4  S8533121      0      0      2      0      0      1      0      0      1
## 5  S8575195      0      0      1      1      2      1      2      0      2
##   strat10 foretht perf eval phases
## 1       0       0    4    1      2
## 2       0       0    4    0      1
## 3       0       0    4    0      1
## 4       0       0    2    1      2
## 5       0       0    5    1      2

## [1]  94 161

##        cluster.all
## foretht  1  2 Sum
##     0   74 13  87
##     1    6  0   6
##     2    1  0   1
##     Sum 81 13  94

##        cluster2
## foretht  1  2 Sum
##     0   56 31  87
##     1    4  2   6
##     2    1  0   1
##     Sum 61 33  94

##      cluster.all
## perf   1  2 Sum
##   1    5  0   5
##   2   18  2  20
##   3   29  4  33
##   4   21  4  25
##   5    8  1   9
##   6    0  2   2
##   Sum 81 13  94

##      cluster2
## perf   1  2 Sum
##   1    3  2   5
##   2   14  6  20
##   3   20 13  33
##   4   16  9  25
##   5    6  3   9
##   6    2  0   2
##   Sum 61 33  94

##      cluster.all
## eval   1  2 Sum
##   0   27  4  31
##   1   40  8  48
##   2   14  1  15
##   Sum 81 13  94

##      cluster2
## eval   1  2 Sum
##   0   18 13  31
##   1   34 14  48
##   2    9  6  15
##   Sum 61 33  94

##       cluster.all
## phases  1  2 Sum
##    1   25  4  29
##    2   51  9  60
##    3    5  0   5
##    Sum 81 13  94

##       cluster2
## phases  1  2 Sum
##    1   17 12  29
##    2   40 20  60
##    3    4  1   5
##    Sum 61 33  94

5 cluster and 3 cluster solutions for cluster.all2 check

cluster.all2.groups.k5 = cutree(cluster.all2, k = 5)
all$cluster.all2k5 = cluster.all2.groups.k5

with(all, tapply(Course.grade, cluster.all2k5, mean))

##        1        2        3        4        5 
## 80.70000 78.41500 81.62857 80.10588 68.26154

cluster.all2.groups.k3 = cutree(cluster.all2, k = 3)
all$cluster.all2k3 = cluster.all2.groups.k3

with(all, tapply(Course.grade, cluster.all2k3, mean))

##        1        2        3 
## 80.18906 80.10588 68.26154

cluster.all2 stat’s for paper

with(all, tapply(Course.grade, cluster.all2, mean))

##        1        2 
## 80.17160 68.26154

with(all, tapply(Course.grade, cluster.all2, sem))

## [1] 0.8932916
## [1] 1.812051

##         1         2 
## 0.8932916 1.8120515

wilcox.test(Course.grade ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Course.grade by cluster.all2
## W = 931, p-value = 9.634e-06
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(access, cluster.all2, mean))

##        1        2 
## 45.58025 61.76923

with(all, tapply(access, cluster.all2, sem))

## [1] 4.045759
## [1] 6.870742

##        1        2 
## 4.045759 6.870742

wilcox.test(access ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  access by cluster.all2
## W = 330, p-value = 0.03178
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(access.days, cluster.all2, mean))

##        1        2 
## 17.04938 20.07692

with(all, tapply(access.days, cluster.all2, sem))

## [1] 0.967034
## [1] 2.076923

##        1        2 
## 0.967034 2.076923

wilcox.test(access.days ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  access.days by cluster.all2
## W = 401.5, p-value = 0.1722
## alternative hypothesis: true location shift is not equal to 0

all$ML1.previous[1:5]

## [1] No    No    No    Yes   Maybe
## Levels: Maybe No Yes

#with(all, tapply(ML1.previous, cluster.all2, mean))
#with(all, tapply(ML1.previous, cluster.all2, sem))
#wilcox.test(ML1.previous ~ cluster.all2, data=all)
#not numerical

with(all, tapply(total.yes, cluster.all2, mean))

##         1         2 
## 0.7654321 1.6923077

with(all, tapply(total.yes, cluster.all2, sem))

## [1] 0.1182063
## [1] 0.3468654

##         1         2 
## 0.1182063 0.3468654

wilcox.test(total.yes ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  total.yes by cluster.all2
## W = 304, p-value = 0.008412
## alternative hypothesis: true location shift is not equal to 0

addmargins(with(all, table(ML1.previous, cluster.all2)))

##             cluster.all2
## ML1.previous  1  2 Sum
##        Maybe 14  2  16
##        No    44  6  50
##        Yes   23  5  28
##        Sum   81 13  94

addmargins(with(all, table(ML2.planMS, cluster.all2)))

##           cluster.all2
## ML2.planMS  1  2 Sum
##      Maybe  4  1   5
##      No    64  9  73
##      Yes   12  3  15
##      Sum   80 13  93

addmargins(with(all, table(ML3.usedMS, cluster.all2)))

##           cluster.all2
## ML3.usedMS  1  2 Sum
##      Maybe  4  0   4
##      No    61  3  64
##      Yes   14 10  24
##      Sum   79 13  92

addmargins(with(all, table(ML4.planEOS, cluster.all2)))

##            cluster.all2
## ML4.planEOS  1  2 Sum
##       Maybe  2  0   2
##       No    60  7  67
##       Yes   13  4  17
##       Sum   75 11  86

str(all$ML1earliness)

##  num [1:94] 8.73 139.87 19.87 43.2 117.33 ...

with(all, tapply(ML1earliness, cluster.all2, mean))

##         1         2 
## 107.94156  32.46923

with(all, tapply(ML1earliness, cluster.all2, sem))

## [1] 5.19381
## [1] 7.732229

##        1        2 
## 5.193810 7.732229

wilcox.test(ML1earliness ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  ML1earliness by cluster.all2
## W = 947, p-value = 4.223e-06
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(ML2earliness, cluster.all2, mean))

##        1        2 
## 101.3214  16.4000

with(all, tapply(ML2earliness, cluster.all2, sem))

## [1] 5.696691
## [1] 4.361642

##        1        2 
## 5.696691 4.361642

wilcox.test(ML2earliness ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  ML2earliness by cluster.all2
## W = 973, p-value = 1.035e-06
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(ML3earliness, cluster.all2, mean))

##         1         2 
## 118.88745  19.32692

with(all, tapply(ML3earliness, cluster.all2, sem))

## [1] 5.166531
## [1] 5.197067

##        1        2 
## 5.166531 5.197067

wilcox.test(ML3earliness ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  ML3earliness by cluster.all2
## W = 1020, p-value = 6.678e-08
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(ML4earliness, cluster.all2, mean))

##         1         2 
## 133.28457  15.52308

with(all, tapply(ML4earliness, cluster.all2, sem))

## [1] 5.957776
## [1] 4.306585

##        1        2 
## 5.957776 4.306585

wilcox.test(ML4earliness ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  ML4earliness by cluster.all2
## W = 1018, p-value = 7.541e-08
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(MLearliness, cluster.all2, mean))

##         1         2 
## 115.35874  20.92981

with(all, tapply(MLearliness, cluster.all2, sem))

## [1] 3.520184
## [1] 3.298145

##        1        2 
## 3.520184 3.298145

wilcox.test(MLearliness ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  MLearliness by cluster.all2
## W = 1053, p-value = 8.36e-09
## alternative hypothesis: true location shift is not equal to 0

115/24

## [1] 4.791667

.791667*24

## [1] 19.00001

with(all, table(Ass.earliness, cluster.all2))

##                      cluster.all2
## Ass.earliness         1 2
##   -143.96             0 1
##   -0.0652777777777778 1 0
##   0.0736111111111111  1 0
##   0.0897222222222222  1 0
##   0.143055555555556   1 0
##   0.147222222222222   1 0
##   0.203611111111111   1 0
##   0.265277777777778   1 0
##   0.303888888888889   0 1
##   0.516111111111111   0 1
##   0.546111111111111   0 1
##   0.555277777777778   0 1
##   0.681944444444444   1 0
##   0.687777777777778   1 0
##   0.713055555555556   1 0
##   0.771111111111111   1 0
##   0.895555555555556   1 0
##   0.977222222222222   0 1
##   1.51083333333333    0 1
##   1.70527777777778    1 0
##   1.89444444444444    1 0
##   1.91222222222222    1 0
##   2.05                1 0
##   2.40416666666667    1 0
##   2.57666666666667    0 1
##   2.65722222222222    1 0
##   2.66472222222222    1 0
##   2.79722222222222    1 0
##   2.81888888888889    1 0
##   2.9925              1 0
##   3.26722222222222    1 0
##   3.47472222222222    1 0
##   4.72                1 0
##   6.31416666666667    0 1
##   9.57305555555556    1 0
##   9.71305555555556    1 0
##   10.8986111111111    1 0
##   11.6347222222222    1 0
##   11.6375             1 0
##   11.8708333333333    1 0
##   12.3691666666667    1 0
##   12.7441666666667    0 1
##   12.7788888888889    1 0
##   12.9455555555556    1 0
##   13.0108333333333    1 0
##   13.0341666666667    1 0
##   13.2358333333333    1 0
##   13.64               1 0
##   13.7522222222222    1 0
##   14.5788888888889    1 0
##   14.8811111111111    1 0
##   15.5013888888889    1 0
##   15.5069444444444    2 0
##   16.6544444444444    1 0
##   19.9622222222222    0 1
##   21.5005555555556    1 0
##   22.4719444444444    1 0
##   22.9075             0 1
##   23.2275             1 0
##   23.2405555555556    1 0
##   23.6263888888889    1 0
##   24.3558333333333    1 0
##   24.59               1 0
##   26.6661111111111    1 0
##   27.5688888888889    1 0
##   28.4233333333333    1 0
##   37.2447222222222    1 0
##   37.8005555555556    1 0
##   39.1761111111111    2 0
##   40.1525             1 0
##   40.8602777777778    1 0
##   42.7508333333333    1 0
##   43.4602777777778    1 0
##   46.4233333333333    1 0
##   47.0469444444444    1 0
##   47.6402777777778    1 0
##   51.3877777777778    1 0
##   51.4061111111111    1 0
##   71.1552777777778    1 0
##   71.3391666666667    1 0
##   73.3827777777778    1 0
##   75.9794444444444    1 0
##   84.4688888888889    1 0
##   90.9058333333333    1 0
##   119.182222222222    1 0
##   123.219722222222    1 0
##   134.078888888889    1 0
##   137.369722222222    1 0
##   146.389722222222    1 0
##   178.101388888889    1 0
##   189.9875            1 0
##   216.656111111111    0 1

#figuring out why Ass.earliness errors (time diff instead)
str(all$Ass.earliness)

## Class 'difftime'  atomic [1:94] 39.1761 39.1761 2.4042 37.2447 0.0897 ...
##   ..- attr(*, "units")= chr "hours"

all$Ass.earliness.num = as.numeric(all$Ass.earliness)

with(all, tapply(Ass.earliness, cluster.all2, mean))

##        1        2 
## 32.20305 10.89310

with(all, tapply(Ass.earliness, cluster.all2, sem))

## [1] 4.692823
## [1] 20.76396

##         1         2 
##  4.692823 20.763958

wilcox.test(Ass.earliness.num ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Ass.earliness.num by cluster.all2
## W = 754, p-value = 0.01291
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(Ass.earliness, cluster.all2, median))

##         1         2 
## 14.881111  1.510833

with(all, boxplot(Ass.earliness.num ~ cluster.all2))

range(all$Ass.earliness.num)

## [1] -143.9600  216.6561

sort(all$Ass.earliness.num)

##  [1] -143.96000000   -0.06527778    0.07361111    0.08972222    0.14305556
##  [6]    0.14722222    0.20361111    0.26527778    0.30388889    0.51611111
## [11]    0.54611111    0.55527778    0.68194444    0.68777778    0.71305556
## [16]    0.77111111    0.89555556    0.97722222    1.51083333    1.70527778
## [21]    1.89444444    1.91222222    2.05000000    2.40416667    2.57666667
## [26]    2.65722222    2.66472222    2.79722222    2.81888889    2.99250000
## [31]    3.26722222    3.47472222    4.72000000    6.31416667    9.57305556
## [36]    9.71305556   10.89861111   11.63472222   11.63750000   11.87083333
## [41]   12.36916667   12.74416667   12.77888889   12.94555556   13.01083333
## [46]   13.03416667   13.23583333   13.64000000   13.75222222   14.57888889
## [51]   14.88111111   15.50138889   15.50694444   15.50694444   16.65444444
## [56]   19.96222222   21.50055556   22.47194444   22.90750000   23.22750000
## [61]   23.24055556   23.62638889   24.35583333   24.59000000   26.66611111
## [66]   27.56888889   28.42333333   37.24472222   37.80055556   39.17611111
## [71]   39.17611111   40.15250000   40.86027778   42.75083333   43.46027778
## [76]   46.42333333   47.04694444   47.64027778   51.38777778   51.40611111
## [81]   71.15527778   71.33916667   73.38277778   75.97944444   84.46888889
## [86]   90.90583333  119.18222222  123.21972222  134.07888889  137.36972222
## [91]  146.38972222  178.10138889  189.98750000  216.65611111

#with(all, sort(table(cluster.all2, Ass.earliness.num)))

with(all, boxplot(log(Ass.earliness.num) ~ cluster.all2))

## Warning in log(Ass.earliness.num): NaNs produced

which.min(all$Ass.earliness.num)

## [1] 19

dim(all)

## [1]  94 164

all[19,c(143:148, 164)]

##    Sum.Cat3and5 MLearliness cluster.all2 cluster.all total.yes.gp strat1
## 19            2    1.020833            2           2            1      0
##    Ass.earliness.num
## 19           -143.96

with(all, table(cluster.all2))

## cluster.all2
##  1  2 
## 81 13

143.9600/24

## [1] 5.998333

.998333*24

## [1] 23.95999

.95999*60

## [1] 57.5994

.5994*60

## [1] 35.964

#removing value for student who had an extension
#Ass.earliness.num
#19           -143.96

all[19,164] = ""
all[15:20, 160:164]

##    eval phases cluster.all2k5 cluster.all2k3   Ass.earliness.num
## 15    0      1              4              2    119.182222222222
## 16    2      2              3              1    73.3827777777778
## 17    2      2              4              2    21.5005555555556
## 18    1      2              3              1 -0.0652777777777778
## 19    1      2              5              3                    
## 20    2      3              1              1    14.8811111111111

str(all$Ass.earliness.num)

##  chr [1:94] "39.1761111111111" "39.1761111111111" ...

all$Ass.earliness.num = as.numeric(all$Ass.earliness.num)
str(all$Ass.earliness.num)

##  num [1:94] 39.1761 39.1761 2.4042 37.2447 0.0897 ...

wilcox.test(Ass.earliness.num ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Ass.earliness.num by cluster.all2
## W = 673, p-value = 0.03257
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(Ass.earliness.num, cluster.all2, mean, na.rm=T))

##        1        2 
## 32.20305 23.79752

#with(all, tapply(Ass.earliness.num, cluster.all2, sem, na.rm=T))
with(all, tapply(Ass.earliness.num, cluster.all2, sd, na.rm=T))

##        1        2 
## 42.23540 61.25978

61.25978/sqrt(12)

## [1] 17.68418

32/24

## [1] 1.333333

.3*24

## [1] 7.2

#Emailed to Kay
#With that student removed: 
#p-value = 0.03257
#High performing cluster: 32.20305  +/- 4.692823 hours
#Low performing cluster: 23.79752 +/- 17.68418 hours

#but actually should have just converted to th 2.5min before the new deadline for their extension

all$Ass.earliness.num[19]

## [1] NA

6*24 -143.96

## [1] 0.04

all$Ass.earliness.num[19] = 0.04
all$Ass.earliness.num[18:20]

## [1] -0.06527778  0.04000000 14.88111111

str(all$Ass.earliness.num)

##  num [1:94] 39.1761 39.1761 2.4042 37.2447 0.0897 ...

wilcox.test(Ass.earliness.num ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Ass.earliness.num by cluster.all2
## W = 753, p-value = 0.01331
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(Ass.earliness.num, cluster.all2, mean))

##        1        2 
## 32.20305 21.97002

with(all, tapply(Ass.earliness.num, cluster.all2, sem))

## [1] 4.692823
## [1] 16.36941

##         1         2 
##  4.692823 16.369409

#checking if organisation qual differs between high and low performing clusters
with(all, tapply(Cat3, cluster.all2, mean))

##         1         2 
## 0.8271605 1.1538462

with(all, tapply(Cat3, cluster.all2, sem))

## [1] 0.1065597
## [1] 0.2492593

##         1         2 
## 0.1065597 0.2492593

wilcox.test(Cat3 ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Cat3 by cluster.all2
## W = 405, p-value = 0.1565
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(Cat5, cluster.all2, mean))

##         1         2 
## 0.3209877 0.3076923

with(all, tapply(Cat5, cluster.all2, sem))

## [1] 0.06041819
## [1] 0.1332347

##          1          2 
## 0.06041819 0.13323468

wilcox.test(Cat5 ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Cat5 by cluster.all2
## W = 520, p-value = 0.9336
## alternative hypothesis: true location shift is not equal to 0

cluster.all2 independent variable check

names(all[c(117:120, 125, 132:136, 137, 141:143)])

##  [1] "ML1.previous"  "ML2.planMS"    "ML3.usedMS"    "ML4.planEOS"  
##  [5] "access"        "ML1earliness"  "ML2earliness"  "ML3earliness" 
##  [9] "ML4earliness"  "Ass.mark"      "Ass.earliness" "Cat5"         
## [13] "Cat3or5"       "Sum.Cat3and5"

cluster.all check - do we use timing or not?

with(all, table(cluster.all, cluster.all2))

##            cluster.all2
## cluster.all  1  2
##           1 81  0
##           2  0 13

access related to academic performance

dim(all)

## [1]  94 164

names(all[120:163])

##  [1] "ML4.planEOS"    "total.no"       "total.yes"      "total.maybe"   
##  [5] "total.noinfo"   "access"         "pattern"        "prevLR"        
##  [9] "access.days"    "Kay.pattern"    "Kay3"           "cluster2"      
## [13] "ML1earliness"   "ML2earliness"   "ML3earliness"   "ML4earliness"  
## [17] "Ass.mark"       "Ass.earliness"  "Course.grade"   "ass.early.num" 
## [21] "Cat3"           "Cat5"           "Cat3or5"        "Sum.Cat3and5"  
## [25] "MLearliness"    "cluster.all2"   "cluster.all"    "total.yes.gp"  
## [29] "strat1"         "strat2"         "strat3"         "strat4"        
## [33] "strat5"         "strat6"         "strat7"         "strat8"        
## [37] "strat9"         "strat10"        "foretht"        "perf"          
## [41] "eval"           "phases"         "cluster.all2k5" "cluster.all2k3"

all$total.yes.gp[1:10]

##  [1] 1 1 2 1 2 1 2 2 1 1

with(all, table(total.yes, total.yes.gp))

##          total.yes.gp
## total.yes  1  2
##         0  0 47
##         1 26  0
##         2  7  0
##         3 12  0
##         4  2  0

aov.acp.lr = aov(Course.grade ~ total.yes, data=all)
summary(aov.acp.lr)

##             Df Sum Sq Mean Sq F value Pr(>F)
## total.yes    1      7    7.10    0.09  0.765
## Residuals   92   7265   78.97

wilcox.test(Course.grade ~ total.yes.gp, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Course.grade by total.yes.gp
## W = 962.5, p-value = 0.2846
## alternative hypothesis: true location shift is not equal to 0

wilcox.test(access.days ~ total.yes.gp, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  access.days by total.yes.gp
## W = 1660, p-value = 2.654e-05
## alternative hypothesis: true location shift is not equal to 0

Post Review relating earliness of ML submission to newly coded (post inter-rater reliability) organisation strategies (Strategies 2 and 5)

##    StudentID ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 20  S8533573     72.95000    100.50000    139.66667     117.4333       86
## 21  S8565657     15.96667     75.10000    114.50000      17.7500       97
## 30  S8578997     68.48333    151.21667    166.26667     164.4333       83
## 40  S8596773    119.93333     44.08333     43.51667     153.7167       83
## 90  S8648397     52.30000    140.26667    144.03333     128.6500       84
##      Ass.earliness Course.grade MLearliness Strat2 Strat5 Strat2.5
## 20  14.88111 hours         83.2   107.63750      1      0   Strat2
## 21  75.97944 hours         91.4    55.82917      2      0   Strat2
## 30  13.03417 hours         69.9   137.60000      1      1     both
## 40 146.38972 hours         96.4    90.31250      1      2     both
## 90  12.94556 hours         81.0   116.31250      1      1     both

##    StudentID ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1   S6089847     8.733333   163.900000    140.26667   115.900000       86
## 3   S8117889    19.866667    47.366667    162.98333   186.316667       75
## 4   S8118323    43.200000    66.950000     19.20000   186.733333       94
## 5   S8152093   117.333333     5.466667      0.50000   168.600000       82
## 6   S8239113   164.033333    66.500000    124.88333   166.466667       91
## 21  S8565657    15.966667    75.100000    114.50000    17.750000       97
## 22  S8575195    99.216667   119.050000    146.56667   113.750000       88
## 23  S8575571     4.366667    18.166667     29.71667     4.283333       86
## 24  S8576691   150.233333   151.266667     90.40000   150.550000       96
##        Ass.earliness Course.grade MLearliness Strat2 Strat5 Strat2.5
## 1  39.17611111 hours         66.6   107.20000      0      0     none
## 3   2.40416667 hours         64.7   104.13333      0      0     none
## 4  37.24472222 hours         78.0    79.02083      0      1   Strat5
## 5   0.08972222 hours         80.7    72.97500      0      1   Strat5
## 6   1.70527778 hours         68.4   130.47083      0      0     none
## 21 75.97944444 hours         91.4    55.82917      2      0   Strat2
## 22  2.81888889 hours         90.7   119.64583      0      2   Strat5
## 23  1.51083333 hours         68.4    14.13333      0      0     none
## 24 13.23583333 hours         89.3   135.61250      0      0     none

## 
##   both   none Strat2 Strat5 
##      3     63      2     24

## 
##        Not Organisers 
##         63         29

## 
##  Welch Two Sample t-test
## 
## data:  Course.grade by org
## t = -1.1785, df = 61.502, p-value = 0.2431
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.943255  1.535155
## sample estimates:
##        mean in group Not mean in group Organisers 
##                 77.82698                 80.03103

all strategies recoded

all.strat = read.csv("All new strategies coded.csv", header=T, stringsAsFactors=F)
cor(all.strat[,2:15])

##               course.mark          X1          X2            X3
## course.mark   1.000000000  0.03275715  0.19154943 -0.2018816967
## X1            0.032757148  1.00000000  0.31999507  0.1255206442
## X2            0.191549431  0.31999507  1.00000000 -0.0370540279
## X3           -0.201881697  0.12552064 -0.03705403  1.0000000000
## X4           -0.006170415 -0.08858909 -0.02444770 -0.0836891433
## X5            0.051955886  0.10443246  0.13615500  0.1116975269
## X6            0.064109286 -0.12306464 -0.09031165 -0.3442794733
## X7            0.101618505 -0.08781616  0.10169791 -0.1637664344
## X8           -0.061660518 -0.10063949 -0.08178829 -0.1161892905
## X9            0.091629805  0.08076773  0.03750570 -0.0733046862
## X10           0.121862666 -0.05201565 -0.03817196 -0.1948123689
## Forethought   0.152014462  0.66487864  0.88228155  0.0003859795
## Perf         -0.074095021 -0.09719023  0.01295356  0.1282175287
## Self.reflect  0.127133507  0.01807959 -0.06563776 -0.1832439851
##                        X4           X5          X6           X7
## course.mark  -0.006170415  0.051955886  0.06410929  0.101618505
## X1           -0.088589094  0.104432460 -0.12306464 -0.087816158
## X2           -0.024447700  0.136155003 -0.09031165  0.101697908
## X3           -0.083689143  0.111697527 -0.34427947 -0.163766434
## X4            1.000000000  0.046777419  0.09820317  0.244027344
## X5            0.046777419  1.000000000 -0.25727484  0.005412309
## X6            0.098203173 -0.257274838  1.00000000  0.130611091
## X7            0.244027344  0.005412309  0.13061109  1.000000000
## X8            0.013296562 -0.264383446  0.12908879  0.061323326
## X9           -0.137310715  0.058654311 -0.16509993 -0.100806052
## X10          -0.031530616 -0.183374195  0.30941262  0.017227186
## Forethought  -0.024447700  0.192837558 -0.13105690 -0.002141009
## Perf          0.603128803  0.221803167  0.23073862  0.505264464
## Self.reflect -0.077028115 -0.233662427  0.13825486 -0.111591227
##                       X8          X9         X10   Forethought        Perf
## course.mark  -0.06166052  0.09162980  0.12186267  0.1520144620 -0.07409502
## X1           -0.10063949  0.08076773 -0.05201565  0.6648786449 -0.09719023
## X2           -0.08178829  0.03750570 -0.03817196  0.8822815534  0.01295356
## X3           -0.11618929 -0.07330469 -0.19481237  0.0003859795  0.12821753
## X4            0.01329656 -0.13731072 -0.03153062 -0.0244477000  0.60312880
## X5           -0.26438345  0.05865431 -0.18337419  0.1928375576  0.22180317
## X6            0.12908879 -0.16509993  0.30941262 -0.1310569034  0.23073862
## X7            0.06132333 -0.10080605  0.01722719 -0.0021410086  0.50526446
## X8            1.00000000 -0.11377992  0.23252609 -0.1346780586  0.30022220
## X9           -0.11377992  1.00000000  0.03939345  0.0996946457 -0.17733845
## X10           0.23252609  0.03939345  1.00000000 -0.0381719626 -0.01455959
## Forethought  -0.13467806  0.09969465 -0.03817196  1.0000000000 -0.04833890
## Perf          0.30022220 -0.17733845 -0.01455959 -0.0483388991  1.00000000
## Self.reflect  0.10132603  0.52102121  0.68736266 -0.0112201297 -0.18548301
##              Self.reflect
## course.mark    0.12713351
## X1             0.01807959
## X2            -0.06563776
## X3            -0.18324399
## X4            -0.07702811
## X5            -0.23366243
## X6             0.13825486
## X7            -0.11159123
## X8             0.10132603
## X9             0.52102121
## X10            0.68736266
## Forethought   -0.01122013
## Perf          -0.18548301
## Self.reflect   1.00000000

with(all.strat, cor.test(course.mark, X3))

## 
##  Pearson's product-moment correlation
## 
## data:  course.mark and X3
## t = -2.0091, df = 95, p-value = 0.04737
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.385793314 -0.002538571
## sample estimates:
##        cor 
## -0.2018817

sapply(all.strat[3:12], sum)

##  X1  X2  X3  X4  X5  X6  X7  X8  X9 X10 
##   1   7  83  47  36 100  56  63 108  20

grouping based on ML submission earliness

names(all.e)

## [1] "StudentID"     "ML1earliness"  "ML2earliness"  "ML3earliness" 
## [5] "ML4earliness"  "Ass.mark"      "Ass.earliness" "Course.grade" 
## [9] "MLearliness"

head(all.e)

##   StudentID ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1  S6089847     8.733333   163.900000     140.2667     115.9000       86
## 2  S6089847   139.866667   163.900000     140.2667     115.9000       86
## 3  S8117889    19.866667    47.366667     162.9833     186.3167       75
## 4  S8118323    43.200000    66.950000      19.2000     186.7333       94
## 5  S8152093   117.333333     5.466667       0.5000     168.6000       82
## 6  S8239113   164.033333    66.500000     124.8833     166.4667       91
##       Ass.earliness Course.grade MLearliness
## 1 39.17611111 hours         66.6   107.20000
## 2 39.17611111 hours         66.6   139.98333
## 3  2.40416667 hours         64.7   104.13333
## 4 37.24472222 hours         78.0    79.02083
## 5  0.08972222 hours         80.7    72.97500
## 6  1.70527778 hours         68.4   130.47083

write.csv(all.e, file = "ReviewData.csv")

MLearliness.cor = cor(all.e[,2:5])
write.csv(MLearliness.cor, file = "MLearlinessCorrelations.csv")

require(reshape2)

## Loading required package: reshape2

all.melt = melt(all.e[,1:5])

## Using StudentID as id variables

all.melt[seq(1, 376, 10),]

##     StudentID     variable      value
## 1    S6089847 ML1earliness   8.733333
## 11   S8465063 ML1earliness 102.500000
## 21   S8565657 ML1earliness  15.966667
## 31   S8582165 ML1earliness 149.850000
## 41   S8599407 ML1earliness  43.200000
## 51   S8635621 ML1earliness 144.550000
## 61   S8639321 ML1earliness 168.566667
## 71   S8643279 ML1earliness  28.666667
## 81   S8646161 ML1earliness 147.750000
## 91   S8650953 ML1earliness 167.933333
## 101  S8283571 ML2earliness  51.433333
## 111  S8526677 ML2earliness 164.583333
## 121  S8577829 ML2earliness 125.283333
## 131  S8587857 ML2earliness  28.616667
## 141  S8634295 ML2earliness 143.050000
## 151  S8638219 ML2earliness 124.966667
## 161  S8641369 ML2earliness 137.550000
## 171  S8644267 ML2earliness 115.500000
## 181  S8647667 ML2earliness 162.166667
## 191  S8117889 ML3earliness 162.983333
## 201  S8471971 ML3earliness   4.233333
## 211  S8575571 ML3earliness  29.716667
## 221  S8582465 ML3earliness 171.900000
## 231  S8603333 ML3earliness  70.000000
## 241  S8635995 ML3earliness  51.383333
## 251  S8639711 ML3earliness  19.466667
## 261  S8643377 ML3earliness 128.600000
## 271  S8646489 ML3earliness  97.666667
## 281  S8651655 ML3earliness 161.516667
## 291  S8407099 ML4earliness 190.900000
## 301  S8533121 ML4earliness   1.483333
## 311  S8578719 ML4earliness 141.483333
## 321  S8593947 ML4earliness 164.550000
## 331  S8635555 ML4earliness 162.850000
## 341  S8638579 ML4earliness  13.683333
## 351  S8641669 ML4earliness  22.400000
## 361  S8645607 ML4earliness 195.666667
## 371  S8648209 ML4earliness 160.366667

str(all.melt)

## 'data.frame':    376 obs. of  3 variables:
##  $ StudentID: chr  "S6089847" "S6089847" "S8117889" "S8118323" ...
##  $ variable : Factor w/ 4 levels "ML1earliness",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ value    : num  8.73 139.87 19.87 43.2 117.33 ...

library(lattice)
histogram(~ value | variable, data = all.melt)

ggplot(all.melt, aes(value, fill = variable)) + geom_histogram(binwidth = 24) + facet_grid(variable ~ ., margins = TRUE, scales = "free") + xlim(0,225)

Meta-Learning Analytics

KZ

8 August 2014

Research Question Brain Storm