Meta-Learning Analytics

available at: at http://rpubs.com/kirstenz/24661

Research Question Brain Storm

with Kay and Louise Friday 8 August

Students who report looking at lecture recordings, do they?

Student plans don’t come to fruition (ref)

-> so if students didn’t say they use lecture recordings and now plan to (in response to weakness or Mid-Sem), does access change? For how long?

control: if said do, and plan to, then lecture recording viewing should be stable, but will give insights into normal semester/lecturer fluctuations

=> by eye, lectures were tues arvo and wed morning and access peaks tue, wed and thurs (can visualise with calendarHeat figures)

how do students prepare for classes => do students look at prac videos

=> by eye Thurs/Fri for Fri prac,

how many of 5 prac’s with videos (as how many times/weeks they access)
are there changes over time ie students stop looking when they realise not so useful

Which students look at lecture recordings

actually figures give how many students access how often first, but calculations do determine which students look, and when

data in “Folder access across semester.xls” moved to “LectAccess.csv”

clean - remove name column, remove empty rows (233-678)
move total column and total row to new vectors, and remove

clean - keep only consenting students reports number of students by number of variables

## [1] 230 116

## [1]  99 116

clean - De-ID students so can push to html

basic structure of data

##    StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1   S6089847        1        1        0        0
## 3   S8117889        0        1        4        0
## 4   S8118323        2        0        0        0
## 5   S8152093        0        0        0        0
## 6   S8239113        1        1        0        0
## 7   S8283571        0        2        0        0
## 9   S8395419        1        0        0        0
## 12  S8407099        3        8        4        0
## 13  S8408815        0        0        0        0
## 14  S8465063        0        0        1        0

dimensions (rows by columns)

## [1]  99 116

Number of students who looked at lectures x number of times plot of chunk unnamed-chunk-7

clipped x axis at 100 access clicks to zoom into lower end plot of chunk unnamed-chunk-8

Converted x axis to log to spread clumped data into normal-ish curve
plot of chunk unnamed-chunk-9

NB Log scale:
0 = 1
1 ~ 3
2 ~ 7
3 ~ 20
4 ~ 55
5 ~ 150
6 ~ 403

Working out viewings by day - number of times folder accessed per day (access.day), number of students who access each day (stud.day)…

Working out number of times (access.stud) and number of days (days.stud) each student accessed…

Loading ‘describe’ function to get descriptive stat’s…

Descriptive stat’s for viewings by day and by student:
Number of times lecture recording folder was accessed per day

##    min    max median   mean     SD    SEM      n    NAs    sum 
##    0.0  162.0   30.0   40.1   31.3    2.9  114.0    0.0 4570.0

Number of students who accessed lecture recordings each day

##    min    max median   mean     SD    SEM      n    NAs    sum 
##    0.0   43.0   13.0   14.9    9.9    0.9  114.0    0.0 1697.0

Number of times each student accessed the lecture recordings

##    min    max median   mean     SD    SEM      n    NAs    sum 
##    4.0  194.0   35.0   46.2   35.0    3.5   99.0    0.0 4570.0

Number of days each student accessed the lecture recordings

##    min    max median   mean     SD    SEM      n    NAs    sum 
##    3.0   41.0   16.0   17.1    8.5    0.9   99.0    0.0 1697.0

Useful conclusions: 114 days (16 weeks, 2 days) in data for 99 consenting students (cohort 231)
Large range in the number of access hits (0, 22) recorded for each student each day. Overall, the number of access hits per day is 2-3x number of students who access per day, and number of access hits per student is also 2-3x the number of days a student access the folder.
Since we don’t really know how the number of folder openings is tracked by Blackboard (could be refreshings), the number of students is probably a better way of looking at the data than number of times the folder is ‘opened’.

On average 15 +/- 1 (mean+/-SEM) students accessed each day, with a max of 43 students one day (1/5/14).

On average students accessed lecture recordings on 17 days, with a max of 41 days and a minimum of 3 days. So there were no students who didn’t access lecture recordings at all?

## days.stud
##  3  4  5  6  7  8  9 10 12 13 14 15 16 17 18 19 20 21 22 23 25 26 27 28 29 
##  1  1  3  3  6  1  5  8  5  5  5  6  1  6  6  1  5  3  4  3  3  4  1  1  1 
## 30 31 32 33 34 35 41 
##  2  1  1  1  3  2  1

So, only 1 student looked on three days… only 1 student looked on 41 days, the majority looked on 7-26 days. Or as a histogram:

plot of chunk unnamed-chunk-18

Transposed data (lat) so we can get real dates…

## [1] 114 100

##         Date S8530605 S8636955 S8475915 S8645607
## 1 2014-03-04        0        1        3        2
## 2 2014-03-05        1        2        8        1
## 3 2014-03-06        0        4        0        0
## 4 2014-03-07        0        2        0        0
## 5 2014-03-08        0        2        0        0

Summed numbers of times the lecture recording folder was accessed and number of students who accessed lecture recording folder per day…

Loaded calanderHeat function…

Calendar of number of times lecture recording folder was accessesed each day plot of chunk unnamed-chunk-22

Calendar of number of students who accessed lecture recordings each day plot of chunk unnamed-chunk-23

Cluster analysis

Built data frame with student ID and T/F for access each day… NB created 2 data frames: la.norm 0 = not accessed, 1 = accessed; la.norm2 1 = accessed, NA = not accessed (NA = missing value), but cluster analysis errors with NA => don’t use data with missing values for cluster analysis

Use la.norm to cluster leture recording access

distances = dist(la.norm[2:115], method = "euclidean")
clusterLA = hclust(distances, method = "ward") 
plot(clusterLA)

plot of chunk unnamed-chunk-25

clusterGroups3 = cutree(clusterLA, k = 3)
la.norm$cluster3 = clusterGroups3
dim(la.norm)

## [1]  99 116

la.norm[1:5,1:5]

##   StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1  S6089847        1        1        0        0
## 3  S8117889        0        1        1        0
## 4  S8118323        1        0        0        0
## 5  S8152093        0        0        0        0
## 6  S8239113        1        1        0        0

la.norm[1:5,110:116]

##   X20.06.14 X21.06.14 X22.06.14 X23.06.14 X24.06.14 X25.06.14 cluster3
## 1         0         0         0         0         0         0        1
## 3         1         1         1         1         1         0        2
## 4         1         0         0         0         0         0        1
## 5         0         1         0         0         0         0        3
## 6         0         0         0         0         0         0        1

## [1] "The number of students in each cluster by the the number of variables"

## [1]  26 116
## [1]  36 116
## [1]  37 116
## [1]   0 116
## [1]   0 116

So what are the characteristics of the clusters - how often do students view lecture recordings and when:

##    min    max median   mean     SD    SEM      n    NAs    sum 
##   15.0   41.0   25.5   25.7    7.4    1.5   26.0    0.0  668.0 
##    min    max median   mean     SD    SEM      n    NAs    sum 
##   12.0   30.0   18.0   19.1    5.3    0.9   36.0    0.0  687.0 
##    min    max median   mean     SD    SEM      n    NAs    sum 
##    3.0   17.0    9.0    9.2    3.5    0.6   37.0    0.0  342.0

## Warning: no non-missing arguments to min; returning Inf
## Warning: no non-missing arguments to max; returning -Inf

##    min    max median   mean     SD    SEM      n    NAs    sum 
##    Inf   -Inf     NA    NaN     NA     NA      0      0      0

## Warning: no non-missing arguments to min; returning Inf
## Warning: no non-missing arguments to max; returning -Inf

##    min    max median   mean     SD    SEM      n    NAs    sum 
##    Inf   -Inf     NA    NaN     NA     NA      0      0      0

plot of chunk unnamed-chunk-27

To see ‘when’ need to get cluster groups into lat (transposed version)

The run calendarHeat for all 3 clusters…
plot of chunk unnamed-chunk-29

Load in qualitative coding “pattern of lecture recording use ML” -> “qual.csv”
clean - de-identify

clean - capitalisation, converted “no info”, “deferred” and “” to NA (ie missing)

Data structure

## [1] 99 10

##   StudentID ML1.previous ML2.planMS ML3.usedMS ML4.planEOS total.no
## 1  S6089847           No        Yes         No          No        3
## 2  S8117889           No         No         No          No        4
## 3  S8118323          Yes         No         No          No        3
## 4  S8152093        Maybe         No      Maybe          No        2
## 5  S8239113          Yes         No        Yes          No        2
##   total.yes total.maybe total.noinfo access
## 1         1           0            0     21
## 2         0           0            0     75
## 3         1           0            0    122
## 4         0           2            0     13
## 5         2           0            0     89

##  ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
##  Maybe:18     Maybe: 5   Maybe: 4   Maybe: 3   
##  No   :52     No   :77   No   :69   No   :71   
##  Yes  :27     Yes  :15   Yes  :23   Yes  :17   
##  NA's : 2     NA's : 2   NA's : 3   NA's : 8

Patterns of self-reported lecture recording use

## 
## Maybe    No   Yes 
##    18    52    27 
## [1] "ML1.previous"
## 
## Maybe    No   Yes 
##     5    77    15 
## [1] "ML2.planMS"
## 
## Maybe    No   Yes 
##     4    69    23 
## [1] "ML3.usedMS"
## 
## Maybe    No   Yes 
##     3    71    17 
## [1] "ML4.planEOS"

##             ML2.planMS
## ML1.previous Maybe No Yes Sum
##        Maybe     1 16   0  17
##        No        2 40   9  51
##        Yes       2 19   6  27
##        Sum       5 75  15  95

##           ML3.usedMS
## ML2.planMS Maybe No Yes Sum
##      Maybe     0  4   0   4
##      No        4 57  14  75
##      Yes       0  6   9  15
##      Sum       4 67  23  94

Concl:
Most students (52/99 (i.e. 53%)) report that they don’t usually use lecture recordings, even more didn’t plan to use lecture recordings for mid-semeter exam (77/99) and a similar number didn’t use lecture recordings for mid-semeter exam (69/99), and this was the same for the end of semester exam (71/99).
This seems inconsistent with the number of students who do use lecture recordings (all 99 at some point), and the majority used lecture recordings on 7-26 days, which is still half to twice the number of weeks in semester so ~ once/fortnight to twice/week.

What are the patterns of No, No, No, No etc, similar to what Kay calculated as number of No’s, Yes’, Maybe’s (tables have the number of no’s 0-4 in header row, then frequency (number of students) in 2nd row)

## 
##  0  1  2  3  4 
##  2 15 21 32 29 
## [1] "total.no"
## 
##  0  1  2  3  4 
## 52 27  7 11  2 
## [1] "total.yes"
## 
##  0  1  2 
## 72 24  3 
## [1] "total.maybe"

Most frequent patterns of repsonse:

## 
## No Yes Yes Yes  Yes No No Yes Yes Yes Yes No   No Yes No No Yes No Yes Yes 
##              3              3              3              4              4 
##   Yes No No No Maybe No No No    No No No No 
##              8              9             29

everything else was reported by 2 or less students.

So there is definitely a group of 29 students who never report using lecture recordings (LR). There are 27 students who report that they usually used LR.
Of these, 6 plan to use LR for mid-sem, 2 maybes, and 19 don’t mention LR for mid-sem prep. There are 52 (51?) students who don’t report usually using LR. Of these, 9 plan to use LR for mid-sem, 2 maybes, and 40 don’t mention LR for mid-sem prep.

How often do these groups of students use LR?

## [1]  99 116

##   StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1  S6089847        1        1        0        0
## 3  S8117889        0        1        1        0
## 4  S8118323        1        0        0        0
## 5  S8152093        0        0        0        0
## 6  S8239113        1        1        0        0

##   X20.06.14 X21.06.14 X22.06.14 X23.06.14 X24.06.14 X25.06.14 cluster3
## 1         0         0         0         0         0         0        1
## 3         1         1         1         1         1         0        2
## 4         1         0         0         0         0         0        1
## 5         0         1         0         0         0         0        3
## 6         0         0         0         0         0         0        1

## [1]  99 126

##   StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1  S6089847        1        1        0        0
## 2  S8117889        0        1        1        0
## 3  S8118323        1        0        0        0
## 4  S8152093        0        0        0        0
## 5  S8239113        1        1        0        0

##   X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1         0        1           No        Yes         No          No
## 2         0        2           No         No         No          No
## 3         0        1          Yes         No         No          No
## 4         0        3        Maybe         No      Maybe          No
## 5         0        1          Yes         No        Yes          No
##   total.no total.yes total.maybe total.noinfo access           pattern
## 1        3         1           0            0     21      No Yes No No
## 2        4         0           0            0     75       No No No No
## 3        3         1           0            0    122      Yes No No No
## 4        2         0           2            0     13 Maybe No Maybe No
## 5        2         2           0            0     89     Yes No Yes No

## [1]  99 128

##   X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1         0        1           No        Yes         No          No
## 2         0        2           No         No         No          No
## 3         0        1          Yes         No         No          No
## 4         0        3        Maybe         No      Maybe          No
## 5         0        1          Yes         No        Yes          No
##   total.no total.yes total.maybe total.noinfo access           pattern
## 1        3         1           0            0     21      No Yes No No
## 2        4         0           0            0     75       No No No No
## 3        3         1           0            0    122      Yes No No No
## 4        2         0           2            0     13 Maybe No Maybe No
## 5        2         2           0            0     89     Yes No Yes No
##   prevLR access.days
## 1     No          15
## 2     No          26
## 3    Yes          33
## 4    Yes          10
## 5    Yes          28

Statistical tests:
Wilcox (ie unpaired t test for categorical data)
Do students who report usually using LR, access more LR? First as number of folder openings, then as number of days. (order is test, mean, sem)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  access by prevLR
## W = 739, p-value = 0.00184
## alternative hypothesis: true location shift is not equal to 0

##   No  Yes 
## 37.9 56.2

## [1] 4.796
## [1] 5.033

##  No Yes 
## 4.8 5.0

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  access.days by prevLR
## W = 790, p-value = 0.005989
## alternative hypothesis: true location shift is not equal to 0

##    No   Yes 
## 14.96 19.82

## [1] 1.078
## [1] 1.312

##    No   Yes 
## 1.078 1.312

Do students who report usually using LR, fall into different clusters? (order is test, table, mean, sem)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  cluster3 by prevLR
## W = 1474, p-value = 0.01951
## alternative hypothesis: true location shift is not equal to 0

##       cluster3
## prevLR  1  2  3 Sum
##    No   9 19 24  52
##    Yes 16 17 12  45
##    Sum 25 36 36  97

##   No  Yes 
## 2.29 1.91

## [1] 0.104
## [1] 0.1182

##   No  Yes 
## 0.10 0.12

Kay’s patterns

1 Previous and did >3 y yyyy ynyy yyny yyyn ynyn 2 previous/intended, but did not ynnn yynn ynny -> 3
3 No previous use, but then did or intended nnyy nyyy nnny -> 2 4 No previous use, intention but not nyny nynn
5 No report nnnn
0 Don’t fit?

Load “Kay.gp.index.csv” fixed for paper version of group names ie 2 and 3 swapped clean - de-identified
merge into la.norm.qual

##   X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1         0        1           No        Yes         No          No
## 2         0        2           No         No         No          No
## 3         0        1          Yes         No         No          No
## 4         0        3        Maybe         No      Maybe          No
## 5         0        1          Yes         No        Yes          No
##   total.no total.yes total.maybe total.noinfo access           pattern
## 1        3         1           0            0     21      No Yes No No
## 2        4         0           0            0     75       No No No No
## 3        3         1           0            0    122      Yes No No No
## 4        2         0           2            0     13 Maybe No Maybe No
## 5        2         2           0            0     89     Yes No Yes No
##   prevLR access.days Kay.pattern
## 1     No          15           4
## 2     No          26           5
## 3    Yes          33           3
## 4    Yes          10           0
## 5    Yes          28           1

##    
##     Maybe Maybe No No Maybe NA No No Maybe No Maybe No Maybe No NA No
##   0                 1              1                 2              1
##   1                 0              0                 0              0
##   2                 0              0                 0              0
##   3                 0              0                 0              0
##   4                 0              0                 0              0
##   5                 0              0                 0              0
##    
##     Maybe No No NA Maybe No No No Maybe No Yes NA Maybe No Yes No
##   0              1              0               0               0
##   1              0              0               1               2
##   2              0              0               0               0
##   3              0              0               0               0
##   4              0              0               0               0
##   5              0              9               0               0
##    
##     NA No No No NA No No Yes No Maybe No No No NA No No No No Maybe No
##   0           1            1              2           0              0
##   1           0            0              0           0              0
##   2           0            0              0           0              0
##   3           0            0              0           0              0
##   4           0            0              0           0              0
##   5           0            0              0           1              1
##    
##     No No Maybe Yes No No NA No No No No NA No No No No No No No Yes
##   0               0           1           0           0            0
##   1               0           0           0           0            0
##   2               1           0           0           0            1
##   3               0           0           0           0            0
##   4               0           0           0           0            0
##   5               0           0           2          29            0
##    
##     No No Yes NA No No Yes No No No Yes Yes No Yes No Maybe No Yes No No
##   0            0            0             0               0            0
##   1            0            0             0               0            0
##   2            2            2             1               0            0
##   3            0            0             0               0            0
##   4            0            0             0               1            4
##   5            0            0             0               0            0
##    
##     No Yes Yes Maybe No Yes Yes Yes Yes Maybe NA No Yes Maybe No No
##   0                0              0               1               1
##   1                0              0               0               0
##   2                1              3               0               0
##   3                0              0               0               0
##   4                0              0               0               0
##   5                0              0               0               0
##    
##     Yes No No Maybe Yes No No NA Yes No No No Yes No No Yes Yes No Yes NA
##   0               0            1            0             0             1
##   1               0            0            0             0             0
##   2               0            0            0             0             0
##   3               1            0            8             3             0
##   4               0            0            0             0             0
##   5               0            0            0             0             0
##    
##     Yes No Yes No Yes No Yes Yes Yes Yes No Yes Yes Yes Yes No
##   0             0              0              0              0
##   1             1              4              1              3
##   2             0              0              0              0
##   3             0              0              0              0
##   4             0              0              0              0
##   5             0              0              0              0
##    
##     Yes Yes Yes Yes
##   0               0
##   1               2
##   2               0
##   3               0
##   4               0
##   5               0

Alignment Kay gp with cluster3

##         Kay.pattern
## cluster3  0  1  2  3  4  5
##        1  3  5  3  5  1  9
##        2  2  8  6  4  4 12
##        3 10  1  2  3  0 21

What do the 3 clusters look like?

## 
##  1  2  3 
## 26 36 37

Alignment between 3 clusters and self report

##         ML1.previous
## cluster3 Maybe No Yes
##        1     6  9  10
##        2     5 19  12
##        3     7 24   5

##         ML2.planMS
## cluster3 Maybe No Yes
##        1     0 20   6
##        2     1 28   7
##        3     4 29   2

##         ML3.usedMS
## cluster3 Maybe No Yes
##        1     0 18   8
##        2     2 21  13
##        3     2 30   2

##         ML4.planEOS
## cluster3 Maybe No Yes
##        1     2 14   6
##        2     1 24   8
##        3     0 33   3

##         total.no
## cluster3       0       1       2       3       4     Sum
##      1   0.50000 0.16667 0.11905 0.15625 0.06897 0.13131
##      2   0.00000 0.23333 0.23810 0.17188 0.13793 0.18182
##      3   0.00000 0.10000 0.14286 0.17188 0.29310 0.18687
##      Sum 0.50000 0.50000 0.50000 0.50000 0.50000 0.50000

##         
## cluster3 FALSE TRUE Sum
##      1      12   14  26
##      2      17   19  36
##      3       9   28  37
##      Sum    38   61  99

##         total.yes
## cluster3  0  1  2  3  4
##        1 10  9  2  3  2
##        2 12 14  4  6  0
##        3 30  4  1  2  0

Kay’s rules for 3 groups 1 = 3-4 y 2 = any 2 y + 2 N, 2n+y+m, 2y+n+m 3 = 3-4n 0 = noinfo

##  [1] "3" "3" "3" ""  ""  "3" "3" "1" "3" "3" "3" ""  "1" ""  "3"

## 
##       1   2   3 Sum 
##   2  13  23  61  99

plot of chunk unnamed-chunk-44

## [1]  23 130

##    ML1.previous ML2.planMS ML3.usedMS ML4.planEOS total.no total.yes
## 4         Maybe         No      Maybe          No        2         0
## 5           Yes         No        Yes          No        2         2
## 12          Yes         No        Yes        <NA>        1         2
## 14        Maybe      Maybe         No          No        2         0
## 16           No        Yes        Yes       Maybe        1         2
## 18        Maybe         No        Yes          No        2         1
## 19          Yes         No         No         Yes        2         2
## 23        Maybe         No        Yes          No        2         1
## 25        Maybe         No      Maybe          No        2         0
## 33        Maybe       <NA>         No          No        2         0
## 42           No        Yes         No       Maybe        2         1
## 45          Yes         No         No         Yes        2         2
## 46           No         No        Yes        <NA>        2         1
## 53         <NA>         No         No         Yes        2         1
## 59        Maybe         No         No        <NA>        2         0
## 60          Yes         No         No        <NA>        2         1
## 63          Yes         No         No       Maybe        2         1
## 70           No         No      Maybe         Yes        2         1
## 71          Yes      Maybe         No          No        2         1
## 74           No         No        Yes        <NA>        2         1
## 79          Yes         No         No         Yes        2         2
## 82        Maybe         No       <NA>          No        2         0
## 99           No         No        Yes         Yes        2         2

## [1] 8636869

## [1] 60

##    StudentID ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 60  S8636869          Yes         No         No        <NA>

##    ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1            No        Yes         No          No
## 2            No         No         No          No
## 3           Yes         No         No          No
## 4         Maybe         No      Maybe          No
## 5           Yes         No        Yes          No
## 6            No         No         No          No
## 7            No         No         No          No
## 8            No        Yes        Yes         Yes
## 9           Yes         No         No          No
## 10           No        Yes         No          No
## 11        Maybe         No         No          No
## 12          Yes         No        Yes        <NA>
## 13          Yes        Yes        Yes         Yes
## 14        Maybe      Maybe         No          No
## 15           No         No         No          No
## 16           No        Yes        Yes       Maybe
## 17          Yes         No         No          No
## 18        Maybe         No        Yes          No
## 19          Yes         No         No         Yes
## 20          Yes         No         No          No

## 
## FALSE  TRUE 
##   381    15

## 
## FALSE  TRUE 
##   351    30

##      cluster3
## Kay3   1  2  3 Sum
##        1  0  1   2
##   1    5  6  2  13
##   2    6 11  6  23
##   3   14 19 28  61
##   Sum 26 36 37  99

Trying a 2 cluster solution Use la.norm to cluster leture recording access

distances = dist(la.norm[2:115], method = "euclidean")
clusterLA = hclust(distances, method = "ward") 
plot(clusterLA)

plot of chunk unnamed-chunk-45

clusterGroups2 = cutree(clusterLA, k = 2)
la.norm$cluster2 = clusterGroups2
dim(la.norm)

## [1]  99 117

la.norm[1:5,1:5]

##   StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1  S6089847        1        1        0        0
## 3  S8117889        0        1        1        0
## 4  S8118323        1        0        0        0
## 5  S8152093        0        0        0        0
## 6  S8239113        1        1        0        0

la.norm[1:5,ncol(la.norm)]

## [1] 1 1 1 2 1

la.norm[1:5,115:117]

##   X25.06.14 cluster3 cluster2
## 1         0        1        1
## 3         0        2        1
## 4         0        1        1
## 5         0        3        2
## 6         0        1        1

addmargins(with(la.norm, table(cluster3, cluster2)))

##         cluster2
## cluster3  1  2 Sum
##      1   26  0  26
##      2   36  0  36
##      3    0 37  37
##      Sum 62 37  99

moving cluster2 over to la.norm.qual

Kay.office = la.norm
df = Kay.office 
dim(df)

## [1]  99 117

df = cbind(df$StudentID, df[117])
dim(df)

## [1] 99  2

df[1:5,]

##   df$StudentID cluster2
## 1     S6089847        1
## 3     S8117889        1
## 4     S8118323        1
## 5     S8152093        2
## 6     S8239113        1

Kay.office = df
dim(Kay.office)

## [1] 99  2

Kay.office[1:5,]

##   df$StudentID cluster2
## 1     S6089847        1
## 3     S8117889        1
## 4     S8118323        1
## 5     S8152093        2
## 6     S8239113        1

names(Kay.office) = c("StudentID", "cluster2")

la.norm.qual = merge(la.norm.qual, Kay.office, by="StudentID")

dim(la.norm.qual)

## [1]  99 131

la.norm.qual[1:5,1:5]

##   StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1  S6089847        1        1        0        0
## 2  S8117889        0        1        1        0
## 3  S8118323        1        0        0        0
## 4  S8152093        0        0        0        0
## 5  S8239113        1        1        0        0

la.norm.qual[1:5,125:131]

##   access           pattern prevLR access.days Kay.pattern Kay3 cluster2
## 1     21      No Yes No No     No          15           4    3        1
## 2     75       No No No No     No          26           5    3        1
## 3    122      Yes No No No    Yes          33           3    3        1
## 4     13 Maybe No Maybe No    Yes          10           0    2        2
## 5     89     Yes No Yes No    Yes          28           1    2        1

with(la.norm.qual, table(Kay3, cluster2))

##     cluster2
## Kay3  1  2
##       1  1
##    1 11  2
##    2 17  6
##    3 33 28

addmargins(with(la.norm.qual, table(total.yes, cluster2)))

##          cluster2
## total.yes  1  2 Sum
##       0   22 30  52
##       1   23  4  27
##       2    6  1   7
##       3    9  2  11
##       4    2  0   2
##       Sum 62 37  99

with(la.norm.qual, tapply(access, cluster2, mean))

##     1     2 
## 62.50 18.78

with(la.norm.qual, tapply(access, cluster2, sem))

## [1] 4.276
## [1] 2.205

##     1     2 
## 4.276 2.205

with(la.norm.qual, tapply(access.days, cluster2, mean))

##      1      2 
## 21.855  9.243

with(la.norm.qual, tapply(access.days, cluster2, sem))

## [1] 0.8915
## [1] 0.57

##      1      2 
## 0.8915 0.5700

la.norm.qual[1:5,115:131]

##   X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1         0        1           No        Yes         No          No
## 2         0        2           No         No         No          No
## 3         0        1          Yes         No         No          No
## 4         0        3        Maybe         No      Maybe          No
## 5         0        1          Yes         No        Yes          No
##   total.no total.yes total.maybe total.noinfo access           pattern
## 1        3         1           0            0     21      No Yes No No
## 2        4         0           0            0     75       No No No No
## 3        3         1           0            0    122      Yes No No No
## 4        2         0           2            0     13 Maybe No Maybe No
## 5        2         2           0            0     89     Yes No Yes No
##   prevLR access.days Kay.pattern Kay3 cluster2
## 1     No          15           4    3        1
## 2     No          26           5    3        1
## 3    Yes          33           3    3        1
## 4    Yes          10           0    2        2
## 5    Yes          28           1    2        1

addmargins(with(la.norm.qual, table(ML1.previous, cluster2)))

##             cluster2
## ML1.previous  1  2 Sum
##        Maybe 11  7  18
##        No    28 24  52
##        Yes   22  5  27
##        Sum   61 36  97

addmargins(with(la.norm.qual, table(ML2.planMS, cluster2)))

##           cluster2
## ML2.planMS  1  2 Sum
##      Maybe  1  4   5
##      No    48 29  77
##      Yes   13  2  15
##      Sum   62 35  97

addmargins(with(la.norm.qual, table(ML3.usedMS, cluster2)))

##           cluster2
## ML3.usedMS  1  2 Sum
##      Maybe  2  2   4
##      No    39 30  69
##      Yes   21  2  23
##      Sum   62 34  96

addmargins(with(la.norm.qual, table(ML4.planEOS, cluster2)))

##            cluster2
## ML4.planEOS  1  2 Sum
##       Maybe  3  0   3
##       No    38 33  71
##       Yes   14  3  17
##       Sum   55 36  91

addmargins(with(la.norm.qual, table(ML1.previous == "Yes", cluster2)))

##        cluster2
##          1  2 Sum
##   FALSE 39 31  70
##   TRUE  22  5  27
##   Sum   61 36  97

addmargins(with(la.norm.qual, table(ML2.planMS == "Yes", cluster2)))

##        cluster2
##          1  2 Sum
##   FALSE 49 33  82
##   TRUE  13  2  15
##   Sum   62 35  97

addmargins(with(la.norm.qual, table(ML3.usedMS == "Yes", cluster2)))

##        cluster2
##          1  2 Sum
##   FALSE 41 32  73
##   TRUE  21  2  23
##   Sum   62 34  96

addmargins(with(la.norm.qual, table(ML4.planEOS == "Yes", cluster2)))

##        cluster2
##          1  2 Sum
##   FALSE 41 33  74
##   TRUE  14  3  17
##   Sum   55 36  91

Then run calendarHeat for all 2 clusters…

## [1]  62 131

## [1]  37 131

## [1]  62 115

##   StudentID X4.03.14 X5.03.14 X6.03.14 X7.03.14
## 1  S6089847        1        1        0        0
## 2  S8117889        0        1        1        0
## 3  S8118323        1        0        0        0
## 5  S8239113        1        1        0        0
## 6  S8283571        0        1        0        0

##   X20.06.14 X21.06.14 X22.06.14 X23.06.14 X24.06.14 X25.06.14
## 1         0         0         0         0         0         0
## 2         1         1         1         1         1         0
## 3         1         0         0         0         0         0
## 5         0         0         0         0         0         0
## 6         0         0         0         0         1         0

## [1] "matrix"

## [1] 115  62

##           1          2          3          5          6         
## StudentID "S6089847" "S8117889" "S8118323" "S8239113" "S8283571"
## X4.03.14  "1"        "0"        "1"        "1"        "0"       
## X5.03.14  "1"        "1"        "0"        "1"        "1"       
## X6.03.14  "0"        "1"        "0"        "0"        "0"       
## X7.03.14  "0"        "0"        "0"        "0"        "0"

##           79         81         86         88         90        
## StudentID "S8643917" "S8644267" "S8646161" "S8646489" "S8647069"
## X4.03.14  "0"        "0"        "1"        "1"        "1"       
## X5.03.14  "1"        "0"        "1"        "1"        "1"       
## X6.03.14  "1"        "1"        "1"        "0"        "0"       
## X7.03.14  "0"        "0"        "0"        "0"        "0"       
##           95         98         99        
## StudentID "S8648397" "S8651655" "S8651793"
## X4.03.14  "0"        "0"        "1"       
## X5.03.14  "0"        "0"        "0"       
## X6.03.14  "0"        "1"        "1"       
## X7.03.14  "0"        "0"        "0"

## [1] "data.frame"

## [1] 115  62

##                  1        2        3        5        6
## StudentID S6089847 S8117889 S8118323 S8239113 S8283571
## X4.03.14         1        0        1        1        0
## X5.03.14         1        1        0        1        1
## X6.03.14         0        1        0        0        0
## X7.03.14         0        0        0        0        0

##                 79       81       86       88       90       95       98
## StudentID S8643917 S8644267 S8646161 S8646489 S8647069 S8648397 S8651655
## X4.03.14         0        0        1        1        1        0        0
## X5.03.14         1        0        1        1        1        0        0
## X6.03.14         1        1        1        0        0        0        1
## X7.03.14         0        0        0        0        0        0        0
##                 99
## StudentID S8651793
## X4.03.14         1
## X5.03.14         0
## X6.03.14         1
## X7.03.14         0

##          79 81 86 88 90 95 98 99    Dates
## X4.03.14  0  0  1  1  1  0  0  1 X4.03.14
## X5.03.14  1  0  1  1  1  0  0  0 X5.03.14
## X6.03.14  1  1  1  0  0  0  1  1 X6.03.14
## X7.03.14  0  0  0  0  0  0  0  0 X7.03.14

## [1] "data.frame"

## [1] 114  63

##          1 2 3 5 6
## X4.03.14 1 0 1 1 0
## X5.03.14 1 1 0 1 1
## X6.03.14 0 1 0 0 0
## X7.03.14 0 0 0 0 0
## X8.03.14 0 0 0 1 0

##           79 81 86 88 90 95 98 99     Dates
## X4.03.14   0  0  1  1  1  0  0  1  X4.03.14
## X5.03.14   1  0  1  1  1  0  0  0  X5.03.14
## X6.03.14   1  1  1  0  0  0  1  1  X6.03.14
## X7.03.14   0  0  0  0  0  0  0  0  X7.03.14
## X8.03.14   0  0  0  0  0  0  0  0  X8.03.14
## X9.03.14   0  0  0  0  0  0  0  0  X9.03.14
## X10.03.14  0  1  0  0  1  0  0  0 X10.03.14
## X11.03.14  0  1  0  1  0  0  0  0 X11.03.14
## X12.03.14  1  0  0  0  0  0  0  0 X12.03.14
## X13.03.14  0  0  0  0  0  0  1  0 X13.03.14
## X14.03.14  1  0  0  0  0  0  0  0 X14.03.14
## X15.03.14  0  0  0  0  0  0  0  0 X15.03.14
## X16.03.14  0  0  0  0  0  0  0  0 X16.03.14
## X17.03.14  0  0  0  0  0  0  0  0 X17.03.14
## X18.03.14  1  0  0  0  0  1  0  0 X18.03.14

##           79 81 86 88 90 95 98 99     Dates   Dates2
## X4.03.14   0  0  1  1  1  0  0  1  X4.03.14  4.03.14
## X5.03.14   1  0  1  1  1  0  0  0  X5.03.14  5.03.14
## X6.03.14   1  1  1  0  0  0  1  1  X6.03.14  6.03.14
## X7.03.14   0  0  0  0  0  0  0  0  X7.03.14  7.03.14
## X8.03.14   0  0  0  0  0  0  0  0  X8.03.14  8.03.14
## X9.03.14   0  0  0  0  0  0  0  0  X9.03.14  9.03.14
## X10.03.14  0  1  0  0  1  0  0  0 X10.03.14 10.03.14
## X11.03.14  0  1  0  1  0  0  0  0 X11.03.14 11.03.14
## X12.03.14  1  0  0  0  0  0  0  0 X12.03.14 12.03.14
## X13.03.14  0  0  0  0  0  0  1  0 X13.03.14 13.03.14
## X14.03.14  1  0  0  0  0  0  0  0 X14.03.14 14.03.14
## X15.03.14  0  0  0  0  0  0  0  0 X15.03.14 15.03.14
## X16.03.14  0  0  0  0  0  0  0  0 X16.03.14 16.03.14
## X17.03.14  0  0  0  0  0  0  0  0 X17.03.14 17.03.14
## X18.03.14  1  0  0  0  0  1  0  0 X18.03.14 18.03.14

##  chr [1:114] "4.03.14" "5.03.14" "6.03.14" "7.03.14" ...

##           79 81 86 88 90 95 98 99     Dates     Dates2
## X4.03.14   0  0  1  1  1  0  0  1  X4.03.14 2014-03-04
## X5.03.14   1  0  1  1  1  0  0  0  X5.03.14 2014-03-05
## X6.03.14   1  1  1  0  0  0  1  1  X6.03.14 2014-03-06
## X7.03.14   0  0  0  0  0  0  0  0  X7.03.14 2014-03-07
## X8.03.14   0  0  0  0  0  0  0  0  X8.03.14 2014-03-08
## X9.03.14   0  0  0  0  0  0  0  0  X9.03.14 2014-03-09
## X10.03.14  0  1  0  0  1  0  0  0 X10.03.14 2014-03-10
## X11.03.14  0  1  0  1  0  0  0  0 X11.03.14 2014-03-11
## X12.03.14  1  0  0  0  0  0  0  0 X12.03.14 2014-03-12
## X13.03.14  0  0  0  0  0  0  1  0 X13.03.14 2014-03-13
## X14.03.14  1  0  0  0  0  0  0  0 X14.03.14 2014-03-14
## X15.03.14  0  0  0  0  0  0  0  0 X15.03.14 2014-03-15
## X16.03.14  0  0  0  0  0  0  0  0 X16.03.14 2014-03-16
## X17.03.14  0  0  0  0  0  0  0  0 X17.03.14 2014-03-17
## X18.03.14  1  0  0  0  0  1  0  0 X18.03.14 2014-03-18

##           79 81 86 88 90 95 98 99     Dates     Dates2
## X4.03.14   0  0  1  1  1  0  0  1  X4.03.14 2014-03-04
## X5.03.14   1  0  1  1  1  0  0  0  X5.03.14 2014-03-05
## X6.03.14   1  1  1  0  0  0  1  1  X6.03.14 2014-03-06
## X7.03.14   0  0  0  0  0  0  0  0  X7.03.14 2014-03-07
## X8.03.14   0  0  0  0  0  0  0  0  X8.03.14 2014-03-08
## X9.03.14   0  0  0  0  0  0  0  0  X9.03.14 2014-03-09
## X10.03.14  0  1  0  0  1  0  0  0 X10.03.14 2014-03-10
## X11.03.14  0  1  0  1  0  0  0  0 X11.03.14 2014-03-11
## X12.03.14  1  0  0  0  0  0  0  0 X12.03.14 2014-03-12
## X13.03.14  0  0  0  0  0  0  1  0 X13.03.14 2014-03-13
## X14.03.14  1  0  0  0  0  0  0  0 X14.03.14 2014-03-14
## X15.03.14  0  0  0  0  0  0  0  0 X15.03.14 2014-03-15
## X16.03.14  0  0  0  0  0  0  0  0 X16.03.14 2014-03-16
## X17.03.14  0  0  0  0  0  0  0  0 X17.03.14 2014-03-17
## X18.03.14  1  0  0  0  0  1  0  0 X18.03.14 2014-03-18

## 'data.frame':    114 obs. of  5 variables:
##  $ 1: chr  "1" "1" "0" "0" ...
##  $ 2: chr  "0" "1" "1" "0" ...
##  $ 3: chr  "1" "0" "0" "0" ...
##  $ 5: chr  "1" "1" "0" "0" ...
##  $ 6: chr  "0" "1" "0" "0" ...

##           79 81 86 88 90 95 98 99     Dates     Dates2
## X4.03.14   0  0  1  1  1  0  0  1  X4.03.14 2014-03-04
## X5.03.14   1  0  1  1  1  0  0  0  X5.03.14 2014-03-05
## X6.03.14   1  1  1  0  0  0  1  1  X6.03.14 2014-03-06
## X7.03.14   0  0  0  0  0  0  0  0  X7.03.14 2014-03-07
## X8.03.14   0  0  0  0  0  0  0  0  X8.03.14 2014-03-08
## X9.03.14   0  0  0  0  0  0  0  0  X9.03.14 2014-03-09
## X10.03.14  0  1  0  0  1  0  0  0 X10.03.14 2014-03-10
## X11.03.14  0  1  0  1  0  0  0  0 X11.03.14 2014-03-11
## X12.03.14  1  0  0  0  0  0  0  0 X12.03.14 2014-03-12
## X13.03.14  0  0  0  0  0  0  1  0 X13.03.14 2014-03-13
## X14.03.14  1  0  0  0  0  0  0  0 X14.03.14 2014-03-14
## X15.03.14  0  0  0  0  0  0  0  0 X15.03.14 2014-03-15
## X16.03.14  0  0  0  0  0  0  0  0 X16.03.14 2014-03-16
## X17.03.14  0  0  0  0  0  0  0  0 X17.03.14 2014-03-17
## X18.03.14  1  0  0  0  0  1  0  0 X18.03.14 2014-03-18

## 'data.frame':    114 obs. of  5 variables:
##  $ 1: num  1 1 0 0 0 0 0 0 0 1 ...
##  $ 2: num  0 1 1 0 0 1 0 0 1 0 ...
##  $ 3: num  1 0 0 0 0 1 1 1 0 1 ...
##  $ 5: num  1 1 0 0 1 0 0 0 0 0 ...
##  $ 6: num  0 1 0 0 0 0 0 0 0 0 ...

## [1] 114  65

##           79 81 86 88 90 95 98 99     Dates     Dates2 Total
## X4.03.14   0  0  1  1  1  0  0  1  X4.03.14 2014-03-04    29
## X5.03.14   1  0  1  1  1  0  0  0  X5.03.14 2014-03-05    32
## X6.03.14   1  1  1  0  0  0  1  1  X6.03.14 2014-03-06    17
## X7.03.14   0  0  0  0  0  0  0  0  X7.03.14 2014-03-07     6
## X8.03.14   0  0  0  0  0  0  0  0  X8.03.14 2014-03-08    10
## X9.03.14   0  0  0  0  0  0  0  0  X9.03.14 2014-03-09    11
## X10.03.14  0  1  0  0  1  0  0  0 X10.03.14 2014-03-10    15
## X11.03.14  0  1  0  1  0  0  0  0 X11.03.14 2014-03-11    21
## X12.03.14  1  0  0  0  0  0  0  0 X12.03.14 2014-03-12    16
## X13.03.14  0  0  0  0  0  0  1  0 X13.03.14 2014-03-13     7
## X14.03.14  1  0  0  0  0  0  0  0 X14.03.14 2014-03-14     7
## X15.03.14  0  0  0  0  0  0  0  0 X15.03.14 2014-03-15     6
## X16.03.14  0  0  0  0  0  0  0  0 X16.03.14 2014-03-16    10
## X17.03.14  0  0  0  0  0  0  0  0 X17.03.14 2014-03-17    17
## X18.03.14  1  0  0  0  0  1  0  0 X18.03.14 2014-03-18    19

plot of chunk unnamed-chunk-47

## [1] 114  39

##          94 96 97    Dates     Dates2
## X4.03.14  1  0  1 X4.03.14 2014-03-04
## X5.03.14  0  0  0 X5.03.14 2014-03-05
## X6.03.14  0  0  0 X6.03.14 2014-03-06
## X7.03.14  0  0  0 X7.03.14 2014-03-07
## X8.03.14  0  0  0 X8.03.14 2014-03-08

## [1] 114  40

##           94 96 97     Dates     Dates2 Total
## X4.03.14   1  0  1  X4.03.14 2014-03-04    10
## X5.03.14   0  0  0  X5.03.14 2014-03-05     8
## X6.03.14   0  0  0  X6.03.14 2014-03-06     2
## X7.03.14   0  0  0  X7.03.14 2014-03-07     2
## X8.03.14   0  0  0  X8.03.14 2014-03-08     1
## X9.03.14   0  0  0  X9.03.14 2014-03-09     2
## X10.03.14  0  0  0 X10.03.14 2014-03-10     3
## X11.03.14  0  0  0 X11.03.14 2014-03-11     3
## X12.03.14  1  0  0 X12.03.14 2014-03-12     9
## X13.03.14  0  0  1 X13.03.14 2014-03-13     3
## X14.03.14  0  0  0 X14.03.14 2014-03-14     1
## X15.03.14  0  0  0 X15.03.14 2014-03-15     0
## X16.03.14  0  0  0 X16.03.14 2014-03-16     1
## X17.03.14  0  0  0 X17.03.14 2014-03-17     4
## X18.03.14  0  0  0 X18.03.14 2014-03-18     7

plot of chunk unnamed-chunk-47

Kay’s email Wed 13 Aug 2014
Low 37 18.8+2.2 9.2+0.57 Meta-learning response (did and/or intended to access) n mean
yes 47 66+5.6
0 yes 52 28.23+2.5**

dim(la.norm.qual)

## [1]  99 131

with(la.norm.qual, tapply(access, total.yes == 0, mean))

## FALSE  TRUE 
## 66.00 28.23

with(la.norm.qual, tapply(access, total.yes == 0, sem))

## [1] 5.591
## [1] 2.542

## FALSE  TRUE 
## 5.591 2.542

with(la.norm.qual, tapply(access.days, total.yes == 0, mean))

## FALSE  TRUE 
## 21.30 13.38

with(la.norm.qual, tapply(access.days, total.yes == 0, sem))

## [1] 1.273
## [1] 0.8848

##  FALSE   TRUE 
## 1.2727 0.8848

with(la.norm.qual, table(total.yes == 0, cluster2))

##        cluster2
##          1  2
##   FALSE 40  7
##   TRUE  22 30

Correlations

Days before for ML1-4 and Ass plus AcP for course (and Ass), plus qual categories 3 and 5 (up to 4 types of each so ordinal data) using MLsub that has Ml1-4 submission and due dates and time differences (should be loaded into global - if not then some indications of code in markup v1)
clean - consent

dim(ci)

## [1] 231   2

MLsub = NULL
MLsub = read.csv("MLsub.csv")
dim(MLsub)

## [1] 876  11

MLsub[,11] = NULL
MLsub[,1] = NULL

str(MLsub)

## 'data.frame':    876 obs. of  9 variables:
##  $ StudentID: Factor w/ 230 levels "s3044923","s361850",..: 56 89 31 148 19 198 136 206 137 68 ...
##  $ Date     : Factor w/ 33 levels "1/06/14","10/04/14",..: 19 20 20 19 21 18 14 16 22 16 ...
##  $ Submitted: Factor w/ 866 levels "0:01:00","0:03:09",..: 586 304 427 490 144 244 571 260 218 114 ...
##  $ Duration : Factor w/ 785 levels "","0:00:42","0:01:07",..: 73 220 718 697 207 684 541 280 611 435 ...
##  $ MLtask   : Factor w/ 4 levels "ML1","ML2","ML3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Open     : Factor w/ 4 levels "19/03/14","27/05/14",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Due      : Factor w/ 4 levels "14/05/14","16/04/14",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ SubDT    : Factor w/ 850 levels "1/06/14 0:28",..: 476 491 494 468 510 441 338 386 543 378 ...
##  $ DueDT    : Factor w/ 4 levels "14/05/14 17:00",..: 3 3 3 3 3 3 3 3 3 3 ...

MLsub$SubDT = as.character(MLsub$SubDT)
MLsub$DueDT = as.character(MLsub$DueDT)

str(MLsub)

## 'data.frame':    876 obs. of  9 variables:
##  $ StudentID: Factor w/ 230 levels "s3044923","s361850",..: 56 89 31 148 19 198 136 206 137 68 ...
##  $ Date     : Factor w/ 33 levels "1/06/14","10/04/14",..: 19 20 20 19 21 18 14 16 22 16 ...
##  $ Submitted: Factor w/ 866 levels "0:01:00","0:03:09",..: 586 304 427 490 144 244 571 260 218 114 ...
##  $ Duration : Factor w/ 785 levels "","0:00:42","0:01:07",..: 73 220 718 697 207 684 541 280 611 435 ...
##  $ MLtask   : Factor w/ 4 levels "ML1","ML2","ML3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Open     : Factor w/ 4 levels "19/03/14","27/05/14",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Due      : Factor w/ 4 levels "14/05/14","16/04/14",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ SubDT    : chr  "23/03/14 20:43" "24/03/14 14:31" "24/03/14 17:06" "23/03/14 18:28" ...
##  $ DueDT    : chr  "26/03/14 17:00" "26/03/14 17:00" "26/03/14 17:00" "26/03/14 17:00" ...

#MLsub[1:3,]

#dtp = as.POSIXct(dt, format = "%d/%m/%Y %H:%M:%S", tz="UTC")
MLsub$SubDT = as.POSIXct(MLsub$SubDT, "%d/%m/%y %H:%M", tz="UTC")
MLsub$DueDT = as.POSIXct(MLsub$DueDT, "%d/%m/%y %H:%M", tz="UTC")
#MLsub[1:3,]

MLsub$Earliness = difftime(MLsub$DueDT, MLsub$SubDT)
#MLsub[1:3,]

#MLsub[1:5,1:5]
MLsub.names = names(MLsub)
MLsub.names

##  [1] "StudentID" "Date"      "Submitted" "Duration"  "MLtask"   
##  [6] "Open"      "Due"       "SubDT"     "DueDT"     "Earliness"

MLsub.names[1] = "StudentID"
names(MLsub) = MLsub.names
#MLsub[1:5,1:5]

clean - De-ID

## [1] 876  10

##   StudentID     Date Submitted Duration MLtask     Open      Due
## 1  S8579275 23/03/14  20:43:42  0:09:16    ML1 19/03/14 26/03/14
## 2  S8587419 24/03/14  14:31:37  0:16:02    ML1 19/03/14 26/03/14
## 3  S8530605 24/03/14  17:06:10 47:58:06    ML1 19/03/14 26/03/14
## 4  S8636955 23/03/14  18:28:04  4:40:22    ML1 19/03/14 26/03/14
## 5  S8475915 25/03/14  12:00:45  0:15:28    ML1 19/03/14 26/03/14
##                 SubDT               DueDT Earliness
## 1 2014-03-23 20:43:00 2014-03-26 17:00:00 4097 mins
## 2 2014-03-24 14:31:00 2014-03-26 17:00:00 3029 mins
## 3 2014-03-24 17:06:00 2014-03-26 17:00:00 2874 mins
## 4 2014-03-23 18:28:00 2014-03-26 17:00:00 4232 mins
## 5 2014-03-25 12:00:00 2014-03-26 17:00:00 1740 mins

str(MLsub)

## 'data.frame':    876 obs. of  10 variables:
##  $ StudentID: chr  "S8579275" "S8587419" "S8530605" "S8636955" ...
##  $ Date     : Factor w/ 33 levels "1/06/14","10/04/14",..: 19 20 20 19 21 18 14 16 22 16 ...
##  $ Submitted: Factor w/ 866 levels "0:01:00","0:03:09",..: 586 304 427 490 144 244 571 260 218 114 ...
##  $ Duration : Factor w/ 785 levels "","0:00:42","0:01:07",..: 73 220 718 697 207 684 541 280 611 435 ...
##  $ MLtask   : Factor w/ 4 levels "ML1","ML2","ML3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Open     : Factor w/ 4 levels "19/03/14","27/05/14",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Due      : Factor w/ 4 levels "14/05/14","16/04/14",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ SubDT    : POSIXct, format: "2014-03-23 20:43:00" "2014-03-24 14:31:00" ...
##  $ DueDT    : POSIXct, format: "2014-03-26 17:00:00" "2014-03-26 17:00:00" ...
##  $ Earliness:Class 'difftime'  atomic [1:876] 4097 3029 2874 4232 1740 ...
##   .. ..- attr(*, "tzone")= chr "UTC"
##   .. ..- attr(*, "units")= chr "mins"

dim(MLsub)

## [1] 876  10

require(lubridate)

## Loading required package: lubridate
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:chron':
## 
##     days, hours, minutes, seconds, years

mean(MLsub$Earliness)

## Time difference of 5747 mins

mean(difftime(MLsub$DueDT, MLsub$SubDT, units = "hours"))

## Time difference of 95.78 hours

sem(difftime(MLsub$DueDT, MLsub$SubDT, units = "hours"))

## [1] 2.028

mean(difftime(MLsub$DueDT, MLsub$SubDT, units = "days"))

## Time difference of 3.991 days

sem(difftime(MLsub$DueDT, MLsub$SubDT, units = "days"))

## [1] 0.08452

MLsub[1:5,1:3]

##   StudentID     Date Submitted
## 1  S8579275 23/03/14  20:43:42
## 2  S8587419 24/03/14  14:31:37
## 3  S8530605 24/03/14  17:06:10
## 4  S8636955 23/03/14  18:28:04
## 5  S8475915 25/03/14  12:00:45

correlations within ML submission to check
want ML1 vs 2 vs 3 vs 4 for Earliness
so ML 1…4 need to be columsn where StudID needs to be rows -> too hard to transform data, try boxplot for consistency instead

str(MLsub)

## 'data.frame':    876 obs. of  10 variables:
##  $ StudentID: chr  "S8579275" "S8587419" "S8530605" "S8636955" ...
##  $ Date     : Factor w/ 33 levels "1/06/14","10/04/14",..: 19 20 20 19 21 18 14 16 22 16 ...
##  $ Submitted: Factor w/ 866 levels "0:01:00","0:03:09",..: 586 304 427 490 144 244 571 260 218 114 ...
##  $ Duration : Factor w/ 785 levels "","0:00:42","0:01:07",..: 73 220 718 697 207 684 541 280 611 435 ...
##  $ MLtask   : Factor w/ 4 levels "ML1","ML2","ML3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Open     : Factor w/ 4 levels "19/03/14","27/05/14",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Due      : Factor w/ 4 levels "14/05/14","16/04/14",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ SubDT    : POSIXct, format: "2014-03-23 20:43:00" "2014-03-24 14:31:00" ...
##  $ DueDT    : POSIXct, format: "2014-03-26 17:00:00" "2014-03-26 17:00:00" ...
##  $ Earliness:Class 'difftime'  atomic [1:876] 4097 3029 2874 4232 1740 ...
##   .. ..- attr(*, "tzone")= chr "UTC"
##   .. ..- attr(*, "units")= chr "mins"

MLsub$Early.hr = difftime(MLsub$DueDT, MLsub$SubDT, units = "hours")
MLsub$Early.hr[1:5]

## Time differences in hours
## [1] 68.28 50.48 47.90 70.53 29.00

MLsub$Early.hr.num = as.numeric(MLsub$Early.hr)

boxplot(Early.hr.num ~ MLtask, data=MLsub)

plot of chunk unnamed-chunk-52

#check if there is a difference in earliness between ML tasks...

aov.out = NULL
aov.out = aov(Early.hr.num ~ MLtask * StudentID + Error(StudentID), data=MLsub)
summary(aov.out)

## 
## Error: StudentID
##            Df  Sum Sq Mean Sq
## MLtask      3    3420    1140
## StudentID 226 1885492    8343
## 
## Error: Within
##                   Df  Sum Sq Mean Sq F value Pr(>F)  
## MLtask             3   88890   29630   10.79  0.013 *
## MLtask:StudentID 638 1162107    1821    0.66  0.815  
## Residuals          5   13736    2747                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#now sig difference...
library(car)

with(MLsub, pairwise.t.test(Early.hr.num, MLtask, p.adjust.method = "bonferroni"))

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  Early.hr.num and MLtask 
## 
##     ML1   ML2     ML3  
## ML2 0.241 -       -    
## ML3 1.000 0.148   -    
## ML4 0.027 8.5e-06 0.053
## 
## P value adjustment method: bonferroni

with(MLsub, tapply(Early.hr.num, MLtask, mean))

##    ML1    ML2    ML3    ML4 
##  94.34  82.72  95.53 110.45

with(MLsub, tapply(Early.hr.num, MLtask, sem))

## [1] 3.686
## [1] 3.858
## [1] 3.887
## [1] 4.57

##   ML1   ML2   ML3   ML4 
## 3.686 3.858 3.887 4.570

add MLsub to la.norm.qual

ML1 = subset(MLsub, MLtask =="ML1")
ML2 = subset(MLsub, MLtask =="ML2")
ML3 = subset(MLsub, MLtask =="ML3")
ML4 = subset(MLsub, MLtask =="ML4")

dim(la.norm.qual)

## [1]  99 131

la.norm.qual[1:5,125:131]

##   access           pattern prevLR access.days Kay.pattern Kay3 cluster2
## 1     21      No Yes No No     No          15           4    3        1
## 2     75       No No No No     No          26           5    3        1
## 3    122      Yes No No No    Yes          33           3    3        1
## 4     13 Maybe No Maybe No    Yes          10           0    2        2
## 5     89     Yes No Yes No    Yes          28           1    2        1

all = NULL
dim(ML1)

## [1] 225  12

ML1[1:5,]

##   StudentID     Date Submitted Duration MLtask     Open      Due
## 1  S8579275 23/03/14  20:43:42  0:09:16    ML1 19/03/14 26/03/14
## 2  S8587419 24/03/14  14:31:37  0:16:02    ML1 19/03/14 26/03/14
## 3  S8530605 24/03/14  17:06:10 47:58:06    ML1 19/03/14 26/03/14
## 4  S8636955 23/03/14  18:28:04  4:40:22    ML1 19/03/14 26/03/14
## 5  S8475915 25/03/14  12:00:45  0:15:28    ML1 19/03/14 26/03/14
##                 SubDT               DueDT Earliness    Early.hr
## 1 2014-03-23 20:43:00 2014-03-26 17:00:00 4097 mins 68.28 hours
## 2 2014-03-24 14:31:00 2014-03-26 17:00:00 3029 mins 50.48 hours
## 3 2014-03-24 17:06:00 2014-03-26 17:00:00 2874 mins 47.90 hours
## 4 2014-03-23 18:28:00 2014-03-26 17:00:00 4232 mins 70.53 hours
## 5 2014-03-25 12:00:00 2014-03-26 17:00:00 1740 mins 29.00 hours
##   Early.hr.num
## 1        68.28
## 2        50.48
## 3        47.90
## 4        70.53
## 5        29.00

all = merge(la.norm.qual, ML1[, c(1, 12)], by="StudentID")
dim(all)

## [1]  99 132

all[1:5, 125:132]

##   access           pattern prevLR access.days Kay.pattern Kay3 cluster2
## 1     21      No Yes No No     No          15           4    3        1
## 2     21      No Yes No No     No          15           4    3        1
## 3     75       No No No No     No          26           5    3        1
## 4    122      Yes No No No    Yes          33           3    3        1
## 5     13 Maybe No Maybe No    Yes          10           0    2        2
##   Early.hr.num
## 1      139.867
## 2        8.733
## 3       19.867
## 4       43.200
## 5      117.333

all.names = names(all)
all.names[132] = "ML1earliness"
names(all) = all.names
all[1:5,130:132]

##   Kay3 cluster2 ML1earliness
## 1    3        1      139.867
## 2    3        1        8.733
## 3    3        1       19.867
## 4    3        1       43.200
## 5    2        2      117.333

all = merge(all, ML2[, c(1, 12)], by="StudentID")
all = merge(all, ML3[, c(1, 12)], by="StudentID")
all = merge(all, ML4[, c(1, 12)], by="StudentID")

dim(all)

## [1]  97 135

all[1:5,130:135]

##   Kay3 cluster2 ML1earliness Early.hr.num.x Early.hr.num.y Early.hr.num
## 1    3        1      139.867        163.900          140.3        115.9
## 2    3        1        8.733        163.900          140.3        115.9
## 3    3        1       19.867         47.367          163.0        186.3
## 4    3        1       43.200         66.950           19.2        186.7
## 5    2        2      117.333          5.467            0.5        168.6

all.names = names(all)
all.names[133] = "ML2earliness"
all.names[134] = "ML3earliness"
all.names[135] = "ML4earliness"
names(all) = all.names
all[1:5,130:135]

##   Kay3 cluster2 ML1earliness ML2earliness ML3earliness ML4earliness
## 1    3        1      139.867      163.900        140.3        115.9
## 2    3        1        8.733      163.900        140.3        115.9
## 3    3        1       19.867       47.367        163.0        186.3
## 4    3        1       43.200       66.950         19.2        186.7
## 5    2        2      117.333        5.467          0.5        168.6

cor(all[,132:135])

##              ML1earliness ML2earliness ML3earliness ML4earliness
## ML1earliness       1.0000       0.4466       0.4805       0.4446
## ML2earliness       0.4466       1.0000       0.5419       0.3660
## ML3earliness       0.4805       0.5419       1.0000       0.5333
## ML4earliness       0.4446       0.3660       0.5333       1.0000

add in assignment submission

ass = read.csv("Ass.csv")
dim(ass)

## [1] 220   4

str(ass)

## 'data.frame':    220 obs. of  4 variables:
##  $ StudentID: Factor w/ 220 levels "s3044923","s361850",..: 144 105 56 90 197 87 213 19 65 52 ...
##  $ Ass.mark : int  82 75 86 83 92 94 81 93 83 78 ...
##  $ Sub.Date : Factor w/ 12 levels "12/05/2014","13/05/2014",..: 1 2 3 4 4 4 5 5 5 6 ...
##  $ Sub.Time : Factor w/ 220 levels "0:07:45","0:21:45",..: 42 88 16 211 103 137 197 80 101 219 ...

clean - consent

dim(ci)

## [1] 231   2

df = merge(ass, ci, by ="StudentID")
dim(df) #drops from 231 to 230 coz uqlipitt removed

## [1] 219   5

#df[1:10,1:10]
#df[1:5,110:116]
df = subset(df, Consent == "Yes")
dim(df)

## [1] 96  5

ass = df
dim(ass)

## [1] 96  5

str(ass)

## 'data.frame':    96 obs. of  5 variables:
##  $ StudentID: Factor w/ 220 levels "s3044923","s361850",..: 1 3 4 5 6 7 9 10 11 12 ...
##  $ Ass.mark : int  86 75 94 82 91 83 77 88 83 87 ...
##  $ Sub.Date : Factor w/ 12 levels "12/05/2014","13/05/2014",..: 8 10 8 10 10 9 7 8 8 9 ...
##  $ Sub.Time : Factor w/ 220 levels "0:07:45","0:21:45",..: 125 210 149 64 21 107 81 86 190 202 ...
##  $ Consent  : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...

clean - De-ID

## [1] 96  5

##   StudentID Ass.mark   Sub.Date Sub.Time Consent
## 1  S6089847       86 19/05/2014 20:49:26     Yes
## 3  S8117889       75 21/05/2014  9:35:45     Yes
## 4  S8118323       94 19/05/2014 22:45:19     Yes
## 5  S8152093       82 21/05/2014 11:54:37     Yes
## 6  S8239113       91 21/05/2014 10:17:41     Yes

merging ass into all Assignment due 12 noon 21/05/14

ass[1:5,]

##   StudentID Ass.mark   Sub.Date Sub.Time Consent
## 1  S6089847       86 19/05/2014 20:49:26     Yes
## 3  S8117889       75 21/05/2014  9:35:45     Yes
## 4  S8118323       94 19/05/2014 22:45:19     Yes
## 5  S8152093       82 21/05/2014 11:54:37     Yes
## 6  S8239113       91 21/05/2014 10:17:41     Yes

str(ass)

## 'data.frame':    96 obs. of  5 variables:
##  $ StudentID: chr  "S6089847" "S8117889" "S8118323" "S8152093" ...
##  $ Ass.mark : int  86 75 94 82 91 83 77 88 83 87 ...
##  $ Sub.Date : Factor w/ 12 levels "12/05/2014","13/05/2014",..: 8 10 8 10 10 9 7 8 8 9 ...
##  $ Sub.Time : Factor w/ 220 levels "0:07:45","0:21:45",..: 125 210 149 64 21 107 81 86 190 202 ...
##  $ Consent  : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...

ass$Sub.Date = as.character(ass$Sub.Date)
ass$Sub.Time = as.character(ass$Sub.Time)
str(ass)

## 'data.frame':    96 obs. of  5 variables:
##  $ StudentID: chr  "S6089847" "S8117889" "S8118323" "S8152093" ...
##  $ Ass.mark : int  86 75 94 82 91 83 77 88 83 87 ...
##  $ Sub.Date : chr  "19/05/2014" "21/05/2014" "19/05/2014" "21/05/2014" ...
##  $ Sub.Time : chr  "20:49:26" "9:35:45" "22:45:19" "11:54:37" ...
##  $ Consent  : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...

ass$Sub.Ass = paste(ass$Sub.Date, ass$Sub.Time)
dim(ass)

## [1] 96  6

ass[1:5,]

##   StudentID Ass.mark   Sub.Date Sub.Time Consent             Sub.Ass
## 1  S6089847       86 19/05/2014 20:49:26     Yes 19/05/2014 20:49:26
## 3  S8117889       75 21/05/2014  9:35:45     Yes  21/05/2014 9:35:45
## 4  S8118323       94 19/05/2014 22:45:19     Yes 19/05/2014 22:45:19
## 5  S8152093       82 21/05/2014 11:54:37     Yes 21/05/2014 11:54:37
## 6  S8239113       91 21/05/2014 10:17:41     Yes 21/05/2014 10:17:41

str(ass[,6])

##  chr [1:96] "19/05/2014 20:49:26" "21/05/2014 9:35:45" ...

ass$Sub.Ass = as.POSIXct(ass$Sub.Ass, format = "%d/%m/%Y %H:%M:%S")
str(ass[,6])

##  POSIXct[1:96], format: "2014-05-19 20:49:26" "2014-05-21 09:35:45" ...

ass$Due = as.POSIXct("2014-05-21 12:00:00", tz="UCT")

tz(ass$Sub.Ass)

## [1] ""

ass$Sub.Ass = force_tz(ass$Sub.Ass, "UTC")
tz(ass$Sub.Ass)

## [1] "UTC"

dim(ass)

## [1] 96  7

ass[1:5,]

##   StudentID Ass.mark   Sub.Date Sub.Time Consent             Sub.Ass
## 1  S6089847       86 19/05/2014 20:49:26     Yes 2014-05-19 20:49:26
## 3  S8117889       75 21/05/2014  9:35:45     Yes 2014-05-21 09:35:45
## 4  S8118323       94 19/05/2014 22:45:19     Yes 2014-05-19 22:45:19
## 5  S8152093       82 21/05/2014 11:54:37     Yes 2014-05-21 11:54:37
## 6  S8239113       91 21/05/2014 10:17:41     Yes 2014-05-21 10:17:41
##                   Due
## 1 2014-05-21 12:00:00
## 3 2014-05-21 12:00:00
## 4 2014-05-21 12:00:00
## 5 2014-05-21 12:00:00
## 6 2014-05-21 12:00:00

str(ass)

## 'data.frame':    96 obs. of  7 variables:
##  $ StudentID: chr  "S6089847" "S8117889" "S8118323" "S8152093" ...
##  $ Ass.mark : int  86 75 94 82 91 83 77 88 83 87 ...
##  $ Sub.Date : chr  "19/05/2014" "21/05/2014" "19/05/2014" "21/05/2014" ...
##  $ Sub.Time : chr  "20:49:26" "9:35:45" "22:45:19" "11:54:37" ...
##  $ Consent  : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Sub.Ass  : POSIXct, format: "2014-05-19 20:49:26" "2014-05-21 09:35:45" ...
##  $ Due      : POSIXct, format: "2014-05-21 12:00:00" "2014-05-21 12:00:00" ...

ass$Ass.earliness = difftime(ass$Due, ass$Sub.Ass, units="hours")
str(ass)

## 'data.frame':    96 obs. of  8 variables:
##  $ StudentID    : chr  "S6089847" "S8117889" "S8118323" "S8152093" ...
##  $ Ass.mark     : int  86 75 94 82 91 83 77 88 83 87 ...
##  $ Sub.Date     : chr  "19/05/2014" "21/05/2014" "19/05/2014" "21/05/2014" ...
##  $ Sub.Time     : chr  "20:49:26" "9:35:45" "22:45:19" "11:54:37" ...
##  $ Consent      : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Sub.Ass      : POSIXct, format: "2014-05-19 20:49:26" "2014-05-21 09:35:45" ...
##  $ Due          : POSIXct, format: "2014-05-21 12:00:00" "2014-05-21 12:00:00" ...
##  $ Ass.earliness:Class 'difftime'  atomic [1:96] 39.1761 2.4042 37.2447 0.0897 1.7053 ...
##   .. ..- attr(*, "tzone")= chr "UCT"
##   .. ..- attr(*, "units")= chr "hours"

ass[1:5,]

##   StudentID Ass.mark   Sub.Date Sub.Time Consent             Sub.Ass
## 1  S6089847       86 19/05/2014 20:49:26     Yes 2014-05-19 20:49:26
## 3  S8117889       75 21/05/2014  9:35:45     Yes 2014-05-21 09:35:45
## 4  S8118323       94 19/05/2014 22:45:19     Yes 2014-05-19 22:45:19
## 5  S8152093       82 21/05/2014 11:54:37     Yes 2014-05-21 11:54:37
## 6  S8239113       91 21/05/2014 10:17:41     Yes 2014-05-21 10:17:41
##                   Due  Ass.earliness
## 1 2014-05-21 12:00:00 39.17611 hours
## 3 2014-05-21 12:00:00  2.40417 hours
## 4 2014-05-21 12:00:00 37.24472 hours
## 5 2014-05-21 12:00:00  0.08972 hours
## 6 2014-05-21 12:00:00  1.70528 hours

dim(ass)

## [1] 96  8

dim(all)

## [1]  97 135

all = merge(all, ass[, c(1:2, 8)], by="StudentID")
dim(all)

## [1]  94 137

all[1:5,131:ncol(all)]

##   cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1        1        8.733      163.900        140.3        115.9       86
## 2        1      139.867      163.900        140.3        115.9       86
## 3        1       19.867       47.367        163.0        186.3       75
## 4        1       43.200       66.950         19.2        186.7       94
## 5        2      117.333        5.467          0.5        168.6       82
##    Ass.earliness
## 1 39.17611 hours
## 2 39.17611 hours
## 3  2.40417 hours
## 4 37.24472 hours
## 5  0.08972 hours

Academic performance as course grade (access vs performance -> AcP.csv)

AcP = read.csv("AcP.csv")

#for s4123456
df = AcP
df$StudID = substr(df$StudentID, start=2, stop=8)
df$StudID = as.numeric(df$StudID)
df$StudID = df$StudID*2+1
df$StudID = as.character(df$StudID)
df$StudID = paste("S", df$StudID, sep="")
df$StudentID = df$StudID
df$StudID = NULL
AcP = df

AcP[1:5,]

##   StudentID Course.grade
## 1  S8529183         40.5
## 2  S8636687         47.2
## 3  S8624451         47.8
## 4  S8633919         51.9
## 5  S8583807         52.5

all = merge(all, AcP, by="StudentID")
dim(all)

## [1]  94 138

all[1:5,131:138]

##   cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1        1        8.733      163.900        140.3        115.9       86
## 2        1      139.867      163.900        140.3        115.9       86
## 3        1       19.867       47.367        163.0        186.3       75
## 4        1       43.200       66.950         19.2        186.7       94
## 5        2      117.333        5.467          0.5        168.6       82
##    Ass.earliness Course.grade
## 1 39.17611 hours         66.6
## 2 39.17611 hours         66.6
## 3  2.40417 hours         64.7
## 4 37.24472 hours         78.0
## 5  0.08972 hours         80.7

correlations (ass.early.num = hours before Assignmnet due date 12noon)

str(all[131:ncol(all)])

## 'data.frame':    94 obs. of  8 variables:
##  $ cluster2     : int  1 1 1 1 2 1 1 1 1 1 ...
##  $ ML1earliness : num  8.73 139.87 19.87 43.2 117.33 ...
##  $ ML2earliness : num  163.9 163.9 47.37 66.95 5.47 ...
##  $ ML3earliness : num  140.3 140.3 163 19.2 0.5 ...
##  $ ML4earliness : num  116 116 186 187 169 ...
##  $ Ass.mark     : int  86 86 75 94 82 91 83 77 88 83 ...
##  $ Ass.earliness:Class 'difftime'  atomic [1:94] 39.1761 39.1761 2.4042 37.2447 0.0897 ...
##   .. ..- attr(*, "units")= chr "hours"
##  $ Course.grade : num  66.6 66.6 64.7 78 80.7 68.4 82.3 92.2 84.9 81.7 ...

all$ass.early.num = as.numeric(all$Ass.earliness)
dim(all)

## [1]  94 139

all[1:5,131:ncol(all)]

##   cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1        1        8.733      163.900        140.3        115.9       86
## 2        1      139.867      163.900        140.3        115.9       86
## 3        1       19.867       47.367        163.0        186.3       75
## 4        1       43.200       66.950         19.2        186.7       94
## 5        2      117.333        5.467          0.5        168.6       82
##    Ass.earliness Course.grade ass.early.num
## 1 39.17611 hours         66.6      39.17611
## 2 39.17611 hours         66.6      39.17611
## 3  2.40417 hours         64.7       2.40417
## 4 37.24472 hours         78.0      37.24472
## 5  0.08972 hours         80.7       0.08972

cor(all[c(132:136, 138, ncol(all))])

##               ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## ML1earliness       1.00000      0.44635      0.49048      0.45386  0.04824
## ML2earliness       0.44635      1.00000      0.53738      0.35783  0.08964
## ML3earliness       0.49048      0.53738      1.00000      0.52623  0.06346
## ML4earliness       0.45386      0.35783      0.52623      1.00000  0.02611
## Ass.mark           0.04824      0.08964      0.06346      0.02611  1.00000
## Course.grade       0.18798      0.24793      0.18772      0.14718  0.24003
## ass.early.num      0.27721      0.16755      0.18932      0.17576  0.06852
##               Course.grade ass.early.num
## ML1earliness        0.1880       0.27721
## ML2earliness        0.2479       0.16755
## ML3earliness        0.1877       0.18932
## ML4earliness        0.1472       0.17576
## Ass.mark            0.2400       0.06852
## Course.grade        1.0000       0.28559
## ass.early.num       0.2856       1.00000

cor(all[c(131:136, 138, ncol(all))])

##               cluster2 ML1earliness ML2earliness ML3earliness ML4earliness
## cluster2       1.00000      0.27705      0.06319      0.23329      0.20728
## ML1earliness   0.27705      1.00000      0.44635      0.49048      0.45386
## ML2earliness   0.06319      0.44635      1.00000      0.53738      0.35783
## ML3earliness   0.23329      0.49048      0.53738      1.00000      0.52623
## ML4earliness   0.20728      0.45386      0.35783      0.52623      1.00000
## Ass.mark       0.06075      0.04824      0.08964      0.06346      0.02611
## Course.grade   0.10996      0.18798      0.24793      0.18772      0.14718
## ass.early.num -0.01725      0.27721      0.16755      0.18932      0.17576
##               Ass.mark Course.grade ass.early.num
## cluster2       0.06075       0.1100      -0.01725
## ML1earliness   0.04824       0.1880       0.27721
## ML2earliness   0.08964       0.2479       0.16755
## ML3earliness   0.06346       0.1877       0.18932
## ML4earliness   0.02611       0.1472       0.17576
## Ass.mark       1.00000       0.2400       0.06852
## Course.grade   0.24003       1.0000       0.28559
## ass.early.num  0.06852       0.2856       1.00000

Organisation qual coded as categories 3 and 5

org.qual = read.csv("ML1-4qual.csv")

dim(org.qual)

## [1] 99  5

#org.qual[1:5,]

org.qual.names = c("StudentID", "Cat3", "Cat5", "Cat3or5", "Sum.Cat3and5")
names(org.qual) = org.qual.names

#for s4123456
df = org.qual
df$StudID = substr(df$StudentID, start=2, stop=8)
df$StudID = as.numeric(df$StudID)
df$StudID = df$StudID*2+1
df$StudID = as.character(df$StudID)
df$StudID = paste("S", df$StudID, sep="")
df$StudentID = df$StudID
df$StudID = NULL
org.qual = df

org.qual[1:5,]

##   StudentID Cat3 Cat5 Cat3or5 Sum.Cat3and5
## 1  S8646489    4    1       2            5
## 2  S8283571    3    1       2            4
## 3  S8586369    4    0       1            4
## 4  S8641669    3    1       2            4
## 5  S8152093    2    1       2            3

all =  merge(all, org.qual, by="StudentID")

dim(all)

## [1]  94 143

all[1:2,c(131:136, 138, 141:ncol(all))]

##   cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1        1        8.733        163.9        140.3        115.9       86
## 2        1      139.867        163.9        140.3        115.9       86
##   Course.grade Cat5 Cat3or5 Sum.Cat3and5
## 1         66.6    0       0            0
## 2         66.6    0       0            0

str(all[1:2,c(131:136, 138, 141:ncol(all))])

## 'data.frame':    2 obs. of  10 variables:
##  $ cluster2    : int  1 1
##  $ ML1earliness: num  8.73 139.87
##  $ ML2earliness: num  164 164
##  $ ML3earliness: num  140 140
##  $ ML4earliness: num  116 116
##  $ Ass.mark    : int  86 86
##  $ Course.grade: num  66.6 66.6
##  $ Cat5        : int  0 0
##  $ Cat3or5     : num  0 0
##  $ Sum.Cat3and5: int  0 0

cor(all[c(131:136, 138, 141:ncol(all))])

##              cluster2 ML1earliness ML2earliness ML3earliness ML4earliness
## cluster2      1.00000      0.27705     0.063187      0.23329      0.20728
## ML1earliness  0.27705      1.00000     0.446352      0.49048      0.45386
## ML2earliness  0.06319      0.44635     1.000000      0.53738      0.35783
## ML3earliness  0.23329      0.49048     0.537383      1.00000      0.52623
## ML4earliness  0.20728      0.45386     0.357834      0.52623      1.00000
## Ass.mark      0.06075      0.04824     0.089638      0.06346      0.02611
## Course.grade  0.10996      0.18798     0.247928      0.18772      0.14718
## Cat5          0.14579      0.07512    -0.006347     -0.02654      0.10722
## Cat3or5       0.01898     -0.07669    -0.160959     -0.11521      0.04154
## Sum.Cat3and5  0.01351      0.01142    -0.179925     -0.07817      0.01499
##              Ass.mark Course.grade      Cat5  Cat3or5 Sum.Cat3and5
## cluster2      0.06075      0.10996  0.145787  0.01898      0.01351
## ML1earliness  0.04824      0.18798  0.075120 -0.07669      0.01142
## ML2earliness  0.08964      0.24793 -0.006347 -0.16096     -0.17993
## ML3earliness  0.06346      0.18772 -0.026537 -0.11521     -0.07817
## ML4earliness  0.02611      0.14718  0.107219  0.04154      0.01499
## Ass.mark      1.00000      0.24003 -0.031987 -0.08725     -0.10659
## Course.grade  0.24003      1.00000  0.103260 -0.07779     -0.10408
## Cat5         -0.03199      0.10326  1.000000  0.64094      0.54055
## Cat3or5      -0.08725     -0.07779  0.640939  1.00000      0.86549
## Sum.Cat3and5 -0.10659     -0.10408  0.540547  0.86549      1.00000

all$MLearliness = (rowSums(all[,132:135]))/4
dim(all)

## [1]  94 144

all[1:5,131:ncol(all)]

##   cluster2 ML1earliness ML2earliness ML3earliness ML4earliness Ass.mark
## 1        1        8.733      163.900        140.3        115.9       86
## 2        1      139.867      163.900        140.3        115.9       86
## 3        1       19.867       47.367        163.0        186.3       75
## 4        1       43.200       66.950         19.2        186.7       94
## 5        2      117.333        5.467          0.5        168.6       82
##    Ass.earliness Course.grade ass.early.num Cat3 Cat5 Cat3or5 Sum.Cat3and5
## 1 39.17611 hours         66.6      39.17611    0    0       0            0
## 2 39.17611 hours         66.6      39.17611    0    0       0            0
## 3  2.40417 hours         64.7       2.40417    2    0       1            2
## 4 37.24472 hours         78.0      37.24472    0    1       1            1
## 5  0.08972 hours         80.7       0.08972    2    1       2            3
##   MLearliness
## 1      107.20
## 2      139.98
## 3      104.13
## 4       79.02
## 5       72.97

cor(all[c(131:136, 138, 140:ncol(all))])

##              cluster2 ML1earliness ML2earliness ML3earliness ML4earliness
## cluster2      1.00000      0.27705     0.063187      0.23329      0.20728
## ML1earliness  0.27705      1.00000     0.446352      0.49048      0.45386
## ML2earliness  0.06319      0.44635     1.000000      0.53738      0.35783
## ML3earliness  0.23329      0.49048     0.537383      1.00000      0.52623
## ML4earliness  0.20728      0.45386     0.357834      0.52623      1.00000
## Ass.mark      0.06075      0.04824     0.089638      0.06346      0.02611
## Course.grade  0.10996      0.18798     0.247928      0.18772      0.14718
## Cat3         -0.06553     -0.02848    -0.209608     -0.07777     -0.04222
## Cat5          0.14579      0.07512    -0.006347     -0.02654      0.10722
## Cat3or5       0.01898     -0.07669    -0.160959     -0.11521      0.04154
## Sum.Cat3and5  0.01351      0.01142    -0.179925     -0.07817      0.01499
## MLearliness   0.25008      0.75391     0.747977      0.82090      0.77709
##              Ass.mark Course.grade     Cat3      Cat5  Cat3or5
## cluster2      0.06075      0.10996 -0.06553  0.145787  0.01898
## ML1earliness  0.04824      0.18798 -0.02848  0.075120 -0.07669
## ML2earliness  0.08964      0.24793 -0.20961 -0.006347 -0.16096
## ML3earliness  0.06346      0.18772 -0.07777 -0.026537 -0.11521
## ML4earliness  0.02611      0.14718 -0.04222  0.107219  0.04154
## Ass.mark      1.00000      0.24003 -0.10838 -0.031987 -0.08725
## Course.grade  0.24003      1.00000 -0.18106  0.103260 -0.07779
## Cat3         -0.10838     -0.18106  1.00000  0.081062  0.66686
## Cat5         -0.03199      0.10326  0.08106  1.000000  0.64094
## Cat3or5      -0.08725     -0.07779  0.66686  0.640939  1.00000
## Sum.Cat3and5 -0.10659     -0.10408  0.88236  0.540547  0.86549
## MLearliness   0.07208      0.24652 -0.11484  0.050721 -0.09463
##              Sum.Cat3and5 MLearliness
## cluster2          0.01351     0.25008
## ML1earliness      0.01142     0.75391
## ML2earliness     -0.17993     0.74798
## ML3earliness     -0.07817     0.82090
## ML4earliness      0.01499     0.77709
## Ass.mark         -0.10659     0.07208
## Course.grade     -0.10408     0.24652
## Cat3              0.88236    -0.11484
## Cat5              0.54055     0.05072
## Cat3or5           0.86549    -0.09463
## Sum.Cat3and5      1.00000    -0.07299
## MLearliness      -0.07299     1.00000

t.tests for 2 clusters

dim(all)

## [1]  94 144

wilcox.test(MLearliness ~ cluster2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  MLearliness by cluster2
## W = 706, p-value = 0.01748
## alternative hypothesis: true location shift is not equal to 0

round(with(all, calc(mean, MLearliness, cluster2)),1)

##     1     2 
##  94.2 117.3

round(with(all, calc.sem(sem, MLearliness, cluster2)),1)

## [1] 5.814
## [1] 6.657

##   1   2 
## 5.8 6.7

wilcox.test(ass.early.num ~ cluster2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  ass.early.num by cluster2
## W = 998, p-value = 0.9495
## alternative hypothesis: true location shift is not equal to 0

round(with(all, calc(mean, ass.early.num, cluster2)),1)

##    1    2 
## 29.9 28.1

round(with(all, calc.sem(sem, ass.early.num, cluster2)),1)

## [1] 6.842
## [1] 6.428

##   1   2 
## 6.8 6.4

wilcox.test(Course.grade ~ cluster2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Course.grade by cluster2
## W = 889, p-value = 0.354
## alternative hypothesis: true location shift is not equal to 0

round(with(all, calc(mean, Course.grade, cluster2)),1)

##    1    2 
## 77.8 79.8

round(with(all, calc.sem(sem, Course.grade, cluster2)),1)

## [1] 1.236
## [1] 1.227

##   1   2 
## 1.2 1.2

wilcox.test(Sum.Cat3and5 ~ cluster2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Sum.Cat3and5 by cluster2
## W = 1002, p-value = 0.9701
## alternative hypothesis: true location shift is not equal to 0

round(with(all, calc(mean, Sum.Cat3and5, cluster2)),1)

##   1   2 
## 1.2 1.2

round(with(all, calc.sem(sem, Sum.Cat3and5, cluster2)),1)

## [1] 0.1451
## [1] 0.1983

##   1   2 
## 0.1 0.2

anovas for Cat3and5

aov.cat = aov(Course.grade ~ Sum.Cat3and5, data=all)
summary(aov.cat)

##              Df Sum Sq Mean Sq F value Pr(>F)
## Sum.Cat3and5  1     79    78.8    1.01   0.32
## Residuals    92   7193    78.2

aov.cat2 = aov(MLearliness ~ Sum.Cat3and5, data=all)
summary(aov.cat2)

##              Df Sum Sq Mean Sq F value Pr(>F)
## Sum.Cat3and5  1    969     969    0.49   0.48
## Residuals    92 180914    1966

aov.cat3 = aov(ass.early.num ~ Sum.Cat3and5, data=all)
summary(aov.cat3)

##              Df Sum Sq Mean Sq F value Pr(>F)
## Sum.Cat3and5  1     23      23    0.01   0.92
## Residuals    92 215028    2337

using more data for clusters

distances.all = dist(all[c(2:115, 117:120, 125, 132:136, 137, 141:143)], method = "euclidean")

## Warning: NAs introduced by coercion

cluster.all = hclust(distances.all, method = "ward") 
plot(cluster.all)

plot of chunk unnamed-chunk-63

distances.all2 = dist(all[c(117:120, 125, 132:136, 137, 141:143)], method = "euclidean")

## Warning: NAs introduced by coercion

cluster.all2 = hclust(distances.all2, method = "ward") 
plot(cluster.all2)

plot of chunk unnamed-chunk-63

cluster.all2.groups = cutree(cluster.all2, k = 2)
all$cluster.all2 = cluster.all2.groups

with(all, table(cluster2, cluster.all2))

##         cluster.all2
## cluster2  1  2
##        1 50 11
##        2 31  2

cluster.all.groups = cutree(cluster.all, k = 2)
all$cluster.all = cluster.all.groups

with(all, table(cluster2, cluster.all))

##         cluster.all
## cluster2  1  2
##        1 50 11
##        2 31  2

with(all, table(cluster.all, cluster.all2))

##            cluster.all2
## cluster.all  1  2
##           1 81  0
##           2  0 13

for paper edits

dim(all)

## [1]  94 146

all[1:2,115:146]

##   X25.06.14 cluster3 ML1.previous ML2.planMS ML3.usedMS ML4.planEOS
## 1         0        1           No        Yes         No          No
## 2         0        1           No        Yes         No          No
##   total.no total.yes total.maybe total.noinfo access      pattern prevLR
## 1        3         1           0            0     21 No Yes No No     No
## 2        3         1           0            0     21 No Yes No No     No
##   access.days Kay.pattern Kay3 cluster2 ML1earliness ML2earliness
## 1          15           4    3        1        8.733        163.9
## 2          15           4    3        1      139.867        163.9
##   ML3earliness ML4earliness Ass.mark Ass.earliness Course.grade
## 1        140.3        115.9       86   39.18 hours         66.6
## 2        140.3        115.9       86   39.18 hours         66.6
##   ass.early.num Cat3 Cat5 Cat3or5 Sum.Cat3and5 MLearliness cluster.all2
## 1         39.18    0    0       0            0       107.2            1
## 2         39.18    0    0       0            0       140.0            1
##   cluster.all
## 1           1
## 2           1

all$total.yes.gp = ifelse(all$total.yes == 0, 2, 1)
table(all$total.yes.gp)

## 
##  1  2 
## 47 47

wilcox.test(access.days ~ total.yes.gp, data=all)

## Warning: cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  access.days by total.yes.gp
## W = 1660, p-value = 2.654e-05
## alternative hypothesis: true location shift is not equal to 0

#ML3 submission date
dim(MLsub)

## [1] 876  12

MLsub[1:5,]

##   StudentID     Date Submitted Duration MLtask     Open      Due
## 1  S8579275 23/03/14  20:43:42  0:09:16    ML1 19/03/14 26/03/14
## 2  S8587419 24/03/14  14:31:37  0:16:02    ML1 19/03/14 26/03/14
## 3  S8530605 24/03/14  17:06:10 47:58:06    ML1 19/03/14 26/03/14
## 4  S8636955 23/03/14  18:28:04  4:40:22    ML1 19/03/14 26/03/14
## 5  S8475915 25/03/14  12:00:45  0:15:28    ML1 19/03/14 26/03/14
##                 SubDT               DueDT Earliness    Early.hr
## 1 2014-03-23 20:43:00 2014-03-26 17:00:00 4097 mins 68.28 hours
## 2 2014-03-24 14:31:00 2014-03-26 17:00:00 3029 mins 50.48 hours
## 3 2014-03-24 17:06:00 2014-03-26 17:00:00 2874 mins 47.90 hours
## 4 2014-03-23 18:28:00 2014-03-26 17:00:00 4232 mins 70.53 hours
## 5 2014-03-25 12:00:00 2014-03-26 17:00:00 1740 mins 29.00 hours
##   Early.hr.num
## 1        68.28
## 2        50.48
## 3        47.90
## 4        70.53
## 5        29.00

ML3[1:5,]

##     StudentID     Date Submitted Duration MLtask    Open      Due
## 441  S8587419 10/05/14  18:10:01  0:22:41    ML3 7/05/14 14/05/14
## 442  S8530605 12/05/14  15:09:04  0:19:05    ML3 7/05/14 14/05/14
## 443  S8636955 12/05/14  21:42:34  0:45:45    ML3 7/05/14 14/05/14
## 444  S8475915 12/05/14  11:57:22  0:21:43    ML3 7/05/14 14/05/14
## 445  S8645607  7/05/14  17:30:54  0:16:33    ML3 7/05/14 14/05/14
##                   SubDT               DueDT  Earliness     Early.hr
## 441 2014-05-10 18:10:00 2014-05-14 17:00:00  5690 mins  94.83 hours
## 442 2014-05-12 15:09:00 2014-05-14 17:00:00  2991 mins  49.85 hours
## 443 2014-05-12 21:42:00 2014-05-14 17:00:00  2598 mins  43.30 hours
## 444 2014-05-12 11:57:00 2014-05-14 17:00:00  3183 mins  53.05 hours
## 445 2014-05-07 17:30:00 2014-05-14 17:00:00 10050 mins 167.50 hours
##     Early.hr.num
## 441        94.83
## 442        49.85
## 443        43.30
## 444        53.05
## 445       167.50

table(ML3$Date)

## 
##  1/06/14 10/04/14 10/05/14 11/04/14 11/05/14 12/04/14 12/05/14 13/04/14 
##        0        0       11        0       26        0       28        0 
## 13/05/14 14/04/14 14/05/14 15/04/14 16/04/14 19/03/14  2/06/14 20/03/14 
##       29        0       20        0        0        0        0        0 
## 21/03/14 22/03/14 23/03/14 24/03/14 25/03/14 26/03/14 27/05/14 28/05/14 
##        0        0        0        0        0        0        0        0 
## 29/05/14  3/06/14 30/05/14 31/05/14  4/06/14  7/05/14  8/05/14  9/04/14 
##        0        0        0        0        0       44       37        0 
##  9/05/14 
##       24

#check submission time clusters against lecture recording clusters
all[1:2,117:ncol(all)]

##   ML1.previous ML2.planMS ML3.usedMS ML4.planEOS total.no total.yes
## 1           No        Yes         No          No        3         1
## 2           No        Yes         No          No        3         1
##   total.maybe total.noinfo access      pattern prevLR access.days
## 1           0            0     21 No Yes No No     No          15
## 2           0            0     21 No Yes No No     No          15
##   Kay.pattern Kay3 cluster2 ML1earliness ML2earliness ML3earliness
## 1           4    3        1        8.733        163.9        140.3
## 2           4    3        1      139.867        163.9        140.3
##   ML4earliness Ass.mark Ass.earliness Course.grade ass.early.num Cat3 Cat5
## 1        115.9       86   39.18 hours         66.6         39.18    0    0
## 2        115.9       86   39.18 hours         66.6         39.18    0    0
##   Cat3or5 Sum.Cat3and5 MLearliness cluster.all2 cluster.all total.yes.gp
## 1       0            0       107.2            1           1            1
## 2       0            0       140.0            1           1            1

with(all, table(cluster2, cluster.all))

##         cluster.all
## cluster2  1  2
##        1 50 11
##        2 31  2

with(all, table(cluster2, cluster.all2))

##         cluster.all2
## cluster2  1  2
##        1 50 11
##        2 31  2

with(all, tapply(access, cluster2, mean))

##     1     2 
## 63.44 18.94

with(all, tapply(access, cluster.all, mean))

##     1     2 
## 45.58 61.77

with(all, tapply(access, cluster.all2, mean))

##     1     2 
## 45.58 61.77

with(all, tapply(access.days, cluster.all, mean))

##     1     2 
## 17.05 20.08

with(all, tapply(MLearliness, cluster.all, mean))

##      1      2 
## 115.36  20.93

with(all, tapply(ass.early.num, cluster.all, mean))

##     1     2 
## 32.20 10.89

names(all[c(117:120, 125, 132:136, 137, 141:143)])

##  [1] "ML1.previous"  "ML2.planMS"    "ML3.usedMS"    "ML4.planEOS"  
##  [5] "access"        "ML1earliness"  "ML2earliness"  "ML3earliness" 
##  [9] "ML4earliness"  "Ass.mark"      "Ass.earliness" "Cat5"         
## [13] "Cat3or5"       "Sum.Cat3and5"

with(all, tapply(Ass.mark, cluster.all, mean))

##     1     2 
## 84.62 80.54

with(all, tapply(Course.grade, cluster.all, mean))

##     1     2 
## 80.17 68.26

CONCLUSIONS:
clusters based on ML responses, lect recording access, earliness, ass mark and Cat3/5 have: huge difference in ML earliness substantial diff in ass earliness no diff in ass mark very large diff in course grade

wilcox.test(Course.grade ~ cluster.all, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Course.grade by cluster.all
## W = 931, p-value = 9.634e-06
## alternative hypothesis: true location shift is not equal to 0

checking impact of qual in cluster.all determination

with(all, tapply(Cat3or5, cluster.all, mean))

##      1      2 
## 0.8272 1.0769

with(all, tapply(Sum.Cat3and5, cluster.all, mean))

##     1     2 
## 1.148 1.462

addmargins(with(all, table(ML1.previous, cluster.all)))

##             cluster.all
## ML1.previous  1  2 Sum
##        Maybe 14  2  16
##        No    44  6  50
##        Yes   23  5  28
##        Sum   81 13  94

with(all, table(ML2.planMS, cluster.all))

##           cluster.all
## ML2.planMS  1  2
##      Maybe  4  1
##      No    64  9
##      Yes   12  3

with(all, table(ML3.usedMS, cluster.all))

##           cluster.all
## ML3.usedMS  1  2
##      Maybe  4  0
##      No    61  3
##      Yes   14 10

with(all, table(ML4.planEOS, cluster.all))

##            cluster.all
## ML4.planEOS  1  2
##       Maybe  2  0
##       No    60  7
##       Yes   13  4

qual categories and phases

cate = read.csv("strat.csv")
#str(cate)


#for s4123456
df = cate
df$StudID = substr(df$StudentID, start=2, stop=8)
df$StudID = as.numeric(df$StudID)
df$StudID = df$StudID*2+1
df$StudID = as.character(df$StudID)
df$StudID = paste("S", df$StudID, sep="")
df$StudentID = df$StudID
df$StudID = NULL
cate = df

cate[1:5,]

##   StudentID strat1 strat2 strat3 strat4 strat5 strat6 strat7 strat8 strat9
## 1  S8152093      0      0      2      0      1      1      1      0      1
## 2  S8469547      0      0      2      1      0      1      1      0      0
## 3  S8522577      0      0      1      0      2      1      1      0      0
## 4  S8533121      0      0      2      0      0      1      0      0      1
## 5  S8575195      0      0      1      1      2      1      2      0      2
##   strat10 foretht perf eval phases
## 1       0       0    4    1      2
## 2       0       0    4    0      1
## 3       0       0    4    0      1
## 4       0       0    2    1      2
## 5       0       0    5    1      2

all = merge(all, cate, by="StudentID")

dim(all)

## [1]  94 161

addmargins(with(all, table(foretht, cluster.all)))

##        cluster.all
## foretht  1  2 Sum
##     0   74 13  87
##     1    6  0   6
##     2    1  0   1
##     Sum 81 13  94

addmargins(with(all, table(foretht, cluster2)))

##        cluster2
## foretht  1  2 Sum
##     0   56 31  87
##     1    4  2   6
##     2    1  0   1
##     Sum 61 33  94

addmargins(with(all, table(perf, cluster.all)))

##      cluster.all
## perf   1  2 Sum
##   1    5  0   5
##   2   18  2  20
##   3   29  4  33
##   4   21  4  25
##   5    8  1   9
##   6    0  2   2
##   Sum 81 13  94

addmargins(with(all, table(perf, cluster2)))

##      cluster2
## perf   1  2 Sum
##   1    3  2   5
##   2   14  6  20
##   3   20 13  33
##   4   16  9  25
##   5    6  3   9
##   6    2  0   2
##   Sum 61 33  94

addmargins(with(all, table(eval, cluster.all)))

##      cluster.all
## eval   1  2 Sum
##   0   27  4  31
##   1   40  8  48
##   2   14  1  15
##   Sum 81 13  94

addmargins(with(all, table(eval, cluster2)))

##      cluster2
## eval   1  2 Sum
##   0   18 13  31
##   1   34 14  48
##   2    9  6  15
##   Sum 61 33  94

addmargins(with(all, table(phases, cluster.all)))

##       cluster.all
## phases  1  2 Sum
##    1   25  4  29
##    2   51  9  60
##    3    5  0   5
##    Sum 81 13  94

addmargins(with(all, table(phases, cluster2)))

##       cluster2
## phases  1  2 Sum
##    1   17 12  29
##    2   40 20  60
##    3    4  1   5
##    Sum 61 33  94

5 cluster and 3 cluster solutions for cluster.all2 check

cluster.all2.groups.k5 = cutree(cluster.all2, k = 5)
all$cluster.all2k5 = cluster.all2.groups.k5

with(all, tapply(Course.grade, cluster.all2k5, mean))

##     1     2     3     4     5 
## 79.15 80.55 77.99 83.91 68.26

cluster.all2.groups.k3 = cutree(cluster.all2, k = 3)
all$cluster.all2k3 = cluster.all2.groups.k3

with(all, tapply(Course.grade, cluster.all2k3, mean))

##     1     2     3 
## 79.94 80.39 68.26

cluster.all2 stat’s for paper

with(all, tapply(Course.grade, cluster.all2, mean))

##     1     2 
## 80.17 68.26

with(all, tapply(Course.grade, cluster.all2, sem))

## [1] 0.8933
## [1] 1.812

##      1      2 
## 0.8933 1.8121

wilcox.test(Course.grade ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Course.grade by cluster.all2
## W = 931, p-value = 9.634e-06
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(access, cluster.all2, mean))

##     1     2 
## 45.58 61.77

with(all, tapply(access, cluster.all2, sem))

## [1] 4.046
## [1] 6.871

##     1     2 
## 4.046 6.871

wilcox.test(access ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  access by cluster.all2
## W = 330, p-value = 0.03178
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(access.days, cluster.all2, mean))

##     1     2 
## 17.05 20.08

with(all, tapply(access.days, cluster.all2, sem))

## [1] 0.967
## [1] 2.077

##     1     2 
## 0.967 2.077

wilcox.test(access.days ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  access.days by cluster.all2
## W = 401.5, p-value = 0.1722
## alternative hypothesis: true location shift is not equal to 0

all$ML1.previous[1:5]

## [1] No    No    No    Yes   Maybe
## Levels: Maybe No Yes

#with(all, tapply(ML1.previous, cluster.all2, mean))
#with(all, tapply(ML1.previous, cluster.all2, sem))
#wilcox.test(ML1.previous ~ cluster.all2, data=all)
#not numerical

with(all, tapply(total.yes, cluster.all2, mean))

##      1      2 
## 0.7654 1.6923

with(all, tapply(total.yes, cluster.all2, sem))

## [1] 0.1182
## [1] 0.3469

##      1      2 
## 0.1182 0.3469

wilcox.test(total.yes ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  total.yes by cluster.all2
## W = 304, p-value = 0.008412
## alternative hypothesis: true location shift is not equal to 0

addmargins(with(all, table(ML1.previous, cluster.all2)))

##             cluster.all2
## ML1.previous  1  2 Sum
##        Maybe 14  2  16
##        No    44  6  50
##        Yes   23  5  28
##        Sum   81 13  94

addmargins(with(all, table(ML2.planMS, cluster.all2)))

##           cluster.all2
## ML2.planMS  1  2 Sum
##      Maybe  4  1   5
##      No    64  9  73
##      Yes   12  3  15
##      Sum   80 13  93

addmargins(with(all, table(ML3.usedMS, cluster.all2)))

##           cluster.all2
## ML3.usedMS  1  2 Sum
##      Maybe  4  0   4
##      No    61  3  64
##      Yes   14 10  24
##      Sum   79 13  92

addmargins(with(all, table(ML4.planEOS, cluster.all2)))

##            cluster.all2
## ML4.planEOS  1  2 Sum
##       Maybe  2  0   2
##       No    60  7  67
##       Yes   13  4  17
##       Sum   75 11  86

str(all$ML1earliness)

##  num [1:94] 8.73 139.87 19.87 43.2 117.33 ...

with(all, tapply(ML1earliness, cluster.all2, mean))

##      1      2 
## 107.94  32.47

with(all, tapply(ML1earliness, cluster.all2, sem))

## [1] 5.194
## [1] 7.732

##     1     2 
## 5.194 7.732

wilcox.test(ML1earliness ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  ML1earliness by cluster.all2
## W = 947, p-value = 4.223e-06
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(ML2earliness, cluster.all2, mean))

##     1     2 
## 101.3  16.4

with(all, tapply(ML2earliness, cluster.all2, sem))

## [1] 5.697
## [1] 4.362

##     1     2 
## 5.697 4.362

wilcox.test(ML2earliness ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  ML2earliness by cluster.all2
## W = 973, p-value = 1.035e-06
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(ML3earliness, cluster.all2, mean))

##      1      2 
## 118.89  19.33

with(all, tapply(ML3earliness, cluster.all2, sem))

## [1] 5.167
## [1] 5.197

##     1     2 
## 5.167 5.197

wilcox.test(ML3earliness ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  ML3earliness by cluster.all2
## W = 1020, p-value = 6.678e-08
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(ML4earliness, cluster.all2, mean))

##      1      2 
## 133.28  15.52

with(all, tapply(ML4earliness, cluster.all2, sem))

## [1] 5.958
## [1] 4.307

##     1     2 
## 5.958 4.307

wilcox.test(ML4earliness ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  ML4earliness by cluster.all2
## W = 1018, p-value = 7.541e-08
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(MLearliness, cluster.all2, mean))

##      1      2 
## 115.36  20.93

with(all, tapply(MLearliness, cluster.all2, sem))

## [1] 3.52
## [1] 3.298

##     1     2 
## 3.520 3.298

wilcox.test(MLearliness ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  MLearliness by cluster.all2
## W = 1053, p-value = 8.36e-09
## alternative hypothesis: true location shift is not equal to 0

115/24

## [1] 4.792

.791667*24

## [1] 19

with(all, table(Ass.earliness, cluster.all2))

##                      cluster.all2
## Ass.earliness         1 2
##   -143.96             0 1
##   -0.0652777777777778 1 0
##   0.0736111111111111  1 0
##   0.0897222222222222  1 0
##   0.143055555555556   1 0
##   0.147222222222222   1 0
##   0.203611111111111   1 0
##   0.265277777777778   1 0
##   0.303888888888889   0 1
##   0.516111111111111   0 1
##   0.546111111111111   0 1
##   0.555277777777778   0 1
##   0.681944444444444   1 0
##   0.687777777777778   1 0
##   0.713055555555556   1 0
##   0.771111111111111   1 0
##   0.895555555555556   1 0
##   0.977222222222222   0 1
##   1.51083333333333    0 1
##   1.70527777777778    1 0
##   1.89444444444444    1 0
##   1.91222222222222    1 0
##   2.05                1 0
##   2.40416666666667    1 0
##   2.57666666666667    0 1
##   2.65722222222222    1 0
##   2.66472222222222    1 0
##   2.79722222222222    1 0
##   2.81888888888889    1 0
##   2.9925              1 0
##   3.26722222222222    1 0
##   3.47472222222222    1 0
##   4.72                1 0
##   6.31416666666667    0 1
##   9.57305555555556    1 0
##   9.71305555555556    1 0
##   10.8986111111111    1 0
##   11.6347222222222    1 0
##   11.6375             1 0
##   11.8708333333333    1 0
##   12.3691666666667    1 0
##   12.7441666666667    0 1
##   12.7788888888889    1 0
##   12.9455555555556    1 0
##   13.0108333333333    1 0
##   13.0341666666667    1 0
##   13.2358333333333    1 0
##   13.64               1 0
##   13.7522222222222    1 0
##   14.5788888888889    1 0
##   14.8811111111111    1 0
##   15.5013888888889    1 0
##   15.5069444444444    2 0
##   16.6544444444444    1 0
##   19.9622222222222    0 1
##   21.5005555555556    1 0
##   22.4719444444444    1 0
##   22.9075             0 1
##   23.2275             1 0
##   23.2405555555556    1 0
##   23.6263888888889    1 0
##   24.3558333333333    1 0
##   24.59               1 0
##   26.6661111111111    1 0
##   27.5688888888889    1 0
##   28.4233333333333    1 0
##   37.2447222222222    1 0
##   37.8005555555556    1 0
##   39.1761111111111    2 0
##   40.1525             1 0
##   40.8602777777778    1 0
##   42.7508333333333    1 0
##   43.4602777777778    1 0
##   46.4233333333333    1 0
##   47.0469444444444    1 0
##   47.6402777777778    1 0
##   51.3877777777778    1 0
##   51.4061111111111    1 0
##   71.1552777777778    1 0
##   71.3391666666667    1 0
##   73.3827777777778    1 0
##   75.9794444444444    1 0
##   84.4688888888889    1 0
##   90.9058333333333    1 0
##   119.182222222222    1 0
##   123.219722222222    1 0
##   134.078888888889    1 0
##   137.369722222222    1 0
##   146.389722222222    1 0
##   178.101388888889    1 0
##   189.9875            1 0
##   216.656111111111    0 1

#figuring out why Ass.earliness errors (time diff instead)
str(all$Ass.earliness)

## Class 'difftime'  atomic [1:94] 39.1761 39.1761 2.4042 37.2447 0.0897 ...
##   ..- attr(*, "units")= chr "hours"

all$Ass.earliness.num = as.numeric(all$Ass.earliness)

with(all, tapply(Ass.earliness, cluster.all2, mean))

##     1     2 
## 32.20 10.89

with(all, tapply(Ass.earliness, cluster.all2, sem))

## [1] 4.693
## [1] 20.76

##      1      2 
##  4.693 20.764

wilcox.test(Ass.earliness.num ~ cluster.all2, data=all)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Ass.earliness.num by cluster.all2
## W = 754, p-value = 0.01291
## alternative hypothesis: true location shift is not equal to 0

with(all, tapply(Ass.earliness, cluster.all2, median))

##      1      2 
## 14.881  1.511

with(all, boxplot(Ass.earliness.num ~ cluster.all2))

plot of chunk unnamed-chunk-69

range(all$Ass.earliness.num)

## [1] -144.0  216.7

sort(all$Ass.earliness.num)

##  [1] -143.96000   -0.06528    0.07361    0.08972    0.14306    0.14722
##  [7]    0.20361    0.26528    0.30389    0.51611    0.54611    0.55528
## [13]    0.68194    0.68778    0.71306    0.77111    0.89556    0.97722
## [19]    1.51083    1.70528    1.89444    1.91222    2.05000    2.40417
## [25]    2.57667    2.65722    2.66472    2.79722    2.81889    2.99250
## [31]    3.26722    3.47472    4.72000    6.31417    9.57306    9.71306
## [37]   10.89861   11.63472   11.63750   11.87083   12.36917   12.74417
## [43]   12.77889   12.94556   13.01083   13.03417   13.23583   13.64000
## [49]   13.75222   14.57889   14.88111   15.50139   15.50694   15.50694
## [55]   16.65444   19.96222   21.50056   22.47194   22.90750   23.22750
## [61]   23.24056   23.62639   24.35583   24.59000   26.66611   27.56889
## [67]   28.42333   37.24472   37.80056   39.17611   39.17611   40.15250
## [73]   40.86028   42.75083   43.46028   46.42333   47.04694   47.64028
## [79]   51.38778   51.40611   71.15528   71.33917   73.38278   75.97944
## [85]   84.46889   90.90583  119.18222  123.21972  134.07889  137.36972
## [91]  146.38972  178.10139  189.98750  216.65611

#with(all, sort(table(cluster.all2, Ass.earliness.num)))

with(all, boxplot(log(Ass.earliness.num) ~ cluster.all2))

## Warning: NaNs produced