We focused on answering the question What is the trend for students’ interaction with the VLE according to the time of the semester?.
RMIT university reached out to us to make sense of data they retrieved from the interactions of their students. The data set includes the interactions of 32,593 students with their Virtual Learning Environment (VLE). You can download the dataset from here.
We set out to understand the datasets, particularly the trends for students’ interactions with the VLE system according to the time of the year/semester. We focused on the subject’s AAA and CCC for the years 2013 and 2014. Within this report, we will discuss our Hypotheses, the conclusion of our findings, the methodology and the possible things that may have affected the result.
For the analysis of this area of study, we used the studentVle.cvs file which includes the information of the course period, the students id, the date, sum of clicks, code module and the id site. Packages used where ggplot2, knitr,dplyr and kable. We’ll outline the method of finding the results for subject AAA. The process was the same for all subjects.
At first, we used the package dplyr using the function dplyr::filter() to filter the data in the code_module column. This was to show only the clicks registered in the course AAA for all the semesters it ran in.
studentVle2 <- studentVle%>%filter(code_module=='AAA')
Then we used the functions dplyr::group_by() and dplyr::tally() to group and count the sum of clicks for each date in the course. The package ggplot2 was used with ggplot::geom_line() in order to visualize the data in a line graph that showed the sum of clicks per date in the course.
Vle <- studentVle2 %>% group_by(studentVle2$date,studentVle2$code_presentation) %>% tally(sum_click)
We changed the size and the angle of the x axis text and we also added titles to the graphs x and y axis. Finally, we used the ggplot::facet_wrap() function in order to separate the data to two different graphs to show how many clicks recorded in each semester the course ran.
studentVle2 <- studentVle%>%filter(code_module=='AAA')
Vle <- studentVle2 %>% group_by(studentVle2$date,studentVle2$code_presentation) %>% tally(sum_click)
ggplot(Vle) + geom_line(aes(x=Vle$`studentVle2$date`, y =Vle$n)) +
scale_x_continuous(breaks = c( 19, 54, 117, 166, 215, 250)) +
theme( axis.text.x = element_text( size = 12, angle = 45, hjust = 1)) +
labs(title = "Student's click sum in course AAA",x = "Date of assessments in the course", y="Click sum") +
facet_wrap(~ Vle$`studentVle2$code_presentation`)
studentVle2 <- studentVle%>%filter(code_module=='CCC')
Vle <- studentVle2 %>% group_by(studentVle2$date,studentVle2$code_presentation) %>% tally(sum_click)
ggplot(Vle) + geom_line(aes(x=Vle$`studentVle2$date`, y =Vle$n, colour = 'red')) +
scale_x_continuous(breaks = c( 19, 54, 117, 166, 215, 250)) +
theme( axis.text.x = element_text( size = 12, angle = 45, hjust = 1)) +
labs(title = "Student's sum of click in course CCC",x = "Date of assessments in the course", y="Sum of click") +
facet_wrap(~ Vle$`studentVle2$code_presentation`)
Student syndrome is a real issue that plagues a lot of people. It refers to planned procrastination, where the person only applies themselves for tasks within the last possible moments of the deadline (Blackstone, Cox, & Schleier, 2009). Going off this widely presumed issue, we as a team hypothesised that we’d see dramatic peaks within the data. That is, students interaction level would mostly peak at a certain point in time, corresponding to an assessed deadline.
We also hypothesised that there’d be a great number of clicks at the start of the course, as students would likely interact to get an understanding of what is expected.
Possible things that could have affected the result:
A major factor that could potentially affect the results, would be students downloading the content to access offline. Although we’re exploring the interaction of a student with the VLE system at a particular time, it can influence how many times a student would have interacted with the system if they didn’t have access to offline materials. So, for this exploration of data, we’ll ignore the above-mentioned issue but still understand that the results may have been affected by this.
Another thing is the fact we didn’t get analyse all the subjects. Due to the time constraint, we opted to limiting the subjects analysed to two subjects instead of the seven provided. The team did this by selecting individual modules. The code can be seen below.
(we need to include the code here)
For the findings, we will separate each subject and analyse each individually. We’ll be doing them in the following order, AAA then CCC. We’ll also only be analysing two semesters of each course.
For this subject it seems our team’s hypotheses was not so accurate. We hypothesised that we’ll see dramatic peaks and troughs throughout the semester which did not show. However, we do see a high number of clicks at the beginning of the subject which is in line with our hypotheses.
For the years 2013 and 2014, we noticed that subject AAA seemed to have the same trend. If you examine the graph, you’ll see similar peaks and troughs. For example, on day 19/20 you’ll see a peak followed by a dramatic trough then a peak on around the 50th day. You’ll also see a major drop in clicks on around the 80th day.
For the year 2013, about 10 days before the course starts (-10), there is a recorded approximate of 12,000 clicks. This is the highest number of clicks for the entire semester. As mentioned above, students could have downloaded a lot content to access offline, so this may be why there aren’t as many clicks later into the semester. For the year 2014, it seems to have a similar trend, but with the clicks first starting twenty to twenty-five days (-20 /-25) before the course starts. As you can see from the graph above, the stream of clicks starts at the -20 mark, indicating twenty days before the subject started.
For both years we see a rapid drop after the initial spike in clicks before the subject begins. The high number of clicks before the subject starts may be the result of the actual VLE system becoming available to students, which beforehand access may not have been given. As the semester progresses, we see an influx of clicks at around the 10 to 20-day mark of the subject beginning. Then after this short influx, there is a steady peak throughout the semester until about 80 days after the subject started. There is a drop in the number of clicks, which could possibly be from a break during the semester. As expected, there are slight up and down peaks, that don’t change drastically until the end of the semester, which may be due to the exams.
studentVle2 <- studentVle%>%filter(code_module=='AAA')
Vle <- studentVle2 %>% group_by(studentVle2$date,studentVle2$code_presentation) %>% tally(sum_click)
ggplot(Vle) + geom_line(aes(x=Vle$`studentVle2$date`, y =Vle$n)) +
scale_x_continuous(breaks = c( 19, 54, 117, 166, 215, 250)) +
theme( axis.text.x = element_text( size = 12, angle = 45, hjust = 1)) +
labs(title = "Student's click sum in course AAA",x = "Date of assessments in the course", y="Click sum") +
facet_wrap(~ Vle$`studentVle2$code_presentation`)
For this subject it seems our team’s hypotheses was correct and quite accurate. We hypothesised that we’ll see dramatic peaks and troughs throughout the semester which is what we see for subject CCC’s graphs. We also see a high number of clicks at the beginning of the subject which further provers our hypothesis.
The trend is almost identical in both semesters. We can see that it starts off with a high number of clicks at the beginning of the course then has a rapid decline. For most of the semester it is easy to understand when something is either due, or when there may have been an online test. Because you can follow the peaks and you’ll see that its very evident that something caused all the students to click a high number of times during these dramatically high peaks.
studentVle2 <- studentVle%>%filter(code_module=='CCC')
Vle <- studentVle2 %>% group_by(studentVle2$date,studentVle2$code_presentation) %>% tally(sum_click)
ggplot(Vle) + geom_line(aes(x=Vle$`studentVle2$date`, y =Vle$n, colour = 'red')) +
scale_x_continuous(breaks = c( 19, 54, 117, 166, 215, 250)) +
theme( axis.text.x = element_text( size = 12, angle = 45, hjust = 1)) +
labs(title = "Student's sum of click in course CCC",x = "Date of assessments in the course", y="Sum of click") +
facet_wrap(~ Vle$`studentVle2$code_presentation`)
We hypothesised that student’s interaction with the VLE system would be much greater at the beginning of the semester, followed by peaks and troughs throughout the remainder of the semester. Our findings above show that our hypothesis was partially accurate. One of the subjects analysed (CCC) was what we expected and followed the trend of our hypotheses . On the other hand, AAA only proved partial of our hypotheses to be correct. There weren’t many dramatic peaks other than the beginning and end of the subject, which goes against our hypotheses.
For a more evident conclusion, we’d need to analyse more of the subjects to remove any biasness that may have been within the data above. Furthermore, it’ll be a more justified analysis, as we’d be incorporating more data to be assessed.