Intan Dea Yutami
22 Februari 2018
This application is based on Open University Learning Dataset. The dataset is taken from: https://archive.ics.uci.edu/ml/datasets/Open+University+Learning+Analytics+dataset#
The dataset contains data about courses, students and their interactions with Virtual Learning Environment (VLE) for seven selected courses (called modules).
This application reads the file: studentInfo.csv, which stores the information of final result of the assessment from the seven courses, done by students enrolled in those courses.
The following result is the columns inside studentInfo.csv.
#studInfo <- read.csv("studentInfo.csv")
head(studInfo)
## code_module code_presentation id_student gender region
## 1 AAA 2013J 11391 M East Anglian Region
## 2 AAA 2013J 28400 F Scotland
## 3 AAA 2013J 30268 F North Western Region
## 4 AAA 2013J 31604 F South East Region
## 5 AAA 2013J 32885 F West Midlands Region
## 6 AAA 2013J 38053 M Wales
## highest_education imd_band age_band num_of_prev_attempts
## 1 HE Qualification 90-100% 55<= 0
## 2 HE Qualification 20-30% 35-55 0
## 3 A Level or Equivalent 30-40% 35-55 0
## 4 A Level or Equivalent 50-60% 35-55 0
## 5 Lower Than A Level 50-60% 0-35 0
## 6 A Level or Equivalent 80-90% 35-55 0
## studied_credits disability final_result
## 1 240 N Pass
## 2 60 N Pass
## 3 60 Y Withdrawn
## 4 60 N Pass
## 5 60 N Pass
## 6 60 N Pass
This tab is basically showing the relationship between number of students passing the assessment and two other variables selected by the user. User also enters a specific number as sample size to subset the dataset.
Student is considered passed if his/her final result is either ‘Passed’ or ‘Distinction’. Below is the example of plot from 500 random samples taken from the dataset.
Just choose a number to set as sample size on the left panel, and select two other variables to explore the relationship!
new_df <- studInfo[sample(nrow(studInfo), 1000), ]
new_df$isPassed <- ifelse(new_df$final_result == "Distinction" |
new_df$final_result == "Pass", 1, 0)
new_df$gender <- as.factor(new_df$gender)
new_df$region <- as.factor(new_df$region)
sum_table <- group_by(new_df, gender, region) %>% summarize(isPassed = sum(isPassed))
p <- ggplot(sum_table, aes(region, isPassed, fill = region)) + geom_bar(stat = "identity")
input_x1 <- toupper("Region")
p <- p + ggtitle("Total Passing Students by Gender and Region ") + facet_grid( . ~ gender) + scale_x_discrete(labels = abbreviate) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(y = "Total Students Passed", x = "") + guides(fill=guide_legend(title=paste("", input_x1)))
print(p)
This tab is showing the relationship between final result of the assessment and a variable picked by user from a specific course. Below is an example of the plot.
To see the plot, choose a course and a parameter to relate to final result of the assessment.
infoModule <- studInfo[studInfo$code_module == "CCC",]
count_res <- table(infoModule$final_result, infoModule$highest_education)
colorList <- c("darkblue", "red", "darkgreen", "gold")
par(mar=c(10.1, 4.1, 2.1, 8.1), xpd=TRUE)
barplot(count_res,
main="Final Result Distribution by Highest Education from course CCC",
ylab = "Number of Students", las = 2,
col=colorList
)
legend("topright", inset=c(-0.1,0),
legend = rownames(count_res),
fill = colorList
)
Github: https://github.com/intandeay/ddp_oulad
Shiny application: https://intandea.shinyapps.io/ddp_oulad/
Open University Learning Analytics Dataset: https://analyse.kmi.open.ac.uk/open_dataset