Open University Learning Analytics Dataset Exploration

Intan Dea Yutami

22 Februari 2018

Brief Description of App

This application is based on Open University Learning Dataset. The dataset is taken from: https://archive.ics.uci.edu/ml/datasets/Open+University+Learning+Analytics+dataset#

The dataset contains data about courses, students and their interactions with Virtual Learning Environment (VLE) for seven selected courses (called modules).

This application reads the file: studentInfo.csv, which stores the information of final result of the assessment from the seven courses, done by students enrolled in those courses.

The following result is the columns inside studentInfo.csv.

#studInfo <- read.csv("studentInfo.csv")
head(studInfo)
##   code_module code_presentation id_student gender               region
## 1         AAA             2013J      11391      M  East Anglian Region
## 2         AAA             2013J      28400      F             Scotland
## 3         AAA             2013J      30268      F North Western Region
## 4         AAA             2013J      31604      F    South East Region
## 5         AAA             2013J      32885      F West Midlands Region
## 6         AAA             2013J      38053      M                Wales
##       highest_education imd_band age_band num_of_prev_attempts
## 1      HE Qualification  90-100%     55<=                    0
## 2      HE Qualification   20-30%    35-55                    0
## 3 A Level or Equivalent   30-40%    35-55                    0
## 4 A Level or Equivalent   50-60%    35-55                    0
## 5    Lower Than A Level   50-60%     0-35                    0
## 6 A Level or Equivalent   80-90%    35-55                    0
##   studied_credits disability final_result
## 1             240          N         Pass
## 2              60          N         Pass
## 3              60          Y    Withdrawn
## 4              60          N         Pass
## 5              60          N         Pass
## 6              60          N         Pass

Tab 1 : General Passing Rate

This tab is basically showing the relationship between number of students passing the assessment and two other variables selected by the user. User also enters a specific number as sample size to subset the dataset.

Student is considered passed if his/her final result is either ‘Passed’ or ‘Distinction’. Below is the example of plot from 500 random samples taken from the dataset.

Just choose a number to set as sample size on the left panel, and select two other variables to explore the relationship!

    new_df <- studInfo[sample(nrow(studInfo), 1000), ]
    new_df$isPassed <- ifelse(new_df$final_result == "Distinction" |
                                            new_df$final_result == "Pass", 1, 0)
    new_df$gender <- as.factor(new_df$gender)
    new_df$region <- as.factor(new_df$region)
    sum_table <- group_by(new_df, gender, region) %>% summarize(isPassed = sum(isPassed))
    
    p <- ggplot(sum_table, aes(region, isPassed, fill = region)) + geom_bar(stat = "identity")
    input_x1 <- toupper("Region")
    p <- p + ggtitle("Total Passing Students by Gender and Region ") + facet_grid( . ~ gender) + scale_x_discrete(labels = abbreviate) + theme(axis.text.x = element_text(angle = 90, hjust = 1))  + labs(y = "Total Students Passed", x = "") + guides(fill=guide_legend(title=paste("", input_x1)))
    print(p)

Tab 2 : Course-Based Passing Result

This tab is showing the relationship between final result of the assessment and a variable picked by user from a specific course. Below is an example of the plot.

To see the plot, choose a course and a parameter to relate to final result of the assessment.

    infoModule <- studInfo[studInfo$code_module == "CCC",]
    count_res <- table(infoModule$final_result, infoModule$highest_education)
    colorList <- c("darkblue", "red", "darkgreen", "gold")                                      
    par(mar=c(10.1, 4.1, 2.1, 8.1), xpd=TRUE)
    barplot(count_res, 
            main="Final Result Distribution by Highest Education from course CCC", 
            ylab = "Number of Students", las = 2,
            col=colorList
    )
    legend("topright", inset=c(-0.1,0),
           legend = rownames(count_res),
           fill = colorList
    )