Goal:
Give an idea about the set of distance learning classes currently offered at Austin Community College (ACC). This is done in several steps:
Step 1:
Scrapped ACC’s distance learning webpage in order to acquire:
- The broad categories of subjects that are being taught and
- The complete list of courses offered.
trimws(cat[5],which=c("both"))
Categories <- cat[-c(1,5, 10, 12,14,37,60)] # subseting the rows that were incorrectly scraped
#categClean <- gsub("\r\n\tAre you a new student and interested in our programs and trying to register for classes? If so please visit our website to learn more about our program http://www.austincc.edu/tcm Also, schedule an appointment to meet with the Departmental Advisor at davidm@austincc.edu.
#\r\n\t","",categClean) # and a last one from row 11
##\r\n\tSocial Media Communication: Degree & Certificate Courses
#View(categClean)
Categories <- gsub("[\r\n\t]","",Categories)
Findings:
There are approximately 55 course categories that are being currently offered.The complete list of categories in alphabetical order can be seen here (by clicking on the Next button you can see all the categories):
#NEED TO MAKE IT WORK
categClean2 <- as.data.frame(Categories)
categClean2
#Categories <- as.data.frame(categClean2)
#Categories
Step 2:
Taking the exploration one step further, ACC’s website is scrapped in more detail once again.
#### Scraping the courses with their (sub)-sections
acc <- html('http://www6.austincc.edu/schedule/index.php?op=browse&opclass=ViewSched_location&term=218S000&ct=CC&locationid=DIL&reporting_year=2018',
options="HUGE")
classes <- acc%>%
html_nodes("h4 a")%>%
html_text()
#classes
head(classes)
# writing this to an excel file so I have a stored dataset
#write.table(classes, file="classes.csv",sep=",",row.names=F) # exported the data under the name registration.csv
# Importing the dataset - it is included in the zip folder under the name registration.csv
registration <- read.csv("/Users/Zenodrakos/Workspaces/R_workspace/SampleProjects/Projects/registration.csv")
Findings:
There are about 430 different courses with a total of 754 course sections. The maximum number of sections that a course has is 11 and that is “English Composition II”.
class_with_most_sections$course_name
d3 <- as.data.frame(final %>%
group_by (course_name,id) %>%
summarise(total_enrolled= sum(enrolled),total_capacity = sum(class_size)))
View(d3)
# d4= d3 but sorted from largest total enrolment to smallest total enrolment
d4 <- d3[order(-d3$total_enrolled),]
View(d4) #
The top 10% courses (= 43 courses) with the highest total-across all sections student enrolment can be found below (by clicking on the Next button you can see all 43 courses with the highest enrollment):
#keeping the top 10% or the 43 classes with the most registered students in total
head(d4,n=430*10/100)
s <- head(d4,n=430*10/100)
I am plotting the first 12 courses with respect to total enrollment, by displaying for each the total enrolment and the total capacity:
d3a <- head(d3[order(d3$total_capacity, decreasing = T),], 15)
d3.plottable <- d3a[, c(1,3,4)]
d3.plottable <- melt(d3.plottable, id.vars = "course_name")
library(ggplot2)
g <- ggplot(d3.plottable, aes(x = course_name, y = value))
g <- g + geom_bar(aes(fill = variable), position = position_dodge(), stat = "identity") +
coord_flip() + theme(legend.position = "top")
g <- g + labs(x = "Course Name")
g <- g+ labs(y = "Number of Students")
g

#g <- g + labs(x = "Number of students")
#p2 <- p2 + labs(y = "Units")
#p2 <- p2 + labs(title = "Number of transactional units by Category")
#p2
