For this project I chose to find the data facing the 5 most popular history classes offered here at Sewanee. I am very interested in history as I am thinking of minoring in it. This is important for me as It shows me what classes I should think of taking when it comes to registering for classes.
-I acquired all of this data from the university registrar via csv. This is the data I used to be able to use read.csv as well as the data I used to load the data from the registrar:
library(tidyverse)
course_data_raw <- read_csv("data/course_data.csv")
For this project I had to find a way to soley get a few columns as well as only a few variables.
Filter -I used the filter function to only show me subjects in the data such as History
more_filter <- filter(course_data_raw , subj == "HIST")
Select
-I also realized I had far too many columns so I knew I had to use select to size down my data chart and get only needed variables. -
ten_select <- select(more_filter , title, subj , limit, enrolled , available)
Arrange and Slice
-I then knew I needed to use the arrange and slice functions to get our data set down to show me only the top 5 most popular classes. I also realized that I needed to see the data in order so i used desc to have more data more organized and readable. (I tried 2 different variables for arrange that is why there are 2) -
more_arrange <- arrange (ten_select , by = desc(enrolled))
most_arrange <- arrange (more_arrange , by = desc(available))
ten_slice <- slice(most_arrange, 1:10)
Mutate
-After I got an easier to read data set I chose to mutate my data a little bit to combine some data sets as well as make a new column with the finalized data that I would need. -
ten_mutate <- ten_slice %>%
mutate(empty_seats = limit - enrolled)
mutate_ten <- ten_mutate %>% select(-available,)
Group by and Summarize
-Finally I ran into a problem. I was originally trying to find the 10 most popular classes, but my data set was being combined and adding together, making my graph not accurate. I had to group by and summarize as well as cut my data down that way I could have good data. Usually you would put this in between select and arrange, but since I had not run into this problem yet, this happened to be the last tweak I made to my data (for group by I used the data set final_ten but it would not let me knit until I put it as course_dat_raw)
final_ten <- course_data_raw |>
group_by(title) |>
summarize(limit_max =max(limit))
Bar Chart
-I was finally able to start on my bar chart! My bar chart features the names of the classes as well as the maximum amount each class has space for. -The higher the number the class shows the more people are in the class. All classes shown were almost entirely full. The higher the number - the more popular the class
ggplot(final_ten, aes(y = reorder(limit_max, title), x = title , fill = title)) +
geom_bar(stat = "identity") +
labs(
x = "Name of the classes" ,
y = "Maximum amount the classes hold" ,
title = "Top 5 most popular classes at Sewanee" ,
subtitle = "Limit each class holds including if each class is filled to the limit " ,
caption = "Data was extracted in csv format from University Registrar"
)+
theme(text = element_text(size=12))
##Conclusion
In conclusion,this was a very insightful class. As a first year this is important and very helpful because I have not yet gone through the process of registering for classes. This will give me a little more insight and help prepare me a little better that way I feel equipped to choose the best classes for me. As I previously stated, I am interested in history classes, so this will show me the best classes, which may be taught by a good professor or just very interesting. I know know if I am interested in taking any of these classes, which ones to take first.