Open the assign08.qmd file and complete the exercises.
The Grades.sqlite file is preloaded into your working directory. In case there are any issues, you can also download it if you need to. It is up to you how much you want to do directly in SQL versus using R to complete the exercises below. Note: you will receive deductions for not using tidyverse syntax when applicable in this assignment. That includes the use of filter, mutate, and the up-to-date pipe operator |>.
The Grading Rubric is available at the end of this document.
Exercises
We will start by connecting to the database and loading packages me may want to use.
Recreate the graph below showing the total students by course in Spring 2015.
spring_2015_counts <-dbGetQuery(db, " SELECT name AS course_name, COUNT(DISTINCT student_id) AS total_students FROM grades INNER JOIN sections USING (section_id) WHERE semester = 'Spring' AND year = 2015 GROUP BY name")
Warning: Closing open result set, pending rows
ggplot(spring_2015_counts, aes(x = course_name, y = total_students)) +geom_col(fill ="grey") +labs(title ="Total Students by Course (Spring 2015)",x ="Course",y ="Number of Students") +theme_minimal()
Exercise 2
Show enrollments by section for the entire year 2015. Make sure you include year, semester, course name, section_id and the number of students in each section. Arrange the table by semester so that all of the Fall sections are listed first.
dbGetQuery(conn = db,"SELECT year, semester, name AS course_name, section_id, COUNT(DISTINCT student_id) AS num_studentsFROM gradesINNER JOIN sections USING (section_id)WHERE year = 2015GROUP BY year, semester, name, section_idORDER BY CASE WHEN LOWER(semester) = 'fall' THEN 1 WHEN LOWER(semester) = 'summer' THEN 2 WHEN LOWER(semester) = 'spring' THEN 3 ELSE 4 END, name, section_id;")
year semester course_name section_id num_students
1 2015 Fall BUS 377 68813 36
2 2015 Fall MBA 676 38737 33
3 2015 Fall MBA 676 86362 39
4 2015 Spring BUS 345 25822 31
5 2015 Spring MBA 674 29369 24
6 2015 Spring MBA 674 42666 40
Exercise 3
Recreate the graph below showing average final grade by section for 2015. The vertical red line showing the final average across all sections for the year is added using geom_vline().
db <-dbConnect(RSQLite::SQLite(), "Grades.sqlite")avg_grades <-dbGetQuery(db, " SELECT name AS course_name, section_id, ROUND(AVG(final_avg), 2) AS avg_grade FROM grades INNER JOIN sections USING (section_id) WHERE year = 2015 GROUP BY course_name, section_id")overall_avg <-mean(avg_grades$avg_grade, na.rm =TRUE)avg_grades$section_label <-paste(avg_grades$course_name, avg_grades$section_id, sep =" - ")ggplot(avg_grades, aes(x =reorder(section_label, avg_grade), y = avg_grade)) +geom_col(fill ="blue") +geom_hline(yintercept = overall_avg, color ="red", linetype ="solid") +coord_flip() +labs(title ="Average Final Grade by Course Section (2015)",x ="Course Section",y ="Average Final Grade", ) +theme_minimal()
Exercise 4
Display a list of students (student_id, last_name, first_name) for all students that failed (i.e., final_avg < 65) MBA 674 in the Spring of 2015.
dbGetQuery(conn = db,"SELECT student_id, last_name, first_nameFROM gradesINNER JOIN sections USING (section_id)INNER JOIN students USING (student_id)WHERE name = 'MBA 674' AND semester = 'spring' AND year = 2015 AND final_avg < 65ORDER BY last_name, first_name;")