title: “Week 6 Data Product” author: “Aine Doyle” date: “September 26, 2022” output: html_document: code_folding: hide code_download: TRUE —
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
data_to_viz <- read_csv("data/data-to-explore.csv")
## Rows: 943 Columns: 34
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): student_id, subject, semester, section, gender, enrollment_reason...
## dbl (23): total_points_possible, total_points_earned, proportion_earned, ti...
## dttm (3): date_x, date_y, date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(data_to_viz) +
geom_histogram(mapping = aes(x = proportion_earned, fill = gender)) + labs(title = "Proportion of points earned by gender", caption = "Which gender earned a higher proportion of points?") + facet_wrap( ~gender)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 226 rows containing non-finite values (stat_bin).
In the histogram above, the number of students who earned a certain proportion of points are categorized by gender. Because there are more women enrolled in the online courses (based on count), they have a higher proportion of points earned. Most male and female students earned somewhere between 80% and 100% of their points.