title: “Week 6 Data Product” author: “Aine Doyle” date: “September 26, 2022” output: html_document: code_folding: hide code_download: TRUE —

Points Proportion by Gender

    library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

    data_to_viz <- read_csv("data/data-to-explore.csv")

## Rows: 943 Columns: 34
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (8): student_id, subject, semester, section, gender, enrollment_reason...
## dbl  (23): total_points_possible, total_points_earned, proportion_earned, ti...
## dttm  (3): date_x, date_y, date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

ggplot(data_to_viz) +
  geom_histogram(mapping = aes(x = proportion_earned, fill = gender)) + labs(title = "Proportion of points earned by gender", caption = "Which gender earned a higher proportion of points?") + facet_wrap( ~gender)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 226 rows containing non-finite values (stat_bin).

In the histogram above, the number of students who earned a certain proportion of points are categorized by gender. Because there are more women enrolled in the online courses (based on count), they have a higher proportion of points earned. Most male and female students earned somewhere between 80% and 100% of their points.