How does discussion post factors influence final grades?

---
title: "How does discussion post factors influence final grades?"
output:
  flexdashboard::flex_dashboard:
    theme: 
      version: 4
      bootswatch: pulse
    source_code: embed
---

```{r setup, include=FALSE}

library(flexdashboard)
library(tidyverse)
library(janitor)

course_data <- read_csv("data/course-data.csv") |>
  clean_names()

data_to_viz1 <- course_data |>
  select(course_id, 
         gender,
         time_spent_hours, 
         final_grade) |>
  separate(course_id, c("subject", "semester", "section")) |>
  mutate(subject = recode(subject, 
                          "AnPhA" = "Anatomy",
                          "BioA" = "Biology", 
                          "FrScA" = "Forensics", 
                          "OcnA" =  "Oceanography", 
                          "PhysA" = "Physics"))

liwc_data <- read_csv("data/liwc-data.csv") |>
  clean_names()

data_to_viz2 <- liwc_data |>
  select(course_id,
         negemo,
         posemo,
         anger,
         tone,
         wc) |>
  separate(course_id, c("subject", "semester", "section")) |>
  mutate(subject = recode(subject, 
                          "AnPhA" = "Anatomy",
                          "BioA" = "Biology", 
                          "FrScA" = "Forensics", 
                          "OcnA" =  "Oceanography", 
                          "PhysA" = "Physics"))
data_to_viz <- inner_join(data_to_viz1, data_to_viz2)
```

## Inputs {.sidebar}
No, word count does not appear to influence final grades.

In the scatterplot, the points represent the number of words a student used in their discussion posts, and thier final grade. There does not appear to be any correlation-- positive or negative-- between the two, except for a slight negative correlation for extremely high word counts on the right side of the scatterplot. 

The boxplot depicts the distribution of discussion post word count in each course, and the histogram shows the distribution of the final grades for each course. The lengthiest posts were in Oceanography, and the shortest posts by word count were in Physics. However, both courses had the highest average word count at around 60 words per post. The histogram showed that the highest final grades were in Oceanography, and the lowest final grades were in forensics, which had the second-to-lowest final word count after Biology. 

If discussion posts influence the final grades in these courses, it is likely from the quality of the posts rather than the quantity or length of the posts. 

## Column {data-width="600"}

### Relationship Between Word Count and Final Grade

```{r}

data_to_viz  %>% 
  ggplot() +
  geom_point(mapping = aes(x = final_grade, 
                       y = wc),
             alpha = .5) +
  geom_smooth(mapping = aes(x = final_grade, 
                            y = wc,
                            # color = subject,
                            weight = .5),
              color = "gray", 
              method = loess,
              se = FALSE) +
  ylim(0, 100) + 
  xlim(0, 150) +
  # facet_wrap(~subject, ncol = 3) +
  labs(
    title = "Will a higher word count on discussion posts result in higher final grades?",
       y = "Final Grade",
       x = "Average word count",
    caption = "There does not seem to be a correlation between discussion posts' word counts and the student's final grades."
       ) +
  theme_minimal() +
  theme(legend.position = "bottom",
        panel.grid.minor = element_blank()) +
  scale_color_brewer(palette = "Set1",
                     name = "Subject")

```

## Column {data-width="400"}

### Quartiles, Median, and Outliers for Wordcount

```{r}
data_to_viz  %>% 
  ggplot() +
  geom_boxplot(mapping = aes(x = wc,
                       color = subject),
             alpha = .25) +
  facet_wrap(~subject, ncol = 1) +
  labs(title = "Will a higher word count on a discussion post land me a better grade?",
       y = "Course Subject",
       x = "Wordcount",
     #  subtitle = "Spoiler Alert... Yes, to an extent.",
     #  caption = "Fine print: Time spent online does not necessarly account for all time students spent on the course, e.g, studying offline."
     ) +
  theme_void() +
  theme(legend.position = "none",
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.title.x = element_text(),
        axis.text.x = element_text()) +
  scale_color_brewer(palette = "Set1",
                     name = "Subject") +
  scale_x_continuous(breaks = seq(0, 160, by = 10))
 # scale_y_discrete(limits=rev)
```

### Distribution of Final Grade by Number of Students

```{r}
data_to_viz |>
  ggplot() +
  geom_histogram(mapping = aes(
                       x = final_grade,
                       #y = stat(count/sum(count)),
                       color = subject),
                 fill = NA
                       ) +
  facet_wrap(~subject, ncol = 1) +
  labs(# title = "Will spending more time in an online course land me a better grade?",
       y = "% of Students",
       x = "Final Grades",
     #  subtitle = "Spoiler Alert... Yes, to an extent.",
     #  caption = "Fine print: Time spent online does not necessarly account for all time students spent on the course, e.g, studying offline."
     ) +
  theme_void() +
  theme(legend.position = "none",
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.title.x = element_text(),
        axis.text.x = element_text()) +
  scale_color_brewer(palette = "Set1",
                     name = "Subject") +
  scale_x_continuous(breaks = seq(0, 100, by = 10))
```