DAM1_Exemplar_Fall2024

Author

EDST0213a

Exemplar

The following is an exemplar of a Data-Analytic Memo (DAM1) for the Fall 2024 semester. This exemplar is a combination of everyone’s code.

Please don’t use this as the “right” answer. But you can find good ideas from everyone. We will continue to hone our skills – and our expectations will rise as we progress.

Purpose of this Data-Analytic Memo

Here we conduct analyses on behalf of a trio of 3rd-grade teachers at Lindquist Elementary School in Hometown, USA. The Principal of the school has requested analysis of some of their preliminary in-house assessments. Ms. Affolter, Mr. Miller-Lane, and Ms. Weston teach 3rd grade, and they plan their instruction together often. After one month of school, they compiled scores for a comprehensive 64-item spelling test for the words they’ve agreed to teach (16 words each week). And they administered the first unit test of 40 items from their mathematics textbook. Ms. Weston keyed in the scores into a Google sheet for all the students on the 3rd grade team and shared them with the Principal.

Data set

The data are contained in a Google Sheet:

https://docs.google.com/spreadsheets/d/1778B79za0VK7-OnSNGeybGRR0F_4AlSxd2n64Nh9LiE/edit?usp=sharing

Ms. Weston included the following details.

Boys were assigned as 1; girls were assigned as 2. However, she left three of the students’ gender blank, as Morgan, Paige, and River are to be categorized as nonbinary. The teachers ask that you work out how to make this happen in the data.

The three classrooms are labeled 1 for Ms. Affolter’s class, 2 for Mr. Miller-Lane’s class, and 3 for Ms. Weston’s class.

Data Analysis

Task 1: Data Import

Load the packages we’re going to use

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(googlesheets4)

Read the Google file from the teachers

gs4_deauth() # deauthorize Google Sheet so that anyone can access it
data1 <- read_sheet("https://docs.google.com/spreadsheets/d/1778B79za0VK7-OnSNGeybGRR0F_4AlSxd2n64Nh9LiE/edit?usp=sharing")
✔ Reading from "DAM1_Data_Fall2024".
✔ Range 'Sheet1'.

Print the data to see what it looks like

print(data1, n=75) # Print all first 75 rows of the data 
# A tibble: 75 × 5
   Name      Gender Classroom Spelling  Math
   <chr>      <dbl>     <dbl>    <dbl> <dbl>
 1 Avery          1         1       38    24
 2 Austin         1         1       48    27
 3 Beth           2         1       53    28
 4 Charlotte      2         1       58    32
 5 Dennis         1         1       59    26
 6 Ezra           1         1       60    30
 7 Gray           1         1       37    22
 8 Heather        1         1       51    26
 9 Iris           2         1       44    24
10 Jessica        2         1       63    25
11 Josephine      2         1       52    28
12 Julie          2         1       63    33
13 Kevin          1         1       43    18
14 Laverne        2         1       62    37
15 Leslie         2         1       60    24
16 Mary           2         1       64    27
17 Morgan        NA         1       60    23
18 Nathan         1         1       41    17
19 Oprah          2         1       54    23
20 Patrick        1         1       57    35
21 Rahim          1         1       61    33
22 Sam            1         1       55    23
23 Thomas         1         1       61    28
24 Victor         1         1       64    30
25 Walter         1         1       54    18
26 Aiden          1         2       45    28
27 Bailey         2         2       39    26
28 Chloe          2         2       64    38
29 Dakota         2         2       42    25
30 Debra          2         2       45    31
31 Dominik        1         2       61    23
32 Eleanor        2         2       56    26
33 Felicity       2         2       50    28
34 Francis        2         2       47    28
35 Grace          2         2       55    35
36 Guido          1         2       45    19
37 Hillary        2         2       56    36
38 Jill           2         2       62    34
39 Jordan         1         2       57    38
40 Kareem         2         2       60    39
41 Laura          2         2       53    27
42 Molly          2         2       50    25
43 Nancy          2         2       50    33
44 Petra          2         2       62    40
45 Rachel         2         2       47    25
46 Rhonda         2         2       43    20
47 Sally          2         2       44    22
48 Tara           2         2       59    36
49 Thelma         2         2       58    34
50 Abigail        2         3       64    40
51 Caden          2         3       41    29
52 Dana           1         3       48    21
53 Darcy          2         3       63    31
54 Erik           1         3       51    34
55 Erin           2         3       56    19
56 Fernando       1         3       63    31
57 Gail           2         3       64    28
58 Gemma          2         3       52    29
59 Hermione       2         3       47    19
60 Iara           2         3       53    27
61 Iman           2         3       41    17
62 Jack           1         3       46    24
63 James          1         3       40    24
64 Jon            2         3       53    29
65 Julia          2         3       46    29
66 Lawrence       1         3       59    32
67 Mo             1         3       43    26
68 Otis           1         3       42    21
69 Paige         NA         3       39    16
70 Patrick        1         3       63    32
71 Quincy         1         3       49    25
72 Quinn          2         3       48    25
73 River         NA         3       59    30
74 Wesley         1         3       55    26
75 William        1         3       52    28

Task 2 - Clean the data

  • Produce suitable labels for the assigned classrooms (Affolter, Miller-Lane, Weston).

  • Produce suitable labels for student gender.

  • Print the data set with labels for classroom and gender.

Mo and Ged submitted the following code, which works!

I noticed that they used “old” code that still uses what’s called the magrittr pipe.

From now on, I encourage everyone to substitute |> where you see %>%

data2 <- data1 |>
  mutate(Classroom = factor(Classroom, levels=c("1", "2", "3"),
                            labels=c("Affolter", "Miller-Lane", "Weston")))

# Class labels
data2 <- data1 |>
  mutate(Classroom = factor(Classroom, levels=c("1", "2", "3"),
                            labels=c("Affolter", "Miller-Lane", "Weston")))

# Gender labels

data3 <- data2 %>%
  mutate(across(everything(), ~ replace_na(., 3)))

# NOTE: Here you can skip this step and make it work as one pipe

data3 <- data3 |>
  mutate(Gender = factor(Gender, levels=c("1", "2", "3"),
                         labels=c("Male", "Female", "Non-Binary")))
print(data3, n=75)
# A tibble: 75 × 5
   Name      Gender     Classroom   Spelling  Math
   <chr>     <fct>      <fct>          <dbl> <dbl>
 1 Avery     Male       Affolter          38    24
 2 Austin    Male       Affolter          48    27
 3 Beth      Female     Affolter          53    28
 4 Charlotte Female     Affolter          58    32
 5 Dennis    Male       Affolter          59    26
 6 Ezra      Male       Affolter          60    30
 7 Gray      Male       Affolter          37    22
 8 Heather   Male       Affolter          51    26
 9 Iris      Female     Affolter          44    24
10 Jessica   Female     Affolter          63    25
11 Josephine Female     Affolter          52    28
12 Julie     Female     Affolter          63    33
13 Kevin     Male       Affolter          43    18
14 Laverne   Female     Affolter          62    37
15 Leslie    Female     Affolter          60    24
16 Mary      Female     Affolter          64    27
17 Morgan    Non-Binary Affolter          60    23
18 Nathan    Male       Affolter          41    17
19 Oprah     Female     Affolter          54    23
20 Patrick   Male       Affolter          57    35
21 Rahim     Male       Affolter          61    33
22 Sam       Male       Affolter          55    23
23 Thomas    Male       Affolter          61    28
24 Victor    Male       Affolter          64    30
25 Walter    Male       Affolter          54    18
26 Aiden     Male       Miller-Lane       45    28
27 Bailey    Female     Miller-Lane       39    26
28 Chloe     Female     Miller-Lane       64    38
29 Dakota    Female     Miller-Lane       42    25
30 Debra     Female     Miller-Lane       45    31
31 Dominik   Male       Miller-Lane       61    23
32 Eleanor   Female     Miller-Lane       56    26
33 Felicity  Female     Miller-Lane       50    28
34 Francis   Female     Miller-Lane       47    28
35 Grace     Female     Miller-Lane       55    35
36 Guido     Male       Miller-Lane       45    19
37 Hillary   Female     Miller-Lane       56    36
38 Jill      Female     Miller-Lane       62    34
39 Jordan    Male       Miller-Lane       57    38
40 Kareem    Female     Miller-Lane       60    39
41 Laura     Female     Miller-Lane       53    27
42 Molly     Female     Miller-Lane       50    25
43 Nancy     Female     Miller-Lane       50    33
44 Petra     Female     Miller-Lane       62    40
45 Rachel    Female     Miller-Lane       47    25
46 Rhonda    Female     Miller-Lane       43    20
47 Sally     Female     Miller-Lane       44    22
48 Tara      Female     Miller-Lane       59    36
49 Thelma    Female     Miller-Lane       58    34
50 Abigail   Female     Weston            64    40
51 Caden     Female     Weston            41    29
52 Dana      Male       Weston            48    21
53 Darcy     Female     Weston            63    31
54 Erik      Male       Weston            51    34
55 Erin      Female     Weston            56    19
56 Fernando  Male       Weston            63    31
57 Gail      Female     Weston            64    28
58 Gemma     Female     Weston            52    29
59 Hermione  Female     Weston            47    19
60 Iara      Female     Weston            53    27
61 Iman      Female     Weston            41    17
62 Jack      Male       Weston            46    24
63 James     Male       Weston            40    24
64 Jon       Female     Weston            53    29
65 Julia     Female     Weston            46    29
66 Lawrence  Male       Weston            59    32
67 Mo        Male       Weston            43    26
68 Otis      Male       Weston            42    21
69 Paige     Non-Binary Weston            39    16
70 Patrick   Male       Weston            63    32
71 Quincy    Male       Weston            49    25
72 Quinn     Female     Weston            48    25
73 River     Non-Binary Weston            59    30
74 Wesley    Male       Weston            55    26
75 William   Male       Weston            52    28

Task 3 - Describe the Spelling scores

We produced a histogram fora ll 75 scores on the Spelling test of 64 words.

Ella, Maeve, Reese, and Jane submitted the following code with colors!

ggplot(data3, aes(x = Spelling)) + 
  geom_histogram(binwidth = 2, fill = "blue", color = "green")

Anna and Felix submitted the following code to find the median, min and max of the Spelling scores.

median(data3$Spelling)
[1] 53
min(data3$Spelling)
[1] 37
max(data3$Spelling)
[1] 64

We calculated the median, minimum, and maximum Spelling Scores for all 75 students. The median is 53, the minimum is 37, and the maximum is 64.

Mo and Ged stated: The graph shows a negative skew, this means that the Spelling scores piled up on the high end of the graph which implies most of the students did relatively well. This graph can also be described as having a ceiling effect.

Islam, Mark, and Zamzama produced a density plot of Spelling scores using the following code:

# Creating a Density Plot for Spellng scores
ggplot(
  data = data3,
  mapping = aes(x = Spelling)) +
  geom_density()

The density plot shows a broad upward curve of spelling scores. The density plot is an efficient method of viewing important trends in the data regarding spelling scores. While the histogram provides more exact information regarding individual student performance, the density plot is more suited to the purposes of our colleagues. For its clarity, we recommend that our third-grade teachers utilize a density plot for parsing the data we are working with.

Task 4 - Describe the Math scores

Ella, Jane, Maeve, and Reese produced a histogram for all 75 scores on the Math test of 40 items with this code:

ggplot(data3, aes(x = Math)) + 
   geom_histogram(binwidth = 2, fill = "red", color = "black")

Anna and Felix submitted the following code to find the median, min and max of the Math scores.

median(data3$Math)
[1] 27
min(data3$Math)
[1] 16
max(data3$Math)
[1] 40

We calculated the median, minimum, and maximum Math Scores for all 75 students. The median is 27, the minimum is 16, and the maximum is 40.

Islam, Mark, and Zamzama produced a density plot of Math scores using the following code:

# Creating a Density Plot for Math scores
ggplot(
  data = data3,
  mapping = aes(x = Math)) +
  geom_density()

Task 5 - A scatterplot

We produced a scatterplot showing the relationship between Spelling (on the x-axis) and Math (on the y-axis).

# Make a scatterplot with Spelling on X and Math on Y
ggplot(
  data = data3,
  mapping = aes(x = Spelling, y = Math)) +
  geom_point() +
labs(
  title = "Relationship between spelling and math",
  subtitle = "By Islam",
  x = "Spelling",
  y = "Math")

Task 6 - Analyze the scatterplot

It seems that students who performed well on this mathematics assessment performed similarly proficiently on the spelling assessment. We will have to make further visualizations to make accurate judgements about these trends.

Task 7 - Scatterplots by Classroom

We produced a scatterplot showing the relationship between Spelling (on the x-axis) and Math (on the y-axis), by Classroom.

ggplot(
  data = data3,
  mapping = aes(x = Spelling, y = Math)) +
  geom_point(aes(color = Classroom, shape = Classroom)) +
  labs(
    title = "Relationship between spelling and math",
    subtitle = "By Islam",
    x = "Spelling",
    y = "Math",
    color = "Classroom",
    shape = "Classroom")

Task 8 - Boxplots of Spelling scores by Classroom

Ella, Jane, Maeve, and Reese produced boxplots of Spelling scores by Classroom with something like the following code:

ggplot(data3, aes(x = Classroom, y = Spelling)) +
  geom_boxplot(color = 'black') +
  labs(x = 'Classroom', y = 'Spelling Scores', title = 'Boxplots of Spelling Scores by Classroom')

Task 9 - Describing the boxplots

I like what Anna and Felix did here: They calculated the median, 25th and 75th percentiles, and the interquartile range of each boxplot by subsetting the data and naming these smaller datasets. It’s not super efficient, but it’s clear and easy to understand.

Affolter<- data2 |>
  filter(Classroom == "Affolter")
median(Affolter$Spelling)
[1] 57
IQR(Affolter$Spelling)
[1] 10
summary(Affolter)
     Name               Gender            Classroom     Spelling    
 Length:25          Min.   :1.000   Affolter   :25   Min.   :37.00  
 Class :character   1st Qu.:1.000   Miller-Lane: 0   1st Qu.:51.00  
 Mode  :character   Median :1.000   Weston     : 0   Median :57.00  
                    Mean   :1.417                    Mean   :54.48  
                    3rd Qu.:2.000                    3rd Qu.:61.00  
                    Max.   :2.000                    Max.   :64.00  
                    NA's   :1                                       
      Math      
 Min.   :17.00  
 1st Qu.:23.00  
 Median :26.00  
 Mean   :26.44  
 3rd Qu.:30.00  
 Max.   :37.00  
                

The 25th percentile of data regarding Ms. Affolter’s kids’ spelling results is 51, the median is 57, and the 75th percentile is 61. The IQR is 10. No outliers to speak of.

NOTE: I think that they should have used IQR() function to calculate the IQR for all three classrooms. (Perhaps they can SEE the IQR by eyeballing and calculating 75th - 25th ???)

Miller_lane <- data2 |>
  filter(Classroom == "Miller-Lane")
summary(Miller_lane)
     Name               Gender            Classroom     Spelling    
 Length:24          Min.   :1.000   Affolter   : 0   Min.   :39.00  
 Class :character   1st Qu.:2.000   Miller-Lane:24   1st Qu.:45.00  
 Mode  :character   Median :2.000   Weston     : 0   Median :51.50  
                    Mean   :1.833                    Mean   :52.08  
                    3rd Qu.:2.000                    3rd Qu.:58.25  
                    Max.   :2.000                    Max.   :64.00  
      Math      
 Min.   :19.00  
 1st Qu.:25.00  
 Median :28.00  
 Mean   :29.83  
 3rd Qu.:35.25  
 Max.   :40.00  

The 25th percentile of data regarding Mr. Miller-Lane’s kids’ spelling results is 45, the median is 51.5, the 75th percentile is 58.25, meaning the IQR range is 13.25. Still no outliers to speak of.

Weston <- data2 |>
  filter(Classroom == "Weston")
summary(Weston)
     Name               Gender          Classroom     Spelling    
 Length:26          Min.   :1.0   Affolter   : 0   Min.   :39.00  
 Class :character   1st Qu.:1.0   Miller-Lane: 0   1st Qu.:46.00  
 Mode  :character   Median :1.5   Weston     :26   Median :51.50  
                    Mean   :1.5                    Mean   :51.42  
                    3rd Qu.:2.0                    3rd Qu.:58.25  
                    Max.   :2.0                    Max.   :64.00  
                    NA's   :2                                     
      Math      
 Min.   :16.00  
 1st Qu.:24.00  
 Median :27.50  
 Mean   :26.62  
 3rd Qu.:29.75  
 Max.   :40.00  
                

The 25th percentile of data regarding Ms. Weston’s kids’ spelling results is 46, the median is 51.5, the 75th percentile is 58.25, meaning the IQR range is 12.25. Again no outliers to speak of.

The median and 75th percentile are the exact same for Mr. Miller Lane and Ms. Weston’s classes, which is a little odd but not crazy. It is a lot easier to analyze the data this way with concrete numbers as opposed to trying to make out trends from on a graph.

Task 10 - Density plots of Spelling by Classroom

While many of you produced density plots of Spelling scores by Classroom, I think there’s something to be said for adding fill = Classroom to the aes() function. This will make the plot easier to read and understand. Here’s the code:

ggplot(
  data = data3,
  mapping = aes(x = Spelling, color = Classroom, fill = Classroom)) +
  geom_density(alpha = 0.5)