DAM 3 Key and notes about reliability quiz

Author

EDST0213, especially Ella & Maeve – also Mark and Zamzama

Task 1

Load these packages:

library(tidyverse) # load the tidyverse
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl) #load the readxl package to read excel
library(kableExtra) # load the kableExtra package

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

Import data:

math8 <- read_excel("Grade8NAEP.xls", range = "A9:E19")
print(math8, n = 10) # display the data
# A tibble: 10 × 5
    Year Jurisdiction `All students` `Average scale score` `Standard deviation`
   <dbl> <chr>        <chr>                          <dbl>                <dbl>
 1  2022 National     All students                    274.                 38.9
 2  2019 National     All students                    282.                 39.7
 3  2017 National     All students                    283.                 38.8
 4  2015 National     All students                    282.                 36.9
 5  2013 National     All students                    285.                 36.5
 6  2011 National     All students                    284.                 36.2
 7  2009 National     All students                    283.                 36.4
 8  2007 National     All students                    281.                 36.1
 9  2005 National     All students                    279.                 36.3
10  2003 National     All students                    278.                 36.2

Task 2

Produce a table for the data from task 1 so that the data is shared in chronological order.

math8 |>
  arrange(math8, Year) |> # arrange by year 
  kable(digits = 0) |> # set to zero decimal places
  kable_styling()
Year Jurisdiction All students Average scale score Standard deviation
2003 National All students 278 36
2005 National All students 279 36
2007 National All students 281 36
2009 National All students 283 36
2011 National All students 284 36
2013 National All students 285 37
2015 National All students 282 37
2017 National All students 283 39
2019 National All students 282 40
2022 National All students 274 39

Task 3

Produce a plot of average achievement in 8th grade mathematics on the main NAEP test similar to what was produced on the NAEP website.

ggplot(data = math8) + 
  geom_line(mapping = aes(x = Year, y = `Average scale score`))  + # line graph
  labs(
    title = "Trend in Grade 8 Mathematics Average Scale Scores",
    subtitle = "Main NAEP: 2003 - 2022"
  )

Task 4

Produce a similar plot adding horizontal lines indicating the cut scores for Basic, Proficient, and Advanced on this plot.

# Task 4
# Produce a similar plot adding horizontal lines indicating the cut scores for Basic, Proficient, and Advanced on this plot.
# https://nces.ed.gov/nationsreportcard/mathematics/achieve.aspx#grade8 
ggplot(
  data = math8,
  mapping = aes(x = Year, y = `Average scale score`)) +
  geom_line() +
  labs(
    title = "Trend in Grade 8 Mathematics Average Score",
    subtitle = "Main NAEP: 2003 - 2022"
  ) +
  ylim(260, 335) +
  geom_line(mapping = aes(x=Year, y = 262, color = "Basic"), linetype=2) +
  geom_line(mapping = aes(x=Year, y = 299, color = "Proficient"), linetype=2) +
  geom_line(mapping = aes(x=Year, y = 333, color = "Advanced"), linetype=2) +
  theme_gray() +
  theme(legend.position = "right") +
  scale_color_manual(breaks=c("Advanced", "Proficient", "Basic"), labels=c("Advanced", "Proficient", "Basic"), values=c("#0047AB", "#228B22", "#FDDA0D"))

Another approach – Zamzama & Mark

Label the horizontal lines with the cut scores for Basic, Proficient, and Advanced.

#Add a text label above basic, proficient, and advanced lines
ggplot(data = math8) + 
  geom_line(mapping = aes(x = Year, y = `Average scale score`))  +
  labs(
    title = "Trend in Grade 8 Mathematics Average Scores",
    subtitle = "Main NAEP: 2003 - 2022"
  ) + 
  geom_hline(yintercept = 262, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 299, linetype = "dashed", color = "blue") +
  geom_hline(yintercept = 333, linetype = "dashed", color = "green") +
  #Using 'annotate' add text labels above the lines
  annotate("text", x= 2022, y = 262, label = "Basic", vjust = -1 , hjust = +1) +
  annotate("text", x = 2022, y = 299, label = "Proficient", vjust = -1 , hjust = +1) +
  annotate("text", x = 2022, y = 333, label = "Advanced", vjust = -.3 , hjust = +1)

A solid answer by Ella & Maeve: The addition of lines indicating advanced, proficient, and basic performance on the Grade 8 mathematics test allows someone with limited knowledge of scoring for this test to understand whether or not the average score can be considered good. With the original graph (lines not included) it is difficult to understand what a score of 285 implies, you can only deduce that over time the average scores have decreased. With the lines, however, you can see that a score of 285 is still well below proficient, making it more impactful that overtime these scores have regressed.

Task 5

Add a column that shows the difference in scale scores between the Year in question and 2003.

# Task 5
math8 <- math8 |>
  mutate(diff_scale_score = `Average scale score` - `Average scale score`[Year == 2003]) # add a column that shows the difference in scale scores between the year in question and 2003
math8 |>
  select(Year, diff_scale_score)
# A tibble: 10 × 2
    Year diff_scale_score
   <dbl>            <dbl>
 1  2022            -3.30
 2  2019             4.44
 3  2017             5.28
 4  2015             4.66
 5  2013             7.04
 6  2011             6.29
 7  2009             5.35
 8  2007             3.78
 9  2005             1.26
10  2003             0   

For this task, we mutated the data set to include a new column that shows the difference in scale scores between the year in question and 2003. This allows us to see how the average scale scores have changed over time. The code was to subtract the average scale score in 2003 from the average scale score in the year in question.

Task 6

Add another column that shows the difference in Math measured in standard deviations since 2003.

math8 <- math8 |>
  mutate(diff_SD = (`Average scale score` - `Average scale score`[Year == 2003]) / `Standard deviation`)
math8 |>
  select(Year, diff_scale_score, diff_SD)
# A tibble: 10 × 3
    Year diff_scale_score diff_SD
   <dbl>            <dbl>   <dbl>
 1  2022            -3.30 -0.0848
 2  2019             4.44  0.112 
 3  2017             5.28  0.136 
 4  2015             4.66  0.126 
 5  2013             7.04  0.193 
 6  2011             6.29  0.174 
 7  2009             5.35  0.147 
 8  2007             3.78  0.105 
 9  2005             1.26  0.0346
10  2003             0     0     

For this task, we mutated the data set to include another new column that shows the difference in math measured in standard deviations since 2003. This allows us to see how the average scale scores have changed over time in terms of standard deviations. The code was to subtract the average scale score in 2003 from the average scale score in the year in question and then divide by the standard deviation.

Task 7

Produce the plot that looks like Figure 5.5 from Measuring Up.

ggplot(data = math8) + 
  geom_line(mapping = aes(x = Year, y = diff_SD)) + # line graph
  labs(
    title = "Trend in Grade 8 Mathematics Average Scale Scores",
    subtitle = "Main NAEP: 2003 - 2022",
    y = "Difference in Standard Deviations from 2003",
  )

Task 8

It appears from the graph that average grade 8 math scores have decreased significantly from 2003 to 2022. Overall, there has been a decrease of 0.1 standard deviations in average math scores for grade 8 - in other words, in 2022, the average math score was 0.1 standard deviations below the average in 2003. While this may seem like a marginal difference, the overall decrease in average test scores implies that there is an issue within math education. It had appeared that from 2003 to around 2012, average scores were displaying an upward trend, with 2012 scores being a whole 0.2 standard deviations above the average in 2003. This upward progression is ideal, however, we see a sharp downward turn by 2015, and a continuous steep downward trend from 2019 to 2023. This particular drop in scores suggests that perhaps these math issues began around 2019. Given the timeframe it is possible that this drop in scores is attributable to the affects of Covid, however, it is important to work towards solutions so that we may see a reappearance of upwards trends in these test scores.

Task 9

mean_2022 <- math8$`Average scale score`[1] #mean in 2022
stddev_2022 <- math8$`Standard deviation`[1] #standard deviation in 2022
cut_scores <- c(262, 299, 333) #cut scores for basic,prof,adv 
labels <- c("Basic", "Proficient", "Advanced")
cut_data <- data.frame(Score = cut_scores, Label = labels) #data frame for the cut scores
x_values <- seq(mean_2022 - 3 * stddev_2022, mean_2022 + 3 * stddev_2022, length.out = 100) #x values to make a perfect bell curve
y_values <- dnorm(x_values, mean = mean_2022, sd = stddev_2022) #y values for the bell curve
data <- data.frame(Score = x_values, Density = y_values) #data frame for the bell curve
# Create the plot
ggplot(data, aes(x = Score, y = Density)) +  # Bell curve
  geom_line(color = "blue", size = 1) +
  geom_vline(data = cut_data, aes(xintercept = Score, color = Label), linetype = "dashed", size = 0.8) +  # Add vertical lines with labels
  labs(
    title = "Normal Distribution of 8th Grade Math Scores (2022)",
    x = "Score",
    y = "Density"
  ) +
  scale_color_manual(values = c("red", "darkgreen", "purple")) +  # Specify colors for the legend
  theme(legend.title = element_blank())  # Remove legend title
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Reliability Quizzes

Hoffman’s work in front of the class on Tuesday

# Libaries
library(tidyverse)

# Data

Pop1 <- c(6, 9, 8, 6, 8, 8, 9, 7, 7, 5, 9)
Pop2 <- c(10, 6, 6, 6, 8, 7, 10, 9, 8, 7, 6)

# Make a table for these two quizzes

df <- data.frame(Pop1, Pop2)

df
   Pop1 Pop2
1     6   10
2     9    6
3     8    6
4     6    6
5     8    8
6     8    7
7     9   10
8     7    9
9     7    8
10    5    7
11    9    6
# make a scatterplot of df

# improve this scatterplot by adding jitter

ggplot(df, aes(x = jitter(Pop1), y = jitter(Pop2))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Scatterplot of Pop quiz #1 and Pop quiz #2",
       x = "Pop1",
       y = "Pop2") +
  theme_minimal() +
  theme(aspect.ratio = 1) +
  scale_x_continuous(limits = c(4, 10)) +
  scale_y_continuous(limits = c(4, 10))
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

# calculate the standard deviation of both quizzes and summarize

sd(df$Pop1)
[1] 1.368476
sd(df$Pop2)
[1] 1.572491
summary(df)
      Pop1            Pop2       
 Min.   :5.000   Min.   : 6.000  
 1st Qu.:6.500   1st Qu.: 6.000  
 Median :8.000   Median : 7.000  
 Mean   :7.455   Mean   : 7.545  
 3rd Qu.:8.500   3rd Qu.: 8.500  
 Max.   :9.000   Max.   :10.000  

calculate the correlation between the two quizzes

cor(df$Pop1, df$Pop2)
[1] -0.1267369

Also, Canvas calculated the coefficent alpha (Chronbach’s alpha) to be 0.30 for Quiz 1 and 0.31 for Quiz 2.

And I let GitHub CoPilot write a summary (which I have problems with)

“This is a low reliability score, which suggests that the quizzes are not reliable measures of the same construct. This is further supported by the low correlation coefficient of 0.31. This suggests that the quizzes are not measuring the same thing, and that the quizzes are not reliable measures of the same construct.”