DAM 3 Key and notes about reliability quiz

Author

EDST0213, especially Ella & Maeve – also Mark and Zamzama

Task 1

Load these packages:

library(tidyverse) # load the tidyverse

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl) #load the readxl package to read excel
library(kableExtra) # load the kableExtra package


Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

Import data:

math8 <- read_excel("Grade8NAEP.xls", range = "A9:E19")
print(math8, n = 10) # display the data

# A tibble: 10 × 5
    Year Jurisdiction `All students` `Average scale score` `Standard deviation`
   <dbl> <chr>        <chr>                          <dbl>                <dbl>
 1  2022 National     All students                    274.                 38.9
 2  2019 National     All students                    282.                 39.7
 3  2017 National     All students                    283.                 38.8
 4  2015 National     All students                    282.                 36.9
 5  2013 National     All students                    285.                 36.5
 6  2011 National     All students                    284.                 36.2
 7  2009 National     All students                    283.                 36.4
 8  2007 National     All students                    281.                 36.1
 9  2005 National     All students                    279.                 36.3
10  2003 National     All students                    278.                 36.2

Task 2

Produce a table for the data from task 1 so that the data is shared in chronological order.

math8 |>
  arrange(math8, Year) |> # arrange by year 
  kable(digits = 0) |> # set to zero decimal places
  kable_styling()

Year	Jurisdiction	All students	Average scale score	Standard deviation
2003	National	All students	278	36
2005	National	All students	279	36
2007	National	All students	281	36
2009	National	All students	283	36
2011	National	All students	284	36
2013	National	All students	285	37
2015	National	All students	282	37
2017	National	All students	283	39
2019	National	All students	282	40
2022	National	All students	274	39

Task 3

Produce a plot of average achievement in 8th grade mathematics on the main NAEP test similar to what was produced on the NAEP website.

ggplot(data = math8) + 
  geom_line(mapping = aes(x = Year, y = `Average scale score`))  + # line graph
  labs(
    title = "Trend in Grade 8 Mathematics Average Scale Scores",
    subtitle = "Main NAEP: 2003 - 2022"
  )

Task 4

Produce a similar plot adding horizontal lines indicating the cut scores for Basic, Proficient, and Advanced on this plot.

# Task 4
# Produce a similar plot adding horizontal lines indicating the cut scores for Basic, Proficient, and Advanced on this plot.
# https://nces.ed.gov/nationsreportcard/mathematics/achieve.aspx#grade8 
ggplot(
  data = math8,
  mapping = aes(x = Year, y = `Average scale score`)) +
  geom_line() +
  labs(
    title = "Trend in Grade 8 Mathematics Average Score",
    subtitle = "Main NAEP: 2003 - 2022"
  ) +
  ylim(260, 335) +
  geom_line(mapping = aes(x=Year, y = 262, color = "Basic"), linetype=2) +
  geom_line(mapping = aes(x=Year, y = 299, color = "Proficient"), linetype=2) +
  geom_line(mapping = aes(x=Year, y = 333, color = "Advanced"), linetype=2) +
  theme_gray() +
  theme(legend.position = "right") +
  scale_color_manual(breaks=c("Advanced", "Proficient", "Basic"), labels=c("Advanced", "Proficient", "Basic"), values=c("#0047AB", "#228B22", "#FDDA0D"))

Another approach – Zamzama & Mark

Label the horizontal lines with the cut scores for Basic, Proficient, and Advanced.

#Add a text label above basic, proficient, and advanced lines
ggplot(data = math8) + 
  geom_line(mapping = aes(x = Year, y = `Average scale score`))  +
  labs(
    title = "Trend in Grade 8 Mathematics Average Scores",
    subtitle = "Main NAEP: 2003 - 2022"
  ) + 
  geom_hline(yintercept = 262, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 299, linetype = "dashed", color = "blue") +
  geom_hline(yintercept = 333, linetype = "dashed", color = "green") +
  #Using 'annotate' add text labels above the lines
  annotate("text", x= 2022, y = 262, label = "Basic", vjust = -1 , hjust = +1) +
  annotate("text", x = 2022, y = 299, label = "Proficient", vjust = -1 , hjust = +1) +
  annotate("text", x = 2022, y = 333, label = "Advanced", vjust = -.3 , hjust = +1)

A solid answer by Ella & Maeve: The addition of lines indicating advanced, proficient, and basic performance on the Grade 8 mathematics test allows someone with limited knowledge of scoring for this test to understand whether or not the average score can be considered good. With the original graph (lines not included) it is difficult to understand what a score of 285 implies, you can only deduce that over time the average scores have decreased. With the lines, however, you can see that a score of 285 is still well below proficient, making it more impactful that overtime these scores have regressed.

Task 5

Add a column that shows the difference in scale scores between the Year in question and 2003.

# Task 5
math8 <- math8 |>
  mutate(diff_scale_score = `Average scale score` - `Average scale score`[Year == 2003]) # add a column that shows the difference in scale scores between the year in question and 2003
math8 |>
  select(Year, diff_scale_score)

# A tibble: 10 × 2
    Year diff_scale_score
   <dbl>            <dbl>
 1  2022            -3.30
 2  2019             4.44
 3  2017             5.28
 4  2015             4.66
 5  2013             7.04
 6  2011             6.29
 7  2009             5.35
 8  2007             3.78
 9  2005             1.26
10  2003             0

For this task, we mutated the data set to include a new column that shows the difference in scale scores between the year in question and 2003. This allows us to see how the average scale scores have changed over time. The code was to subtract the average scale score in 2003 from the average scale score in the year in question.

Task 6

Add another column that shows the difference in Math measured in standard deviations since 2003.

math8 <- math8 |>
  mutate(diff_SD = (`Average scale score` - `Average scale score`[Year == 2003]) / `Standard deviation`)
math8 |>
  select(Year, diff_scale_score, diff_SD)

# A tibble: 10 × 3
    Year diff_scale_score diff_SD
   <dbl>            <dbl>   <dbl>
 1  2022            -3.30 -0.0848
 2  2019             4.44  0.112 
 3  2017             5.28  0.136 
 4  2015             4.66  0.126 
 5  2013             7.04  0.193 
 6  2011             6.29  0.174 
 7  2009             5.35  0.147 
 8  2007             3.78  0.105 
 9  2005             1.26  0.0346
10  2003             0     0

For this task, we mutated the data set to include another new column that shows the difference in math measured in standard deviations since 2003. This allows us to see how the average scale scores have changed over time in terms of standard deviations. The code was to subtract the average scale score in 2003 from the average scale score in the year in question and then divide by the standard deviation.

Task 7

Produce the plot that looks like Figure 5.5 from Measuring Up.

ggplot(data = math8) + 
  geom_line(mapping = aes(x = Year, y = diff_SD)) + # line graph
  labs(
    title = "Trend in Grade 8 Mathematics Average Scale Scores",
    subtitle = "Main NAEP: 2003 - 2022",
    y = "Difference in Standard Deviations from 2003",
  )

Task 8

It appears from the graph that average grade 8 math scores have decreased significantly from 2003 to 2022. Overall, there has been a decrease of 0.1 standard deviations in average math scores for grade 8 - in other words, in 2022, the average math score was 0.1 standard deviations below the average in 2003. While this may seem like a marginal difference, the overall decrease in average test scores implies that there is an issue within math education. It had appeared that from 2003 to around 2012, average scores were displaying an upward trend, with 2012 scores being a whole 0.2 standard deviations above the average in 2003. This upward progression is ideal, however, we see a sharp downward turn by 2015, and a continuous steep downward trend from 2019 to 2023. This particular drop in scores suggests that perhaps these math issues began around 2019. Given the timeframe it is possible that this drop in scores is attributable to the affects of Covid, however, it is important to work towards solutions so that we may see a reappearance of upwards trends in these test scores.

Task 9

mean_2022 <- math8$`Average scale score`[1] #mean in 2022
stddev_2022 <- math8$`Standard deviation`[1] #standard deviation in 2022
cut_scores <- c(262, 299, 333) #cut scores for basic,prof,adv 
labels <- c("Basic", "Proficient", "Advanced")
cut_data <- data.frame(Score = cut_scores, Label = labels) #data frame for the cut scores
x_values <- seq(mean_2022 - 3 * stddev_2022, mean_2022 + 3 * stddev_2022, length.out = 100) #x values to make a perfect bell curve
y_values <- dnorm(x_values, mean = mean_2022, sd = stddev_2022) #y values for the bell curve
data <- data.frame(Score = x_values, Density = y_values) #data frame for the bell curve
# Create the plot
ggplot(data, aes(x = Score, y = Density)) +  # Bell curve
  geom_line(color = "blue", size = 1) +
  geom_vline(data = cut_data, aes(xintercept = Score, color = Label), linetype = "dashed", size = 0.8) +  # Add vertical lines with labels
  labs(
    title = "Normal Distribution of 8th Grade Math Scores (2022)",
    x = "Score",
    y = "Density"
  ) +
  scale_color_manual(values = c("red", "darkgreen", "purple")) +  # Specify colors for the legend
  theme(legend.title = element_blank())  # Remove legend title

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Reliability Quizzes

Hoffman’s work in front of the class on Tuesday

# Libaries
library(tidyverse)

# Data

Pop1 <- c(6, 9, 8, 6, 8, 8, 9, 7, 7, 5, 9)
Pop2 <- c(10, 6, 6, 6, 8, 7, 10, 9, 8, 7, 6)

# Make a table for these two quizzes

df <- data.frame(Pop1, Pop2)

df

   Pop1 Pop2
1     6   10
2     9    6
3     8    6
4     6    6
5     8    8
6     8    7
7     9   10
8     7    9
9     7    8
10    5    7
11    9    6

# make a scatterplot of df

# improve this scatterplot by adding jitter

ggplot(df, aes(x = jitter(Pop1), y = jitter(Pop2))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Scatterplot of Pop quiz #1 and Pop quiz #2",
       x = "Pop1",
       y = "Pop2") +
  theme_minimal() +
  theme(aspect.ratio = 1) +
  scale_x_continuous(limits = c(4, 10)) +
  scale_y_continuous(limits = c(4, 10))

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 1 row containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

# calculate the standard deviation of both quizzes and summarize

sd(df$Pop1)

[1] 1.368476

sd(df$Pop2)

[1] 1.572491

summary(df)

      Pop1            Pop2       
 Min.   :5.000   Min.   : 6.000  
 1st Qu.:6.500   1st Qu.: 6.000  
 Median :8.000   Median : 7.000  
 Mean   :7.455   Mean   : 7.545  
 3rd Qu.:8.500   3rd Qu.: 8.500  
 Max.   :9.000   Max.   :10.000

calculate the correlation between the two quizzes

cor(df$Pop1, df$Pop2)

[1] -0.1267369

Also, Canvas calculated the coefficent alpha (Chronbach’s alpha) to be 0.30 for Quiz 1 and 0.31 for Quiz 2.

And I let GitHub CoPilot write a summary (which I have problems with)

“This is a low reliability score, which suggests that the quizzes are not reliable measures of the same construct. This is further supported by the low correlation coefficient of 0.31. This suggests that the quizzes are not measuring the same thing, and that the quizzes are not reliable measures of the same construct.”