DAM5a Assignment Fall 2024

Purpose of the Data-Analytic Memo

Here we will continue where we left off on DAM 3 using public data to better understand educational achievement in the USA. Again, we will use the National Assessment of Educational Progress (NAEP) Data Explorer at nationsreportcard.gov/ndecore/landing. And once again we will focus on 8th grade math. But this time, you are asked to produce a report for your home state – Connecticut, Kentucky, Massachusetts, New Hampshire, New Jersey, New York, South Carolina, or Vermont.

This DAM is a solo project, and each student in our class is responsible for producing their own report.

Hoffman will provide you with links to the data for your state in this document. And he will show sample code for Wisconsin, where he was a Middle School Principal – 20 years ago.

You are welcome to employ GitHub Copilot & ChatGPT or ask questions to website like Stack Overflow during your analysis, but much of your analysis can be completed following the sample code in this document.

State Snapshot Report

NAEP provides a State Snapshot Report for each state. The report provides a summary of the most recent NAEP results for the state and compares the state’s overall performance to the nation. The report also provides information about the demographic characteristics of the state’s students. The report is available for each state that participated in the most recent NAEP administration.

Here is Wisconsin’s State Snapshot Report for 8th Grade Mathematics:

Downloading a copy of your state’s report will provide you with a good overview of some of the data you will be analyzing. It also provides language that NAEP uses to describe overall results and results by student groups. Here are links to the states in question:

Connecticut: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011CT8.pdf

Kentucky: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011KY8.pdf

Massachusetts: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011MA8.pdf

New Hampshire: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011NH8.pdf

New Jersey: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011NJ8.pdf

New York: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011NY8.pdf

South Carolina: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011SC8.pdf

Vermont: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011VT8.pdf

Wisconsin: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011WI8.pdf

An anomaly in the NAEP snapshots!

Some of the states show a line graph of “Average Scores for State/Jurisdiction and the Nation (Public)” that show an x-axis from Year ’03 through ’22 (including Hoffman’s example from Wisconsin). Other states, including many of the states that you are analyzing, show an x-axis from Year ’00 through ’22.

For consistency, conduct your analysis focused on 2003 through 2022.

Additionally, note that while we’ve discussed “all students” in our previous analyses, here we will also focus on public students – both in the Nation and for your state in question.

Spreadsheet for your state

Here are the links to the Google Sheets for your state that Hoffman curated for you by downloading the data from the NAEP Data Explorer as Excel spreadsheets and converted them to Google Sheets:

Connecticut:

https://docs.google.com/spreadsheets/d/1nSeCk7c6xwF-90UJGdCnNaDEgG-5ZqDiMZWxcKyGNPM/edit?usp=sharing

Kentucky:

https://docs.google.com/spreadsheets/d/1pX2XyUnOZCFyg3ZcQl39yWCCX-ObnKVhxFmp5lE60xw/edit?usp=sharing

Massachusetts:

https://docs.google.com/spreadsheets/d/1Tg-YnDYMad4Nc4EOzMGuKMqJseD1YRriy0pactc-Nus/edit?usp=sharing

New Hampshire:

https://docs.google.com/spreadsheets/d/1zH6Qjzx476AbGWKJzPovYbvnmy7rC10FPaBU7dHc78s/edit?usp=sharing

New Jersey:

https://docs.google.com/spreadsheets/d/1HPyiBSL2xG2J_Wbr0kUgU5ZS3jvjheYrFhIavrxuMMM/edit?usp=sharing

New York:

https://docs.google.com/spreadsheets/d/194iPXDb31qtwXtVWBvgvv_feiAVk-kLCuVSi3wKz10M/edit?usp=sharing

South Carolina:

https://docs.google.com/spreadsheets/d/1H8wyi0ZIozZj9CHqNmjNcaU5uyfg_O2tu0kE7DEobFk/edit?usp=sharing

Vermont:

https://docs.google.com/spreadsheets/d/10KvdRfUCIzLFTjC9iITCBLIaMUwzTinPlpW_63mXkOM/edit?usp=sharing

Wisconsin:

https://docs.google.com/spreadsheets/d/1vn3tMGYhkz5TebJfnJdQ_Jt49ahtFXY1ShcBaa9bGT0/edit?usp=sharing

Figures from Measuring Up

As before, you will produce plots similar to those that Professor Koretz showed in Measuring Up.

In Chapter 5, several plots show trends in NAEP scores since the the start of test now termed the “Long-term trend” test, as well as the first record of what is now the “main” NAEP, with both tests showing substantial gains in mathematics scores, measured as “Gain in standard deviations” – which you are now well-equipped to understand.

Task 1 – Organize and Download Data

  • Open a new Rstudio Project someplace where you will be able to find it. (Hoffman created a project called DAM5_Fall2024, saved in a desktop folder.)

  • Recommended: Open up an R script for testing your code, saving your work for future reference

  • Start a new Quarto Document and save it with your name on it. This is the document you will submit by Friday, November 15.

  • Recommended: Copy the following code that Hoffman used to download the data for Wisconsin and paste it into your R script used to keep track of useful code. You will need to modify it for your state.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(googlesheets4)
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
library(ggthemes) # for a colorblind safe color pallette

# Follow the instructions from R for Data Science, Chapter 20
# https://r4ds.hadley.nz/spreadsheets#google-sheets 

# deauthorize the googlesheets4 package
gs4_deauth()

# identify a sheet by its ID
Wisconsin_url <- "https://docs.google.com/spreadsheets/d/1vn3tMGYhkz5TebJfnJdQ_Jt49ahtFXY1ShcBaa9bGT0/edit?usp=sharing"

# Load the state data by reading in the sheet starting at row 9
wisc <- read_sheet(Wisconsin_url, range = "A9:E29")
✔ Reading from "Wisconsin_Grade8_Math".
✔ Range 'A9:E29'.
# View the data
wisc
# A tibble: 20 × 5
    Year Jurisdiction  `All students` `Average scale score` `Standard deviation`
   <dbl> <chr>         <chr>                          <dbl>                <dbl>
 1  2022 National pub… All students                    273.                 39.0
 2  2022 Wisconsin     All students                    281.                 38.3
 3  2019 National pub… All students                    281.                 39.8
 4  2019 Wisconsin     All students                    289.                 39.0
 5  2017 National pub… All students                    282.                 38.9
 6  2017 Wisconsin     All students                    288.                 37.7
 7  2015 National pub… All students                    281.                 36.9
 8  2015 Wisconsin     All students                    289.                 36.2
 9  2013 National pub… All students                    284.                 36.4
10  2013 Wisconsin     All students                    289.                 35.3
11  2011 National pub… All students                    283.                 36.3
12  2011 Wisconsin     All students                    289.                 34.9
13  2009 National pub… All students                    282.                 36.4
14  2009 Wisconsin     All students                    288.                 34.2
15  2007 National pub… All students                    280.                 36.1
16  2007 Wisconsin     All students                    286.                 35.1
17  2005 National pub… All students                    278.                 36.3
18  2005 Wisconsin     All students                    285.                 34.4
19  2003 National pub… All students                    276.                 36.2
20  2003 Wisconsin     All students                    284.                 34.8

Task 2 – A nice looking table

Produce a table for the data in chronological order, set to zero decimal places, using the kableExtra package.

wisc |>
  arrange(wisc, Year) |>
  kable(digits = 0) |>
  kable_styling()
Year Jurisdiction All students Average scale score Standard deviation
2003 National public All students 276 36
2003 Wisconsin All students 284 35
2005 National public All students 278 36
2005 Wisconsin All students 285 34
2007 National public All students 280 36
2007 Wisconsin All students 286 35
2009 National public All students 282 36
2009 Wisconsin All students 288 34
2011 National public All students 283 36
2011 Wisconsin All students 289 35
2013 National public All students 284 36
2013 Wisconsin All students 289 35
2015 National public All students 281 37
2015 Wisconsin All students 289 36
2017 National public All students 282 39
2017 Wisconsin All students 288 38
2019 National public All students 281 40
2019 Wisconsin All students 289 39
2022 National public All students 273 39
2022 Wisconsin All students 281 38

Task 3 – A Plot of Average Scores

Produce a plot of average achievement in 8th grade mathematics on the main NAEP test similar to what was produced on the state snapshot

wisc |>
  group_by(Jurisdiction) |>
  ggplot(aes(x = Year, y = `Average scale score`)) +
  geom_point(aes(shape = Jurisdiction)) +
  geom_line(aes(color = Jurisdiction)) +
  labs(title = "Average Scores for Wisconsin and the Nation (public)",
       x = "Year",
       y = "Average Score") +
  scale_color_colorblind()

Task 4 - Plot showing Basic and Proficient cuts

Produce a similar plot adding horizontal lines indicating the cut scores for Basic and Proficient (not Advanced) on this plot.

wisc |>
  group_by(Jurisdiction) |>
  ggplot(aes(x = Year, y = `Average scale score`)) +
  geom_point(aes(shape = Jurisdiction)) +
  geom_line(aes(color = Jurisdiction)) +
  scale_y_continuous(limits = c(260, 310)) +
  labs(title = "Average Scores for Wisconsin and the Nation (public)",
       x = "Year",
       y = "Average Score") +
  scale_color_colorblind() + 
  geom_hline(yintercept = 262, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 299, linetype = "dashed", color = "blue") +
  annotate("text", x= 2022, y = 262, label = "Basic", vjust = -1 , hjust = +1) +
  annotate("text", x = 2022, y = 299, label = "Proficient", vjust = -1 , hjust = +1) 

Task 5 - Add Columns

Add columns to the data frame for the difference in Standard Deviations for Wisconsin and the Nation. To align with our previous work, we will use the 2003 standard deviation for the Nation as our baseline.

# Average scale score for the Nation in 2003 is 277.56
Avg_Nation_2003 <- 277.56
# Standard deviation for the Nation in 2003 is 36.23
SD_Nation_2003 <- 36.23
wisc <- wisc |>
  mutate(diff_scale_score = `Average scale score` - Avg_Nation_2003,
         diff_SD = diff_scale_score / `Standard deviation`)

Task 6 - A table comparing to 2003

Produce a nice looking table showing the year, jurisdiction, the difference in scale score compared to our baseline of the National Average Scale Score in 2003, and the difference between each data point and our baseline of the National Average Scale Score in 2003 in standard deviation units.

wisc |>
  arrange(wisc, Year) |>
  select(Year, Jurisdiction, diff_scale_score, diff_SD) |>
  kable(digits = 2) |>
  kable_styling()
Year Jurisdiction diff_scale_score diff_SD
2003 National public -1.44 -0.04
2003 Wisconsin 6.36 0.18
2005 National public -0.04 0.00
2005 Wisconsin 6.98 0.20
2007 National public 2.61 0.07
2007 Wisconsin 8.06 0.23
2009 National public 4.11 0.11
2009 Wisconsin 10.58 0.31
2011 National public 5.17 0.14
2011 Wisconsin 11.11 0.32
2013 National public 6.06 0.17
2013 Wisconsin 11.19 0.32
2015 National public 3.72 0.10
2015 Wisconsin 11.52 0.32
2017 National public 4.40 0.11
2017 Wisconsin 10.58 0.28
2019 National public 3.43 0.09
2019 Wisconsin 11.10 0.28
2022 National public -4.43 -0.11
2022 Wisconsin 3.58 0.09

Task 7 - A paragraph similar to your Snapshot

Using the table that you produced in Task 6, write a paragraph similar to what your State Snapshot Report described in “Overall Results” (top left) in the first two bullet points that describes the trends in the difference in scale scores between your state and the Nation. Additionally, include a description of the trends in the difference in standard deviations between your state and the Nation.

Task 8 - A plot in standard deviations

Produce a plot of the difference in standard deviations between your state and the Nation over time. (Your plot should be similar to Figure 5.5 in Measuring Up.)

wisc |>
  ggplot(aes(x = Year, y = diff_SD)) +
  geom_point(aes(shape = Jurisdiction)) +
  geom_line(aes(color = Jurisdiction)) +
  labs(title = "Differences in Standard Deviations for Wisconsin and the Nation (public)",
       x = "Year",
       y = "Differences in Standard Deviations") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  scale_color_colorblind()