Here we will continue where we left off on DAM 3 using public data to better understand educational achievement in the USA. Again, we will use the National Assessment of Educational Progress (NAEP) Data Explorer at nationsreportcard.gov/ndecore/landing. And once again we will focus on 8th grade math. But this time, you are asked to produce a report for your home state – Connecticut, Kentucky, Massachusetts, New Hampshire, New Jersey, New York, South Carolina, or Vermont.
This DAM is a solo project, and each student in our class is responsible for producing their own report.
Hoffman will provide you with links to the data for your state in this document. And he will show sample code for Wisconsin, where he was a Middle School Principal – 20 years ago.
You are welcome to employ GitHub Copilot & ChatGPT or ask questions to website like Stack Overflow during your analysis, but much of your analysis can be completed following the sample code in this document.
State Snapshot Report
NAEP provides a State Snapshot Report for each state. The report provides a summary of the most recent NAEP results for the state and compares the state’s overall performance to the nation. The report also provides information about the demographic characteristics of the state’s students. The report is available for each state that participated in the most recent NAEP administration.
Here is Wisconsin’s State Snapshot Report for 8th Grade Mathematics:
Downloading a copy of your state’s report will provide you with a good overview of some of the data you will be analyzing. It also provides language that NAEP uses to describe overall results and results by student groups. Here are links to the states in question:
Some of the states show a line graph of “Average Scores for State/Jurisdiction and the Nation (Public)” that show an x-axis from Year ’03 through ’22 (including Hoffman’s example from Wisconsin). Other states, including many of the states that you are analyzing, show an x-axis from Year ’00 through ’22.
For consistency, conduct your analysis focused on 2003 through 2022.
Additionally, note that while we’ve discussed “all students” in our previous analyses, here we will also focus on public students – both in the Nation and for your state in question.
Spreadsheet for your state
Here are the links to the Google Sheets for your state that Hoffman curated for you by downloading the data from the NAEP Data Explorer as Excel spreadsheets and converted them to Google Sheets:
As before, you will produce plots similar to those that Professor Koretz showed in Measuring Up.
In Chapter 5, several plots show trends in NAEP scores since the the start of test now termed the “Long-term trend” test, as well as the first record of what is now the “main” NAEP, with both tests showing substantial gains in mathematics scores, measured as “Gain in standard deviations” – which you are now well-equipped to understand.
Task 1 – Organize and Download Data
Open a new Rstudio Project someplace where you will be able to find it. (Hoffman created a project called DAM5_Fall2024, saved in a desktop folder.)
Recommended: Open up an R script for testing your code, saving your work for future reference
Start a new Quarto Document and save it with your name on it. This is the document you will submit by Friday, November 15.
Recommended: Copy the following code that Hoffman used to download the data for Wisconsin and paste it into your R script used to keep track of useful code. You will need to modify it for your state.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(googlesheets4)library(kableExtra)
Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':
group_rows
library(ggthemes) # for a colorblind safe color pallette# Follow the instructions from R for Data Science, Chapter 20# https://r4ds.hadley.nz/spreadsheets#google-sheets # deauthorize the googlesheets4 packagegs4_deauth()# identify a sheet by its IDWisconsin_url <-"https://docs.google.com/spreadsheets/d/1vn3tMGYhkz5TebJfnJdQ_Jt49ahtFXY1ShcBaa9bGT0/edit?usp=sharing"# Load the state data by reading in the sheet starting at row 9wisc <-read_sheet(Wisconsin_url, range ="A9:E29")
✔ Reading from "Wisconsin_Grade8_Math".
✔ Range 'A9:E29'.
# View the datawisc
# A tibble: 20 × 5
Year Jurisdiction `All students` `Average scale score` `Standard deviation`
<dbl> <chr> <chr> <dbl> <dbl>
1 2022 National pub… All students 273. 39.0
2 2022 Wisconsin All students 281. 38.3
3 2019 National pub… All students 281. 39.8
4 2019 Wisconsin All students 289. 39.0
5 2017 National pub… All students 282. 38.9
6 2017 Wisconsin All students 288. 37.7
7 2015 National pub… All students 281. 36.9
8 2015 Wisconsin All students 289. 36.2
9 2013 National pub… All students 284. 36.4
10 2013 Wisconsin All students 289. 35.3
11 2011 National pub… All students 283. 36.3
12 2011 Wisconsin All students 289. 34.9
13 2009 National pub… All students 282. 36.4
14 2009 Wisconsin All students 288. 34.2
15 2007 National pub… All students 280. 36.1
16 2007 Wisconsin All students 286. 35.1
17 2005 National pub… All students 278. 36.3
18 2005 Wisconsin All students 285. 34.4
19 2003 National pub… All students 276. 36.2
20 2003 Wisconsin All students 284. 34.8
Task 2 – A nice looking table
Produce a table for the data in chronological order, set to zero decimal places, using the kableExtra package.
Produce a plot of average achievement in 8th grade mathematics on the main NAEP test similar to what was produced on the state snapshot
wisc |>group_by(Jurisdiction) |>ggplot(aes(x = Year, y =`Average scale score`)) +geom_point(aes(shape = Jurisdiction)) +geom_line(aes(color = Jurisdiction)) +labs(title ="Average Scores for Wisconsin and the Nation (public)",x ="Year",y ="Average Score") +scale_color_colorblind()
Task 4 - Plot showing Basic and Proficient cuts
Produce a similar plot adding horizontal lines indicating the cut scores for Basic and Proficient (not Advanced) on this plot.
wisc |>group_by(Jurisdiction) |>ggplot(aes(x = Year, y =`Average scale score`)) +geom_point(aes(shape = Jurisdiction)) +geom_line(aes(color = Jurisdiction)) +scale_y_continuous(limits =c(260, 310)) +labs(title ="Average Scores for Wisconsin and the Nation (public)",x ="Year",y ="Average Score") +scale_color_colorblind() +geom_hline(yintercept =262, linetype ="dashed", color ="red") +geom_hline(yintercept =299, linetype ="dashed", color ="blue") +annotate("text", x=2022, y =262, label ="Basic", vjust =-1 , hjust =+1) +annotate("text", x =2022, y =299, label ="Proficient", vjust =-1 , hjust =+1)
Task 5 - Add Columns
Add columns to the data frame for the difference in Standard Deviations for Wisconsin and the Nation. To align with our previous work, we will use the 2003 standard deviation for the Nation as our baseline.
# Average scale score for the Nation in 2003 is 277.56Avg_Nation_2003 <-277.56# Standard deviation for the Nation in 2003 is 36.23SD_Nation_2003 <-36.23wisc <- wisc |>mutate(diff_scale_score =`Average scale score`- Avg_Nation_2003,diff_SD = diff_scale_score /`Standard deviation`)
Task 6 - A table comparing to 2003
Produce a nice looking table showing the year, jurisdiction, the difference in scale score compared to our baseline of the National Average Scale Score in 2003, and the difference between each data point and our baseline of the National Average Scale Score in 2003 in standard deviation units.
Using the table that you produced in Task 6, write a paragraph similar to what your State Snapshot Report described in “Overall Results” (top left) in the first two bullet points that describes the trends in the difference in scale scores between your state and the Nation. Additionally, include a description of the trends in the difference in standard deviations between your state and the Nation.
Task 8 - A plot in standard deviations
Produce a plot of the difference in standard deviations between your state and the Nation over time. (Your plot should be similar to Figure 5.5 in Measuring Up.)
wisc |>ggplot(aes(x = Year, y = diff_SD)) +geom_point(aes(shape = Jurisdiction)) +geom_line(aes(color = Jurisdiction)) +labs(title ="Differences in Standard Deviations for Wisconsin and the Nation (public)",x ="Year",y ="Differences in Standard Deviations") +geom_hline(yintercept =0, linetype ="dashed") +scale_color_colorblind()