Here we will show off the skills that you’ve acquired on DAMs 3 & 5 – using public data to better understand educational achievement in the USA. Again, we will use the National Assessment of Educational Progress (NAEP) Data Explorer at nationsreportcard.gov/ndecore/landing. And once again we will focus on 8th grade math.
This final DAM focuses on students with disabilities (SD) and those not identified as students with disabilities in the Nation.
This DAM is a solo project, and each student in our class is responsible for producing their own report.
You are welcome to employ GitHub Copilot & ChatGPT or ask questions to website like Stack Overflow during your analysis, but much of your analysis can be completed following the sample code in this document.
For consistency, conduct your analysis focused on 2003 through 2022.
Spreadsheet for you to analyze
Here is the link to the Google Sheet for this Part 3 of your Final Project that Hoffman curated for you by downloading the data from the NAEP Data Explorer as Excel spreadsheets and converted them to Google Sheets:
A student with a disability may need specially designed instruction to meet his or her learning goals. A student with a disability will usually have an Individualized Education Plan (IEP), which guides his or her special education instruction. Students with disabilities are often referred to as special education students and may be classified by their school as learning disabled (LD) or emotionally disturbed (ED). The goal of NAEP is that students who are capable of participating meaningfully in the assessment are assessed, but some students with disabilities selected by NAEP may not be able to participate, even with the accommodations provided.
Figures from Measuring Up
As before, you will produce plots similar to those that Professor Koretz showed in Measuring Up.
In Chapter 5, several plots show trends in NAEP scores since the the start of test now termed the “Long-term trend” test, as well as the first record of what is now the “main” NAEP, with both tests showing substantial gains in mathematics scores, measured as “Gain in standard deviations” – which you are now well-equipped to understand.
Task 1 – Organize and Download Data
Open an Rstudio Project someplace where you will be able to find it.
Recommended: Open up an R script for testing your code, saving your work for future reference
Start a new Quarto Document and save it with your name on it. This is the document you will submit by December 16, 2024.
Recommended: Copy the following code that Hoffman used to download the data and paste it into your R script used to keep track of useful code.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(googlesheets4)library(kableExtra)
Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':
group_rows
library(ggthemes) # for a colorblind safe color pallette# Follow the instructions from R for Data Science, Chapter 20# https://r4ds.hadley.nz/spreadsheets#google-sheets # deauthorize the googlesheets4 packagegs4_deauth()# identify a sheet by its IDDisability_Status_url <-"https://docs.google.com/spreadsheets/d/14BWIEEdWtz1vWF0ZxTRGXYx-abRhpUm6VpqYXoZMFU8/edit?usp=sharing"# Load the state data by reading in the sheet starting at row 9disab <-read_sheet(Disability_Status_url, range ="A9:E29")
✔ Reading from "NAEP_Grade 8_Disability_status".
✔ Range 'A9:E29'.
# Simplify the 3rd column by renaming it to "Disability_status"disab <- disab |>rename(Disability_status =`Disability status of student, including those with 504 plan`)# View the datadisab
# A tibble: 20 × 5
Year Jurisdiction Disability_status `Average scale score`
<dbl> <chr> <chr> <dbl>
1 2022 National Identified as students with disabil… 243.
2 2022 National Not identified as students with dis… 279.
3 2019 National Identified as students with disabil… 247.
4 2019 National Not identified as students with dis… 287.
5 2017 National Identified as students with disabil… 247.
6 2017 National Not identified as students with dis… 288.
7 2015 National Identified as students with disabil… 247.
8 2015 National Not identified as students with dis… 287.
9 2013 National Identified as students with disabil… 249.
10 2013 National Not identified as students with dis… 289.
11 2011 National Identified as students with disabil… 250.
12 2011 National Not identified as students with dis… 288.
13 2009 National Identified as students with disabil… 249.
14 2009 National Not identified as students with dis… 287.
15 2007 National Identified as students with disabil… 246.
16 2007 National Not identified as students with dis… 285.
17 2005 National Identified as students with disabil… 245.
18 2005 National Not identified as students with dis… 283.
19 2003 National Identified as students with disabil… 242.
20 2003 National Not identified as students with dis… 282.
# ℹ 1 more variable: `Standard deviation` <dbl>
Task 2 – A nice looking table
Produce a table for the data in chronological order, set to zero decimal places, using the kableExtra package.
Produce a plot of average achievement in 8th grade mathematics with two trend lines – one for students with disabilities and one for students not identified as disabled, adding horizontal lines indicating the cut scores for Basic and Proficient (not Advanced) on this plot.
disab |>group_by(Disability_status) |>ggplot(aes(x = Year, y =`Average scale score`)) +geom_point(aes(shape = Disability_status)) +geom_line(aes(color = Disability_status)) +labs(title ="Average Scores by Disability Status",x ="Year",y ="Average Score") +scale_color_colorblind() +geom_hline(yintercept =262, linetype ="dashed", color ="red") +geom_hline(yintercept =299, linetype ="dashed", color ="blue") +annotate("text", x=2022, y =262, label ="Basic", vjust =-1 , hjust =+1) +annotate("text", x =2022, y =299, label ="Proficient", vjust =-1 , hjust =+1)
Task 4 - Add Columns
Add columns to the data frame for the difference in Standard Deviations. To align with our previous work, we will use the 2003 standard deviation for the Nation as our baseline.
# Average scale score for the Nation in 2003 is 277.56Avg_Nation_2003 <-277.56# Standard deviation for the Nation in 2003 is 36.23SD_Nation_2003 <-36.23disab <- disab |>mutate(diff_scale_score =`Average scale score`- Avg_Nation_2003,diff_SD = diff_scale_score /`Standard deviation`)
Task 5 - A table comparing to 2003
Produce a nice looking table showing the year, jurisdiction, the difference in scale score compared to our baseline of the National Average Scale Score in 2003, and the difference between each data point and our baseline of the National Average Scale Score in 2003 in standard deviation units.
Using the table that you produced in Task 5, write a paragraph describing the score gaps between students with disabilities and those not identified as disabled in the Nation over time.
Task 7 - A plot in standard deviations
Produce a plot of the difference in standard deviations between your state and the Nation over time. (Your plot should be similar to Figure 5.5 in Measuring Up.)
disab |>ggplot(aes(x = Year, y = diff_SD)) +geom_point(aes(shape = Disability_status)) +geom_line(aes(color = Disability_status)) +labs(title ="Differences in Standard Deviations, by Disability Status",x ="Year",y ="Differences in Standard Deviations") +geom_hline(yintercept =0, linetype ="dashed") +scale_color_colorblind()
Task 8 - A paragraph about differences in SD
Write a paragraph explaining the plot you produced in Task 7. What does it show about the differences in standard deviations for students, by disability status?
Task 9 - A table showing the gap
Following the directions in R for Data Science, Chapter 5.4, produce a “wide” table of average scale scores, and then mutate a column showing the gap between students eligible for NSLP and students not eligible.
# Pivot wider, following https://r4ds.hadley.nz/data-tidy.html#widening-data disab_wide <- disab |>select(Year, Disability_status, `Average scale score`) |>pivot_wider(names_from = Disability_status, values_from =`Average scale score`) |>mutate(gap =`Not identified as students with disabilities`-`Identified as students with disabilities`)# display disab_widedisab_wide
# A tibble: 10 × 4
Year `Identified as students with disabilities` Not identified as st…¹ gap
<dbl> <dbl> <dbl> <dbl>
1 2022 243. 279. 36.3
2 2019 247. 287. 39.9
3 2017 247. 288. 40.8
4 2015 247. 287. 40.1
5 2013 249. 289. 40.3
6 2011 250. 288. 38.0
7 2009 249. 287. 37.5
8 2007 246. 285. 38.2
9 2005 245. 283. 38.1
10 2003 242. 282. 39.4
# ℹ abbreviated name: ¹`Not identified as students with disabilities`
Produce a table for the data in chronological order, set to zero decimal places, using the kableExtra package.
Task 11: A paragraph about the gap
Write a paragraph explaining the table you produced in Task 10. What does it show about the gap in average scale scores for students, by disability status? Did it improve over worsen over time? Was the gap in 2019 (right before the pandemic) larger or smaller than in 2003? What happened between 2019 and 2022?