Final Project Part 3 Fall 2024

Purpose of the Data-Analytic Memo

Here we will show off the skills that you’ve acquired on DAMs 3 & 5 – using public data to better understand educational achievement in the USA. Again, we will use the National Assessment of Educational Progress (NAEP) Data Explorer at nationsreportcard.gov/ndecore/landing. And once again we will focus on 8th grade math.

This final DAM focuses on students with disabilities (SD) and those not identified as students with disabilities in the Nation.

This DAM is a solo project, and each student in our class is responsible for producing their own report.

You are welcome to employ GitHub Copilot & ChatGPT or ask questions to website like Stack Overflow during your analysis, but much of your analysis can be completed following the sample code in this document.

For consistency, conduct your analysis focused on 2003 through 2022.

Spreadsheet for you to analyze

Here is the link to the Google Sheet for this Part 3 of your Final Project that Hoffman curated for you by downloading the data from the NAEP Data Explorer as Excel spreadsheets and converted them to Google Sheets:

https://docs.google.com/spreadsheets/d/14BWIEEdWtz1vWF0ZxTRGXYx-abRhpUm6VpqYXoZMFU8/edit?usp=sharing

A note about Students with disabilities (SD)

https://www.nationsreportcard.gov/mathematics/nation/groups/?grade=8

A student with a disability may need specially designed instruction to meet his or her learning goals. A student with a disability will usually have an Individualized Education Plan (IEP), which guides his or her special education instruction. Students with disabilities are often referred to as special education students and may be classified by their school as learning disabled (LD) or emotionally disturbed (ED). The goal of NAEP is that students who are capable of participating meaningfully in the assessment are assessed, but some students with disabilities selected by NAEP may not be able to participate, even with the accommodations provided.

Figures from Measuring Up

As before, you will produce plots similar to those that Professor Koretz showed in Measuring Up.

In Chapter 5, several plots show trends in NAEP scores since the the start of test now termed the “Long-term trend” test, as well as the first record of what is now the “main” NAEP, with both tests showing substantial gains in mathematics scores, measured as “Gain in standard deviations” – which you are now well-equipped to understand.

Task 1 – Organize and Download Data

  • Open an Rstudio Project someplace where you will be able to find it.

  • Recommended: Open up an R script for testing your code, saving your work for future reference

  • Start a new Quarto Document and save it with your name on it. This is the document you will submit by December 16, 2024.

  • Recommended: Copy the following code that Hoffman used to download the data and paste it into your R script used to keep track of useful code.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(googlesheets4)
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
library(ggthemes) # for a colorblind safe color pallette

# Follow the instructions from R for Data Science, Chapter 20
# https://r4ds.hadley.nz/spreadsheets#google-sheets 

# deauthorize the googlesheets4 package
gs4_deauth()

# identify a sheet by its ID
Disability_Status_url <- "https://docs.google.com/spreadsheets/d/14BWIEEdWtz1vWF0ZxTRGXYx-abRhpUm6VpqYXoZMFU8/edit?usp=sharing"

# Load the state data by reading in the sheet starting at row 9
disab <- read_sheet(Disability_Status_url, range = "A9:E29")
✔ Reading from "NAEP_Grade 8_Disability_status".
✔ Range 'A9:E29'.
# Simplify the 3rd column by renaming it to "Disability_status"

disab <- disab |>
  rename(Disability_status = `Disability status of student, including those with 504 plan`)

# View the data
disab
# A tibble: 20 × 5
    Year Jurisdiction Disability_status                    `Average scale score`
   <dbl> <chr>        <chr>                                                <dbl>
 1  2022 National     Identified as students with disabil…                  243.
 2  2022 National     Not identified as students with dis…                  279.
 3  2019 National     Identified as students with disabil…                  247.
 4  2019 National     Not identified as students with dis…                  287.
 5  2017 National     Identified as students with disabil…                  247.
 6  2017 National     Not identified as students with dis…                  288.
 7  2015 National     Identified as students with disabil…                  247.
 8  2015 National     Not identified as students with dis…                  287.
 9  2013 National     Identified as students with disabil…                  249.
10  2013 National     Not identified as students with dis…                  289.
11  2011 National     Identified as students with disabil…                  250.
12  2011 National     Not identified as students with dis…                  288.
13  2009 National     Identified as students with disabil…                  249.
14  2009 National     Not identified as students with dis…                  287.
15  2007 National     Identified as students with disabil…                  246.
16  2007 National     Not identified as students with dis…                  285.
17  2005 National     Identified as students with disabil…                  245.
18  2005 National     Not identified as students with dis…                  283.
19  2003 National     Identified as students with disabil…                  242.
20  2003 National     Not identified as students with dis…                  282.
# ℹ 1 more variable: `Standard deviation` <dbl>

Task 2 – A nice looking table

Produce a table for the data in chronological order, set to zero decimal places, using the kableExtra package.

disab |>
  arrange(disab, Year, Disability_status) |>
  kable(digits = 0) |>
  kable_styling()
Year Jurisdiction Disability_status Average scale score Standard deviation
2003 National Identified as students with disabilities 242 36
2003 National Not identified as students with disabilities 282 34
2005 National Identified as students with disabilities 245 37
2005 National Not identified as students with disabilities 283 34
2007 National Identified as students with disabilities 246 38
2007 National Not identified as students with disabilities 285 34
2009 National Identified as students with disabilities 249 38
2009 National Not identified as students with disabilities 287 34
2011 National Identified as students with disabilities 250 37
2011 National Not identified as students with disabilities 288 34
2013 National Identified as students with disabilities 249 37
2013 National Not identified as students with disabilities 289 34
2015 National Identified as students with disabilities 247 36
2015 National Not identified as students with disabilities 287 34
2017 National Identified as students with disabilities 247 36
2017 National Not identified as students with disabilities 288 36
2019 National Identified as students with disabilities 247 38
2019 National Not identified as students with disabilities 287 37
2022 National Identified as students with disabilities 243 36
2022 National Not identified as students with disabilities 279 37

Task 3 – A Plot of Average Scores

Produce a plot of average achievement in 8th grade mathematics with two trend lines – one for students with disabilities and one for students not identified as disabled, adding horizontal lines indicating the cut scores for Basic and Proficient (not Advanced) on this plot.

disab |>
  group_by(Disability_status) |>
  ggplot(aes(x = Year, y = `Average scale score`)) +
  geom_point(aes(shape = Disability_status)) +
  geom_line(aes(color = Disability_status)) +
  labs(title = "Average Scores by Disability Status",
       x = "Year",
       y = "Average Score") +
  scale_color_colorblind() +
  geom_hline(yintercept = 262, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 299, linetype = "dashed", color = "blue") +
  annotate("text", x= 2022, y = 262, label = "Basic", vjust = -1 , hjust = +1) +
  annotate("text", x = 2022, y = 299, label = "Proficient", vjust = -1 , hjust = +1) 

Task 4 - Add Columns

Add columns to the data frame for the difference in Standard Deviations. To align with our previous work, we will use the 2003 standard deviation for the Nation as our baseline.

# Average scale score for the Nation in 2003 is 277.56
Avg_Nation_2003 <- 277.56
# Standard deviation for the Nation in 2003 is 36.23
SD_Nation_2003 <- 36.23
disab <- disab |>
  mutate(diff_scale_score = `Average scale score` - Avg_Nation_2003,
         diff_SD = diff_scale_score / `Standard deviation`)

Task 5 - A table comparing to 2003

Produce a nice looking table showing the year, jurisdiction, the difference in scale score compared to our baseline of the National Average Scale Score in 2003, and the difference between each data point and our baseline of the National Average Scale Score in 2003 in standard deviation units.

disab |>
  arrange(disab, Year, Disability_status) |>
  select(Year, diff_scale_score, diff_SD) |>
  kable(digits = 2) |>
  kable_styling()
Year diff_scale_score diff_SD
2003 -35.28 -0.97
2003 4.09 0.12
2005 -33.05 -0.89
2005 5.03 0.15
2007 -31.14 -0.82
2007 7.09 0.21
2009 -28.53 -0.76
2009 9.01 0.26
2011 -27.87 -0.75
2011 10.10 0.30
2013 -28.66 -0.78
2013 11.60 0.34
2015 -30.71 -0.85
2015 9.43 0.27
2017 -30.43 -0.83
2017 10.33 0.28
2019 -30.21 -0.80
2019 9.65 0.26
2022 -34.57 -0.95
2022 1.73 0.05

Task 6 - A paragraph

Using the table that you produced in Task 5, write a paragraph describing the score gaps between students with disabilities and those not identified as disabled in the Nation over time.

Task 7 - A plot in standard deviations

Produce a plot of the difference in standard deviations between your state and the Nation over time. (Your plot should be similar to Figure 5.5 in Measuring Up.)

disab |>
  ggplot(aes(x = Year, y = diff_SD)) +
  geom_point(aes(shape = Disability_status)) +
  geom_line(aes(color = Disability_status)) +
  labs(title = "Differences in Standard Deviations, by Disability Status",
       x = "Year",
       y = "Differences in Standard Deviations") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  scale_color_colorblind()

Task 8 - A paragraph about differences in SD

Write a paragraph explaining the plot you produced in Task 7. What does it show about the differences in standard deviations for students, by disability status?

Task 9 - A table showing the gap

Following the directions in R for Data Science, Chapter 5.4, produce a “wide” table of average scale scores, and then mutate a column showing the gap between students eligible for NSLP and students not eligible.

# Pivot wider, following https://r4ds.hadley.nz/data-tidy.html#widening-data 
disab_wide <- disab |>
  select(Year, Disability_status, `Average scale score`) |> 
  pivot_wider(names_from = Disability_status, values_from = `Average scale score`) |>
  mutate(gap = `Not identified as students with disabilities` - `Identified as students with disabilities`)

# display disab_wide
disab_wide
# A tibble: 10 × 4
    Year `Identified as students with disabilities` Not identified as st…¹   gap
   <dbl>                                      <dbl>                  <dbl> <dbl>
 1  2022                                       243.                   279.  36.3
 2  2019                                       247.                   287.  39.9
 3  2017                                       247.                   288.  40.8
 4  2015                                       247.                   287.  40.1
 5  2013                                       249.                   289.  40.3
 6  2011                                       250.                   288.  38.0
 7  2009                                       249.                   287.  37.5
 8  2007                                       246.                   285.  38.2
 9  2005                                       245.                   283.  38.1
10  2003                                       242.                   282.  39.4
# ℹ abbreviated name: ¹​`Not identified as students with disabilities`

Task 10: Produce a table using kableExtra

disab_wide |>
  arrange(disab_wide, Year, gap) |>
  kable(digits = 0) |>
  kable_styling()
Year Identified as students with disabilities Not identified as students with disabilities gap
2003 242 282 39
2005 245 283 38
2007 246 285 38
2009 249 287 38
2011 250 288 38
2013 249 289 40
2015 247 287 40
2017 247 288 41
2019 247 287 40
2022 243 279 36

Produce a table for the data in chronological order, set to zero decimal places, using the kableExtra package.

Task 11: A paragraph about the gap

Write a paragraph explaining the table you produced in Task 10. What does it show about the gap in average scale scores for students, by disability status? Did it improve over worsen over time? Was the gap in 2019 (right before the pandemic) larger or smaller than in 2003? What happened between 2019 and 2022?