Final Project Part 3 Fall 2024

Purpose of the Data-Analytic Memo

Here we will show off the skills that you’ve acquired on DAMs 3 & 5 – using public data to better understand educational achievement in the USA. Again, we will use the National Assessment of Educational Progress (NAEP) Data Explorer at nationsreportcard.gov/ndecore/landing. And once again we will focus on 8th grade math.

This final DAM focuses on students with disabilities (SD) and those not identified as students with disabilities in the Nation.

This DAM is a solo project, and each student in our class is responsible for producing their own report.

You are welcome to employ GitHub Copilot & ChatGPT or ask questions to website like Stack Overflow during your analysis, but much of your analysis can be completed following the sample code in this document.

For consistency, conduct your analysis focused on 2003 through 2022.

Spreadsheet for you to analyze

Here is the link to the Google Sheet for this Part 3 of your Final Project that Hoffman curated for you by downloading the data from the NAEP Data Explorer as Excel spreadsheets and converted them to Google Sheets:

https://docs.google.com/spreadsheets/d/14BWIEEdWtz1vWF0ZxTRGXYx-abRhpUm6VpqYXoZMFU8/edit?usp=sharing

A note about Students with disabilities (SD)

https://www.nationsreportcard.gov/mathematics/nation/groups/?grade=8

A student with a disability may need specially designed instruction to meet his or her learning goals. A student with a disability will usually have an Individualized Education Plan (IEP), which guides his or her special education instruction. Students with disabilities are often referred to as special education students and may be classified by their school as learning disabled (LD) or emotionally disturbed (ED). The goal of NAEP is that students who are capable of participating meaningfully in the assessment are assessed, but some students with disabilities selected by NAEP may not be able to participate, even with the accommodations provided.

Figures from Measuring Up

As before, you will produce plots similar to those that Professor Koretz showed in Measuring Up.

In Chapter 5, several plots show trends in NAEP scores since the the start of test now termed the “Long-term trend” test, as well as the first record of what is now the “main” NAEP, with both tests showing substantial gains in mathematics scores, measured as “Gain in standard deviations” – which you are now well-equipped to understand.

Task 1 – Organize and Download Data

Open an Rstudio Project someplace where you will be able to find it.
Recommended: Open up an R script for testing your code, saving your work for future reference
Start a new Quarto Document and save it with your name on it. This is the document you will submit by December 16, 2024.
Recommended: Copy the following code that Hoffman used to download the data and paste it into your R script used to keep track of useful code.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(googlesheets4)
library(kableExtra)


Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

library(ggthemes) # for a colorblind safe color pallette

# Follow the instructions from R for Data Science, Chapter 20
# https://r4ds.hadley.nz/spreadsheets#google-sheets 

# deauthorize the googlesheets4 package
gs4_deauth()

# identify a sheet by its ID
Disability_Status_url <- "https://docs.google.com/spreadsheets/d/14BWIEEdWtz1vWF0ZxTRGXYx-abRhpUm6VpqYXoZMFU8/edit?usp=sharing"

# Load the state data by reading in the sheet starting at row 9
disab <- read_sheet(Disability_Status_url, range = "A9:E29")

✔ Reading from "NAEP_Grade 8_Disability_status".
✔ Range 'A9:E29'.

# Simplify the 3rd column by renaming it to "Disability_status"

disab <- disab |>
  rename(Disability_status = `Disability status of student, including those with 504 plan`)

# View the data
disab

# A tibble: 20 × 5
    Year Jurisdiction Disability_status                    `Average scale score`
   <dbl> <chr>        <chr>                                                <dbl>
 1  2022 National     Identified as students with disabil…                  243.
 2  2022 National     Not identified as students with dis…                  279.
 3  2019 National     Identified as students with disabil…                  247.
 4  2019 National     Not identified as students with dis…                  287.
 5  2017 National     Identified as students with disabil…                  247.
 6  2017 National     Not identified as students with dis…                  288.
 7  2015 National     Identified as students with disabil…                  247.
 8  2015 National     Not identified as students with dis…                  287.
 9  2013 National     Identified as students with disabil…                  249.
10  2013 National     Not identified as students with dis…                  289.
11  2011 National     Identified as students with disabil…                  250.
12  2011 National     Not identified as students with dis…                  288.
13  2009 National     Identified as students with disabil…                  249.
14  2009 National     Not identified as students with dis…                  287.
15  2007 National     Identified as students with disabil…                  246.
16  2007 National     Not identified as students with dis…                  285.
17  2005 National     Identified as students with disabil…                  245.
18  2005 National     Not identified as students with dis…                  283.
19  2003 National     Identified as students with disabil…                  242.
20  2003 National     Not identified as students with dis…                  282.
# ℹ 1 more variable: `Standard deviation` <dbl>

Task 2 – A nice looking table

Produce a table for the data in chronological order, set to zero decimal places, using the kableExtra package.

disab |>
  arrange(disab, Year, Disability_status) |>
  kable(digits = 0) |>
  kable_styling()

Year	Jurisdiction	Disability_status	Average scale score	Standard deviation
2003	National	Identified as students with disabilities	242	36
2003	National	Not identified as students with disabilities	282	34
2005	National	Identified as students with disabilities	245	37
2005	National	Not identified as students with disabilities	283	34
2007	National	Identified as students with disabilities	246	38
2007	National	Not identified as students with disabilities	285	34
2009	National	Identified as students with disabilities	249	38
2009	National	Not identified as students with disabilities	287	34
2011	National	Identified as students with disabilities	250	37
2011	National	Not identified as students with disabilities	288	34
2013	National	Identified as students with disabilities	249	37
2013	National	Not identified as students with disabilities	289	34
2015	National	Identified as students with disabilities	247	36
2015	National	Not identified as students with disabilities	287	34
2017	National	Identified as students with disabilities	247	36
2017	National	Not identified as students with disabilities	288	36
2019	National	Identified as students with disabilities	247	38
2019	National	Not identified as students with disabilities	287	37
2022	National	Identified as students with disabilities	243	36
2022	National	Not identified as students with disabilities	279	37

Task 3 – A Plot of Average Scores

Produce a plot of average achievement in 8th grade mathematics with two trend lines – one for students with disabilities and one for students not identified as disabled, adding horizontal lines indicating the cut scores for Basic and Proficient (not Advanced) on this plot.

disab |>
  group_by(Disability_status) |>
  ggplot(aes(x = Year, y = `Average scale score`)) +
  geom_point(aes(shape = Disability_status)) +
  geom_line(aes(color = Disability_status)) +
  labs(title = "Average Scores by Disability Status",
       x = "Year",
       y = "Average Score") +
  scale_color_colorblind() +
  geom_hline(yintercept = 262, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 299, linetype = "dashed", color = "blue") +
  annotate("text", x= 2022, y = 262, label = "Basic", vjust = -1 , hjust = +1) +
  annotate("text", x = 2022, y = 299, label = "Proficient", vjust = -1 , hjust = +1)

Task 4 - Add Columns

Add columns to the data frame for the difference in Standard Deviations. To align with our previous work, we will use the 2003 standard deviation for the Nation as our baseline.

# Average scale score for the Nation in 2003 is 277.56
Avg_Nation_2003 <- 277.56
# Standard deviation for the Nation in 2003 is 36.23
SD_Nation_2003 <- 36.23
disab <- disab |>
  mutate(diff_scale_score = `Average scale score` - Avg_Nation_2003,
         diff_SD = diff_scale_score / `Standard deviation`)

Task 5 - A table comparing to 2003

Produce a nice looking table showing the year, jurisdiction, the difference in scale score compared to our baseline of the National Average Scale Score in 2003, and the difference between each data point and our baseline of the National Average Scale Score in 2003 in standard deviation units.

disab |>
  arrange(disab, Year, Disability_status) |>
  select(Year, diff_scale_score, diff_SD) |>
  kable(digits = 2) |>
  kable_styling()

Year	diff_scale_score	diff_SD
2003	-35.28	-0.97
2003	4.09	0.12
2005	-33.05	-0.89
2005	5.03	0.15
2007	-31.14	-0.82
2007	7.09	0.21
2009	-28.53	-0.76
2009	9.01	0.26
2011	-27.87	-0.75
2011	10.10	0.30
2013	-28.66	-0.78
2013	11.60	0.34
2015	-30.71	-0.85
2015	9.43	0.27
2017	-30.43	-0.83
2017	10.33	0.28
2019	-30.21	-0.80
2019	9.65	0.26
2022	-34.57	-0.95
2022	1.73	0.05

Task 6 - A paragraph

Using the table that you produced in Task 5, write a paragraph describing the score gaps between students with disabilities and those not identified as disabled in the Nation over time.

Task 7 - A plot in standard deviations

Produce a plot of the difference in standard deviations between your state and the Nation over time. (Your plot should be similar to Figure 5.5 in Measuring Up.)

disab |>
  ggplot(aes(x = Year, y = diff_SD)) +
  geom_point(aes(shape = Disability_status)) +
  geom_line(aes(color = Disability_status)) +
  labs(title = "Differences in Standard Deviations, by Disability Status",
       x = "Year",
       y = "Differences in Standard Deviations") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  scale_color_colorblind()

Task 8 - A paragraph about differences in SD

Write a paragraph explaining the plot you produced in Task 7. What does it show about the differences in standard deviations for students, by disability status?

Task 9 - A table showing the gap

Following the directions in R for Data Science, Chapter 5.4, produce a “wide” table of average scale scores, and then mutate a column showing the gap between students eligible for NSLP and students not eligible.

# Pivot wider, following https://r4ds.hadley.nz/data-tidy.html#widening-data 
disab_wide <- disab |>
  select(Year, Disability_status, `Average scale score`) |> 
  pivot_wider(names_from = Disability_status, values_from = `Average scale score`) |>
  mutate(gap = `Not identified as students with disabilities` - `Identified as students with disabilities`)

# display disab_wide
disab_wide

# A tibble: 10 × 4
    Year `Identified as students with disabilities` Not identified as st…¹   gap
   <dbl>                                      <dbl>                  <dbl> <dbl>
 1  2022                                       243.                   279.  36.3
 2  2019                                       247.                   287.  39.9
 3  2017                                       247.                   288.  40.8
 4  2015                                       247.                   287.  40.1
 5  2013                                       249.                   289.  40.3
 6  2011                                       250.                   288.  38.0
 7  2009                                       249.                   287.  37.5
 8  2007                                       246.                   285.  38.2
 9  2005                                       245.                   283.  38.1
10  2003                                       242.                   282.  39.4
# ℹ abbreviated name: ¹`Not identified as students with disabilities`

Task 10: Produce a table using kableExtra

disab_wide |>
  arrange(disab_wide, Year, gap) |>
  kable(digits = 0) |>
  kable_styling()

Year	Identified as students with disabilities	Not identified as students with disabilities	gap
2003	242	282	39
2005	245	283	38
2007	246	285	38
2009	249	287	38
2011	250	288	38
2013	249	289	40
2015	247	287	40
2017	247	288	41
2019	247	287	40
2022	243	279	36

Produce a table for the data in chronological order, set to zero decimal places, using the kableExtra package.

Task 11: A paragraph about the gap

Write a paragraph explaining the table you produced in Task 10. What does it show about the gap in average scale scores for students, by disability status? Did it improve over worsen over time? Was the gap in 2019 (right before the pandemic) larger or smaller than in 2003? What happened between 2019 and 2022?