DAM5b Assignment Fall 2024

Purpose of the Data-Analytic Memo

Here we will continue where we left off on DAM 5a using public data to better understand educational achievement in the USA. Again, you are asked to produce a report for your home state – Connecticut, Kentucky, Massachusetts, New Hampshire, New Jersey, New York, South Carolina, or Vermont.

This analysis will explore the differences in achievement in 8th grade math between students who are eligible for the National School Lunch Program (low income students) and those who are not eligible for NSLP (not low income students).

This DAM is a solo project, and each student in our class is responsible for producing their own report.

Hoffman will again provide you with links to the data for your state in this document. And he will show sample code for Wisconsin, where he was a Middle School Principal – more than 20 years ago.

You are welcome to employ GitHub Copilot & ChatGPT or ask questions to website like Stack Overflow during your analysis, but much of your analysis can be completed following the sample code in this document.

National School Lunch Program (NSLP)

According to the National Center for Education Statistics, the National School Lunch Program (NSLP) is a federally assisted meal program that provides low-cost or free lunches to eligible students. It is sometimes referred to as the free/reduced-price lunch program. Free lunches are offered to those students whose family incomes are at or below 130 percent of the poverty level; reduced-price lunches are offered to those students whose family incomes are between 130 percent and 185 percent of the poverty level.

https://nces.ed.gov/nationsreportcard/glossary.aspx#national_school_lunch_program

State Snapshot Report

Here is Wisconsin’s State Snapshot Report for 8th Grade Mathematics:

I assume that you’ve already downloaded a copy of your state’s report for DAM5a.

Here again are links to the states in question:

Connecticut: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011CT8.pdf

Kentucky: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011KY8.pdf

Massachusetts: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011MA8.pdf

New Hampshire: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011NH8.pdf

New Jersey: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011NJ8.pdf

New York: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011NY8.pdf

South Carolina: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011SC8.pdf

Vermont: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011VT8.pdf

Wisconsin: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011WI8.pdf

Additionally, here is a link to the National Snapshot for 8th Grade Mathematics: https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011NP8.pdf

Spreadsheet for your state

Here are new links to the Google Sheets for your state that Hoffman curated for you by downloading the data from the NAEP Data Explorer as Excel spreadsheets and converted them to Google Sheets:

Connecticut:

https://docs.google.com/spreadsheets/d/138Q8EJiQJC9V0DzQdkdKhaZe66nUUFagEY1RssrW7OA/edit?usp=sharing

Kentucky:

https://docs.google.com/spreadsheets/d/1-qA_Oml2Ccv5fUDOyPVkdBtOkh3dXgolKrfnQvNf26U/edit?usp=sharing

Massachusetts:

https://docs.google.com/spreadsheets/d/1OjDrqiAxvoEl4arRCdBQ5kE3bMGbnwrGUepqIZCtafE/edit?usp=sharing

New Hampshire:

https://docs.google.com/spreadsheets/d/1Is1q3mIZvFzJ0ncFkFedhAhaqStQxnZ9gN-6Q_1DM8Y/edit?usp=sharing

New Jersey:

https://docs.google.com/spreadsheets/d/1sg4YrKq1_PcWSmLb1f6PQBLzrF9dpVZ8_jXUSzCnSjQ/edit?usp=sharing

New York:

https://docs.google.com/spreadsheets/d/1srAnIlbpFffzL73gf23XamlPf5CKlGQuc5BUdS8erb8/edit?usp=sharing

South Carolina:

https://docs.google.com/spreadsheets/d/1n-KZe0H-eLmnB0OQPoleNzPcYNBPVwUB-_N6k2CD-6c/edit?usp=sharing

Vermont:

https://docs.google.com/spreadsheets/d/18wbcH9ronfrbZVsxB5adHvB61jezojXZPj_lQIQMP5E/edit?usp=sharing

Wisconsin:

https://docs.google.com/spreadsheets/d/1XcwnldyOtfy6jVrlWh6_CsOHA8NIrwB8aIuEuLHth6s/edit?usp=sharing

Task 1 – Organize and Download Data

  • Open a new Rstudio Project someplace where you will be able to find it. (Hoffman created a project called DAM5_Fall2024, saved in a desktop folder.)

  • Recommended: Open up an R script for testing your code, saving your work for future reference

  • Start a new Quarto Document and save it with your name on it. This is the document you will submit by Wednesday, November 20.

  • Recommended: Copy the following code that Hoffman used to download the data for Wisconsin and paste it into your R script used to keep track of useful code. You will need to modify it for your state.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(googlesheets4)
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
library(ggthemes) # for a colorblind safe color pallette

# Follow the instructions from R for Data Science, Chapter 20
# https://r4ds.hadley.nz/spreadsheets#google-sheets 

# deauthorize the googlesheets4 package
gs4_deauth()

# identify a sheet by its ID
Wisconsin_NSLP <- "https://docs.google.com/spreadsheets/d/1XcwnldyOtfy6jVrlWh6_CsOHA8NIrwB8aIuEuLHth6s/edit?usp=sharing"

# Load the state data by reading in the sheet starting at row 9 and end at row 39
wisc_nslp <- read_sheet(Wisconsin_NSLP, range = "A9:E39")
✔ Reading from "Wisconsin_NSLP_Math8".
✔ Range 'A9:E39'.
✖ Request failed [429]. Retry 1 happens in 4.7 seconds ...
✖ Request failed [429]. Retry 2 happens in 4.3 seconds ...
✖ Request failed [429]. Retry 3 happens in 2.9 seconds ...
✖ Request failed [429]. Retry 4 happens in 39.3 seconds ...
# View the data
wisc_nslp
# A tibble: 30 × 5
    Year Jurisdiction National School Lunch Program elig…¹ `Average scale score`
   <dbl> <chr>        <chr>                                <list>               
 1  2022 Wisconsin    Eligible                             <dbl [1]>            
 2  2022 Wisconsin    Not eligible                         <dbl [1]>            
 3  2022 Wisconsin    Information not available            <chr [1]>            
 4  2019 Wisconsin    Eligible                             <dbl [1]>            
 5  2019 Wisconsin    Not eligible                         <dbl [1]>            
 6  2019 Wisconsin    Information not available            <chr [1]>            
 7  2017 Wisconsin    Eligible                             <dbl [1]>            
 8  2017 Wisconsin    Not eligible                         <dbl [1]>            
 9  2017 Wisconsin    Information not available            <dbl [1]>            
10  2015 Wisconsin    Eligible                             <dbl [1]>            
# ℹ 20 more rows
# ℹ abbreviated name:
#   ¹​`National School Lunch Program eligibility, 3 categories`
# ℹ 1 more variable: `Standard deviation` <list>

New issues to deal with

If you have examined the Google sheet for your state, you will notice that there are some issues that need to be addressed. For example, the data for the National School Lunch Program eligibility is not in a format that is easy to work with. You will need to rename the column and filter out rows where the data is not available. You will also need to convert the data in the columns for Average scale score and Standard deviation into numeric variables.

# Rename `National School Lunch Program eligibility, 3 categories` to NSLP

wisc_nslp <- wisc_nslp |>
  rename(NSLP = `National School Lunch Program eligibility, 3 categories`)

# Keep only rows where the NSLP is not "Information not available"

wisc_nslp <- wisc_nslp |>
  filter(NSLP != "Information not available")

# Mutate `Average scale score` and `Standard deviation` into numeric variables

wisc_nslp <- wisc_nslp |>
  mutate(`Average scale score` = as.numeric(`Average scale score`)) |>
  mutate(`Standard deviation` = as.numeric(`Standard deviation`))

wisc_nslp
# A tibble: 20 × 5
    Year Jurisdiction NSLP         `Average scale score` `Standard deviation`
   <dbl> <chr>        <chr>                        <dbl>                <dbl>
 1  2022 Wisconsin    Eligible                      263.                 35.9
 2  2022 Wisconsin    Not eligible                  293.                 35.3
 3  2019 Wisconsin    Eligible                      269.                 36.8
 4  2019 Wisconsin    Not eligible                  301.                 34.9
 5  2017 Wisconsin    Eligible                      268.                 35.2
 6  2017 Wisconsin    Not eligible                  300.                 33.7
 7  2015 Wisconsin    Eligible                      269.                 34.0
 8  2015 Wisconsin    Not eligible                  300.                 31.9
 9  2013 Wisconsin    Eligible                      274.                 34.6
10  2013 Wisconsin    Not eligible                  299.                 32.0
11  2011 Wisconsin    Eligible                      269.                 34.1
12  2011 Wisconsin    Not eligible                  299.                 30.7
13  2009 Wisconsin    Eligible                      269.                 34.2
14  2009 Wisconsin    Not eligible                  297.                 30.4
15  2007 Wisconsin    Eligible                      266.                 34.3
16  2007 Wisconsin    Not eligible                  293.                 32.2
17  2005 Wisconsin    Eligible                      263.                 34.1
18  2005 Wisconsin    Not eligible                  292.                 31.1
19  2003 Wisconsin    Eligible                      259.                 34.5
20  2003 Wisconsin    Not eligible                  292.                 31.0

Write a paragraph explaining what this code does and why it is necessary.

Task 2 – A nice looking table

Produce a table for the data in chronological order, set to zero decimal places, using the kableExtra package.

wisc_nslp |>
  arrange(wisc_nslp, Year) |>
  kable(digits = 0) |>
  kable_styling()
Year Jurisdiction NSLP Average scale score Standard deviation
2003 Wisconsin Eligible 259 34
2003 Wisconsin Not eligible 292 31
2005 Wisconsin Eligible 263 34
2005 Wisconsin Not eligible 292 31
2007 Wisconsin Eligible 266 34
2007 Wisconsin Not eligible 293 32
2009 Wisconsin Eligible 269 34
2009 Wisconsin Not eligible 297 30
2011 Wisconsin Eligible 269 34
2011 Wisconsin Not eligible 299 31
2013 Wisconsin Eligible 274 35
2013 Wisconsin Not eligible 299 32
2015 Wisconsin Eligible 269 34
2015 Wisconsin Not eligible 300 32
2017 Wisconsin Eligible 268 35
2017 Wisconsin Not eligible 300 34
2019 Wisconsin Eligible 269 37
2019 Wisconsin Not eligible 301 35
2022 Wisconsin Eligible 263 36
2022 Wisconsin Not eligible 293 35

Task 3 – A Plot of Average Scores

Produce a plot of average achievement in 8th grade mathematics with two trend lines – one for Wisconsin students eligible for free lunch and one not eligible, adding horizontal lines indicating the cut scores for Basic and Proficient (not Advanced) on this plot.

wisc_nslp |>
  group_by(NSLP) |>
  ggplot(aes(x = Year, y = `Average scale score`)) +
  geom_point(aes(shape = NSLP)) +
  geom_line(aes(color = NSLP)) +
  labs(title = "Average Scores for Wisconsin, by lunch eligibility" ,
       x = "Year",
       y = "Average Score") +
  scale_color_colorblind() +
  geom_hline(yintercept = 262, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 299, linetype = "dashed", color = "blue") +
  annotate("text", x= 2022, y = 262, label = "Basic", vjust = -1 , hjust = +1) +
  annotate("text", x = 2022, y = 299, label = "Proficient", vjust = -1 , hjust = +1) 

Task 4 - Add Columns

Add columns to the data frame for the difference in Standard Deviations for Wisconsin students, by eligibility for NSLP. To align with our previous work, we will use the 2003 standard deviation for the Nation as our baseline.

# Add columns to the data frame for the difference in scale scores and in Standard Deviations
# for Wisconsin students, by eligibility for NSLP
# To align with our previous work, we will  2003 standard deviation for the Nation as our baseline.

# Average scale score for the Nation in 2003 is 277.56
Avg_Nation_2003 <- 277.56
# Standard deviation for the Nation in 2003 is 36.23
SD_Nation_2003 <- 36.23
wisc_nslp <- wisc_nslp |>
  mutate(diff_scale_score = `Average scale score` - Avg_Nation_2003,
         diff_SD = diff_scale_score / `Standard deviation`)

Task 5 - A plot in standard deviations

Produce a plot of the difference in standard deviations your state, by eligibility for NSLP. Include an annotation indicating that the baseline shows average math score for all students in the nation in 2003.

wisc_nslp |>
  ggplot(aes(x = Year, y = diff_SD)) +
  geom_point(aes(shape = NSLP)) +
  geom_line(aes(color = NSLP)) +
  labs(title = "Differences in Standard Deviations for Wisconsin, by NSLP eligibility",
       x = "Year",
       y = "Differences in Standard Deviations") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  scale_color_colorblind() +
  annotate("text", x= 2003, y = 0, label = "Average math score for all students in the nation in 2003", vjust = -1 , hjust = 0)

Task 6 - A paragraph about differences in SD

Write a paragraph explaining the plot you produced in Task 6. What does it show about the differences in standard deviations for students in your state, by eligibility for NSLP?

Task 7 - A table showing the gap

Following the directions in R for Data Science, Chapter 5.4, produce a “wide” table of average scale scores, and then mutate a column showing the gap between students eligible for NSLP and students not eligible.

# Pivot wider, following https://r4ds.hadley.nz/data-tidy.html#widening-data 
wisc_nslp_wide <- wisc_nslp |>
  select(Year, NSLP, `Average scale score`) |> 
  pivot_wider(names_from = NSLP, values_from = `Average scale score`) |>
  mutate(gap = `Not eligible` - `Eligible`)

# Produce a table using kableExtra
wisc_nslp_wide |>
  arrange(wisc_nslp_wide, Year, gap) |>
  kable(digits = 0) |>
  kable_styling()
Year Eligible Not eligible gap
2003 259 292 33
2005 263 292 29
2007 266 293 28
2009 269 297 28
2011 269 299 30
2013 274 299 25
2015 269 300 31
2017 268 300 32
2019 269 301 32
2022 263 293 29

Task 8 - A paragraph about the gap

Write a paragraph explaining the table you produced in Task 8. What does it show about the gap in average scale scores for students in your state, by eligibility for NSLP? Did it improve over worsen over time? Was the gap in 2019 (right before the pandemic) larger or smaller than in 2003? What happened between 2019 and 2022?

Task 9 - A Conclusion

Note the percentage of students eligible for the National School Lunch Program in your state. (In Wisconsin, for example, 38% of students are eligible, which is below the national average of 50%.) Nationally, in 2022, students who were eligible for NSLP had an average score that was 27 points lower than that for students who were not eligible. This national gap was not significantly different from that in 2003 (30 points). https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011NP8.pdf

Given this national information and the information you’ve produced for your state, write a brief conclusion to your report. Be sure to note the NSLP eligibility rate in your state compared to the nation. What did you learn about the differences in achievement in 8th grade math between students who are eligible for the National School Lunch Program (low income students) and those who are not eligible for NSLP (not low income students) in your state?