Here we will continue where we left off on DAM 5a using public data to better understand educational achievement in the USA. Again, you are asked to produce a report for your home state – Connecticut, Kentucky, Massachusetts, New Hampshire, New Jersey, New York, South Carolina, or Vermont.
This analysis will explore the differences in achievement in 8th grade math between students who are eligible for the National School Lunch Program (low income students) and those who are not eligible for NSLP (not low income students).
This DAM is a solo project, and each student in our class is responsible for producing their own report.
Hoffman will again provide you with links to the data for your state in this document. And he will show sample code for Wisconsin, where he was a Middle School Principal – more than 20 years ago.
You are welcome to employ GitHub Copilot & ChatGPT or ask questions to website like Stack Overflow during your analysis, but much of your analysis can be completed following the sample code in this document.
National School Lunch Program (NSLP)
According to the National Center for Education Statistics, the National School Lunch Program (NSLP) is a federally assisted meal program that provides low-cost or free lunches to eligible students. It is sometimes referred to as the free/reduced-price lunch program. Free lunches are offered to those students whose family incomes are at or below 130 percent of the poverty level; reduced-price lunches are offered to those students whose family incomes are between 130 percent and 185 percent of the poverty level.
Here are new links to the Google Sheets for your state that Hoffman curated for you by downloading the data from the NAEP Data Explorer as Excel spreadsheets and converted them to Google Sheets:
Open a new Rstudio Project someplace where you will be able to find it. (Hoffman created a project called DAM5_Fall2024, saved in a desktop folder.)
Recommended: Open up an R script for testing your code, saving your work for future reference
Start a new Quarto Document and save it with your name on it. This is the document you will submit by Wednesday, November 20.
Recommended: Copy the following code that Hoffman used to download the data for Wisconsin and paste it into your R script used to keep track of useful code. You will need to modify it for your state.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(googlesheets4)library(kableExtra)
Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':
group_rows
library(ggthemes) # for a colorblind safe color pallette# Follow the instructions from R for Data Science, Chapter 20# https://r4ds.hadley.nz/spreadsheets#google-sheets # deauthorize the googlesheets4 packagegs4_deauth()# identify a sheet by its IDWisconsin_NSLP <-"https://docs.google.com/spreadsheets/d/1XcwnldyOtfy6jVrlWh6_CsOHA8NIrwB8aIuEuLHth6s/edit?usp=sharing"# Load the state data by reading in the sheet starting at row 9 and end at row 39wisc_nslp <-read_sheet(Wisconsin_NSLP, range ="A9:E39")
✔ Reading from "Wisconsin_NSLP_Math8".
✔ Range 'A9:E39'.
✖ Request failed [429]. Retry 1 happens in 4.7 seconds ...
✖ Request failed [429]. Retry 2 happens in 4.3 seconds ...
✖ Request failed [429]. Retry 3 happens in 2.9 seconds ...
✖ Request failed [429]. Retry 4 happens in 39.3 seconds ...
# View the datawisc_nslp
# A tibble: 30 × 5
Year Jurisdiction National School Lunch Program elig…¹ `Average scale score`
<dbl> <chr> <chr> <list>
1 2022 Wisconsin Eligible <dbl [1]>
2 2022 Wisconsin Not eligible <dbl [1]>
3 2022 Wisconsin Information not available <chr [1]>
4 2019 Wisconsin Eligible <dbl [1]>
5 2019 Wisconsin Not eligible <dbl [1]>
6 2019 Wisconsin Information not available <chr [1]>
7 2017 Wisconsin Eligible <dbl [1]>
8 2017 Wisconsin Not eligible <dbl [1]>
9 2017 Wisconsin Information not available <dbl [1]>
10 2015 Wisconsin Eligible <dbl [1]>
# ℹ 20 more rows
# ℹ abbreviated name:
# ¹`National School Lunch Program eligibility, 3 categories`
# ℹ 1 more variable: `Standard deviation` <list>
New issues to deal with
If you have examined the Google sheet for your state, you will notice that there are some issues that need to be addressed. For example, the data for the National School Lunch Program eligibility is not in a format that is easy to work with. You will need to rename the column and filter out rows where the data is not available. You will also need to convert the data in the columns for Average scale score and Standard deviation into numeric variables.
# Rename `National School Lunch Program eligibility, 3 categories` to NSLPwisc_nslp <- wisc_nslp |>rename(NSLP =`National School Lunch Program eligibility, 3 categories`)# Keep only rows where the NSLP is not "Information not available"wisc_nslp <- wisc_nslp |>filter(NSLP !="Information not available")# Mutate `Average scale score` and `Standard deviation` into numeric variableswisc_nslp <- wisc_nslp |>mutate(`Average scale score`=as.numeric(`Average scale score`)) |>mutate(`Standard deviation`=as.numeric(`Standard deviation`))wisc_nslp
Produce a plot of average achievement in 8th grade mathematics with two trend lines – one for Wisconsin students eligible for free lunch and one not eligible, adding horizontal lines indicating the cut scores for Basic and Proficient (not Advanced) on this plot.
wisc_nslp |>group_by(NSLP) |>ggplot(aes(x = Year, y =`Average scale score`)) +geom_point(aes(shape = NSLP)) +geom_line(aes(color = NSLP)) +labs(title ="Average Scores for Wisconsin, by lunch eligibility" ,x ="Year",y ="Average Score") +scale_color_colorblind() +geom_hline(yintercept =262, linetype ="dashed", color ="red") +geom_hline(yintercept =299, linetype ="dashed", color ="blue") +annotate("text", x=2022, y =262, label ="Basic", vjust =-1 , hjust =+1) +annotate("text", x =2022, y =299, label ="Proficient", vjust =-1 , hjust =+1)
Task 4 - Add Columns
Add columns to the data frame for the difference in Standard Deviations for Wisconsin students, by eligibility for NSLP. To align with our previous work, we will use the 2003 standard deviation for the Nation as our baseline.
# Add columns to the data frame for the difference in scale scores and in Standard Deviations# for Wisconsin students, by eligibility for NSLP# To align with our previous work, we will 2003 standard deviation for the Nation as our baseline.# Average scale score for the Nation in 2003 is 277.56Avg_Nation_2003 <-277.56# Standard deviation for the Nation in 2003 is 36.23SD_Nation_2003 <-36.23wisc_nslp <- wisc_nslp |>mutate(diff_scale_score =`Average scale score`- Avg_Nation_2003,diff_SD = diff_scale_score /`Standard deviation`)
Task 5 - A plot in standard deviations
Produce a plot of the difference in standard deviations your state, by eligibility for NSLP. Include an annotation indicating that the baseline shows average math score for all students in the nation in 2003.
wisc_nslp |>ggplot(aes(x = Year, y = diff_SD)) +geom_point(aes(shape = NSLP)) +geom_line(aes(color = NSLP)) +labs(title ="Differences in Standard Deviations for Wisconsin, by NSLP eligibility",x ="Year",y ="Differences in Standard Deviations") +geom_hline(yintercept =0, linetype ="dashed") +scale_color_colorblind() +annotate("text", x=2003, y =0, label ="Average math score for all students in the nation in 2003", vjust =-1 , hjust =0)
Task 6 - A paragraph about differences in SD
Write a paragraph explaining the plot you produced in Task 6. What does it show about the differences in standard deviations for students in your state, by eligibility for NSLP?
Task 7 - A table showing the gap
Following the directions in R for Data Science, Chapter 5.4, produce a “wide” table of average scale scores, and then mutate a column showing the gap between students eligible for NSLP and students not eligible.
# Pivot wider, following https://r4ds.hadley.nz/data-tidy.html#widening-data wisc_nslp_wide <- wisc_nslp |>select(Year, NSLP, `Average scale score`) |>pivot_wider(names_from = NSLP, values_from =`Average scale score`) |>mutate(gap =`Not eligible`-`Eligible`)# Produce a table using kableExtrawisc_nslp_wide |>arrange(wisc_nslp_wide, Year, gap) |>kable(digits =0) |>kable_styling()
Year
Eligible
Not eligible
gap
2003
259
292
33
2005
263
292
29
2007
266
293
28
2009
269
297
28
2011
269
299
30
2013
274
299
25
2015
269
300
31
2017
268
300
32
2019
269
301
32
2022
263
293
29
Task 8 - A paragraph about the gap
Write a paragraph explaining the table you produced in Task 8. What does it show about the gap in average scale scores for students in your state, by eligibility for NSLP? Did it improve over worsen over time? Was the gap in 2019 (right before the pandemic) larger or smaller than in 2003? What happened between 2019 and 2022?
Task 9 - A Conclusion
Note the percentage of students eligible for the National School Lunch Program in your state. (In Wisconsin, for example, 38% of students are eligible, which is below the national average of 50%.) Nationally, in 2022, students who were eligible for NSLP had an average score that was 27 points lower than that for students who were not eligible. This national gap was not significantly different from that in 2003 (30 points). https://nces.ed.gov/nationsreportcard/subject/publications/stt2022/pdf/2023011NP8.pdf
Given this national information and the information you’ve produced for your state, write a brief conclusion to your report. Be sure to note the NSLP eligibility rate in your state compared to the nation. What did you learn about the differences in achievement in 8th grade math between students who are eligible for the National School Lunch Program (low income students) and those who are not eligible for NSLP (not low income students) in your state?