One of my goals for this course is to give you an opportunity to use public data to better understand educational achievement in the USA. To that end, we’ve looked at many of the websites describing the National Assessment of Educational Progress, including “A Technical History of NAEP” and the Reading and Mathematics Frameworks for NAEP.
The front-facing website “The Nation’s Report Card” https://www.nationsreportcard.gov links any interested researcher to NAEP data that describes estimates of what American students know and are able to do in Mathematics, Reading – as well as Arts, Civics, Economics, Geography, Science, Technology, U.S. History, and Writing.
The Nation’s Report Card
On October 24, 2022, the National Center for Education Statistics (NCES) released scores on the “main” National Assessment of Educational Progress for 4th and 8th grade students in Math and Reading. https://www.nationsreportcard.gov/ Many within the educational community are concerned that the reported declines in test scores nationally, particularly in Mathematics.
As part of the goals for this course, this assignment is designed to hone your expertise and refine your understanding of what these public data can tell us about educational achievement in the USA.
Constructing useful plots
The headline on October 22, 2022 screamed “Largest score declines in NAEP mathematics at grades 4 and 8 since initial assessments in 1990” along with two plots showing declines in average math scores for 4th and 8th grade American students between 2019 and 2022 – associated with the COVID pandemic and associated school shutdowns. https://www.nationsreportcard.gov/highlights/mathematics/2022/
In this DAM, you will first of all replicate plots like these – producing similar plots using data pulled from the NAEP website.
Figures from Measuring Up
Additionally, you will produce plots similar to those that Professor Koretz showed in Measuring Up.
In Chapter 5, several plots show trends in NAEP scores since the the start of test now termed the “Long-term trend” test, as well as the first record of what is now the “main” NAEP, with both tests showing substantial gains in mathematics scores, measured as “Gain in standard deviations” – which you are now well-equipped to understand.
And in Chapter 11, Professor Koretz used “simulated data to explain the relationship between the selectivity of admissions and adverse impact” (p. 267).
Cut scores identifying “Basic”, “Proficient”, and “Advanced” levels of performance on the Mathematics NAEP function similarly, and we will also produce plots to explore how these cut scores affect public understanding of achievement levels of American K12 students.
Focus on 8th grade math
Here you will conduct analyses in response to the October 24,2022 release of trend scores in math for 8th grade students and the substantial gap between students eligible to participate in the National School Lunch Program and those who do not receive this government help.
Remember that this assignment is all about collaborative learning. It is an opportunity to try out your skills as a coder and your understanding of educational testing. Please ensure that you engage in a full, fair, and mutually-agreeable collaboration with your partners. Do not simply divide the work. Discuss and plan your analyses together; debate what you have found with each other; collaborate on the writing.
You are welcome to employ ChatGPT or ask questions to website like Stack Overflow during your analysis. Cite them apprpriately.
Task 1
Download an Excel spreadsheet from the NAEP Data Explorer at nationsreportcard.gov/ndecore/landing. Select Mathematics, Grade 8, National (not public nor private).
Select the years 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017, 2019, and 2022
For “Variable” select “All students”
For “Statistic, choose”Average scale scores” and “Standard deviations”
“Create Report”
Export Data Options “XLS” and “Export”
read in this spreadsheet into R
display this data table on your report
Sample packages to load. Note that “kableExtra” is new and will need to be installed (one time) on your own computer.
## LOAD PACKAGESlibrary(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.3 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)library(kableExtra)
Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':
group_rows
# NOTE: # You likely will need to install kableExtra# install.packages("kableExtra")
Sample code for Task 1
NOTE: The names of your Excel file will differ from this sample code. Be sure to use the correct file name!
#read in the Excel file from the A9 to E10 cell in the Excel file titled Task1data1 <-read_excel("NDECoreExcel_Mathematics, Grade 8, All students_20231106185824.Xls", range ="A9:E19")data1
# A tibble: 10 × 5
Year Jurisdiction `All students` `Average scale score` `Standard deviation`
<dbl> <chr> <chr> <dbl> <dbl>
1 2022 National All students 274. 38.9
2 2019 National All students 282. 39.7
3 2017 National All students 283. 38.8
4 2015 National All students 282. 36.9
5 2013 National All students 285. 36.5
6 2011 National All students 284. 36.2
7 2009 National All students 283. 36.4
8 2007 National All students 281. 36.1
9 2005 National All students 279. 36.3
10 2003 National All students 278. 36.2
Task 2
Use the package called “kableExtra” to produce a table for the data from task 1 that you’ve provided; however, produce this table so that the data is shared in chronological order (from 2003 to 2022, rather than the reverse)
Essentially, we’re looking for a nicer table, formatted by year from 2003 through 2022, with zero decimal places.
# Task 2data1 |>arrange(data1, Year) |># arrange by year kable(digits =0) |># set to zero decimal placeskable_styling()
Year
Jurisdiction
All students
Average scale score
Standard deviation
2003
National
All students
278
36
2005
National
All students
279
36
2007
National
All students
281
36
2009
National
All students
283
36
2011
National
All students
284
36
2013
National
All students
285
37
2015
National
All students
282
37
2017
National
All students
283
39
2019
National
All students
282
40
2022
National
All students
274
39
Task 3
Produce a plot of average achievement in 8th grade mathematics on the main NAEP test similar to what was produced on the NAEP website.
# Task 3ggplot(data = data1) +geom_line(mapping =aes(x = Year, y =`Average scale score`)) +labs(title ="Trend in Grade 8 Mathematics Average Scores",subtitle ="Main NAEP: 2003 - 2022" )
Task 4
Produce a similar plot adding horizontal lines indicating the cut scores for Basic, Proficient, and Advanced on this plot.
Include the usual nice things that a good analyst would include, like titles, subtitles, appropriate axis titles, and a perhaps a legend.
And write a short narrative explaining the how this plot might improve the readability and interpretability of this data.
Task 5
Produce another plot, similar to the dotted line for the “Main NAEP” represented on Figure 5.5 from Measuring Up. However, in this plot start the “Gain in standard deviations” from 2003.
Add a column (mutate) that shows the difference in scale scores between the Year in question and 2003. (You should see numbers between -3.3 and about 7.)
Add another column that shows the difference in Math measured in standard deviations since 2003. (This column should be 0.0 in 2003 and should be about - 0.085 for 2022.)
Include the usual nice things that a good analyst would include, like titles, subtitles, appropriate axis titles, and a perhaps a legend. I also think that using the term “Difference in standard deviations” rather than “Gain in standard deviations” is appropriate, since we witnessed substantial loss in average scores since 2019.
Task 6
Describe the trend since 2003 of 8th grade math scores as portrayed on your distribution of scale scores in math for a nationally representative sample of 8th grade students in the USA. You might find that reviewing the prose in Measuring Up around page 94 helpful.
Task 7
Download a second Excel spreadsheet from the NAEP Data Explorer at nationsreportcard.gov/ndecore/landing.
Select Mathematics, Grade 8, National (not public nor private).
Select the years 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017y, 2019, and 2022
For “Variable” select “National School Lunch Program eligibility, 3 categories”
For “Statistic, choose”Average scale scores” and “Standard deviations”
“Create Report”
THEN, on the NDE website, click “Show Report Data”, “Edit Table Layout”, move “National School Lunch Program eligibility” from Rows to Columns and APPLY. (It might take a minute to load on your screen.)
Export Data Options “XLS” and “Export”
read in this spreadsheet into R (with a different name like “Task7” or something)
Notice that this time we must fuss with the names of the columns, as the Excel spreadsheet includes a line (probably line 9 on your Excel file) where the data is identified by NSLP eligibility.
display this data table on your report
Sample code for Task 7
This sample code below provides you with one way to overcome this issue, although there likely are more elegant solutions. (Bonus points available here if you come up with a better approach!)
NOTE: The names of your Excel file will differ from this sample code. Be sure to use the correct file name!
#read in the Excel file from the A9 to E10 cell in the Excel file titled Task1data7 <-read_excel("NDECoreExcel_Mathematics, Grade 8, National School Lunch Progra_20231106225343.Xls", range ="A10:H20")
# A tibble: 10 × 8
Year Jurisdiction Eligible EligibleSD NotEligible NotEligibleSD NoInfo
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2022 National 260. 35.4 287. 37.6 285.
2 2019 National 266. 36.3 296. 37.3 294.
3 2017 National 267. 35.0 296. 36.7 296.
4 2015 National 268. 34.0 296. 34.2 294.
5 2013 National 270. 34.0 297. 33.6 297.
6 2011 National 269. 33.8 296. 33.6 296.
7 2009 National 266. 34.0 294. 33.7 295.
8 2007 National 265. 33.8 291. 33.5 291.
9 2005 National 262. 34.2 288. 33.6 289.
10 2003 National 259. 34.6 287. 33.0 285.
# ℹ 1 more variable: NoInfoSD <dbl>
Task 8
Use the package called “kableExtra” to produce a table for the data from task 7 that you’ve provided; however, produce this table so that the data is shared in chronological order (from 2003 to 2022, rather than the reverse); AND, show ONLY the columns for students who are eligible or not eligible for the National School Lunch Program (leaving off the columns where we have No Information about the sampled students).
Task 9
Produce a similar plot to Task 4 but for both students Eligible for NSLP and students not eligible for NSLP, with horizontal lines indicating the cut scores for Basic, Proficient, and Advanced on this plot. (Do not include students where we have No Information about the sampled students.)
Include the usual nice things that a good analyst would include, like titles, subtitles, appropriate axis titles, and a perhaps a legend.
And write a short narrative explaining the how this plot might the readability and interpretability of this data.
Task 10
Switching back to your first dataset (perhaps called data1), assume that these scores are distributed in a normal (bell-shaped) manner each year. For the distribution of 8th grade Math scores, produce a plot that shows what this distribution looks like (a bell curve) for Grade 8 Math scores from 2022.
This is a theoretical plot, looking like a perfect bell curve.
Include a title and appropriate axis titles.
Include vertical lines showing the cut scores marking Basic, Proficient, and Advanced.
Task 11
Similarly with your second dataset, assume that the scores for students who are Eligible for NSLP are distributed normally (a bell curve centered around the average score for those Eligible for NSLP), and the scores for students who are Not Eligible for NSLP are distributed normally. Produce a plot showing both distributions for Grade 8 Math scores from 2022 for the two samples.
These are theoretical plots, looking like a perfect bell curves.
Include a title and appropriate axis titles.
Include vertical lines showing the cut scores marking Basic, Proficient, and Advanced.
Task 12 (bonus points)
Professor Koretz’s plot on Figure 11.1 doesn’t show perfect bell curves. Rather, he simulated the data to show some slight messiness to the data. He also attempted the samples to be realistic, with the size of Black students smaller than the population of White students (~30 years ago). Similarly, I ask that your groups reflecting the proportion of students Eligible or Not Eligible be of similar size to the proportion in the population of 8th grade students in the USA. (See page 267 of Measuring Up for more info about Koretz’s thoughts. See the NAEP website for sample size of students Eligible for NSLP.
Produce a plot (two distributions) that mimics these aspects of the population of 8th graders in the USA in 2022.