DAM4 Assignment Fall2023

Author

Hoffman

Purpose of the Data-Analytic Memo

One of my goals for this course is to give you an opportunity to use public data to better understand educational achievement in the USA. To that end, we’ve looked at many of the websites describing the National Assessment of Educational Progress, including “A Technical History of NAEP” and the Reading and Mathematics Frameworks for NAEP.

The front-facing website “The Nation’s Report Card” https://www.nationsreportcard.gov links any interested researcher to NAEP data that describes estimates of what American students know and are able to do in Mathematics, Reading – as well as Arts, Civics, Economics, Geography, Science, Technology, U.S. History, and Writing.

The Nation’s Report Card

On October 24, 2022, the National Center for Education Statistics (NCES) released scores on the “main” National Assessment of Educational Progress for 4th and 8th grade students in Math and Reading. https://www.nationsreportcard.gov/ Many within the educational community are concerned that the reported declines in test scores nationally, particularly in Mathematics.

As part of the goals for this course, this assignment is designed to hone your expertise and refine your understanding of what these public data can tell us about educational achievement in the USA.

Constructing useful plots

The headline on October 22, 2022 screamed “Largest score declines in NAEP mathematics at grades 4 and 8 since initial assessments in 1990” along with two plots showing declines in average math scores for 4th and 8th grade American students between 2019 and 2022 – associated with the COVID pandemic and associated school shutdowns. https://www.nationsreportcard.gov/highlights/mathematics/2022/

In this DAM, you will first of all replicate plots like these – producing similar plots using data pulled from the NAEP website.

Figures from Measuring Up

Additionally, you will produce plots similar to those that Professor Koretz showed in Measuring Up.

In Chapter 5, several plots show trends in NAEP scores since the the start of test now termed the “Long-term trend” test, as well as the first record of what is now the “main” NAEP, with both tests showing substantial gains in mathematics scores, measured as “Gain in standard deviations” – which you are now well-equipped to understand.

And in Chapter 11, Professor Koretz used “simulated data to explain the relationship between the selectivity of admissions and adverse impact” (p. 267).

Cut scores identifying “Basic”, “Proficient”, and “Advanced” levels of performance on the Mathematics NAEP function similarly, and we will also produce plots to explore how these cut scores affect public understanding of achievement levels of American K12 students.

Focus on 8th grade math

Here you will conduct analyses in response to the October 24,2022 release of trend scores in math for 8th grade students and the substantial gap between students eligible to participate in the National School Lunch Program and those who do not receive this government help.

Remember that this assignment is all about collaborative learning. It is an opportunity to try out your skills as a coder and your understanding of educational testing. Please ensure that you engage in a full, fair, and mutually-agreeable collaboration with your partners. Do not simply divide the work. Discuss and plan your analyses together; debate what you have found with each other; collaborate on the writing.

You are welcome to employ ChatGPT or ask questions to website like Stack Overflow during your analysis. Cite them apprpriately.

Task 1

Download an Excel spreadsheet from the NAEP Data Explorer at nationsreportcard.gov/ndecore/landing. Select Mathematics, Grade 8, National (not public nor private).

Select the years 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017, 2019, and 2022
For “Variable” select “All students”
For “Statistic, choose”Average scale scores” and “Standard deviations”
“Create Report”
Export Data Options “XLS” and “Export”
read in this spreadsheet into R
display this data table on your report

Sample packages to load. Note that “kableExtra” is new and will need to be installed (one time) on your own computer.

## LOAD PACKAGES
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)
library(kableExtra)


Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

# NOTE: 
# You likely will need to install kableExtra
# install.packages("kableExtra")

Sample code for Task 1

NOTE: The names of your Excel file will differ from this sample code. Be sure to use the correct file name!

#read in the Excel file from the A9 to E10 cell in the Excel file titled Task1
data1 <- read_excel("NDECoreExcel_Mathematics, Grade 8, All students_20231106185824.Xls", 
                    range = "A9:E19")

data1

# A tibble: 10 × 5
    Year Jurisdiction `All students` `Average scale score` `Standard deviation`
   <dbl> <chr>        <chr>                          <dbl>                <dbl>
 1  2022 National     All students                    274.                 38.9
 2  2019 National     All students                    282.                 39.7
 3  2017 National     All students                    283.                 38.8
 4  2015 National     All students                    282.                 36.9
 5  2013 National     All students                    285.                 36.5
 6  2011 National     All students                    284.                 36.2
 7  2009 National     All students                    283.                 36.4
 8  2007 National     All students                    281.                 36.1
 9  2005 National     All students                    279.                 36.3
10  2003 National     All students                    278.                 36.2

Task 2

Use the package called “kableExtra” to produce a table for the data from task 1 that you’ve provided; however, produce this table so that the data is shared in chronological order (from 2003 to 2022, rather than the reverse)

Essentially, we’re looking for a nicer table, formatted by year from 2003 through 2022, with zero decimal places.

# Task 2
data1 |>
  arrange(data1, Year) |> # arrange by year 
  kable(digits = 0) |> # set to zero decimal places
  kable_styling()

Year	Jurisdiction	All students	Average scale score	Standard deviation
2003	National	All students	278	36
2005	National	All students	279	36
2007	National	All students	281	36
2009	National	All students	283	36
2011	National	All students	284	36
2013	National	All students	285	37
2015	National	All students	282	37
2017	National	All students	283	39
2019	National	All students	282	40
2022	National	All students	274	39

Task 3

Produce a plot of average achievement in 8th grade mathematics on the main NAEP test similar to what was produced on the NAEP website.

# Task 3
ggplot(data = data1) + 
  geom_line(mapping = aes(x = Year, y = `Average scale score`))  +
  labs(
    title = "Trend in Grade 8 Mathematics Average Scores",
    subtitle = "Main NAEP: 2003 - 2022"
  )

Task 4

Produce a similar plot adding horizontal lines indicating the cut scores for Basic, Proficient, and Advanced on this plot.

Include the usual nice things that a good analyst would include, like titles, subtitles, appropriate axis titles, and a perhaps a legend.
And write a short narrative explaining the how this plot might improve the readability and interpretability of this data.

Task 5

Produce another plot, similar to the dotted line for the “Main NAEP” represented on Figure 5.5 from Measuring Up. However, in this plot start the “Gain in standard deviations” from 2003.

Add a column (mutate) that shows the difference in scale scores between the Year in question and 2003. (You should see numbers between -3.3 and about 7.)
Add another column that shows the difference in Math measured in standard deviations since 2003. (This column should be 0.0 in 2003 and should be about - 0.085 for 2022.)
Include the usual nice things that a good analyst would include, like titles, subtitles, appropriate axis titles, and a perhaps a legend. I also think that using the term “Difference in standard deviations” rather than “Gain in standard deviations” is appropriate, since we witnessed substantial loss in average scores since 2019.

Task 6

Describe the trend since 2003 of 8th grade math scores as portrayed on your distribution of scale scores in math for a nationally representative sample of 8th grade students in the USA. You might find that reviewing the prose in Measuring Up around page 94 helpful.

Task 7

Download a second Excel spreadsheet from the NAEP Data Explorer at nationsreportcard.gov/ndecore/landing.

Select Mathematics, Grade 8, National (not public nor private).
Select the years 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017y, 2019, and 2022
For “Variable” select “National School Lunch Program eligibility, 3 categories”
For “Statistic, choose”Average scale scores” and “Standard deviations”
“Create Report”
THEN, on the NDE website, click “Show Report Data”, “Edit Table Layout”, move “National School Lunch Program eligibility” from Rows to Columns and APPLY. (It might take a minute to load on your screen.)
Export Data Options “XLS” and “Export”
read in this spreadsheet into R (with a different name like “Task7” or something)
Notice that this time we must fuss with the names of the columns, as the Excel spreadsheet includes a line (probably line 9 on your Excel file) where the data is identified by NSLP eligibility.
display this data table on your report

Sample code for Task 7

This sample code below provides you with one way to overcome this issue, although there likely are more elegant solutions. (Bonus points available here if you come up with a better approach!)

NOTE: The names of your Excel file will differ from this sample code. Be sure to use the correct file name!

#read in the Excel file from the A9 to E10 cell in the Excel file titled Task1
data7 <- read_excel("NDECoreExcel_Mathematics, Grade 8, National School Lunch Progra_20231106225343.Xls", range = "A10:H20")

New names:
• `Average scale score` -> `Average scale score...3`
• `Standard deviation` -> `Standard deviation...4`
• `Average scale score` -> `Average scale score...5`
• `Standard deviation` -> `Standard deviation...6`
• `Average scale score` -> `Average scale score...7`
• `Standard deviation` -> `Standard deviation...8`

data7 <- data7 |>
  rename(Eligible = `Average scale score...3`) |>
  rename(EligibleSD = `Standard deviation...4`) |> 
  rename(NotEligible = `Average scale score...5`) |>
  rename(NotEligibleSD = `Standard deviation...6` ) |>
  rename(NoInfo = `Average scale score...7`) |>
  rename(NoInfoSD = `Standard deviation...8`)

data7

# A tibble: 10 × 8
    Year Jurisdiction Eligible EligibleSD NotEligible NotEligibleSD NoInfo
   <dbl> <chr>           <dbl>      <dbl>       <dbl>         <dbl>  <dbl>
 1  2022 National         260.       35.4        287.          37.6   285.
 2  2019 National         266.       36.3        296.          37.3   294.
 3  2017 National         267.       35.0        296.          36.7   296.
 4  2015 National         268.       34.0        296.          34.2   294.
 5  2013 National         270.       34.0        297.          33.6   297.
 6  2011 National         269.       33.8        296.          33.6   296.
 7  2009 National         266.       34.0        294.          33.7   295.
 8  2007 National         265.       33.8        291.          33.5   291.
 9  2005 National         262.       34.2        288.          33.6   289.
10  2003 National         259.       34.6        287.          33.0   285.
# ℹ 1 more variable: NoInfoSD <dbl>

Task 8

Use the package called “kableExtra” to produce a table for the data from task 7 that you’ve provided; however, produce this table so that the data is shared in chronological order (from 2003 to 2022, rather than the reverse); AND, show ONLY the columns for students who are eligible or not eligible for the National School Lunch Program (leaving off the columns where we have No Information about the sampled students).

Task 9

Produce a similar plot to Task 4 but for both students Eligible for NSLP and students not eligible for NSLP, with horizontal lines indicating the cut scores for Basic, Proficient, and Advanced on this plot. (Do not include students where we have No Information about the sampled students.)

Include the usual nice things that a good analyst would include, like titles, subtitles, appropriate axis titles, and a perhaps a legend.
And write a short narrative explaining the how this plot might the readability and interpretability of this data.

Task 10

Switching back to your first dataset (perhaps called data1), assume that these scores are distributed in a normal (bell-shaped) manner each year. For the distribution of 8th grade Math scores, produce a plot that shows what this distribution looks like (a bell curve) for Grade 8 Math scores from 2022.

This is a theoretical plot, looking like a perfect bell curve.
Include a title and appropriate axis titles.
Include vertical lines showing the cut scores marking Basic, Proficient, and Advanced.

Task 11

Similarly with your second dataset, assume that the scores for students who are Eligible for NSLP are distributed normally (a bell curve centered around the average score for those Eligible for NSLP), and the scores for students who are Not Eligible for NSLP are distributed normally. Produce a plot showing both distributions for Grade 8 Math scores from 2022 for the two samples.

These are theoretical plots, looking like a perfect bell curves.
Include a title and appropriate axis titles.
Include vertical lines showing the cut scores marking Basic, Proficient, and Advanced.

Task 12 (bonus points)

Professor Koretz’s plot on Figure 11.1 doesn’t show perfect bell curves. Rather, he simulated the data to show some slight messiness to the data. He also attempted the samples to be realistic, with the size of Black students smaller than the population of White students (~30 years ago). Similarly, I ask that your groups reflecting the proportion of students Eligible or Not Eligible be of similar size to the proportion in the population of 8th grade students in the USA. (See page 267 of Measuring Up for more info about Koretz’s thoughts. See the NAEP website for sample size of students Eligible for NSLP.

Produce a plot (two distributions) that mimics these aspects of the population of 8th graders in the USA in 2022.