DAM4 Assignment Fall2023

Author

Hoffman

Purpose of the Data-Analytic Memo

One of my goals for this course is to give you an opportunity to use public data to better understand educational achievement in the USA. To that end, we’ve looked at many of the websites describing the National Assessment of Educational Progress, including “A Technical History of NAEP” and the Reading and Mathematics Frameworks for NAEP.

The front-facing website “The Nation’s Report Card” https://www.nationsreportcard.gov links any interested researcher – and you count as a researcher! – to NAEP data that describes estimates of what American students know and are able to do in Mathematics, Reading – as well as Arts, Civics, Economics, Geography, Science, Technology, U.S. History, and Writing.

The Nation’s Report Card

On October 24, 2022, the National Center for Education Statistics (NCES) released scores on the “main” National Assessment of Educational Progress for 4th and 8th grade students in Math and Reading. https://www.nationsreportcard.gov/

Many within the educational community are concerned that the reported declines in test scores nationally, particularly in Mathematics.

As part of the goals for this course, this assignment is designed to hone your expertise and refine your understanding of what these public data can tell us about educational achievement in the USA.

Constructing useful plots

The headline on October 22, 2022 screamed “Largest score declines in NAEP mathematics at grades 4 and 8 since initial assessments in 1990” along with two plots showing declines in average math scores for 4th and 8th grade American students between 2019 and 2022 – associated with the COVID pandemic and associated school shutdowns. https://www.nationsreportcard.gov/highlights/mathematics/2022/

In this DAM, you will first of all replicate plots like these – producing similar plots using data pulled from the NAEP website.

Figures from Measuring Up

Additionally, you will produce plots similar to those that Professor Koretz showed in Measuring Up.

In Chapter 5, several plots show trends in NAEP scores since the the start of test now termed the “Long-term trend” test, as well as the first record of what is now the “main” NAEP, with both tests showing substantial gains in mathematics scores, measured as “Gain in standard deviations” – which you are now well-equipped to understand.

Cut scores identifying “Basic”, “Proficient”, and “Advanced” levels of performance on the Mathematics NAEP function similarly, and we will also produce plots to explore how these cut scores affect public understanding of achievement levels of American K12 students.

Focus on 8th grade math

Here you will conduct analyses in response to the October 24,2022 release of trend scores in math for 8th grade students.

Remember that this assignment is all about collaborative learning. It is an opportunity to try out your skills as a coder and your understanding of educational testing. Please ensure that you engage in a full, fair, and mutually-agreeable collaboration with your partners. Do not simply divide the work. Discuss and plan your analyses together; debate what you have found with each other; collaborate on the writing.

You are welcome to employ GitHub Copilot & ChatGPT or ask questions to website like Stack Overflow during your analysis.

Task 1

Download an Excel spreadsheet from the NAEP Data Explorer at nationsreportcard.gov/ndecore/landing. Select Mathematics, Grade 8, National (not public nor private).

  • Select the years 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017, 2019, and 2022

  • For “Variable” select “All students”

  • For “Statistic, choose”Average scale scores” and “Standard deviations”

  • “Create Report”

  • Export Data Options “XLS” and “Export”

  • read in this spreadsheet into R

  • display this data table on your report

Sample packages to load. Note that “kableExtra” is new and will need to be installed (one time) on your own computer.

## LOAD PACKAGES
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
# NOTE: 
# You likely will need to install kableExtra
# install.packages("kableExtra")

Sample code for Task 1

NOTE: The names of your Excel file will differ from this sample code. Be sure to use the correct file name!

#read in the Excel file from the A9 to E19 cell in the Excel file titled Task1
data1 <- read_excel("NDECoreExcel_Mathematics, Grade 8, All students_20231106185824.Xls", 
                    range = "A9:E19")

data1
# A tibble: 10 × 5
    Year Jurisdiction `All students` `Average scale score` `Standard deviation`
   <dbl> <chr>        <chr>                          <dbl>                <dbl>
 1  2022 National     All students                    274.                 38.9
 2  2019 National     All students                    282.                 39.7
 3  2017 National     All students                    283.                 38.8
 4  2015 National     All students                    282.                 36.9
 5  2013 National     All students                    285.                 36.5
 6  2011 National     All students                    284.                 36.2
 7  2009 National     All students                    283.                 36.4
 8  2007 National     All students                    281.                 36.1
 9  2005 National     All students                    279.                 36.3
10  2003 National     All students                    278.                 36.2

Task 2

Use the package called “kableExtra” to produce a table for the data from task 1 that you’ve provided; however, produce this table so that the data is shared in chronological order (from 2003 to 2022, rather than the reverse)

Essentially, we’re looking for a nicer table, formatted by year from 2003 through 2022, with zero decimal places.

# Task 2
data1 |>
  arrange(data1, Year) |> # arrange by year 
  kable(digits = 0) |> # set to zero decimal places
  kable_styling()
Year Jurisdiction All students Average scale score Standard deviation
2003 National All students 278 36
2005 National All students 279 36
2007 National All students 281 36
2009 National All students 283 36
2011 National All students 284 36
2013 National All students 285 37
2015 National All students 282 37
2017 National All students 283 39
2019 National All students 282 40
2022 National All students 274 39

Task 3

Produce a plot of average achievement in 8th grade mathematics on the main NAEP test similar to what was produced on the NAEP website.

# Task 3
ggplot(data = data1) + 
  geom_line(mapping = aes(x = Year, y = `Average scale score`))  +
  labs(
    title = "Trend in Grade 8 Mathematics Average Scores",
    subtitle = "Main NAEP: 2003 - 2022"
  )

Task 4

Produce a similar plot adding horizontal lines indicating the cut scores for Basic, Proficient, and Advanced on this plot.

  • Include the usual nice things that a good analyst would include, like titles, subtitles, appropriate axis titles, and a perhaps a legend.

  • And write a short narrative explaining the how this plot might improve the readability and interpretability of this data.

Producing a plot with like the dotted-line portion of Figure 5.5, starting in 2003 and ending in 2022.

Produce another plot, similar to the dotted line for the “Main NAEP” represented on Figure 5.5 from Measuring Up. However, in this plot start the “Gain in standard deviations” from 2003.

Task 5

Add a column (mutate) that shows the difference in scale scores between the Year in question and 2003. You should see numbers between -3.3 and about 7. And the column for 2003 should be 0.0. Explain how you accomplished this briefly in your report.

Task 6

Add another column that shows the difference in Math measured in standard deviations since 2003. (This column should be 0.0 in 2003 and should be about - 0.085 for 2022.) Again, explain how you accomplished this briefly in your report.

Task 7

Produce the plot that looks like Figure 5.5 from Measuring Up.

Include the usual nice things that a good analyst would include, like titles, subtitles, appropriate axis titles, and a perhaps a legend. I also think that using the term “Difference in standard deviations” rather than “Gain in standard deviations” is appropriate, since we witnessed substantial loss in average scores since 2019.

Task 8

Describe the trend since 2003 of 8th grade math scores as portrayed on your distribution of scale scores in math for a nationally representative sample of 8th grade students in the USA. You might find that reviewing the prose in Measuring Up around page 94 helpful.

Task 9

Assume that these scores are distributed in a normal (bell-shaped) manner each year. For the distribution of 8th grade Math scores, produce a plot that shows what this distribution looks like (a bell curve) for Grade 8 Math scores from 2022.

  • This is a theoretical plot, looking like a perfect bell curve.

  • Include a title and appropriate axis titles.

  • Include vertical lines showing the cut scores marking Basic, Proficient, and Advanced.