Assessment declaration checklist

Please carefully read the statements below and check each box if you agree with the declaration. If you do not check all boxes, your assignment will not be marked. If you make a false declaration on any of these points, you may be investigated for academic misconduct. Students found to have breached academic integrity may receive official warnings and/or serious academic penalties. Please read more about academic integrity here. If you are unsure about any of these points or feel your assessment might breach academic integrity, please contact your course coordinator for support. It is important that you DO NOT submit any assessment until you can complete the declaration truthfully.

By checking the boxes below, I declare the following:

I understand that:

I agree and acknowledge that:

Deconstruct

Original

The original data visualisation selected for the assignment was as follows:


Source: https://dqydj.com/2018-average-median-top-individual-income-percentiles/


Objective and Audience

The objective and audience of the original data visualisation chosen can be summarised as follows:

Objective To compare USA household income thresholds at important points in two different years. We look at the income levels. These levels mark specific income percentiles. This shows income changes over time.

Audience This content is for people interested in income distribution. It is for students, journalists, or the general public.

Improvements

The original data visualisation chosen could be improved in the three following ways:

  • First, I will check that the years are correct and consistent. For example, use either 2016 or 2018 , not both. I will define the dollar cutoff. The dollar cutoff is the minimum household income to be at or above a certain percentile.
  • I will remove the “window” border, use clean grid lines, right aligned currency labels on the y-axis, and show data labels only for the top three percentiles to avoid clutter.
  • I will add a short subtitle and include a note on whether inflation is calculated using nominal USD or CPI-adjusted figures.

Reconstruct

Code

The following code was used to improve the original.

library(tidyverse)
library(scales)     
library(plotly) 

df <- tribble(
  ~percentile, ~year, ~cutoff,
  "10%", 2017, 14280,  "10%", 2018, 13488,
  "25%", 2017, 30001,  "25%", 2018, 29010,
  "50%", 2017, 61822,  "50%", 2018, 56849,
  "75%", 2017,111160,  "75%", 2018,105680,
  "90%", 2017,178793,  "90%", 2018,170432,
  "95%", 2017,236360,  "95%", 2018,225186,
  "99%", 2017,434455,  "99%", 2018,430600
) %>%
  mutate(percentile = factor(percentile, levels = c("10%","25%","50%","75%","90%","95%","99%")))

wide <- df %>%
  pivot_wider(names_from = year, values_from = cutoff) %>%
  mutate(delta = `2018` - `2017`,
         dir   = ifelse(delta >= 0, "Up", "Down"))

p_new <- ggplot(wide, aes(y = percentile)) +
  geom_segment(aes(x = `2017`, xend = `2018`, yend = percentile),
               linewidth = 2, colour = "grey70") +
  geom_point(aes(x = `2017`), colour = "blue", size = 2.6) +  # slightly smaller
  geom_point(aes(x = `2018`), colour = "red", size = 2.6) +
  geom_text(aes(x = pmax(`2017`, `2018`) + 10000,                 # was +15000
                label = scales::dollar(delta, accuracy = 1),
                colour = dir),
            hjust = 0, size = 2.8, show.legend = FALSE) +         # smaller text
  scale_colour_manual(values = c("Up"="red","Down"="blue"), guide = "none") +
  scale_x_continuous(
    breaks = seq(0, 450000, 100000),                              # roomy ticks
    labels = scales::dollar_format(accuracy = 1),
    limits = c(0, 460000),                                        # a touch more room
    expand = expansion(mult = c(0, 0.02))
  ) +
  coord_cartesian(clip = "off") +
  labs(
    title = "US Household Income Cut-offs by Percentile",
    subtitle = "Comparison of 2017 (blue) vs 2018 (red); label shows change (2018–2017)",
    x = "Dollar cutoff", y = "Percentile",
    caption = "Source: IPUMS-CPS ASEC (University of Minnesota). Weighted household income percentiles."
  ) +
  theme_minimal(base_size = 12) +
  theme(panel.grid.minor = element_blank(),
        plot.margin = margin(10, 90, 10, 10))

p_interactive <- ggplotly(p_new, tooltip = c("x","y")) %>%
  layout(
    xaxis = list(tickformat = "$~s", dtick = 100000, automargin = TRUE),
    margin = list(l = 70, r = 120, b = 70, t = 60)
  )

for (i in seq_along(p_interactive$x$data)) {
  if (p_interactive$x$data[[i]]$mode == "text") {
    p_interactive$x$data[[i]]$showlegend <- FALSE
  }
}

Reconstruction

The following plot improves the original data visualisation in the three ways previously explained.

References

The reference to the original data visualisation choose, the data source(s) used for the reconstruction and any other sources used for this assignment are as follows: