Assignment 2

Assessment declaration checklist

Please carefully read the statements below and check each box if you agree with the declaration. If you do not check all boxes, your assignment will not be marked. If you make a false declaration on any of these points, you may be investigated for academic misconduct. Students found to have breached academic integrity may receive official warnings and/or serious academic penalties. Please read more about academic integrity here. If you are unsure about any of these points or feel your assessment might breach academic integrity, please contact your course coordinator for support. It is important that you DO NOT submit any assessment until you can complete the declaration truthfully.

By checking the boxes below, I declare the following:

I have not impersonated, or allowed myself to be impersonated by, any person for the purposes of this assessment
This assessment is my original work and no part of it has been copied from any other source except where due acknowledgement is made. Due acknowledgement means the following:
- The source is correctly referenced in a reference list
- The work has been paraphrased or directly quoted
- A citation to the original work’s reference has been included where the copied work appears in the assessment.
No part of this assessment has been written for me by any other person except where such collaboration has been authorised by the lecturer/teacher concerned.
I have not used generative “AI” tools for the purposes of this assessment.
Where this work is being submitted for individual assessment, I declare that it is my original work and that no part has been contributed by, produced by or in conjunction with another student.
I give permission for my assessment response to be reproduced, communicated, compared and archived for the purposes of detecting plagiarism.
I give permission for a copy of my assessment to be retained by the university for review and comparison, including review by external examiners.

I understand that:

Plagiarism is the presentation of the work, idea or creation of another person or machine as though it is your own. It is a form of cheating and is a very serious academic offence that may lead to exclusion from the University. Plagiarised material can be drawn from, and presented in, written, graphic and visual form, including electronic data and oral presentations. Plagiarism occurs when the origin of the material used is not appropriately cited.
Plagiarism includes the act of assisting or allowing another person to plagiarise or to copy my work.

I agree and acknowledge that:

I have read and understood the Declaration and Statement of Authorship above.
If I do not agree to the Declaration and Statement of Authorship in this context and all boxes are not checked, the assessment outcome is not valid for assessment purposes and will not be included in my final result for this course.

Deconstruct

Original

The original data visualisation selected for the assignment was as follows:

Source: https://dqydj.com/2018-average-median-top-individual-income-percentiles/

Objective and Audience

The objective and audience of the original data visualisation chosen can be summarised as follows:

Objective To compare USA household income thresholds at important points in two different years. We look at the income levels. These levels mark specific income percentiles. This shows income changes over time.

Audience This content is for people interested in income distribution. It is for students, journalists, or the general public.

Improvements

The original data visualisation chosen could be improved in the three following ways:

First, I will check that the years are correct and consistent. For example, use either 2016 or 2018 , not both. I will define the dollar cutoff. The dollar cutoff is the minimum household income to be at or above a certain percentile.
I will remove the “window” border, use clean grid lines, right aligned currency labels on the y-axis, and show data labels only for the top three percentiles to avoid clutter.
I will add a short subtitle and include a note on whether inflation is calculated using nominal USD or CPI-adjusted figures.

Reconstruct

Code

The following code was used to improve the original.

library(tidyverse)
library(scales)     
library(plotly) 

df <- tribble(
  ~percentile, ~year, ~cutoff,
  "10%", 2017, 14280,  "10%", 2018, 13488,
  "25%", 2017, 30001,  "25%", 2018, 29010,
  "50%", 2017, 61822,  "50%", 2018, 56849,
  "75%", 2017,111160,  "75%", 2018,105680,
  "90%", 2017,178793,  "90%", 2018,170432,
  "95%", 2017,236360,  "95%", 2018,225186,
  "99%", 2017,434455,  "99%", 2018,430600
) %>%
  mutate(percentile = factor(percentile, levels = c("10%","25%","50%","75%","90%","95%","99%")))

wide <- df %>%
  pivot_wider(names_from = year, values_from = cutoff) %>%
  mutate(delta = `2018` - `2017`,
         dir   = ifelse(delta >= 0, "Up", "Down"))

p_new <- ggplot(wide, aes(y = percentile)) +
  geom_segment(aes(x = `2017`, xend = `2018`, yend = percentile),
               linewidth = 2, colour = "grey70") +
  geom_point(aes(x = `2017`), colour = "blue", size = 2.6) +  # slightly smaller
  geom_point(aes(x = `2018`), colour = "red", size = 2.6) +
  geom_text(aes(x = pmax(`2017`, `2018`) + 10000,                 # was +15000
                label = scales::dollar(delta, accuracy = 1),
                colour = dir),
            hjust = 0, size = 2.8, show.legend = FALSE) +         # smaller text
  scale_colour_manual(values = c("Up"="red","Down"="blue"), guide = "none") +
  scale_x_continuous(
    breaks = seq(0, 450000, 100000),                              # roomy ticks
    labels = scales::dollar_format(accuracy = 1),
    limits = c(0, 460000),                                        # a touch more room
    expand = expansion(mult = c(0, 0.02))
  ) +
  coord_cartesian(clip = "off") +
  labs(
    title = "US Household Income Cut-offs by Percentile",
    subtitle = "Comparison of 2017 (blue) vs 2018 (red); label shows change (2018–2017)",
    x = "Dollar cutoff", y = "Percentile",
    caption = "Source: IPUMS-CPS ASEC (University of Minnesota). Weighted household income percentiles."
  ) +
  theme_minimal(base_size = 12) +
  theme(panel.grid.minor = element_blank(),
        plot.margin = margin(10, 90, 10, 10))

p_interactive <- ggplotly(p_new, tooltip = c("x","y")) %>%
  layout(
    xaxis = list(tickformat = "$~s", dtick = 100000, automargin = TRUE),
    margin = list(l = 70, r = 120, b = 70, t = 60)
  )

for (i in seq_along(p_interactive$x$data)) {
  if (p_interactive$x$data[[i]]$mode == "text") {
    p_interactive$x$data[[i]]$showlegend <- FALSE
  }
}

Reconstruction

The following plot improves the original data visualisation in the three ways previously explained.

References

The reference to the original data visualisation choose, the data source(s) used for the reconstruction and any other sources used for this assignment are as follows:

DQYDJ. (2019). Household income percentile calculator for 2017–2018. website: https://dqydj.com/2018-average-median-top-individual-income-percentiles/