My recent endeavor has been to investigate the hypothesis that the middle class is shrinking in the United States. My primary tool in this analysis has been an extensive examination of census data sets from 1969 to 2022. This data includes crucial information such as the monetary boundaries defining each quantile and the corresponding percentage of wealth each quantile possesses. By meticulously dissecting these data sets, I aimed to uncover trends and patterns that could confirm or refute the notion of a contracting middle class. This longitudinal data is critical, as it allows for observing changes over a significant period, offering a comprehensive view of the economic shifts affecting various income groups.
To illustrate these trends effectively, I utilized line graphs and treemaps in my analysis. Line graphs are handy for showcasing the evolution of income distribution over time, allowing us to observe any significant shifts in the wealth of the middle class relative to other income groups. On the other hand, treemaps offer a more nuanced visual representation of wealth distribution across different quantiles. They provide an intuitive understanding of how the economic landscape has transformed, highlighting the disparities in wealth accumulation among these groups. My ultimate goal is to develop a dynamic treemap within a Shiny application. This interactive platform will enable users to explore the data in real-time, observing how the percentage of wealth held by each quantile has evolved year by year, thus offering a compelling and user-friendly way to engage with the data and understand the economic changes impacting the middle class.
# Load required Libraries
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(ggrepel)
library(RColorBrewer)
library(reshape2)
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library(dplyr)
library(hexbin)
library(lubridate)
library(geojsonio)
## Registered S3 method overwritten by 'geojsonsf':
## method from
## print.geojson geojson
##
## Attaching package: 'geojsonio'
##
## The following object is masked from 'package:base':
##
## pretty
library(sf)
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(leaflet)
library(shiny)
library(d3Tree)
library(treemap)
library(tidycensus)
library(maps)
##
## Attaching package: 'maps'
##
## The following object is masked from 'package:purrr':
##
## map
library(tigris)
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
library(shinyWidgets)
library(rlang)
##
## Attaching package: 'rlang'
##
## The following objects are masked from 'package:purrr':
##
## %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
## flatten_raw, invoke, splice
library(viridis)
## Loading required package: viridisLite
##
## Attaching package: 'viridis'
##
## The following object is masked from 'package:maps':
##
## unemp
##
## The following object is masked from 'package:scales':
##
## viridis_pal
df <- read.csv("~/Desktop/JHU/Semester 1/Data Visualization/Class Project/Module 12/Income Dispersion 1967 to 2022 cleaned.csv")
#Familiarize myself with Data
head(df)
## Year Measures.of.income.dispersion X X.1
## 1 NA Mean household income of quintiles
## 2 NA
## 3 NA Lowest quintile Second quintile Third quintile
## 4 2022 16,120 43,850 74,730
## 5 2021 16,020 44,240 76,430
## 6 2020 16,530 44,800 76,920
## X.2 X.3 X.4
## 1
## 2
## 3 Fourth quintile Highest quintile Top 5 percent
## 4 119,900 277,300 499,900
## 5 124,500 290,400 517,800
## 6 124,300 286,800 504,500
## X.5 X.6 X.7
## 1 Shares of household income quintiles
## 2
## 3 Lowest quintile Second quintile Third quintile
## 4 3.0 8.2 14.0
## 5 2.9 8.0 13.9
## 6 3.0 8.2 14.0
## X.8 X.9 X.10
## 1
## 2
## 3 Fourth quintile Highest quintile Top 5 percent
## 4 22.5 52.1 23.5
## 5 22.6 52.7 23.5
## 6 22.6 52.2 23.0
## X.11 X.12 X.13
## 1 Summary measures
## 2 Gini index of income inequality Mean logarithmic deviation of income Theil
## 3
## 4 0.488 0.637 0.440
## 5 0.494 0.634 0.448
## 6 0.488 0.617 0.437
## X.14 X.15 X.16
## 1
## 2 Atkinson
## 3 e=0.25 e=0.50 e=0.75
## 4 0.106 0.207 0.315
## 5 0.108 0.211 0.320
## 6 0.105 0.206 0.313
#remove 1st two rows in the header
df <- df[-c(1, 2), ]
When I first loaded the dataset from the CSV, it presented a challenge due to its disorganized format and extraneous columns. The initial structure included a mix of numeric and character types, with multiple columns labeled generically as ‘X,’ ‘X.1’, ‘X.2’, and so forth, making it difficult to decipher the data meaningfully. It was evident that considerable cleaning and reformatting was necessary to make this dataset usable for my analysis.
To address these issues, I implemented a series of steps in R. First, I set the third row, which contained the relevant headers, as the new header for the dataset. This involved storing the third row as ‘new_header’ and removing the first few rows containing incorrect headers and the row used for the new headers. Subsequently, I assigned these new headers as column names, reset the row names to align correctly, and checked the first few rows to ensure the changes were correctly applied. Continuing the cleanup process, I meticulously renamed each column to reflect the data it contained accurately. This step was crucial for clarity and ease of analysis. Columns were renamed to represent different income quantiles in dollars and percentages and summary measures such as the Gini index of income inequality, mean logarithmic deviation of income, and the Theil index. Through these steps, I transformed a disordered and challenging dataset into a structured and analyzable format, setting the stage for my investigation into the state of the middle class.
head(df)
## Year Measures.of.income.dispersion X X.1
## 3 NA Lowest quintile Second quintile Third quintile
## 4 2022 16,120 43,850 74,730
## 5 2021 16,020 44,240 76,430
## 6 2020 16,530 44,800 76,920
## 7 2019 17,410 46,300 78,520
## 8 2018 15,920 43,110 73,480
## X.2 X.3 X.4 X.5
## 3 Fourth quintile Highest quintile Top 5 percent Lowest quintile
## 4 119,900 277,300 499,900 3.0
## 5 124,500 290,400 517,800 2.9
## 6 124,300 286,800 504,500 3.0
## 7 126,600 289,800 513,800 3.1
## 8 117,400 270,300 481,400 3.1
## X.6 X.7 X.8 X.9
## 3 Second quintile Third quintile Fourth quintile Highest quintile
## 4 8.2 14.0 22.5 52.1
## 5 8.0 13.9 22.6 52.7
## 6 8.2 14.0 22.6 52.2
## 7 8.3 14.1 22.7 51.9
## 8 8.3 14.1 22.6 52.0
## X.10 X.11 X.12 X.13 X.14 X.15 X.16
## 3 Top 5 percent e=0.25 e=0.50 e=0.75
## 4 23.5 0.488 0.637 0.440 0.106 0.207 0.315
## 5 23.5 0.494 0.634 0.448 0.108 0.211 0.320
## 6 23.0 0.488 0.617 0.437 0.105 0.206 0.313
## 7 23.0 0.484 0.590 0.432 0.104 0.203 0.306
## 8 23.1 0.486 0.616 0.436 0.105 0.205 0.311
# Store the third row as the new header
new_header <- as.character(df[1,])
# Remove the first row of the dataframe (which includes the incorrect headers and the row used for the new headers)
df <- df[-(1:1), ]
# Set the new headers as column names
colnames(df) <- new_header
# Reset the row names in case they are now out of sync
rownames(df) <- NULL
# Check the first few rows to ensure changes have been applied
head(df)
## NA Lowest quintile Second quintile Third quintile Fourth quintile
## 1 2022 16,120 43,850 74,730 119,900
## 2 2021 16,020 44,240 76,430 124,500
## 3 2020 16,530 44,800 76,920 124,300
## 4 2019 17,410 46,300 78,520 126,600
## 5 2018 15,920 43,110 73,480 117,400
## 6 2017 15,630 41,740 72,590 116,800
## Highest quintile Top 5 percent Lowest quintile Second quintile
## 1 277,300 499,900 3.0 8.2
## 2 290,400 517,800 2.9 8.0
## 3 286,800 504,500 3.0 8.2
## 4 289,800 513,800 3.1 8.3
## 5 270,300 481,400 3.1 8.3
## 6 261,600 454,300 3.1 8.2
## Third quintile Fourth quintile Highest quintile Top 5 percent
## 1 14.0 22.5 52.1 23.5 0.488 0.637
## 2 13.9 22.6 52.7 23.5 0.494 0.634
## 3 14.0 22.6 52.2 23.0 0.488 0.617
## 4 14.1 22.7 51.9 23.0 0.484 0.590
## 5 14.1 22.6 52.0 23.1 0.486 0.616
## 6 14.3 23.0 51.5 22.3 0.482 0.609
## e=0.25 e=0.50 e=0.75
## 1 0.440 0.106 0.207 0.315
## 2 0.448 0.108 0.211 0.320
## 3 0.437 0.105 0.206 0.313
## 4 0.432 0.104 0.203 0.306
## 5 0.436 0.105 0.205 0.311
## 6 0.424 0.103 0.202 0.307
# Set the name of the columns"
colnames(df)[1] <- "Year"
colnames(df)[2] <- "Lowest quintile $"
colnames(df)[3] <- "Second quintile $"
colnames(df)[4] <- "Third quintile $"
colnames(df)[5] <- "Fourth quintile $"
colnames(df)[6] <- "Highest quintile $"
colnames(df)[7] <- "Top 5 percent $"
colnames(df)[8] <- "Lowest quintile %"
colnames(df)[9] <- "Second quintile %"
colnames(df)[10] <- "Third quintile %"
colnames(df)[11] <- "Fourth quintile %"
colnames(df)[12] <- "Highest quintile %"
colnames(df)[13] <- "Top 5 percent %"
head(df)
## Year Lowest quintile $ Second quintile $ Third quintile $ Fourth quintile $
## 1 2022 16,120 43,850 74,730 119,900
## 2 2021 16,020 44,240 76,430 124,500
## 3 2020 16,530 44,800 76,920 124,300
## 4 2019 17,410 46,300 78,520 126,600
## 5 2018 15,920 43,110 73,480 117,400
## 6 2017 15,630 41,740 72,590 116,800
## Highest quintile $ Top 5 percent $ Lowest quintile % Second quintile %
## 1 277,300 499,900 3.0 8.2
## 2 290,400 517,800 2.9 8.0
## 3 286,800 504,500 3.0 8.2
## 4 289,800 513,800 3.1 8.3
## 5 270,300 481,400 3.1 8.3
## 6 261,600 454,300 3.1 8.2
## Third quintile % Fourth quintile % Highest quintile % Top 5 percent %
## 1 14.0 22.5 52.1 23.5 0.488
## 2 13.9 22.6 52.7 23.5 0.494
## 3 14.0 22.6 52.2 23.0 0.488
## 4 14.1 22.7 51.9 23.0 0.484
## 5 14.1 22.6 52.0 23.1 0.486
## 6 14.3 23.0 51.5 22.3 0.482
## e=0.25 e=0.50 e=0.75
## 1 0.637 0.440 0.106 0.207 0.315
## 2 0.634 0.448 0.108 0.211 0.320
## 3 0.617 0.437 0.105 0.206 0.313
## 4 0.590 0.432 0.104 0.203 0.306
## 5 0.616 0.436 0.105 0.205 0.311
## 6 0.609 0.424 0.103 0.202 0.307
# Set the name of the remaining columns"
colnames(df)[14] <- "Gini index of income inequality"
colnames(df)[15] <- "Mean logarithmic deviation of income"
colnames(df)[16] <- "Theil"
head(df)
## Year Lowest quintile $ Second quintile $ Third quintile $ Fourth quintile $
## 1 2022 16,120 43,850 74,730 119,900
## 2 2021 16,020 44,240 76,430 124,500
## 3 2020 16,530 44,800 76,920 124,300
## 4 2019 17,410 46,300 78,520 126,600
## 5 2018 15,920 43,110 73,480 117,400
## 6 2017 15,630 41,740 72,590 116,800
## Highest quintile $ Top 5 percent $ Lowest quintile % Second quintile %
## 1 277,300 499,900 3.0 8.2
## 2 290,400 517,800 2.9 8.0
## 3 286,800 504,500 3.0 8.2
## 4 289,800 513,800 3.1 8.3
## 5 270,300 481,400 3.1 8.3
## 6 261,600 454,300 3.1 8.2
## Third quintile % Fourth quintile % Highest quintile % Top 5 percent %
## 1 14.0 22.5 52.1 23.5
## 2 13.9 22.6 52.7 23.5
## 3 14.0 22.6 52.2 23.0
## 4 14.1 22.7 51.9 23.0
## 5 14.1 22.6 52.0 23.1
## 6 14.3 23.0 51.5 22.3
## Gini index of income inequality Mean logarithmic deviation of income Theil
## 1 0.488 0.637 0.440
## 2 0.494 0.634 0.448
## 3 0.488 0.617 0.437
## 4 0.484 0.590 0.432
## 5 0.486 0.616 0.436
## 6 0.482 0.609 0.424
## e=0.25 e=0.50 e=0.75
## 1 0.106 0.207 0.315
## 2 0.108 0.211 0.320
## 3 0.105 0.206 0.313
## 4 0.104 0.203 0.306
## 5 0.105 0.205 0.311
## 6 0.103 0.202 0.307
I aim to transform and visualize the income data across different quintiles over time. Firstly, I ensure that the ‘Year’ column is correctly formatted as a numerical value, and then I convert the income columns to numeric, stripping any commas for accurate numerical representation. By using tidyr::pivot_longer, I reshape the data from a wide format to a long format, which is more suitable for the ggplot2 library in R, allowing me to create a visually appealing and informative line graph. This graph, enhanced with the viridis color palette for clear differentiation between quintiles, is designed to display trends in mean household income by income quintile over the years. Finally, I convert the static ggplot into an interactive plotly object, which enhances the user experience by enabling interactive exploration of the data points through tooltips.
# Create income column
income_columns <- c("Lowest quintile $", "Second quintile $", "Third quintile $", "Fourth quintile $", "Highest quintile $", "Top 5 percent $")
# Ensuring that the Year column is treated as a numerical value if it does not contain any non-numeric characters
df$Year <- as.numeric(gsub("[^0-9]", "", df$Year))
#Use the actual column names.
df[income_columns] <- lapply(df[income_columns], function(x) as.numeric(gsub(",", "", as.character(x))))
# Use tidyr::pivot_longer instead of gather
df_long <- df %>%
pivot_longer(cols = all_of(income_columns), names_to = "Quintile", values_to = "Income")
# Plot the data with ggplot2 using the viridis color palette
p <- ggplot(df_long, aes(x = Year, y = Income, color = Quintile, group = Quintile)) +
geom_line(size = 1.2) +
geom_point(size = 1.5) +
scale_color_viridis(discrete = TRUE, option = "D") +
labs(title = "Income Trend Analysis by Quintile",
subtitle = "Trends in mean household income across different income quintiles",
x = "Year",
y = "Mean Household Income",
color = "Income Quintile") +
theme_minimal(base_size = 14) +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 14)) +
scale_y_continuous(labels = scales::dollar_format(prefix = "$", suffix = "K", scale = 1e-3))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Convert to an interactive plotly object
ggplotly(p, tooltip = c("y", "x"))
Analyzing the “Income Trend Analysis by Quintile chart,” I can depict mean household income trends across various income quintiles from 1969 to 2022. The chart shows that the highest quintile and the top 5 percent have experienced significant income growth over this period, which is indicated by the steep upward trajectory of their lines. In contrast, the lowest three quintiles have seen only modest increases in mean household income, as their lines are relatively flat, suggesting that income growth has been disproportionately concentrated at the top, which could indicate widening income inequality and potentially a shrinking middle class.
The chart reveals a concerning trend where income growth is stagnating for the lower and middle quintiles and becoming more disparate over time, with the gap between the top earners and the rest widening. The highest quintile’s and the top 5 percent’s lines ascend more sharply and diverge further from the rest as time progresses, highlighting an economy increasingly skewed towards the wealthy. This divergence is particularly evident from the early 2000s onwards, coinciding with economic policies and market changes that may have favored capital over labor, technology over traditional industries, and investment income over wages. The relatively stagnant growth lines of the lower quintiles indicate that despite overall economic growth, the benefits have not been evenly distributed, which points towards a shrinking middle class, as their share of the pie is not keeping pace with the upper quintiles. This visual evidence aligns with the data I’ve been analyzing, where the Gini coefficient, a measure of income inequality, has generally increased, reinforcing the suggestion that the economic growth experienced over the last few decades has not been equitably shared.
# Create a wealth_percentage_columns as a vector containing the names of columns with percentages
wealth_percentage_columns <- names(df)[grepl("%", names(df))]
# Convert percentage columns to numeric
df_long <- df %>%
pivot_longer(cols = all_of(wealth_percentage_columns), names_to = "Quintile", values_to = "Wealth Distribution Percentage") %>%
mutate(`Wealth Distribution Percentage` = as.numeric(`Wealth Distribution Percentage`)) %>% # Convert to numeric
mutate(`Wealth Distribution Percentage` = `Wealth Distribution Percentage` / 100) # Then divide by 100
# Calculate year-over-year change for each quintile
df_long <- df_long %>%
arrange(Year, Quintile) %>%
group_by(Quintile) %>% # Group only by Quintile to calculate the change year-over-year
mutate(Change = `Wealth Distribution Percentage` - lag(`Wealth Distribution Percentage`))
# Create custom hover text including the year and the percentage change
df_long <- df_long %>%
mutate(hover_text = paste0(
"Year: ", Year, "<br>",
"Quintile: ", Quintile, "<br>",
"Wealth Distribution Percentage: ", scales::percent(`Wealth Distribution Percentage`), "<br>",
"Change from previous year: ",
ifelse(is.na(Change), "NA", paste0(sprintf("%.2f%%", Change * 100)))
))
# Plot the data with ggplot2
p <- ggplot(df_long, aes(x = Year, y = `Wealth Distribution Percentage`, text = hover_text, color = Quintile, group = Quintile)) +
geom_line(size = 1.2) +
geom_point(size = 1.5) +
scale_color_viridis(discrete = TRUE, option = "D") +
labs(title = "Wealth Distribution Trend Analysis by Quintile",
subtitle = "Year-over-year change in wealth distribution percentages",
x = "Year",
y = "Wealth Distribution Percentage",
color = "Quintile") +
theme_minimal(base_size = 14) +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 14)) +
scale_y_continuous(labels = scales::label_percent(), limits = c(0, 0.55)) # Correct y-axis scale
# Convert to an interactive plotly object with custom hover text
ggplotly(p, tooltip = "hover_text")
The “Wealth Distribution Trend Analysis by Quintile” chart provides a stark visual representation of wealth disparity across different income quintiles over time. It is immediately apparent that the highest quintile, along with the top 5 percent, has a growing share of wealth, with the highest quintile’s share surpassing 50% and the top 5 percent’s share significantly higher than any of the other lower quintiles. The lower and middle quintiles, represented by the lowest three lines, show a relatively flat, if not slightly decreasing trend, indicating that their share of total wealth has generally stayed the same over time if not decreased slightly.
What stands out is the apparent slight decrease in the lower quintiles’ wealth share, which suggests that there has been little to no relative improvement in their wealth position despite the overall economic growth. This stability contrasts sharply with the dynamic growth of the highest quintile, reinforcing the notion of an economic environment that increasingly favors the accumulation of wealth at the top. The incremental year-over-year changes, as indicated by the code’s calculation of percentage changes, would reveal the nuances of these trends, but the chart suggests that any such changes have not altered the overarching pattern of wealth concentration.
Given this analysis, the chart corroborates the hypothesis of a shrinking middle class in terms of wealth share. It illustrates that while the wealth of the highest earners continues to grow, the middle class needs to see a proportionate increase in wealth, which could result in financial instability and decreased economic mobility for middle and lower-income earners. This unchecked trend may lead to further economic stratification and a hollowing out of the middle class.
head(df, n = 57)
## Year Lowest quintile $ Second quintile $ Third quintile $ Fourth quintile $
## 1 2022 16120 43850 74730 119900
## 2 2021 16020 44240 76430 124500
## 3 2020 16530 44800 76920 124300
## 4 2019 17410 46300 78520 126600
## 5 2018 15920 43110 73480 117400
## 6 2017 15630 41740 72590 116800
## 7 2016 15530 41400 70970 114200
## 8 2015 15080 39500 68790 111400
## 9 2014 14120 37600 65370 106200
## 10 2013 14220 37800 65930 106100
## 11 2013 14290 37430 64190 102500
## 12 2012 14270 36870 63550 101900
## 13 2011 14220 36960 63080 101400
## 14 2010 14350 37230 64160 102900
## 15 2009 15280 38700 65530 104100
## 16 2008 15360 38890 66050 105100
## 17 2007 15780 40210 68250 108100
## 18 2006 15900 40310 67550 106900
## 19 2005 15360 39440 66740 105000
## 20 2004 15200 38880 65870 103900
## 21 2003 15200 39040 66270 104900
## 22 2002 15510 39420 66430 104500
## 23 2001 15930 40020 66990 105000
## 24 2000 16320 40750 67860 105500
## 25 1999 16480 40470 67740 105400
## 26 1998 15650 39510 66120 102300
## 27 1997 15200 38010 63940 99030
## 28 1996 15100 37060 62340 96480
## 29 1995 15050 36780 61500 94530
## 30 1994 14240 35480 59770 93020
## 31 1993 13880 35190 58980 91660
## 32 1992 14030 35140 59200 90880
## 33 1991 14320 35970 59750 91080
## 34 1990 14720 37030 61170 92220
## 35 1989 15060 37480 62300 94230
## 36 1988 14530 36690 61360 92750
## 37 1987 14300 36330 60750 91820
## 38 1986 13920 35800 59910 90250
## 39 1985 13750 34860 57860 87030
## 40 1984 13770 34300 56840 85580
## 41 1983 13340 33500 55390 83120
## 42 1982 13170 33330 55180 82060
## 43 1981 13420 33430 55390 82560
## 44 1980 13770 34270 56560 83320
## 45 1979 14210 35350 58280 85530
## 46 1978 14310 35150 58040 85020
## 47 1977 13840 34060 56350 82560
## 48 1976 13930 34090 56070 81420
## 49 1975 13590 33370 54750 79550
## 50 1974 14080 34990 56390 81440
## 51 1973 14110 35470 58180 83690
## 52 1972 13460 34770 56740 81460
## 53 1971 12750 33690 54620 77650
## 54 1970 12650 34280 55200 77810
## 55 1969 12870 34750 55530 77780
## 56 1968 12540 33630 53340 74400
## 57 1967 11550 32010 51100 71500
## Highest quintile $ Top 5 percent $ Lowest quintile % Second quintile %
## 1 277300 499900 3.0 8.2
## 2 290400 517800 2.9 8.0
## 3 286800 504500 3.0 8.2
## 4 289800 513800 3.1 8.3
## 5 270300 481400 3.1 8.3
## 6 261600 454300 3.1 8.2
## 7 256700 450100 3.1 8.3
## 8 245000 424700 3.1 8.2
## 9 234700 402000 3.1 8.2
## 10 237200 410300 3.1 8.2
## 11 227200 395400 3.2 8.4
## 12 225900 394900 3.2 8.3
## 13 225300 394200 3.2 8.4
## 14 221000 374800 3.3 8.5
## 15 226000 390800 3.4 8.6
## 16 225400 388300 3.4 8.6
## 17 229400 392300 3.4 8.7
## 18 235600 416600 3.4 8.6
## 19 230000 405300 3.4 8.6
## 20 224600 391400 3.4 8.7
## 21 223600 385000 3.4 8.7
## 22 223100 389600 3.5 8.8
## 23 229400 409300 3.5 8.7
## 24 228600 405600 3.6 8.9
## 25 224800 390800 3.6 8.9
## 26 216400 377100 3.6 9.0
## 27 211100 370500 3.6 8.9
## 28 202900 353500 3.6 9.0
## 29 197300 340500 3.7 9.1
## 30 195500 337800 3.6 8.9
## 31 191000 327800 3.6 9.0
## 32 176100 279500 3.8 9.4
## 33 174700 272600 3.8 9.6
## 34 179000 285000 3.8 9.6
## 35 184200 297600 3.8 9.5
## 36 177100 279300 3.8 9.6
## 37 174600 275100 3.8 9.6
## 38 170800 267600 3.8 9.7
## 39 162600 250800 3.9 9.8
## 40 157400 237600 4.0 9.9
## 41 152600 230600 4.0 9.9
## 42 150500 227400 4.0 10.0
## 43 147500 219100 4.1 10.1
## 44 148600 222000 4.2 10.2
## 45 153500 234100 4.1 10.2
## 46 151800 231000 4.2 10.2
## 47 147200 225200 4.2 10.2
## 48 144100 219600 4.3 10.3
## 49 140400 213100 4.3 10.4
## 50 144300 219300 4.3 10.6
## 51 149700 230700 4.2 10.4
## 52 146200 226700 4.1 10.4
## 53 137600 211000 4.1 10.6
## 54 137700 211400 4.1 10.8
## 55 136700 210400 4.1 10.9
## 56 129300 197700 4.2 11.1
## 57 128700 203000 4.0 10.8
## Third quintile % Fourth quintile % Highest quintile % Top 5 percent %
## 1 14.0 22.5 52.1 23.5
## 2 13.9 22.6 52.7 23.5
## 3 14.0 22.6 52.2 23.0
## 4 14.1 22.7 51.9 23.0
## 5 14.1 22.6 52.0 23.1
## 6 14.3 23.0 51.5 22.3
## 7 14.2 22.9 51.5 22.6
## 8 14.3 23.2 51.1 22.1
## 9 14.3 23.2 51.2 21.9
## 10 14.3 23.0 51.4 22.2
## 11 14.4 23.0 51.0 22.2
## 12 14.4 23.0 51.0 22.3
## 13 14.3 23.0 51.1 22.3
## 14 14.6 23.4 50.3 21.3
## 15 14.6 23.2 50.3 21.7
## 16 14.7 23.3 50.0 21.5
## 17 14.8 23.4 49.7 21.2
## 18 14.5 22.9 50.5 22.3
## 19 14.6 23.0 50.4 22.2
## 20 14.7 23.2 50.1 21.8
## 21 14.8 23.4 49.8 21.4
## 22 14.8 23.3 49.7 21.7
## 23 14.6 23.0 50.1 22.4
## 24 14.8 23.0 49.8 22.1
## 25 14.9 23.2 49.4 21.5
## 26 15.0 23.2 49.2 21.4
## 27 15.0 23.2 49.4 21.7
## 28 15.1 23.3 49.0 21.4
## 29 15.2 23.3 48.7 21.0
## 30 15.0 23.4 49.1 21.2
## 31 15.1 23.5 48.9 21.0
## 32 15.8 24.2 46.9 18.6
## 33 15.9 24.2 46.5 18.1
## 34 15.9 24.0 46.6 18.5
## 35 15.8 24.0 46.8 18.9
## 36 16.0 24.2 46.3 18.3
## 37 16.1 24.3 46.2 18.2
## 38 16.2 24.3 46.1 18.0
## 39 16.2 24.4 45.6 17.6
## 40 16.3 24.6 45.2 17.1
## 41 16.4 24.6 45.1 17.0
## 42 16.5 24.5 45.0 17.0
## 43 16.7 24.8 44.3 16.5
## 44 16.8 24.7 44.1 16.5
## 45 16.8 24.6 44.2 16.9
## 46 16.8 24.7 44.1 16.8
## 47 16.9 24.7 44.0 16.8
## 48 17.0 24.7 43.7 16.6
## 49 17.0 24.7 43.6 16.5
## 50 17.0 24.6 43.5 16.5
## 51 17.0 24.5 43.9 16.9
## 52 17.0 24.5 43.9 17.0
## 53 17.3 24.5 43.5 16.7
## 54 17.4 24.5 43.3 16.6
## 55 17.5 24.5 43.0 16.6
## 56 17.6 24.5 42.6 16.3
## 57 17.3 24.2 43.6 17.2
## Gini index of income inequality Mean logarithmic deviation of income Theil
## 1 0.488 0.637 0.440
## 2 0.494 0.634 0.448
## 3 0.488 0.617 0.437
## 4 0.484 0.590 0.432
## 5 0.486 0.616 0.436
## 6 0.482 0.609 0.424
## 7 0.481 0.601 0.426
## 8 0.479 0.596 0.420
## 9 0.480 0.611 0.419
## 10 0.482 0.606 0.428
## 11 0.476 0.578 0.415
## 12 0.477 0.586 0.423
## 13 0.477 0.585 0.422
## 14 0.470 0.574 0.400
## 15 0.468 0.550 0.403
## 16 0.466 0.541 0.398
## 17 0.463 0.532 0.391
## 18 0.470 0.543 0.417
## 19 0.469 0.545 0.411
## 20 0.466 0.543 0.406
## 21 0.464 0.530 0.397
## 22 0.462 0.514 0.398
## 23 0.466 0.515 0.413
## 24 0.462 0.490 0.404
## 25 0.458 0.476 0.386
## 26 0.456 0.488 0.389
## 27 0.459 0.484 0.396
## 28 0.455 0.464 0.389
## 29 0.450 0.452 0.378
## 30 0.456 0.471 0.387
## 31 0.454 0.467 0.385
## 32 0.433 0.417 0.324
## 33 0.428 0.411 0.313
## 34 0.428 0.402 0.317
## 35 0.431 0.406 0.324
## 36 0.426 0.401 0.314
## 37 0.426 0.408 0.314
## 38 0.425 0.416 0.310
## 39 0.419 0.403 0.300
## 40 0.415 0.391 0.290
## 41 0.414 0.397 0.288
## 42 0.412 0.401 0.287
## 43 0.406 0.387 0.277
## 44 0.403 0.375 0.274
## 45 0.404 0.369 0.279
## 46 0.402 0.363 0.275
## 47 0.402 0.364 0.276
## 48 0.398 0.361 0.271
## 49 0.397 0.361 0.270
## 50 0.395 0.352 0.267
## 51 0.400 0.360 0.275
## 52 0.401 0.371 0.279
## 53 0.396 0.370 0.273
## 54 0.394 0.370 0.271
## 55 0.391 0.357 0.268
## 56 0.386 0.352 0.261
## 57 0.397 0.377 0.280
## e=0.25 e=0.50 e=0.75
## 1 0.106 0.207 0.315
## 2 0.108 0.211 0.320
## 3 0.105 0.206 0.313
## 4 0.104 0.203 0.306
## 5 0.105 0.205 0.311
## 6 0.103 0.202 0.307
## 7 0.103 0.201 0.305
## 8 0.101 0.199 0.303
## 9 0.102 0.200 0.307
## 10 0.103 0.202 0.307
## 11 0.100 0.196 0.298
## 12 0.101 0.198 0.300
## 13 0.101 0.198 0.300
## 14 0.097 0.191 0.293
## 15 0.097 0.190 0.288
## 16 0.096 0.188 0.285
## 17 0.095 0.185 0.281
## 18 0.099 0.192 0.289
## 19 0.098 0.192 0.289
## 20 0.097 0.190 0.286
## 21 0.095 0.187 0.283
## 22 0.095 0.186 0.279
## 23 0.098 0.189 0.282
## 24 0.096 0.185 0.275
## 25 0.092 0.180 0.268
## 26 0.093 0.181 0.271
## 27 0.094 0.183 0.272
## 28 0.093 0.179 0.266
## 29 0.090 0.175 0.261
## 30 0.092 0.179 0.268
## 31 0.092 0.178 0.266
## 32 0.080 0.160 0.243
## 33 0.078 0.156 0.237
## 34 0.078 0.156 0.236
## 35 0.080 0.158 0.239
## 36 0.078 0.155 0.236
## 37 0.078 0.155 0.237
## 38 0.077 0.155 0.237
## 39 0.075 0.151 0.231
## 40 0.073 0.147 0.225
## 41 0.072 0.147 0.226
## 42 0.072 0.146 0.226
## 43 0.070 0.141 0.220
## 44 0.069 0.140 0.216
## 45 0.070 0.141 0.216
## 46 0.069 0.139 0.213
## 47 0.069 0.139 0.213
## 48 0.068 0.137 0.211
## 49 0.067 0.136 0.210
## 50 0.067 0.134 0.207
## 51 0.069 0.139 0.213
## 52 0.070 0.140 0.216
## 53 0.068 0.138 0.214
## 54 0.068 0.138 0.214
## 55 0.067 0.135 0.209
## 56 0.065 0.133 0.206
## 57 0.070 0.141 0.218
str(df_long)
## gropd_df [342 × 17] (S3: grouped_df/tbl_df/tbl/data.frame)
## $ Year : num [1:342] 1967 1967 1967 1967 1967 ...
## $ Lowest quintile $ : num [1:342] 11550 11550 11550 11550 11550 ...
## $ Second quintile $ : num [1:342] 32010 32010 32010 32010 32010 ...
## $ Third quintile $ : num [1:342] 51100 51100 51100 51100 51100 ...
## $ Fourth quintile $ : num [1:342] 71500 71500 71500 71500 71500 71500 74400 74400 74400 74400 ...
## $ Highest quintile $ : num [1:342] 128700 128700 128700 128700 128700 ...
## $ Top 5 percent $ : num [1:342] 203000 203000 203000 203000 203000 ...
## $ Gini index of income inequality : chr [1:342] "0.397" "0.397" "0.397" "0.397" ...
## $ Mean logarithmic deviation of income: chr [1:342] "0.377" "0.377" "0.377" "0.377" ...
## $ Theil : chr [1:342] "0.280" "0.280" "0.280" "0.280" ...
## $ e=0.25 : chr [1:342] "0.070" "0.070" "0.070" "0.070" ...
## $ e=0.50 : chr [1:342] "0.141" "0.141" "0.141" "0.141" ...
## $ e=0.75 : chr [1:342] "0.218" "0.218" "0.218" "0.218" ...
## $ Quintile : chr [1:342] "Fourth quintile %" "Highest quintile %" "Lowest quintile %" "Second quintile %" ...
## $ Wealth Distribution Percentage : num [1:342] 0.242 0.436 0.04 0.108 0.173 0.172 0.245 0.426 0.042 0.111 ...
## $ Change : num [1:342] NA NA NA NA NA ...
## $ hover_text : chr [1:342] "Year: 1967<br>Quintile: Fourth quintile %<br>Wealth Distribution Percentage: 24.20%<br>Change from previous year: NA" "Year: 1967<br>Quintile: Highest quintile %<br>Wealth Distribution Percentage: 43.60%<br>Change from previous year: NA" "Year: 1967<br>Quintile: Lowest quintile %<br>Wealth Distribution Percentage: 4.00%<br>Change from previous year: NA" "Year: 1967<br>Quintile: Second quintile %<br>Wealth Distribution Percentage: 10.80%<br>Change from previous year: NA" ...
## - attr(*, "groups")= tibble [6 × 2] (S3: tbl_df/tbl/data.frame)
## ..$ Quintile: chr [1:6] "Fourth quintile %" "Highest quintile %" "Lowest quintile %" "Second quintile %" ...
## ..$ .rows : list<int> [1:6]
## .. ..$ : int [1:57] 1 7 13 19 25 31 37 43 49 55 ...
## .. ..$ : int [1:57] 2 8 14 20 26 32 38 44 50 56 ...
## .. ..$ : int [1:57] 3 9 15 21 27 33 39 45 51 57 ...
## .. ..$ : int [1:57] 4 10 16 22 28 34 40 46 52 58 ...
## .. ..$ : int [1:57] 5 11 17 23 29 35 41 47 53 59 ...
## .. ..$ : int [1:57] 6 12 18 24 30 36 42 48 54 60 ...
## .. ..@ ptype: int(0)
## ..- attr(*, ".drop")= logi TRUE
My Animation Gini Ratio Overtime
# Load required libraries
library(gganimate)
# Convert the 'Gini index of income inequality' column to numeric if it's not already
df$`Gini index of income inequality` <- as.numeric(as.character(df$`Gini index of income inequality`))
# Create the plot
p <- ggplot(df, aes(x = Year, y = `Gini index of income inequality`, group = 1)) +
geom_line(size = 1.2, color = "#0073C2FF") +
geom_point(size = 2, color = "#0073C2FF") +
labs(title = 'Gini Index of Income Inequality by Year', y = 'Gini Index', x = 'Year') +
theme_minimal() +
ylim(0.35, 0.5) # Make sure this range includes all your data points
# Animate the plot
animated_plot <- p +
transition_time(Year) +
shadow_mark(past = TRUE, size = 1, color = 'blue') +
ease_aes('linear')
# Save the animation
anim_save("gini_ratio_animation.gif", animation = animated_plot, fps = 10, end_pause = 50, width = 800, height = 600)
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
My Animation Quantile Wealth Distribution Overtime
# Create the plot
p <- ggplot(df_long, aes(x = Year, y = `Wealth Distribution Percentage`, color = Quintile, group = Quintile)) +
geom_line(size = 1.2) +
geom_point(size = 1.5) +
scale_color_viridis(discrete = TRUE, option = "D") +
labs(title = 'Wealth Distribution by Quintile',
subtitle = "Year-over-year change in wealth distribution percentages",
x = "Year", y = "Wealth Distribution Percentage",
color = "Quintile") +
theme_minimal(base_size = 14) +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 14)) +
scale_y_continuous(labels = scales::label_percent(), limits = c(0, .55))
# Animate the plot with shadow lines that match the color of the dots
anim <- p + transition_time(Year) +
shadow_mark(past = TRUE, size = 0.8) + # Maintain the color of the Quintile
ease_aes('linear')
# Save the animation
anim_save("wealth_distribution_animation.gif", animation = anim, nframes = 200, duration = 20, end_pause = 50)
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
My goal now is to create a dynamic treemap to visualize the proportion of wealth held by different quintiles at a glance. A treemap can efficiently represent the hierarchical structure of wealth distribution, with each block’s size reflecting the wealth share of a particular quintile. This visualization will provide a clear and immediate understanding of the relative wealth sizes and changes over time. By comparing treemaps from 1969 and 2022, we can visually observe the shifts in wealth concentration, making it easier to communicate the complex data story of wealth disparity and the middle class’s changing economic status.
The process begins with data preparation, ensuring the dataset is clean and ready for visualization. Initial steps include using head(df, n = 10) to preview the dataset and summarizing NA counts across each column to identify and address missing values. Checking the structure of the data frame with str(pdf) and ensuring that all percentage columns are correctly formatted as numerical values are crucial steps in data cleaning. Once the dataset is in good shape, I will select the necessary columns and reshape the data with pivot_longer, preparing it for the treemap. For the treemap of the year 2022, I will filter the data for that specific year, create the visualization with plot_ly, using the type = “treemap” parameter, and apply the Viridis color scale for aesthetic coherence and clarity. This treemap will serve as a powerful snapshot of the current state of wealth distribution, illustrating the percentage of wealth each quintile holds, and will be a pivotal part of the analysis and discussion around the shifts in wealth distribution over more than five decades.
# Counting NAs in each column
na_count <- df %>% summarize(across(everything(), ~ sum(is.na(.))))
# Print the count of NAs
print(na_count)
## Year Lowest quintile $ Second quintile $ Third quintile $ Fourth quintile $
## 1 0 0 0 0 0
## Highest quintile $ Top 5 percent $ Lowest quintile % Second quintile %
## 1 0 0 0 0
## Third quintile % Fourth quintile % Highest quintile % Top 5 percent %
## 1 0 0 0 0
## Gini index of income inequality Mean logarithmic deviation of income Theil
## 1 0 0 0
## e=0.25 e=0.50 e=0.75
## 1 0 0 0
# Prepare the data
df1 <- df %>%
select(Year, `Lowest quintile %`, `Second quintile %`, `Third quintile %`, `Fourth quintile %`, `Highest quintile %`, `Top 5 percent %`) %>%
pivot_longer(-Year, names_to = "Quintile", values_to = "Percentage") %>%
mutate(Percentage = as.numeric(Percentage))
# Counting NAs in each column
na_count <- df1 %>% summarize(across(everything(), ~ sum(is.na(.))))
# Print the count of NAs
print(na_count)
## # A tibble: 1 × 3
## Year Quintile Percentage
## <int> <int> <int>
## 1 0 0 0
# Checking the structure of the dataframe to see data types
str(df)
## 'data.frame': 57 obs. of 19 variables:
## $ Year : num 2022 2021 2020 2019 2018 ...
## $ Lowest quintile $ : num 16120 16020 16530 17410 15920 ...
## $ Second quintile $ : num 43850 44240 44800 46300 43110 ...
## $ Third quintile $ : num 74730 76430 76920 78520 73480 ...
## $ Fourth quintile $ : num 119900 124500 124300 126600 117400 ...
## $ Highest quintile $ : num 277300 290400 286800 289800 270300 ...
## $ Top 5 percent $ : num 499900 517800 504500 513800 481400 ...
## $ Lowest quintile % : chr "3.0" "2.9" "3.0" "3.1" ...
## $ Second quintile % : chr "8.2" "8.0" "8.2" "8.3" ...
## $ Third quintile % : chr "14.0" "13.9" "14.0" "14.1" ...
## $ Fourth quintile % : chr "22.5" "22.6" "22.6" "22.7" ...
## $ Highest quintile % : chr "52.1" "52.7" "52.2" "51.9" ...
## $ Top 5 percent % : chr "23.5" "23.5" "23.0" "23.0" ...
## $ Gini index of income inequality : num 0.488 0.494 0.488 0.484 0.486 0.482 0.481 0.479 0.48 0.482 ...
## $ Mean logarithmic deviation of income: chr "0.637" "0.634" "0.617" "0.590" ...
## $ Theil : chr "0.440" "0.448" "0.437" "0.432" ...
## $ e=0.25 : chr "0.106" "0.108" "0.105" "0.104" ...
## $ e=0.50 : chr "0.207" "0.211" "0.206" "0.203" ...
## $ e=0.75 : chr "0.315" "0.320" "0.313" "0.306" ...
# Checking the class of the Percentage column specifically
class(df$`Percentage`)
## [1] "NULL"
# Checking the structure of the dataframe to see data types
str(df1)
## tibble [342 × 3] (S3: tbl_df/tbl/data.frame)
## $ Year : num [1:342] 2022 2022 2022 2022 2022 ...
## $ Quintile : chr [1:342] "Lowest quintile %" "Second quintile %" "Third quintile %" "Fourth quintile %" ...
## $ Percentage: num [1:342] 3 8.2 14 22.5 52.1 23.5 2.9 8 13.9 22.6 ...
# Checking the class of the Percentage column specifically
class(df1$`Percentage`)
## [1] "numeric"
# Filter the data for the year 2022
df_2022 <- df1 %>% filter(Year == 2022)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2021
df_2022 <- df1 %>% filter(Year == 2021)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2020
df_2022 <- df1 %>% filter(Year == 2020)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2019
df_2022 <- df1 %>% filter(Year == 2019)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2018
df_2022 <- df1 %>% filter(Year == 2018)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2017
df_2022 <- df1 %>% filter(Year == 2017)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2016
df_2022 <- df1 %>% filter(Year == 2016)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2015
df_2022 <- df1 %>% filter(Year == 2015)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2014
df_2022 <- df1 %>% filter(Year == 2014)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2013
df_2022 <- df1 %>% filter(Year == 2013)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2012
df_2022 <- df1 %>% filter(Year == 2012)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2011
df_2022 <- df1 %>% filter(Year == 2011)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2010
df_2022 <- df1 %>% filter(Year == 2010)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2009
df_2022 <- df1 %>% filter(Year == 2009)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2008
df_2022 <- df1 %>% filter(Year == 2008)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2007
df_2022 <- df1 %>% filter(Year == 2007)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
# Filter the data for the year 2006
df_2022 <- df1 %>% filter(Year == 2006)
# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
type = "treemap",
labels = ~Quintile,
parents = ~Year,
values = ~Percentage,
textinfo = "label+value",
hoverinfo = 'text',
hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
layout(colorway = viridis_pal(option = "D")(7)) # Set Viridis color scale
# Show the treemap
p
The treemaps presented for the years 2005 to 2022 offer a vivid portrayal of the wealth distribution across different quintiles. In the 2022 treemap, the highest quintile holds more than half of the wealth, represented by the most significant area on the map, a clear visual testament to the wealth concentration at the top. This trend is consistent with the preceding years, with 2019 showing a slightly smaller but dominant share for the highest quintile and the top 5 percent. The consistency of the large area representing the highest quintile across all treemaps underlines the persistence of wealth inequality.
Furthermore, these treemaps illustrate the relatively minimal shares the lower quintiles hold. In 2022, the lowest quintile’s share is a mere 3%, which has not seen any substantial growth in the preceding years. The second and third quintiles combined account for a significantly smaller proportion of the wealth than the fourth quintile, dwarfing the highest quintile’s share. This disparity, barely changing through the years, supports the hypothesis of a shrinking middle class when considering wealth distribution. The middle quintiles, supposedly representative of the middle class, do not exhibit growth in wealth share, indicating that the economic gains over these years have not been equitably distributed across society. These treemaps, therefore, are not just representations of data; they are a stark illustration of the growing economic divide.
head(df1, n = 57)
## # A tibble: 57 × 3
## Year Quintile Percentage
## <dbl> <chr> <dbl>
## 1 2022 Lowest quintile % 3
## 2 2022 Second quintile % 8.2
## 3 2022 Third quintile % 14
## 4 2022 Fourth quintile % 22.5
## 5 2022 Highest quintile % 52.1
## 6 2022 Top 5 percent % 23.5
## 7 2021 Lowest quintile % 2.9
## 8 2021 Second quintile % 8
## 9 2021 Third quintile % 13.9
## 10 2021 Fourth quintile % 22.6
## # ℹ 47 more rows
My Goal was to make an interactive Tree map: but for some reason I was unable to complete this task.
library(shiny)
library(plotly)
library(dplyr)
library(tidyr)
library(viridisLite) # For the viridis color palette
# Prepare the data - select the years 2020 to 2022
df1 <- df %>%
filter(Year >= 2020 & Year <= 2022) %>%
select(Year, `Lowest quintile %`, `Second quintile %`, `Third quintile %`, `Fourth quintile %`, `Highest quintile %`, `Top 5 percent %`) %>%
pivot_longer(-Year, names_to = "Quintile", values_to = "Percentage") %>%
mutate(Percentage = as.numeric(Percentage)) %>%
mutate(Year = as.factor(Year), Quintile = as.factor(Quintile)) # Ensure these are factors
# Define the UI
ui <- fluidPage(
titlePanel("Treemap of Wealth Distribution"),
plotlyOutput("treemapPlot")
)
server <- function(input, output, session) {
# Use this to trigger a breakpoint in the Shiny app and inspect variables
# browser()
output$treemapPlot <- renderPlotly({
# Check if the data frame has rows
if(nrow(df1) == 0) {
stop("Data frame 'df1' is empty. Check your data filtering steps.")
}
# Check the structure of df1
print(str(df1))
# Create the treemap outside of Shiny to see if it renders
p <- plot_ly(
data = df1,
labels = ~Quintile,
values = ~Percentage,
type = 'treemap'
) %>%
layout(title = "Wealth Distribution Over Time")
# Print the plot
print(p)
})
}
# Run the application
#shinyApp(ui = ui, server = server)
Creating an interactive treemap to visualize wealth distribution across different quintiles over time was an ambitious and insightful goal. I envisioned a tool that could dynamically show changes from 1969 to 2022, allowing users to explore the nuances of economic inequality. Using a Shiny app was crucial for interactivity, where a slider could adjust the year and dynamically update the treemap to reflect that period’s data.
The struggles began with the complexity of the task. Implementing an
interactive slider that adjusted the data year by year required a static
visualization and a reactive one that could update with user input. The
R code for generating the treemap using the treemap and plotly packages
was theoretically sound, but the devil was in the details. The error
messages provided clues to the underlying issues. The first challenge
was ensuring the data frame, df, lacked missing values, as these would
break the treemap function. This necessitated meticulous data cleaning,
which often feels like chasing shadows as one missing value resolved can
lead to another issue elsewhere. Another error, “Can’t combine Quintile
The visualization also displayed a single quintile and its percentage value rather than a full spectrum for each year. This suggested an issue with how the data was being grouped or indexed in the treemap function. The labels, quintiles, and corresponding values did not align as intended.
In the face of these challenges and the looming deadline, I had to make a tough decision. After hours of debugging, searching for solutions, and consulting documentation, it became clear that meeting the deadline with a fully interactive treemap was impossible. I had to abandon the interactive element, perhaps settling for a static representation or a more straightforward visualization.