Wankelman Module 12.3

My recent endeavor has been to investigate the hypothesis that the middle class is shrinking in the United States. My primary tool in this analysis has been an extensive examination of census data sets from 1969 to 2022. This data includes crucial information such as the monetary boundaries defining each quantile and the corresponding percentage of wealth each quantile possesses. By meticulously dissecting these data sets, I aimed to uncover trends and patterns that could confirm or refute the notion of a contracting middle class. This longitudinal data is critical, as it allows for observing changes over a significant period, offering a comprehensive view of the economic shifts affecting various income groups.

To illustrate these trends effectively, I utilized line graphs and treemaps in my analysis. Line graphs are handy for showcasing the evolution of income distribution over time, allowing us to observe any significant shifts in the wealth of the middle class relative to other income groups. On the other hand, treemaps offer a more nuanced visual representation of wealth distribution across different quantiles. They provide an intuitive understanding of how the economic landscape has transformed, highlighting the disparities in wealth accumulation among these groups. My ultimate goal is to develop a dynamic treemap within a Shiny application. This interactive platform will enable users to explore the data in real-time, observing how the percentage of wealth held by each quantile has evolved year by year, thus offering a compelling and user-friendly way to engage with the data and understand the economic changes impacting the middle class.

# Load required Libraries
library(ggplot2)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(knitr)
library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

library(ggrepel)
library(RColorBrewer)
library(reshape2)

## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths

library(scales)

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

library(dplyr)
library(hexbin)
library(lubridate)
library(geojsonio)

## Registered S3 method overwritten by 'geojsonsf':
##   method        from   
##   print.geojson geojson
## 
## Attaching package: 'geojsonio'
## 
## The following object is masked from 'package:base':
## 
##     pretty

library(sf)

## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

library(leaflet)
library(shiny)
library(d3Tree)
library(treemap)
library(tidycensus)
library(maps)

## 
## Attaching package: 'maps'
## 
## The following object is masked from 'package:purrr':
## 
##     map

library(tigris)

## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.

library(shinyWidgets)
library(rlang)

## 
## Attaching package: 'rlang'
## 
## The following objects are masked from 'package:purrr':
## 
##     %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
##     flatten_raw, invoke, splice

library(viridis)

## Loading required package: viridisLite
## 
## Attaching package: 'viridis'
## 
## The following object is masked from 'package:maps':
## 
##     unemp
## 
## The following object is masked from 'package:scales':
## 
##     viridis_pal

df <- read.csv("~/Desktop/JHU/Semester 1/Data Visualization/Class Project/Module 12/Income Dispersion 1967 to 2022 cleaned.csv")

#Familiarize myself with Data
head(df)

##   Year      Measures.of.income.dispersion               X            X.1
## 1   NA Mean household income of quintiles                               
## 2   NA                                                                  
## 3   NA                    Lowest quintile Second quintile Third quintile
## 4 2022                             16,120          43,850         74,730
## 5 2021                             16,020          44,240         76,430
## 6 2020                             16,530          44,800         76,920
##               X.2              X.3            X.4
## 1                                                
## 2                                                
## 3 Fourth quintile Highest quintile Top 5 percent 
## 4         119,900          277,300        499,900
## 5         124,500          290,400        517,800
## 6         124,300          286,800        504,500
##                                    X.5             X.6            X.7
## 1 Shares of household income quintiles                               
## 2                                                                    
## 3                      Lowest quintile Second quintile Third quintile
## 4                                  3.0             8.2           14.0
## 5                                  2.9             8.0           13.9
## 6                                  3.0             8.2           14.0
##               X.8              X.9           X.10
## 1                                                
## 2                                                
## 3 Fourth quintile Highest quintile Top 5 percent 
## 4            22.5             52.1           23.5
## 5            22.6             52.7           23.5
## 6            22.6             52.2           23.0
##                              X.11                                 X.12  X.13
## 1                Summary measures                                           
## 2 Gini index of income inequality Mean logarithmic deviation of income Theil
## 3                                                                           
## 4                           0.488                                0.637 0.440
## 5                           0.494                                0.634 0.448
## 6                           0.488                                0.617 0.437
##       X.14   X.15   X.16
## 1                       
## 2 Atkinson              
## 3   e=0.25 e=0.50 e=0.75
## 4    0.106  0.207  0.315
## 5    0.108  0.211  0.320
## 6    0.105  0.206  0.313

#remove 1st two rows in the header
df <- df[-c(1, 2), ]

When I first loaded the dataset from the CSV, it presented a challenge due to its disorganized format and extraneous columns. The initial structure included a mix of numeric and character types, with multiple columns labeled generically as ‘X,’ ‘X.1’, ‘X.2’, and so forth, making it difficult to decipher the data meaningfully. It was evident that considerable cleaning and reformatting was necessary to make this dataset usable for my analysis.

To address these issues, I implemented a series of steps in R. First, I set the third row, which contained the relevant headers, as the new header for the dataset. This involved storing the third row as ‘new_header’ and removing the first few rows containing incorrect headers and the row used for the new headers. Subsequently, I assigned these new headers as column names, reset the row names to align correctly, and checked the first few rows to ensure the changes were correctly applied. Continuing the cleanup process, I meticulously renamed each column to reflect the data it contained accurately. This step was crucial for clarity and ease of analysis. Columns were renamed to represent different income quantiles in dollars and percentages and summary measures such as the Gini index of income inequality, mean logarithmic deviation of income, and the Theil index. Through these steps, I transformed a disordered and challenging dataset into a structured and analyzable format, setting the stage for my investigation into the state of the middle class.

head(df)

##   Year Measures.of.income.dispersion               X            X.1
## 3   NA               Lowest quintile Second quintile Third quintile
## 4 2022                        16,120          43,850         74,730
## 5 2021                        16,020          44,240         76,430
## 6 2020                        16,530          44,800         76,920
## 7 2019                        17,410          46,300         78,520
## 8 2018                        15,920          43,110         73,480
##               X.2              X.3            X.4             X.5
## 3 Fourth quintile Highest quintile Top 5 percent  Lowest quintile
## 4         119,900          277,300        499,900             3.0
## 5         124,500          290,400        517,800             2.9
## 6         124,300          286,800        504,500             3.0
## 7         126,600          289,800        513,800             3.1
## 8         117,400          270,300        481,400             3.1
##               X.6            X.7             X.8              X.9
## 3 Second quintile Third quintile Fourth quintile Highest quintile
## 4             8.2           14.0            22.5             52.1
## 5             8.0           13.9            22.6             52.7
## 6             8.2           14.0            22.6             52.2
## 7             8.3           14.1            22.7             51.9
## 8             8.3           14.1            22.6             52.0
##             X.10  X.11  X.12  X.13   X.14   X.15   X.16
## 3 Top 5 percent                    e=0.25 e=0.50 e=0.75
## 4           23.5 0.488 0.637 0.440  0.106  0.207  0.315
## 5           23.5 0.494 0.634 0.448  0.108  0.211  0.320
## 6           23.0 0.488 0.617 0.437  0.105  0.206  0.313
## 7           23.0 0.484 0.590 0.432  0.104  0.203  0.306
## 8           23.1 0.486 0.616 0.436  0.105  0.205  0.311

# Store the third row as the new header
new_header <- as.character(df[1,])

# Remove the first row of the dataframe (which includes the incorrect headers and the row used for the new headers)
df <- df[-(1:1), ]

# Set the new headers as column names
colnames(df) <- new_header

# Reset the row names in case they are now out of sync
rownames(df) <- NULL

# Check the first few rows to ensure changes have been applied
head(df)

##     NA Lowest quintile Second quintile Third quintile Fourth quintile
## 1 2022          16,120          43,850         74,730         119,900
## 2 2021          16,020          44,240         76,430         124,500
## 3 2020          16,530          44,800         76,920         124,300
## 4 2019          17,410          46,300         78,520         126,600
## 5 2018          15,920          43,110         73,480         117,400
## 6 2017          15,630          41,740         72,590         116,800
##   Highest quintile Top 5 percent  Lowest quintile Second quintile
## 1          277,300        499,900             3.0             8.2
## 2          290,400        517,800             2.9             8.0
## 3          286,800        504,500             3.0             8.2
## 4          289,800        513,800             3.1             8.3
## 5          270,300        481,400             3.1             8.3
## 6          261,600        454,300             3.1             8.2
##   Third quintile Fourth quintile Highest quintile Top 5 percent             
## 1           14.0            22.5             52.1           23.5 0.488 0.637
## 2           13.9            22.6             52.7           23.5 0.494 0.634
## 3           14.0            22.6             52.2           23.0 0.488 0.617
## 4           14.1            22.7             51.9           23.0 0.484 0.590
## 5           14.1            22.6             52.0           23.1 0.486 0.616
## 6           14.3            23.0             51.5           22.3 0.482 0.609
##         e=0.25 e=0.50 e=0.75
## 1 0.440  0.106  0.207  0.315
## 2 0.448  0.108  0.211  0.320
## 3 0.437  0.105  0.206  0.313
## 4 0.432  0.104  0.203  0.306
## 5 0.436  0.105  0.205  0.311
## 6 0.424  0.103  0.202  0.307

# Set the name of the columns"
colnames(df)[1] <- "Year"
colnames(df)[2] <- "Lowest quintile $"
colnames(df)[3] <- "Second quintile $"
colnames(df)[4] <- "Third quintile $"
colnames(df)[5] <- "Fourth quintile $"
colnames(df)[6] <- "Highest quintile $"
colnames(df)[7] <- "Top 5 percent $"
colnames(df)[8] <- "Lowest quintile %"
colnames(df)[9] <- "Second quintile %"
colnames(df)[10] <- "Third quintile %"
colnames(df)[11] <- "Fourth quintile %"
colnames(df)[12] <- "Highest quintile %"
colnames(df)[13] <- "Top 5 percent %"
head(df)

##   Year Lowest quintile $ Second quintile $ Third quintile $ Fourth quintile $
## 1 2022            16,120            43,850           74,730           119,900
## 2 2021            16,020            44,240           76,430           124,500
## 3 2020            16,530            44,800           76,920           124,300
## 4 2019            17,410            46,300           78,520           126,600
## 5 2018            15,920            43,110           73,480           117,400
## 6 2017            15,630            41,740           72,590           116,800
##   Highest quintile $ Top 5 percent $ Lowest quintile % Second quintile %
## 1            277,300         499,900               3.0               8.2
## 2            290,400         517,800               2.9               8.0
## 3            286,800         504,500               3.0               8.2
## 4            289,800         513,800               3.1               8.3
## 5            270,300         481,400               3.1               8.3
## 6            261,600         454,300               3.1               8.2
##   Third quintile % Fourth quintile % Highest quintile % Top 5 percent %      
## 1             14.0              22.5               52.1            23.5 0.488
## 2             13.9              22.6               52.7            23.5 0.494
## 3             14.0              22.6               52.2            23.0 0.488
## 4             14.1              22.7               51.9            23.0 0.484
## 5             14.1              22.6               52.0            23.1 0.486
## 6             14.3              23.0               51.5            22.3 0.482
##               e=0.25 e=0.50 e=0.75
## 1 0.637 0.440  0.106  0.207  0.315
## 2 0.634 0.448  0.108  0.211  0.320
## 3 0.617 0.437  0.105  0.206  0.313
## 4 0.590 0.432  0.104  0.203  0.306
## 5 0.616 0.436  0.105  0.205  0.311
## 6 0.609 0.424  0.103  0.202  0.307

# Set the name of the remaining columns"
colnames(df)[14] <- "Gini index of income inequality"
colnames(df)[15] <- "Mean logarithmic deviation of income"
colnames(df)[16] <- "Theil"
head(df)

##   Year Lowest quintile $ Second quintile $ Third quintile $ Fourth quintile $
## 1 2022            16,120            43,850           74,730           119,900
## 2 2021            16,020            44,240           76,430           124,500
## 3 2020            16,530            44,800           76,920           124,300
## 4 2019            17,410            46,300           78,520           126,600
## 5 2018            15,920            43,110           73,480           117,400
## 6 2017            15,630            41,740           72,590           116,800
##   Highest quintile $ Top 5 percent $ Lowest quintile % Second quintile %
## 1            277,300         499,900               3.0               8.2
## 2            290,400         517,800               2.9               8.0
## 3            286,800         504,500               3.0               8.2
## 4            289,800         513,800               3.1               8.3
## 5            270,300         481,400               3.1               8.3
## 6            261,600         454,300               3.1               8.2
##   Third quintile % Fourth quintile % Highest quintile % Top 5 percent %
## 1             14.0              22.5               52.1            23.5
## 2             13.9              22.6               52.7            23.5
## 3             14.0              22.6               52.2            23.0
## 4             14.1              22.7               51.9            23.0
## 5             14.1              22.6               52.0            23.1
## 6             14.3              23.0               51.5            22.3
##   Gini index of income inequality Mean logarithmic deviation of income Theil
## 1                           0.488                                0.637 0.440
## 2                           0.494                                0.634 0.448
## 3                           0.488                                0.617 0.437
## 4                           0.484                                0.590 0.432
## 5                           0.486                                0.616 0.436
## 6                           0.482                                0.609 0.424
##   e=0.25 e=0.50 e=0.75
## 1  0.106  0.207  0.315
## 2  0.108  0.211  0.320
## 3  0.105  0.206  0.313
## 4  0.104  0.203  0.306
## 5  0.105  0.205  0.311
## 6  0.103  0.202  0.307

I aim to transform and visualize the income data across different quintiles over time. Firstly, I ensure that the ‘Year’ column is correctly formatted as a numerical value, and then I convert the income columns to numeric, stripping any commas for accurate numerical representation. By using tidyr::pivot_longer, I reshape the data from a wide format to a long format, which is more suitable for the ggplot2 library in R, allowing me to create a visually appealing and informative line graph. This graph, enhanced with the viridis color palette for clear differentiation between quintiles, is designed to display trends in mean household income by income quintile over the years. Finally, I convert the static ggplot into an interactive plotly object, which enhances the user experience by enabling interactive exploration of the data points through tooltips.

# Create income column
income_columns <- c("Lowest quintile $", "Second quintile $", "Third quintile $", "Fourth quintile $", "Highest quintile $", "Top 5 percent $")

# Ensuring that the Year column is treated as a numerical value if it does not contain any non-numeric characters
df$Year <- as.numeric(gsub("[^0-9]", "", df$Year))

#Use the actual column names.
df[income_columns] <- lapply(df[income_columns], function(x) as.numeric(gsub(",", "", as.character(x))))


# Use tidyr::pivot_longer instead of gather
df_long <- df %>%
  pivot_longer(cols = all_of(income_columns), names_to = "Quintile", values_to = "Income")

# Plot the data with ggplot2 using the viridis color palette
p <- ggplot(df_long, aes(x = Year, y = Income, color = Quintile, group = Quintile)) +
  geom_line(size = 1.2) + 
  geom_point(size = 1.5) +  
  scale_color_viridis(discrete = TRUE, option = "D") +
  labs(title = "Income Trend Analysis by Quintile",
       subtitle = "Trends in mean household income across different income quintiles",
       x = "Year",
       y = "Mean Household Income",
       color = "Income Quintile") +
  theme_minimal(base_size = 14) +
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = 16),
        plot.subtitle = element_text(size = 14)) +
  scale_y_continuous(labels = scales::dollar_format(prefix = "$", suffix = "K", scale = 1e-3))

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Convert to an interactive plotly object
ggplotly(p, tooltip = c("y", "x"))

Analyzing the “Income Trend Analysis by Quintile chart,” I can depict mean household income trends across various income quintiles from 1969 to 2022. The chart shows that the highest quintile and the top 5 percent have experienced significant income growth over this period, which is indicated by the steep upward trajectory of their lines. In contrast, the lowest three quintiles have seen only modest increases in mean household income, as their lines are relatively flat, suggesting that income growth has been disproportionately concentrated at the top, which could indicate widening income inequality and potentially a shrinking middle class.

The chart reveals a concerning trend where income growth is stagnating for the lower and middle quintiles and becoming more disparate over time, with the gap between the top earners and the rest widening. The highest quintile’s and the top 5 percent’s lines ascend more sharply and diverge further from the rest as time progresses, highlighting an economy increasingly skewed towards the wealthy. This divergence is particularly evident from the early 2000s onwards, coinciding with economic policies and market changes that may have favored capital over labor, technology over traditional industries, and investment income over wages. The relatively stagnant growth lines of the lower quintiles indicate that despite overall economic growth, the benefits have not been evenly distributed, which points towards a shrinking middle class, as their share of the pie is not keeping pace with the upper quintiles. This visual evidence aligns with the data I’ve been analyzing, where the Gini coefficient, a measure of income inequality, has generally increased, reinforcing the suggestion that the economic growth experienced over the last few decades has not been equitably shared.

# Create a wealth_percentage_columns as a vector containing the names of columns with percentages
wealth_percentage_columns <- names(df)[grepl("%", names(df))]

# Convert percentage columns to numeric
df_long <- df %>%
  pivot_longer(cols = all_of(wealth_percentage_columns), names_to = "Quintile", values_to = "Wealth Distribution Percentage") %>%
  mutate(`Wealth Distribution Percentage` = as.numeric(`Wealth Distribution Percentage`)) %>% # Convert to numeric
  mutate(`Wealth Distribution Percentage` = `Wealth Distribution Percentage` / 100) # Then divide by 100

# Calculate year-over-year change for each quintile
df_long <- df_long %>%
  arrange(Year, Quintile) %>%
  group_by(Quintile) %>%  # Group only by Quintile to calculate the change year-over-year
  mutate(Change = `Wealth Distribution Percentage` - lag(`Wealth Distribution Percentage`))

# Create custom hover text including the year and the percentage change
df_long <- df_long %>%
  mutate(hover_text = paste0(
    "Year: ", Year, "<br>",
    "Quintile: ", Quintile, "<br>",
    "Wealth Distribution Percentage: ", scales::percent(`Wealth Distribution Percentage`), "<br>",
    "Change from previous year: ", 
    ifelse(is.na(Change), "NA", paste0(sprintf("%.2f%%", Change * 100)))
  ))

# Plot the data with ggplot2
p <- ggplot(df_long, aes(x = Year, y = `Wealth Distribution Percentage`, text = hover_text, color = Quintile, group = Quintile)) +
  geom_line(size = 1.2) +
  geom_point(size = 1.5) +
  scale_color_viridis(discrete = TRUE, option = "D") +
  labs(title = "Wealth Distribution Trend Analysis by Quintile",
       subtitle = "Year-over-year change in wealth distribution percentages",
       x = "Year",
       y = "Wealth Distribution Percentage",
       color = "Quintile") +
  theme_minimal(base_size = 14) +
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = 16),
        plot.subtitle = element_text(size = 14)) +
  scale_y_continuous(labels = scales::label_percent(), limits = c(0, 0.55)) # Correct y-axis scale

# Convert to an interactive plotly object with custom hover text
ggplotly(p, tooltip = "hover_text")

The “Wealth Distribution Trend Analysis by Quintile” chart provides a stark visual representation of wealth disparity across different income quintiles over time. It is immediately apparent that the highest quintile, along with the top 5 percent, has a growing share of wealth, with the highest quintile’s share surpassing 50% and the top 5 percent’s share significantly higher than any of the other lower quintiles. The lower and middle quintiles, represented by the lowest three lines, show a relatively flat, if not slightly decreasing trend, indicating that their share of total wealth has generally stayed the same over time if not decreased slightly.

What stands out is the apparent slight decrease in the lower quintiles’ wealth share, which suggests that there has been little to no relative improvement in their wealth position despite the overall economic growth. This stability contrasts sharply with the dynamic growth of the highest quintile, reinforcing the notion of an economic environment that increasingly favors the accumulation of wealth at the top. The incremental year-over-year changes, as indicated by the code’s calculation of percentage changes, would reveal the nuances of these trends, but the chart suggests that any such changes have not altered the overarching pattern of wealth concentration.

Given this analysis, the chart corroborates the hypothesis of a shrinking middle class in terms of wealth share. It illustrates that while the wealth of the highest earners continues to grow, the middle class needs to see a proportionate increase in wealth, which could result in financial instability and decreased economic mobility for middle and lower-income earners. This unchecked trend may lead to further economic stratification and a hollowing out of the middle class.

head(df, n = 57)

##    Year Lowest quintile $ Second quintile $ Third quintile $ Fourth quintile $
## 1  2022             16120             43850            74730            119900
## 2  2021             16020             44240            76430            124500
## 3  2020             16530             44800            76920            124300
## 4  2019             17410             46300            78520            126600
## 5  2018             15920             43110            73480            117400
## 6  2017             15630             41740            72590            116800
## 7  2016             15530             41400            70970            114200
## 8  2015             15080             39500            68790            111400
## 9  2014             14120             37600            65370            106200
## 10 2013             14220             37800            65930            106100
## 11 2013             14290             37430            64190            102500
## 12 2012             14270             36870            63550            101900
## 13 2011             14220             36960            63080            101400
## 14 2010             14350             37230            64160            102900
## 15 2009             15280             38700            65530            104100
## 16 2008             15360             38890            66050            105100
## 17 2007             15780             40210            68250            108100
## 18 2006             15900             40310            67550            106900
## 19 2005             15360             39440            66740            105000
## 20 2004             15200             38880            65870            103900
## 21 2003             15200             39040            66270            104900
## 22 2002             15510             39420            66430            104500
## 23 2001             15930             40020            66990            105000
## 24 2000             16320             40750            67860            105500
## 25 1999             16480             40470            67740            105400
## 26 1998             15650             39510            66120            102300
## 27 1997             15200             38010            63940             99030
## 28 1996             15100             37060            62340             96480
## 29 1995             15050             36780            61500             94530
## 30 1994             14240             35480            59770             93020
## 31 1993             13880             35190            58980             91660
## 32 1992             14030             35140            59200             90880
## 33 1991             14320             35970            59750             91080
## 34 1990             14720             37030            61170             92220
## 35 1989             15060             37480            62300             94230
## 36 1988             14530             36690            61360             92750
## 37 1987             14300             36330            60750             91820
## 38 1986             13920             35800            59910             90250
## 39 1985             13750             34860            57860             87030
## 40 1984             13770             34300            56840             85580
## 41 1983             13340             33500            55390             83120
## 42 1982             13170             33330            55180             82060
## 43 1981             13420             33430            55390             82560
## 44 1980             13770             34270            56560             83320
## 45 1979             14210             35350            58280             85530
## 46 1978             14310             35150            58040             85020
## 47 1977             13840             34060            56350             82560
## 48 1976             13930             34090            56070             81420
## 49 1975             13590             33370            54750             79550
## 50 1974             14080             34990            56390             81440
## 51 1973             14110             35470            58180             83690
## 52 1972             13460             34770            56740             81460
## 53 1971             12750             33690            54620             77650
## 54 1970             12650             34280            55200             77810
## 55 1969             12870             34750            55530             77780
## 56 1968             12540             33630            53340             74400
## 57 1967             11550             32010            51100             71500
##    Highest quintile $ Top 5 percent $ Lowest quintile % Second quintile %
## 1              277300          499900               3.0               8.2
## 2              290400          517800               2.9               8.0
## 3              286800          504500               3.0               8.2
## 4              289800          513800               3.1               8.3
## 5              270300          481400               3.1               8.3
## 6              261600          454300               3.1               8.2
## 7              256700          450100               3.1               8.3
## 8              245000          424700               3.1               8.2
## 9              234700          402000               3.1               8.2
## 10             237200          410300               3.1               8.2
## 11             227200          395400               3.2               8.4
## 12             225900          394900               3.2               8.3
## 13             225300          394200               3.2               8.4
## 14             221000          374800               3.3               8.5
## 15             226000          390800               3.4               8.6
## 16             225400          388300               3.4               8.6
## 17             229400          392300               3.4               8.7
## 18             235600          416600               3.4               8.6
## 19             230000          405300               3.4               8.6
## 20             224600          391400               3.4               8.7
## 21             223600          385000               3.4               8.7
## 22             223100          389600               3.5               8.8
## 23             229400          409300               3.5               8.7
## 24             228600          405600               3.6               8.9
## 25             224800          390800               3.6               8.9
## 26             216400          377100               3.6               9.0
## 27             211100          370500               3.6               8.9
## 28             202900          353500               3.6               9.0
## 29             197300          340500               3.7               9.1
## 30             195500          337800               3.6               8.9
## 31             191000          327800               3.6               9.0
## 32             176100          279500               3.8               9.4
## 33             174700          272600               3.8               9.6
## 34             179000          285000               3.8               9.6
## 35             184200          297600               3.8               9.5
## 36             177100          279300               3.8               9.6
## 37             174600          275100               3.8               9.6
## 38             170800          267600               3.8               9.7
## 39             162600          250800               3.9               9.8
## 40             157400          237600               4.0               9.9
## 41             152600          230600               4.0               9.9
## 42             150500          227400               4.0              10.0
## 43             147500          219100               4.1              10.1
## 44             148600          222000               4.2              10.2
## 45             153500          234100               4.1              10.2
## 46             151800          231000               4.2              10.2
## 47             147200          225200               4.2              10.2
## 48             144100          219600               4.3              10.3
## 49             140400          213100               4.3              10.4
## 50             144300          219300               4.3              10.6
## 51             149700          230700               4.2              10.4
## 52             146200          226700               4.1              10.4
## 53             137600          211000               4.1              10.6
## 54             137700          211400               4.1              10.8
## 55             136700          210400               4.1              10.9
## 56             129300          197700               4.2              11.1
## 57             128700          203000               4.0              10.8
##    Third quintile % Fourth quintile % Highest quintile % Top 5 percent %
## 1              14.0              22.5               52.1            23.5
## 2              13.9              22.6               52.7            23.5
## 3              14.0              22.6               52.2            23.0
## 4              14.1              22.7               51.9            23.0
## 5              14.1              22.6               52.0            23.1
## 6              14.3              23.0               51.5            22.3
## 7              14.2              22.9               51.5            22.6
## 8              14.3              23.2               51.1            22.1
## 9              14.3              23.2               51.2            21.9
## 10             14.3              23.0               51.4            22.2
## 11             14.4              23.0               51.0            22.2
## 12             14.4              23.0               51.0            22.3
## 13             14.3              23.0               51.1            22.3
## 14             14.6              23.4               50.3            21.3
## 15             14.6              23.2               50.3            21.7
## 16             14.7              23.3               50.0            21.5
## 17             14.8              23.4               49.7            21.2
## 18             14.5              22.9               50.5            22.3
## 19             14.6              23.0               50.4            22.2
## 20             14.7              23.2               50.1            21.8
## 21             14.8              23.4               49.8            21.4
## 22             14.8              23.3               49.7            21.7
## 23             14.6              23.0               50.1            22.4
## 24             14.8              23.0               49.8            22.1
## 25             14.9              23.2               49.4            21.5
## 26             15.0              23.2               49.2            21.4
## 27             15.0              23.2               49.4            21.7
## 28             15.1              23.3               49.0            21.4
## 29             15.2              23.3               48.7            21.0
## 30             15.0              23.4               49.1            21.2
## 31             15.1              23.5               48.9            21.0
## 32             15.8              24.2               46.9            18.6
## 33             15.9              24.2               46.5            18.1
## 34             15.9              24.0               46.6            18.5
## 35             15.8              24.0               46.8            18.9
## 36             16.0              24.2               46.3            18.3
## 37             16.1              24.3               46.2            18.2
## 38             16.2              24.3               46.1            18.0
## 39             16.2              24.4               45.6            17.6
## 40             16.3              24.6               45.2            17.1
## 41             16.4              24.6               45.1            17.0
## 42             16.5              24.5               45.0            17.0
## 43             16.7              24.8               44.3            16.5
## 44             16.8              24.7               44.1            16.5
## 45             16.8              24.6               44.2            16.9
## 46             16.8              24.7               44.1            16.8
## 47             16.9              24.7               44.0            16.8
## 48             17.0              24.7               43.7            16.6
## 49             17.0              24.7               43.6            16.5
## 50             17.0              24.6               43.5            16.5
## 51             17.0              24.5               43.9            16.9
## 52             17.0              24.5               43.9            17.0
## 53             17.3              24.5               43.5            16.7
## 54             17.4              24.5               43.3            16.6
## 55             17.5              24.5               43.0            16.6
## 56             17.6              24.5               42.6            16.3
## 57             17.3              24.2               43.6            17.2
##    Gini index of income inequality Mean logarithmic deviation of income Theil
## 1                            0.488                                0.637 0.440
## 2                            0.494                                0.634 0.448
## 3                            0.488                                0.617 0.437
## 4                            0.484                                0.590 0.432
## 5                            0.486                                0.616 0.436
## 6                            0.482                                0.609 0.424
## 7                            0.481                                0.601 0.426
## 8                            0.479                                0.596 0.420
## 9                            0.480                                0.611 0.419
## 10                           0.482                                0.606 0.428
## 11                           0.476                                0.578 0.415
## 12                           0.477                                0.586 0.423
## 13                           0.477                                0.585 0.422
## 14                           0.470                                0.574 0.400
## 15                           0.468                                0.550 0.403
## 16                           0.466                                0.541 0.398
## 17                           0.463                                0.532 0.391
## 18                           0.470                                0.543 0.417
## 19                           0.469                                0.545 0.411
## 20                           0.466                                0.543 0.406
## 21                           0.464                                0.530 0.397
## 22                           0.462                                0.514 0.398
## 23                           0.466                                0.515 0.413
## 24                           0.462                                0.490 0.404
## 25                           0.458                                0.476 0.386
## 26                           0.456                                0.488 0.389
## 27                           0.459                                0.484 0.396
## 28                           0.455                                0.464 0.389
## 29                           0.450                                0.452 0.378
## 30                           0.456                                0.471 0.387
## 31                           0.454                                0.467 0.385
## 32                           0.433                                0.417 0.324
## 33                           0.428                                0.411 0.313
## 34                           0.428                                0.402 0.317
## 35                           0.431                                0.406 0.324
## 36                           0.426                                0.401 0.314
## 37                           0.426                                0.408 0.314
## 38                           0.425                                0.416 0.310
## 39                           0.419                                0.403 0.300
## 40                           0.415                                0.391 0.290
## 41                           0.414                                0.397 0.288
## 42                           0.412                                0.401 0.287
## 43                           0.406                                0.387 0.277
## 44                           0.403                                0.375 0.274
## 45                           0.404                                0.369 0.279
## 46                           0.402                                0.363 0.275
## 47                           0.402                                0.364 0.276
## 48                           0.398                                0.361 0.271
## 49                           0.397                                0.361 0.270
## 50                           0.395                                0.352 0.267
## 51                           0.400                                0.360 0.275
## 52                           0.401                                0.371 0.279
## 53                           0.396                                0.370 0.273
## 54                           0.394                                0.370 0.271
## 55                           0.391                                0.357 0.268
## 56                           0.386                                0.352 0.261
## 57                           0.397                                0.377 0.280
##    e=0.25 e=0.50 e=0.75
## 1   0.106  0.207  0.315
## 2   0.108  0.211  0.320
## 3   0.105  0.206  0.313
## 4   0.104  0.203  0.306
## 5   0.105  0.205  0.311
## 6   0.103  0.202  0.307
## 7   0.103  0.201  0.305
## 8   0.101  0.199  0.303
## 9   0.102  0.200  0.307
## 10  0.103  0.202  0.307
## 11  0.100  0.196  0.298
## 12  0.101  0.198  0.300
## 13  0.101  0.198  0.300
## 14  0.097  0.191  0.293
## 15  0.097  0.190  0.288
## 16  0.096  0.188  0.285
## 17  0.095  0.185  0.281
## 18  0.099  0.192  0.289
## 19  0.098  0.192  0.289
## 20  0.097  0.190  0.286
## 21  0.095  0.187  0.283
## 22  0.095  0.186  0.279
## 23  0.098  0.189  0.282
## 24  0.096  0.185  0.275
## 25  0.092  0.180  0.268
## 26  0.093  0.181  0.271
## 27  0.094  0.183  0.272
## 28  0.093  0.179  0.266
## 29  0.090  0.175  0.261
## 30  0.092  0.179  0.268
## 31  0.092  0.178  0.266
## 32  0.080  0.160  0.243
## 33  0.078  0.156  0.237
## 34  0.078  0.156  0.236
## 35  0.080  0.158  0.239
## 36  0.078  0.155  0.236
## 37  0.078  0.155  0.237
## 38  0.077  0.155  0.237
## 39  0.075  0.151  0.231
## 40  0.073  0.147  0.225
## 41  0.072  0.147  0.226
## 42  0.072  0.146  0.226
## 43  0.070  0.141  0.220
## 44  0.069  0.140  0.216
## 45  0.070  0.141  0.216
## 46  0.069  0.139  0.213
## 47  0.069  0.139  0.213
## 48  0.068  0.137  0.211
## 49  0.067  0.136  0.210
## 50  0.067  0.134  0.207
## 51  0.069  0.139  0.213
## 52  0.070  0.140  0.216
## 53  0.068  0.138  0.214
## 54  0.068  0.138  0.214
## 55  0.067  0.135  0.209
## 56  0.065  0.133  0.206
## 57  0.070  0.141  0.218

str(df_long)

## gropd_df [342 × 17] (S3: grouped_df/tbl_df/tbl/data.frame)
##  $ Year                                : num [1:342] 1967 1967 1967 1967 1967 ...
##  $ Lowest quintile $                   : num [1:342] 11550 11550 11550 11550 11550 ...
##  $ Second quintile $                   : num [1:342] 32010 32010 32010 32010 32010 ...
##  $ Third quintile $                    : num [1:342] 51100 51100 51100 51100 51100 ...
##  $ Fourth quintile $                   : num [1:342] 71500 71500 71500 71500 71500 71500 74400 74400 74400 74400 ...
##  $ Highest quintile $                  : num [1:342] 128700 128700 128700 128700 128700 ...
##  $ Top 5 percent $                     : num [1:342] 203000 203000 203000 203000 203000 ...
##  $ Gini index of income inequality     : chr [1:342] "0.397" "0.397" "0.397" "0.397" ...
##  $ Mean logarithmic deviation of income: chr [1:342] "0.377" "0.377" "0.377" "0.377" ...
##  $ Theil                               : chr [1:342] "0.280" "0.280" "0.280" "0.280" ...
##  $ e=0.25                              : chr [1:342] "0.070" "0.070" "0.070" "0.070" ...
##  $ e=0.50                              : chr [1:342] "0.141" "0.141" "0.141" "0.141" ...
##  $ e=0.75                              : chr [1:342] "0.218" "0.218" "0.218" "0.218" ...
##  $ Quintile                            : chr [1:342] "Fourth quintile %" "Highest quintile %" "Lowest quintile %" "Second quintile %" ...
##  $ Wealth Distribution Percentage      : num [1:342] 0.242 0.436 0.04 0.108 0.173 0.172 0.245 0.426 0.042 0.111 ...
##  $ Change                              : num [1:342] NA NA NA NA NA ...
##  $ hover_text                          : chr [1:342] "Year: 1967<br>Quintile: Fourth quintile %<br>Wealth Distribution Percentage: 24.20%<br>Change from previous year: NA" "Year: 1967<br>Quintile: Highest quintile %<br>Wealth Distribution Percentage: 43.60%<br>Change from previous year: NA" "Year: 1967<br>Quintile: Lowest quintile %<br>Wealth Distribution Percentage: 4.00%<br>Change from previous year: NA" "Year: 1967<br>Quintile: Second quintile %<br>Wealth Distribution Percentage: 10.80%<br>Change from previous year: NA" ...
##  - attr(*, "groups")= tibble [6 × 2] (S3: tbl_df/tbl/data.frame)
##   ..$ Quintile: chr [1:6] "Fourth quintile %" "Highest quintile %" "Lowest quintile %" "Second quintile %" ...
##   ..$ .rows   : list<int> [1:6] 
##   .. ..$ : int [1:57] 1 7 13 19 25 31 37 43 49 55 ...
##   .. ..$ : int [1:57] 2 8 14 20 26 32 38 44 50 56 ...
##   .. ..$ : int [1:57] 3 9 15 21 27 33 39 45 51 57 ...
##   .. ..$ : int [1:57] 4 10 16 22 28 34 40 46 52 58 ...
##   .. ..$ : int [1:57] 5 11 17 23 29 35 41 47 53 59 ...
##   .. ..$ : int [1:57] 6 12 18 24 30 36 42 48 54 60 ...
##   .. ..@ ptype: int(0) 
##   ..- attr(*, ".drop")= logi TRUE

My Animation Gini Ratio Overtime

# Load required libraries
library(gganimate)


# Convert the 'Gini index of income inequality' column to numeric if it's not already
df$`Gini index of income inequality` <- as.numeric(as.character(df$`Gini index of income inequality`))

# Create the plot
p <- ggplot(df, aes(x = Year, y = `Gini index of income inequality`, group = 1)) +
    geom_line(size = 1.2, color = "#0073C2FF") +
    geom_point(size = 2, color = "#0073C2FF") +
    labs(title = 'Gini Index of Income Inequality by Year', y = 'Gini Index', x = 'Year') +
    theme_minimal() +
    ylim(0.35, 0.5)  # Make sure this range includes all your data points

# Animate the plot
animated_plot <- p + 
    transition_time(Year) +
    shadow_mark(past = TRUE, size = 1, color = 'blue') +
    ease_aes('linear')

# Save the animation
anim_save("gini_ratio_animation.gif", animation = animated_plot, fps = 10, end_pause = 50, width = 800, height = 600)

## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?

My Animation Quantile Wealth Distribution Overtime

# Create the plot
p <- ggplot(df_long, aes(x = Year, y = `Wealth Distribution Percentage`, color = Quintile, group = Quintile)) +
  geom_line(size = 1.2) +
  geom_point(size = 1.5) +
  scale_color_viridis(discrete = TRUE, option = "D") +
  labs(title = 'Wealth Distribution by Quintile',
       subtitle = "Year-over-year change in wealth distribution percentages",
       x = "Year", y = "Wealth Distribution Percentage",
       color = "Quintile") +
  theme_minimal(base_size = 14) +
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = 16),
        plot.subtitle = element_text(size = 14)) +
  scale_y_continuous(labels = scales::label_percent(), limits = c(0, .55))

# Animate the plot with shadow lines that match the color of the dots
anim <- p + transition_time(Year) +
  shadow_mark(past = TRUE, size = 0.8) +  # Maintain the color of the Quintile
  ease_aes('linear')

# Save the animation
anim_save("wealth_distribution_animation.gif", animation = anim, nframes = 200, duration = 20, end_pause = 50)

## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?

My goal now is to create a dynamic treemap to visualize the proportion of wealth held by different quintiles at a glance. A treemap can efficiently represent the hierarchical structure of wealth distribution, with each block’s size reflecting the wealth share of a particular quintile. This visualization will provide a clear and immediate understanding of the relative wealth sizes and changes over time. By comparing treemaps from 1969 and 2022, we can visually observe the shifts in wealth concentration, making it easier to communicate the complex data story of wealth disparity and the middle class’s changing economic status.

The process begins with data preparation, ensuring the dataset is clean and ready for visualization. Initial steps include using head(df, n = 10) to preview the dataset and summarizing NA counts across each column to identify and address missing values. Checking the structure of the data frame with str(pdf) and ensuring that all percentage columns are correctly formatted as numerical values are crucial steps in data cleaning. Once the dataset is in good shape, I will select the necessary columns and reshape the data with pivot_longer, preparing it for the treemap. For the treemap of the year 2022, I will filter the data for that specific year, create the visualization with plot_ly, using the type = “treemap” parameter, and apply the Viridis color scale for aesthetic coherence and clarity. This treemap will serve as a powerful snapshot of the current state of wealth distribution, illustrating the percentage of wealth each quintile holds, and will be a pivotal part of the analysis and discussion around the shifts in wealth distribution over more than five decades.

# Counting NAs in each column
na_count <- df %>% summarize(across(everything(), ~ sum(is.na(.))))

# Print the count of NAs
print(na_count)

##   Year Lowest quintile $ Second quintile $ Third quintile $ Fourth quintile $
## 1    0                 0                 0                0                 0
##   Highest quintile $ Top 5 percent $ Lowest quintile % Second quintile %
## 1                  0               0                 0                 0
##   Third quintile % Fourth quintile % Highest quintile % Top 5 percent %
## 1                0                 0                  0               0
##   Gini index of income inequality Mean logarithmic deviation of income Theil
## 1                               0                                    0     0
##   e=0.25 e=0.50 e=0.75
## 1      0      0      0

# Prepare the data
df1 <- df %>% 
  select(Year, `Lowest quintile %`, `Second quintile %`, `Third quintile %`, `Fourth quintile %`, `Highest quintile %`, `Top 5 percent %`) %>%
  pivot_longer(-Year, names_to = "Quintile", values_to = "Percentage") %>%
  mutate(Percentage = as.numeric(Percentage))


# Counting NAs in each column
na_count <- df1 %>% summarize(across(everything(), ~ sum(is.na(.))))

# Print the count of NAs
print(na_count)

## # A tibble: 1 × 3
##    Year Quintile Percentage
##   <int>    <int>      <int>
## 1     0        0          0

# Checking the structure of the dataframe to see data types
str(df)

## 'data.frame':    57 obs. of  19 variables:
##  $ Year                                : num  2022 2021 2020 2019 2018 ...
##  $ Lowest quintile $                   : num  16120 16020 16530 17410 15920 ...
##  $ Second quintile $                   : num  43850 44240 44800 46300 43110 ...
##  $ Third quintile $                    : num  74730 76430 76920 78520 73480 ...
##  $ Fourth quintile $                   : num  119900 124500 124300 126600 117400 ...
##  $ Highest quintile $                  : num  277300 290400 286800 289800 270300 ...
##  $ Top 5 percent $                     : num  499900 517800 504500 513800 481400 ...
##  $ Lowest quintile %                   : chr  "3.0" "2.9" "3.0" "3.1" ...
##  $ Second quintile %                   : chr  "8.2" "8.0" "8.2" "8.3" ...
##  $ Third quintile %                    : chr  "14.0" "13.9" "14.0" "14.1" ...
##  $ Fourth quintile %                   : chr  "22.5" "22.6" "22.6" "22.7" ...
##  $ Highest quintile %                  : chr  "52.1" "52.7" "52.2" "51.9" ...
##  $ Top 5 percent %                     : chr  "23.5" "23.5" "23.0" "23.0" ...
##  $ Gini index of income inequality     : num  0.488 0.494 0.488 0.484 0.486 0.482 0.481 0.479 0.48 0.482 ...
##  $ Mean logarithmic deviation of income: chr  "0.637" "0.634" "0.617" "0.590" ...
##  $ Theil                               : chr  "0.440" "0.448" "0.437" "0.432" ...
##  $ e=0.25                              : chr  "0.106" "0.108" "0.105" "0.104" ...
##  $ e=0.50                              : chr  "0.207" "0.211" "0.206" "0.203" ...
##  $ e=0.75                              : chr  "0.315" "0.320" "0.313" "0.306" ...

# Checking the class of the Percentage column specifically
class(df$`Percentage`)

## [1] "NULL"

# Checking the structure of the dataframe to see data types
str(df1)

## tibble [342 × 3] (S3: tbl_df/tbl/data.frame)
##  $ Year      : num [1:342] 2022 2022 2022 2022 2022 ...
##  $ Quintile  : chr [1:342] "Lowest quintile %" "Second quintile %" "Third quintile %" "Fourth quintile %" ...
##  $ Percentage: num [1:342] 3 8.2 14 22.5 52.1 23.5 2.9 8 13.9 22.6 ...

# Checking the class of the Percentage column specifically
class(df1$`Percentage`)

## [1] "numeric"

# Filter the data for the year 2022
df_2022 <- df1 %>% filter(Year == 2022)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2021
df_2022 <- df1 %>% filter(Year == 2021)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2020
df_2022 <- df1 %>% filter(Year == 2020)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2019
df_2022 <- df1 %>% filter(Year == 2019)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2018
df_2022 <- df1 %>% filter(Year == 2018)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2017
df_2022 <- df1 %>% filter(Year == 2017)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2016
df_2022 <- df1 %>% filter(Year == 2016)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2015
df_2022 <- df1 %>% filter(Year == 2015)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2014
df_2022 <- df1 %>% filter(Year == 2014)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2013
df_2022 <- df1 %>% filter(Year == 2013)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2012
df_2022 <- df1 %>% filter(Year == 2012)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2011
df_2022 <- df1 %>% filter(Year == 2011)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2010
df_2022 <- df1 %>% filter(Year == 2010)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2009
df_2022 <- df1 %>% filter(Year == 2009)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2008
df_2022 <- df1 %>% filter(Year == 2008)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2007
df_2022 <- df1 %>% filter(Year == 2007)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

# Filter the data for the year 2006
df_2022 <- df1 %>% filter(Year == 2006)

# Create the treemap with Viridis color scale
p <- plot_ly(data = df_2022,
             type = "treemap",
             labels = ~Quintile,
             parents = ~Year,
             values = ~Percentage,
             textinfo = "label+value",
             hoverinfo = 'text',
             hovertext = ~paste('Year:', Year, '<br>Label:', Quintile, '<br>Percentage:', Percentage, '%')) %>%
  layout(colorway = viridis_pal(option = "D")(7))  # Set Viridis color scale

# Show the treemap
p

The treemaps presented for the years 2005 to 2022 offer a vivid portrayal of the wealth distribution across different quintiles. In the 2022 treemap, the highest quintile holds more than half of the wealth, represented by the most significant area on the map, a clear visual testament to the wealth concentration at the top. This trend is consistent with the preceding years, with 2019 showing a slightly smaller but dominant share for the highest quintile and the top 5 percent. The consistency of the large area representing the highest quintile across all treemaps underlines the persistence of wealth inequality.

Furthermore, these treemaps illustrate the relatively minimal shares the lower quintiles hold. In 2022, the lowest quintile’s share is a mere 3%, which has not seen any substantial growth in the preceding years. The second and third quintiles combined account for a significantly smaller proportion of the wealth than the fourth quintile, dwarfing the highest quintile’s share. This disparity, barely changing through the years, supports the hypothesis of a shrinking middle class when considering wealth distribution. The middle quintiles, supposedly representative of the middle class, do not exhibit growth in wealth share, indicating that the economic gains over these years have not been equitably distributed across society. These treemaps, therefore, are not just representations of data; they are a stark illustration of the growing economic divide.

head(df1, n = 57)

## # A tibble: 57 × 3
##     Year Quintile           Percentage
##    <dbl> <chr>                   <dbl>
##  1  2022 Lowest quintile %         3  
##  2  2022 Second quintile %         8.2
##  3  2022 Third quintile %         14  
##  4  2022 Fourth quintile %        22.5
##  5  2022 Highest quintile %       52.1
##  6  2022 Top 5 percent %          23.5
##  7  2021 Lowest quintile %         2.9
##  8  2021 Second quintile %         8  
##  9  2021 Third quintile %         13.9
## 10  2021 Fourth quintile %        22.6
## # ℹ 47 more rows

My Goal was to make an interactive Tree map: but for some reason I was unable to complete this task.

library(shiny)
library(plotly)
library(dplyr)
library(tidyr)
library(viridisLite) # For the viridis color palette

# Prepare the data - select the years 2020 to 2022
df1 <- df %>%
  filter(Year >= 2020 & Year <= 2022) %>% 
  select(Year, `Lowest quintile %`, `Second quintile %`, `Third quintile %`, `Fourth quintile %`, `Highest quintile %`, `Top 5 percent %`) %>%
  pivot_longer(-Year, names_to = "Quintile", values_to = "Percentage") %>%
  mutate(Percentage = as.numeric(Percentage)) %>%
  mutate(Year = as.factor(Year), Quintile = as.factor(Quintile))  # Ensure these are factors

# Define the UI
ui <- fluidPage(
  titlePanel("Treemap of Wealth Distribution"),
  plotlyOutput("treemapPlot")
)

server <- function(input, output, session) {
  # Use this to trigger a breakpoint in the Shiny app and inspect variables
  # browser()

  output$treemapPlot <- renderPlotly({
    # Check if the data frame has rows
    if(nrow(df1) == 0) {
      stop("Data frame 'df1' is empty. Check your data filtering steps.")
    }

    # Check the structure of df1
    print(str(df1))

    # Create the treemap outside of Shiny to see if it renders
    p <- plot_ly(
      data = df1,
      labels = ~Quintile,
      values = ~Percentage,
      type = 'treemap'
    ) %>%
      layout(title = "Wealth Distribution Over Time")
    
    # Print the plot
    print(p)
  })
}


# Run the application
#shinyApp(ui = ui, server = server)

Creating an interactive treemap to visualize wealth distribution across different quintiles over time was an ambitious and insightful goal. I envisioned a tool that could dynamically show changes from 1969 to 2022, allowing users to explore the nuances of economic inequality. Using a Shiny app was crucial for interactivity, where a slider could adjust the year and dynamically update the treemap to reflect that period’s data.

The struggles began with the complexity of the task. Implementing an interactive slider that adjusted the data year by year required a static visualization and a reactive one that could update with user input. The R code for generating the treemap using the treemap and plotly packages was theoretically sound, but the devil was in the details. The error messages provided clues to the underlying issues. The first challenge was ensuring the data frame, df, lacked missing values, as these would break the treemap function. This necessitated meticulous data cleaning, which often feels like chasing shadows as one missing value resolved can lead to another issue elsewhere. Another error, “Can’t combine Quintile and Percentage ,” hinted at a type mismatch within the data frame. The pivot_longer function was reshaping the data, but the expected numeric percentages were being interpreted or combined with character data somewhere along the line, causing a conflict.

The visualization also displayed a single quintile and its percentage value rather than a full spectrum for each year. This suggested an issue with how the data was being grouped or indexed in the treemap function. The labels, quintiles, and corresponding values did not align as intended.

In the face of these challenges and the looming deadline, I had to make a tough decision. After hours of debugging, searching for solutions, and consulting documentation, it became clear that meeting the deadline with a fully interactive treemap was impossible. I had to abandon the interactive element, perhaps settling for a static representation or a more straightforward visualization.

Wankelman Module 12.3

2023-11-29