TidyTuesday Project (210810)


This notebook is about Measuring Infrastructure in the Bureau of Economic Analytics National Economic Accounts.

Introduction

Infrastructure provides critical support for economic activity. Hence it contributes in a significant way to our living standards. This notebook will analyze the trends of infrastructure investment through the years (decades) using the measurements of infrastructure data in the U.S. National Economic Accounts(NEAs).

For this notebook, I will use the chain_investment.csv dataset. I think that using the chained dollars(*) will give us better insights about how important the investment was in that year(decade) comparing it with nowadays infrastructure investments.

Why is it important?

This study is a challenge that will provide a huge value in understanding the nature of the surrounding infrastructure and its behavior as a connected multi-network.

A glance to the data

Following the Bureau of Economic Analysis paper, there are three main categories of infrastructure: basic, social and digital.

That being said, let’s start working with those three main categories from the chain_investment.csv.

df_main_cat = chinvestment %>%
  filter(group_num == 1) %>%
  mutate(
    label = case_when(
      category == "Total basic infrastructure"~"Basic",
      category == "Total digital infrastructure"~"Digital",
      category == "Total social infrastructure"~"Social"
    ),
      colorL = case_when(
      category == "Total basic infrastructure"~"#ef476f",
      category == "Total digital infrastructure"~"#ffd166",
      category == "Total social infrastructure"~"#06d6a0"
    )
  )
chart_lines <- df_main_cat %>%
  ggplot(aes(x=year, y=gross_inv_chain)) +
  geom_line(aes(color=category))+
  geom_point(aes(color=category))+
  guides(fill=FALSE, color=FALSE)+
  theme_minimal()+
  scale_color_manual(breaks=df_main_cat$category,values=df_main_cat$colorL)+
  ylab("Total $ in Infrastructure Investment") +
  theme(legend.position="none", legend.title=element_blank())


chart_area <- df_main_cat %>%
  ggplot(aes(x=year, y=gross_inv_chain, fill=category)) +
  geom_area()+
  scale_fill_manual(breaks=df_main_cat$category,values=df_main_cat$colorL)+
  scale_color_manual(breaks=df_main_cat$category,values=df_main_cat$colorL)+
  ylab("Infrastructure Investment ($) per category") +
  guides(fill=FALSE, color=FALSE)+
  theme_minimal()+
  theme(legend.position="none", legend.title=element_blank())+
  annotate(geom = "segment", x=1947,xend=2017, y=600000,yend=600000, color="#343a40")+
  annotate(geom = "segment", x=1947,xend=2017, y=300000,yend=300000, color="#343a40")+
  annotate(geom = "text", x=1947, y=630000, label="600 billions $", color="#343a40",
            hjust=0, size=6,family=fontfamily)+
  annotate(geom = "text", x=1947, y=330000, label="300 billions $", color="#343a40",
            hjust=0, size=6,family=fontfamily)
  
  

ggarrange(chart_area, chart_lines, ncol = 2, heights = c(10, 10), align = "v")

Infrastructure Investment Breakdown (sankey)

We already know how the money is invested in the three main categories (basic, social and digital) so let’s break it down in order to see how much money is invested in each subcategory.

In order to do that, Sankey (*) charts are a viable option.

Following the Google Charts definition, a sankey diagram is a visualization used to depict a flow from one set of values to another. The things being connected are called nodes and the connections are called links. Sankeys are best used when you want to show a many-to-many mapping between two domains.

Splitting the dataset in different levels will be necessary for generating the Sankey chart (spoiler alert: There are three different levels regarding the investments). In order to do that:

options(dplyr.summarise.inform = FALSE)

# Remember: We need to remove the fund sources
dfch_grp <- chinvestment %>% 
  group_by(category, meta_cat, group_num) %>%
  filter(!(category %in% c("Federal", "S&L", "Private")) &(group_num!=14))   %>%
  summarise(inv = sum(gross_inv_chain))

dfch_grp <- dfch_grp %>%
      mutate(
        category = case_when(
        category == "Total basic infrastructure"~"Basic",
        category == "Total digital infrastructure"~"Digital",
        category == "Total social infrastructure"~"Social",
        TRUE ~category
        ), meta_cat =case_when(
          meta_cat == "Total basic infrastructure"~"Basic",
          TRUE ~meta_cat
        )
    )

# dfch_grp$category <- as.factor(dfch_grp$category)
# dfch_grp$meta_cat <- as.factor(dfch_grp$meta_cat)
# Level 1: Total infrastructure => Total basic/social/digital infrastructure
lvl1 <- dfch_grp %>% 
      filter(meta_cat %in% "Total infrastructure")


lvl2 <- dfch_grp %>% filter(meta_cat %in% c(
      "Basic",
      "Digital",
      "Social"
  ))



lvl3 <- dfch_grp %>% filter(meta_cat %in% lvl2$category)
labelsV <- c(unique(lvl1$meta_cat), unique(lvl2$meta_cat), unique(lvl2$category),unique(lvl3$category))

sourceV <-c(match(lvl1$meta_cat, labels), match(lvl2$meta_cat,labels), match(lvl3$meta_cat,labels))
targetV <- c(match(lvl1$category, labels), match(lvl2$category,labels), match(lvl3$category,labels))
valueV <- c(lvl1$inv, lvl2$inv, lvl3$inv)
             

# lvl1
# lvl2
# lvl3
