Transportation Economic Trends: Shiny Trains, Planes, and Automobiles

Author

Alexandra Veremeychik

Source: US Department of Transportation

Since 2016, the Bureau of Transportation Statistics (BTS) has released an annual publication of statistics that highlight transportation’s role in the economy. It’s objectives are to observe changes over time, to produce interactive charts, and to explain related concepts and data sources for a general audience. This project will be an attempt to expand upon this work. It is also available as a presentation with embedded Shiny applications here.

Variables to Explore

The BTS has compiled this data from several different federal offices, in addition to their own collection efforts. The sources can be viewed alongside their variables below. Citations of specific databases, tables, and releases can be found at the end of the document. Collection occurred in the form of surveys, state reports, and federal records.

Variable Source
Percent of GDP U.S. Department of Commerce, Bureau of Economic Analysis
State Expenditures Allocated to Transportation U.S. Department of Commerce, Census Bureau
State Highway Expenditures (billions of dollars) U.S. Department of Commerce, Census Bureau
Average Hourly Compensation (Wages and Benefits) U.S. Department of Labor, Bureau of Labor Statistics
Cost of Transportation: Costs Faced by Households (Consumer Price Index) U.S. Department of Labor, Bureau of Labor Statistics
Per-Mile Costs of Owning and Operating an Automobile (current dollars) U.S. Department of Transportation, Bureau of Transportation Statistics

Average Amtrak Fare,

Average Commuter Rail Fare,

Average Domestic Airfare

U.S. Department of Transportation, Bureau of Transportation Statistics
Average Individual Household Expenditures U.S. Department of Labor, Bureau of Labor Statistics, Consumer Expenditure Survey
Public and Private Investment in Transportation U.S. Department of Commerce, Bureau of Economic Analysis

This dataset was chosen due to the critical nature of the transportation sector. It is the roads, rails, and airways that connect us to loved ones, allow us to make a living, and give us access to goods and services. It is important to maintain and explore a robust amount of data on the subject, in order asses public policy, allocate funds appropriately, and adjust to observed trends.

I also just really like trains.

Source. U.S. Department of Transportation

Libraries:

library(tidyverse)
library(plotly)
library(ggfortify)
library(highcharter)
library(alluvial)
library(ggalluvial)
tet <- read_csv("Transportation_Economic_Trends__TET__data.csv", col_types = cols())

Color palette and Themes:

# Palette
colors <- c("#4A7F8A",
            "#312416",
            "#C08226",
            "#122C40",
            "#664729",
            "#31393A",
            "#962E02")

# Highcharter Themes
custom_hc_theme <- hc_theme(
  colors = c("#4A7F8A", "#312416", "#C08226", "#122C40",
             "#664729", "#31393A", "#962E02"),
  chart = list(
    backgroundColor = "transparent",
    divBackgroundImage = "https://i.ibb.co/xm0wF8L/background08-20240508042638.png"
  ),
  title = list(
    style = list(
      color = "#333333",
      fontFamily = "Arial"
    )
  ),
  subtitle = list(
    style = list(
      color = "#666666",
      fontFamily = "Arial"
    )
  ),
  caption = list(
    style = list(
      color = "#666666",
      fontFamily = "Arial"
    )
  ),
  legend = list(
    itemStyle = list(
      fontFamily = "Arial",
      color = "#333333"
    ),
    itemHoverStyle = list(
      color = "gray"
    )
  )
)

# GGPlot Theme
custom_gg_theme <- theme(
  panel.background = element_rect(fill = "#f8ece0", color = "#da9c60"),
  panel.grid.major = element_line(color = "#F2CFA7", linetype = "dashed"),
  panel.grid.minor = element_line(color = "#F2CFA7", linetype = "dashed"),
  plot.background = element_rect(fill = "#ECBB83", colour = "#E1943D"),
  plot.title = element_text(size = 14)
  
)

Clean-up and EDA

summary(tet$Year)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1947    2010    2015    2013    2019    2032 

The first quartile of observation years is 2010, so although the dataset ranges from the years 1947 to 2032, clearly most of the data comes from the 2000s. Any row pertaining to a year after 2024 appears to be a projection.

length(which(tet$Year > 2024))
[1] 231

There are only 231 of these projections out of 86,885 data points, and will most likely be ignored.

Percent of U.S. GDP per Sector

Source: U.S. Department of Commerce, Bureau of Economic Analysis, National Income and Product Accounts Tables, tables 1.1.4, 2.4.4, 3.11.4, 3.15.4, 4.2.4, 5.4.4, 5.5.4 and 5.7.4B (price deflators); 1.1.5, 2.4.5, 3.11.5, 3.15.5, 4.2.5, 5.4.5, 5.5.5 and 5.7.5B (current dollars); 1.1.6, 2.4.6, 3.11.6, 3.15.6, 4.2.6, 5.4.6, 5.5.6 and 5.7.6B (chained dollars), available at apps.bea.gov/iTable/index_nipa.cfm

# Sector percentages of the GDP
tet_gdp <- tet |>
  filter(Metric == "Percent of U.S. GDP")

tet_gdp_housing <- tet_gdp |>
  filter(Measure == "Housing") |>
  select(Year, Value) |>
  rename(Housing = Value)

tet_gdp_healthcare <- tet_gdp |>
  filter(Measure == "Healthcare") |>
  select(Year, Value) |>
  rename(Healthcare = Value)

tet_gdp_food <- tet_gdp |>
  filter(Measure == "Food") |>
  select(Year, Value) |>
  rename(Food = Value)

tet_gdp_transport <- tet_gdp |>
  filter(Measure == "Transportation") |>
  select(Year, Value) |>
  rename(Transportation = Value)

tet_gdp_edu <- tet_gdp |>
  filter(Measure == "Education") |>
  select(Year, Value) |>
  rename(Education = Value)

tet_gdp_other <- tet_gdp |>
  filter(Measure == "Other") |>
  select(Year, Value) |>
  rename(Other = Value)

tet_gdp_percent <- merge(tet_gdp_housing, tet_gdp_healthcare, by = "Year") |>
  merge(tet_gdp_food, by = "Year") |>
  merge(tet_gdp_transport, by = "Year") |>
  merge(tet_gdp_edu, by = "Year") |>
  merge(tet_gdp_other, by = "Year")

# GDP by dollars per sector
tet_gdp_dollar <- tet |>
  filter(Table == "2.1") |>
  filter(Metric == "Current dollars (billions)") |>
  filter(Measure != "GDP" & Measure != "Other") |>
  select(Year, Value, Measure) |>
  rename(`GDP (billions of dollars)` = Value) |>
  rename(Sector = Measure)
ggplot(tet_gdp, aes (x = Year, y = Value, fill = Measure)) +
  geom_col(position = "fill") +
  scale_fill_manual(values = colors) +
  custom_gg_theme

Productivity

https://www.bls.gov/productivity/

tet_productivity <- tet |>
  filter(Table == "5.1")

State Expenditures Allocated to Transportation

U.S. Department of Commerce, Census Bureau, Census of State and Local Governments, 2007, available at https://www.census.gov/programs-surveys/gov-finances.html

tet_state <- tet |>
  filter(Table == "7.1") |>
  filter(Measure != "United States") |>
  select(Year, Value, Measure) |>
  rename(`State Expenditures Allocated to Transportation (billions of dollars)` = Value) |>
  rename(State = Measure)

A Shiny app comparing state expenditures can be found here.

State Highway Expenditures (billions of dollars)

U.S. Department of Commerce, Census Bureau, Census of State and Local Governments, 2007, available at https://www.census.gov/programs-surveys/gov-finances.html

tet_expend_highway <- tet |>
  filter(Table == "7.3") |>
  filter(Measure != "United States") |>
  filter(MeasureNum == 1) |>
  select(Year, Value, Measure) |>
  rename(State = Measure) |>
  rename(`Highway Expenditures (billions of dollars)` = Value)

Average Hourly Compensation (Wages and Benefits)

U.S. Department of Labor, Bureau of Labor Statistics, Employer Costs for Compensation, CMU1010000000000D available at http://www.bls.gov/ncs/ect/#tables The Bureau of Labor Statistics conducts the National Compensation Survey which provides quarterly data measuring level of average costs per hour worked. NCS is a survey of employers, which gathers data on both wages and benefits, including health insurance, pensions and paid leave. It is used to generate data on benefits, and to estimate the Employment Cost Index, which is used as an indicator by the Federal Reserve Board, as well as to generate data on changes in employer costs.

# For comparing quarter-to-quarter
tet_compensation <- tet |>
  filter(Table == "4.5") |>
  filter(Measure == "All occupations") |>
  mutate(date = str_replace_all(date, " 12:00:00 AM", "")) |>
  mutate(date = as.Date(date, "%m/%d/%Y")) |>
  select(Year, date, Value) |> 
  rename(Date = date) |>
  rename(`Avg Compensation` = Value)

# Average quarters to compare yearly
tet_compensation_year <- tet_compensation |>
  group_by(Year) |>
  summarise(`Avg Hourly Compensation` = mean(`Avg Compensation`)) |>
  select(Year, `Avg Hourly Compensation`)

Federal Minimum Wage

Source: Federal Reserve Bank

min_wage_year <- read_csv("STTMINWGFG.csv", col_types = cols()) |>
  mutate(Year = str_replace_all(DATE, "-01-01", "")) |>
  rename(`Federal Minimum Wage` = STTMINWGFG) |>
  select(Year, `Federal Minimum Wage`) |>
  mutate(Year = as.numeric(Year)) |>
  filter(Year >= 1990 & Year <= 2022)

Cost of Transportation: Costs Faced by Households (Consumer Price Index)

U.S. Department of Labor, Bureau of Labor Statistics, Consumer Price Index, CUUR0000SETA available at https://www.bls.gov/cpi

tet_cost <- tet |>
  filter(Table == "3.8")

tet_cost_overall <- tet |>
  filter(Measure == "Overall transportation")

tet_cost_public <- tet_cost |>
  filter(Measure == "Public transportation") |>
  select(Year, Value) |>
  rename(`Public Transportation` = Value)

tet_cost_insurance <- tet_cost |>
  filter(Measure == "Motor vehicle insurance") |>
  select(Year, Value) |>
  rename(`Motor Vehicl Insurance` = Value)

tet_cost_gas <- tet_cost |>
  filter(Measure == "Gasoline (all types)") |>
  select(Year, Value) |>
  rename(`Gasoline` = Value)

tet_cost_parking <- tet_cost |>
  filter(Measure == "Parking fees and tolls (1)(2)") |>
  select(Year, Value) |>
  rename(`Parking Fees and Tolls` = Value)

tet_cost_fees <- tet_cost |>
  filter(Measure == "Motor vehicle fees (1)") |>
  select(Year, Value) |>
  rename(`Motor Vehicl Fees` = Value)

tet_cost_car <- tet_cost |>
  filter(Measure == "New and used motor vehicles (1)") |>
  select(Year, Value) |>
  rename(`New and Used Motor Vehicles` = Value)

tet_cost_new <- tet_cost |>
  filter(Measure == "New vehicles") |>
  select(Year, Value) |>
  rename(`New vehicles` = Value)

tet_cost_used <- tet_cost |>
  filter(Measure == "Used cars and trucks (1)(2)") |>
  select(Year, Value) |>
  rename(`Used cars and trucks` = Value)

tet_cost_maint <- tet_cost |>
  filter(Measure == "Motor vehicle maintenance and repair") |>
  select(Year, Value) |>
  rename(`Motor Vehicle Maintenance and Repair` = Value)

Per-Mile Costs of Owning and Operating an Automobile (current dollars)

U.S. Department of Transportation, Bureau of Transportation Statistics, National Transportation Statistics, Table 3-17: Average Cost of Owning and Operating an Automobile, available at https://www.bts.gov/content/average-cost-owning-and-operating-automobilea-assuming-15000-vehicle-miles-year Year-over-year change not shown due to changes in methodology in 2004 and 2017 that make estimates before and after not comparable. Ownership costs include insurance, license, registration, taxes, depreciation, and finance charges. Estimates are for new vehicles driven 15,000 miles a year for 5 years. Estimates not available for used vehicles.

tet_cost_mile <- tet |>
  filter(Table == 6.6) |>
  filter(Measure == "Average total cost") |>
  filter(Metric == "Current dollars") |>
  select(Year, Value) |>
  rename(`Per-Mile Costs of Owning and Operating an Automobile (current dollars)` = Value)

Average Amtrak Fare (current dollars)

tet_amtrak <- tet |>
  filter(Table == "3.5") |>
  filter(Metric == "Current dollars") |>
  select(Year, Value) |>
  # Adding a column to count how many hours are needed to afford(min. wage)
  mutate(`Hrs to Work Min Wage (Amtrak)` = 
                      case_when(Year == 1990 ~ Value/3.8,
                                Year >= 1991 & Year < 1996 ~ Value/4.25,
                                Year == 1996 ~ Value/4.75,
                                Year >= 1997 & Year < 2007 ~ Value/5.15,
                                Year == 2007 ~ Value/5.85,
                                Year == 2008 ~ Value/6.55,
                                Year >= 2009 ~ Value/7.25)) |>
  rename(`Avg Amtrak Fare` = Value)

Average Commuter Rail Fare

tet_commuter <- tet |>
  filter(Table == "3.6") |>
  filter(Metric == "Current dollars") |>
  select(Year, Value) |>
  # Adding a column to count how many hours are needed to afford(min. wage)
  mutate(`Hrs to Work Min Wage(Commuter Rail)` = case_when(Year == 1990 ~ Value/3.8,
                                Year >= 1991 & Year < 1996 ~ Value/4.25,
                                Year == 1996 ~ Value/4.75,
                                Year >= 1997 & Year < 2007 ~ Value/5.15,
                                Year == 2007 ~ Value/5.85,
                                Year == 2008 ~ Value/6.55,
                                Year >= 2009 ~ Value/7.25)) |>
  rename(`Avg Commuter Rail Fare` = Value)

Domestic Average Airfare

https://www.bts.gov/content/annual-us-domestic-average-itinerary-fare-current-and-constant-dollars

tet_airfare <- tet |>
  filter(Table == "3.4") |>
  filter(Metric == "Current dollars") |>
  select(Year, Value) |>
  # Adding a column to count how many hours are needed to afford(min. wage)
  mutate(hours_earn = case_when(Year == 1990 ~ Value/3.8,
                                Year >= 1991 & Year < 1996 ~ Value/4.25,
                                Year == 1996 ~ Value/4.75,
                                Year >= 1997 & Year < 2007 ~ Value/5.15,
                                Year == 2007 ~ Value/5.85,
                                Year == 2008 ~ Value/6.55,
                                Year >= 2009 ~ Value/7.25))

Average Individual Household Expenditures (Consumer Expenditure Survey)

U.S. Department of Labor, Bureau of Labor Statistics, Consumer Expenditure Survey, available at https://www.bls.gov/cex “Other expenditures” include alcoholic beverages purchased for off-premises consumption; furnishings, household equipment, and routine household maintenance; education; accommodations; financial services (excluding pension funds); other goods and services; net foreign travel and expenditures abroad by U.S. residents; and final consumption expenditures of nonprofit institutions serving households.

tet_expenditures_household <- tet |>
  filter(Table == "6.3")

tet_expenditures_household_total <- tet_expenditures_household |>
  filter(Measure == "Total expenditures") |>
  select(Year, Value) |>
  rename(`Total expenditures` = Value)

tet_expenditures_household_housing <- tet_expenditures_household |>
  filter(Measure == "Housing") |>
  select(Year, Value) |>
  rename(`Housing` = Value)

tet_expenditures_household_transport <- tet_expenditures_household |>
  filter(Measure == "Transportation") |>
  select(Year, Value) |>
  rename(`Transportation` = Value)

tet_expenditures_household_food <- tet_expenditures_household |>
  filter(Measure == "Food") |>
  select(Year, Value) |>
  rename(`Food` = Value)

tet_expenditures_household_insurance <- tet_expenditures_household |>
  filter(Measure == "Insurance and pensions") |>
  select(Year, Value)  |>
  rename(`Insurance and Pensions` = Value)

tet_expenditures_household_health <- tet_expenditures_household |>
  filter(Measure == "Health") |>
  select(Year, Value) |>
  rename(`Health` = Value)

tet_expenditures_household_apparel <- tet_expenditures_household |>
  filter(Measure == "Apparel and services") |>
  select(Year, Value) |>
  rename(`Apparel and Services` = Value)

tet_expenditures_household_other <- tet_expenditures_household |>
  filter(Measure == "Other") |>
  select(Year, Value) |>
  rename(`Other` = Value)

# New dataframe out of these variables
tet_expenditures_household_full <- merge(tet_expenditures_household_transport,
                                         tet_expenditures_household_housing,
                                         by = "Year") |>
  merge(tet_expenditures_household_food, by = "Year") |>
  merge(tet_expenditures_household_insurance, by = "Year") |>
  merge(tet_expenditures_household_health, by = "Year") |>
  merge(tet_expenditures_household_apparel, by = "Year")
tet_expenditures_household_long <- pivot_longer(tet_expenditures_household_full,
                                                cols = 2:7,
                                                names_to = "Expense",
                                                values_to = "Dollars")
ggplot(tet_expenditures_household_long,
                aes(x = Year, y = Dollars)) +
           geom_col(aes(fill = Expense)) +
           scale_fill_manual(values = colors) +
           custom_gg_theme

Detailed Average Individual Household Transportation Expenditures (Consumer Expenditure Survey)

“Calculations by U.S. Department of Transportation, Bureau of Transportation Statistics from U.S. Department of Labor, Bureau of Labor Statistics, Consumer Expenditure Survey microdata, available at www.bls.gov/cex” “Public transportation not on trips includes public transportation not taken as part of a trip or vacation. A trip or vacation includes trips to visit relatives or friends, business trips, recreational trips, other trips overnight or longer, and day trips of at least 75 miles away from home. Local transportation includes intracity mass transit and local for-hire taxicabs and ride-hailing services. Amounts are calculated by the Bureau of Transportation Statistics using public-use microdata and may differ slightly from amounts calculated using original data.”

tet_expenditures_transport <- tet |>
  filter(Table == "6.4")

Public and Private Investment in Transportation

tet_investment <- tet |>
  filter(Table == "8.6")

tet_investment_public <- tet_investment |>
  filter(Measure == "Total public investment in transportation")

tet_investment_public <- tet_investment |>
  filter(Measure == "Total private investment in transportation")

Public Investment in New Transportation Infastructure in Current Dollars (billions)

“U.S. Department of Commerce, Bureau of Economic Analysis, National Income and Product Account Tables, Real Private Fixed Investment in Structures by Type, table 5.4.5 and 5.4.6 (millions), Real Private Fixed Investment in Equipment by Type, table 5.5.5 and 5.5.6 (millions), and Real Gross Government Fixed Investment by Type, table 5.9.5 and table 5.9.6 (millions), available at https://www.bea.gov/iTable/index_nipa.cfm” “U.S. Department of Commerce, Bureau of Economic Analysis, National Income and Product Account Tables, Real Gross Government Fixed Investment by Type, table 5.9.5 and table 5.9.6 (millions), available at https://www.bea.gov/iTable/index_nipa.cfm”

tet_public_infa <- tet |>
  filter(Table == "8.2") |>
  filter(Metric == "Current dollars (billions)")

tet_public_infa_total <- tet_public_infa |>
  filter(Measure == "Total public investment in transportation")

# Add together for total investment
tet_public_infa_streets <- tet_public_infa |>
  filter(Measure == "Government investment in highways and streets: federal" |
         Measure == "Government investment in highways and streets: state and local") |>
  select(Year, Value) |>
  group_by(Year) |>
  summarise(Investment = sum(Value))

# Add together for total investment
tet_public_infa_other <- tet_public_infa |>
  filter(Measure == "Government investment in other transportation infrastructure: federal" |
         Measure == "Government investment in other transportation infrastructure: state and local") |>
  select(Year, Value) |>
  group_by(Year) |>
  summarise(Investment = sum(Value))

Private (Business) Investment in New Transportation Infrastructure and Equipment in Current Dollars (billions)

“U.S. Department of Commerce, Bureau of Economic Analysis, National Income and Product Account Tables, Real Private Fixed Investment in Structures by Type, table 5.4.5 and 5.4.6 (millions), Real Private Fixed Investment in Equipment by Type, table 5.5.5 and 5.5.6 (millions), and Real Gross Government Fixed Investment by Type, table 5.9.5 and table 5.9.6 (millions), available at https://www.bea.gov/iTable/index_nipa.cfm”

tet_private_infa <- tet |>
  filter(Table == "8.3") |>
  filter(Measure == "Total private investment in transportation") |>
  filter(Metric == "Current dollars (billions)")

Value of Construction Put in Place (in billions of dollars)

U.S. Department of Commerce, Census Bureau, Construction Spending Survey, available at https://www.census.gov/construction/c30/historical_data.html (Annual, Total

tet_value <- tet |>
  filter(Table == "8.4")

# Separate total and parts 
tet_value_total <- tet_value |>
  filter(Measure == "Total") |>
  select(Year, Value)

tet_value_private <- tet_value |>
  filter(Measure == "Private transportation construction") |>
  select(Year, Value)

tet_value_public <- tet_value |>
  filter(Measure == "Public air, land, and water transportation facilities") |>
  select(Year, Value)

tet_value_streets <- tet_value |>
  filter(Measure == "Public highway and streets") |>
  select(Year, Value) |>
  rename(`Value of Construction (billions of dollars)` = Value)

Regression Analysis

Is there a relationship between how much money the public invests in transportation infrastructure and the value of the construction of transportation infrastructure. One would hope so.

Linear Regression Model

# New dataframe to compare investment and value
tet_invest_value <- merge(tet_public_infa_streets, tet_value_streets, by = "Year")

# Linear model
fit1 <- lm(`Value of Construction (billions of dollars)` ~ Investment, data = tet_invest_value)
summary(fit1)

Call:
lm(formula = `Value of Construction (billions of dollars)` ~ 
    Investment, data = tet_invest_value)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.6843 -2.6454 -0.6806  3.0022  5.6728 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9.67049    3.68975   2.621   0.0168 *  
Investment   0.85800    0.04273  20.080 2.96e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.22 on 19 degrees of freedom
Multiple R-squared:  0.955, Adjusted R-squared:  0.9526 
F-statistic: 403.2 on 1 and 19 DF,  p-value: 2.962e-14
# Save the predicted values
tet_invest_value$predicted <- predict(fit1)
tet_invest_value$residuals <- residuals(fit1)

Model equation:

Value of Construction = 0.85800(Investment) + 9.67049

For each additional billion dollars invested in street infrastructure, there is a predicted increase of 0.858 billions of dollars in value.

The p-value is very small, suggesting investment is a meaningful variable to predict value.

95.26% of the variation in the data is likely explained by this model. The adjusted R-squared is alarmingly high.

Diagnostic Plots

autoplot(fit1, 1:4, nrow = 2, ncol = 2) +
  custom_gg_theme

These plots suggest that a linear model may not be ideal for these variables. The points stray away from the the Normal Q-Q line. Furthermore, looking at the Residuals vs Fitted plot, the points form a peak as opposed to a straight line.

ggplot(tet_invest_value, aes(x = Investment, y = `Value of Construction (billions of dollars)`)) +
  geom_smooth(method = "lm", se = FALSE, color = "#C08226") +
  geom_segment(aes(xend = Investment, yend = predicted), alpha = 0.2) +
  geom_point(aes(color = residuals)) +
  scale_color_gradient2(low = "#122C40", mid = "white", high = "#962E02") +
  guides(alpha = FALSE, color = FALSE) +
  geom_point(aes(y = predicted), shape = 1) +
  custom_gg_theme
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
`geom_smooth()` using formula = 'y ~ x'

The residuals here appear to form a curve.

Perhaps a local polynomial regression fitting is more suitable?

fit2 <- loess(`Value of Construction (billions of dollars)` ~ Investment, data = tet_invest_value)
# Save the predicted values

tet_invest_value$predicted2 <- predict(fit2)
tet_invest_value$residuals2 <- residuals(fit2)

ggplot(tet_invest_value, aes(x = Investment, y = `Value of Construction (billions of dollars)`)) +
  # Change method from "lm"
  geom_smooth(method = "loess", formula = "y ~ x", se = FALSE, color = "#C08226") +
  geom_segment(aes(xend = Investment, yend = predicted2), alpha = 0.2) +
  geom_point(aes(color = residuals2)) +
  scale_color_gradient2(low = "#122C40", mid = "white", high = "#962E02") +
  guides(alpha = FALSE, color = FALSE) +
  geom_point(aes(y = predicted2), shape = 1) +
  custom_gg_theme

The residuals of this model appear smaller on average than those of the previous linear model. While this model may not be ideal, it illustrates that a non-linear model may be more appropriate.

Regression Conclusions

More data is needed for this model to be robust; 21 data points makes it weak. Hopefully the BTS will continue to update this data set in the future. It is important for tax dollars to translate efficiently to valuable infrastructure. A stronger model could be used to monitor the value gained through public investment, to protect taxpayers from inefficient use of their contributions.

Rates and Rails

# Merge variables from different dataframes
tet_rates <- merge(tet_amtrak, tet_commuter, by = "Year") |>
  merge(min_wage_year, by = "Year") |>
  merge(tet_compensation_year, by = "Year") |>
  pivot_longer(cols = 2:7,
               names_to = "Wages/Rates/Fares",
               values_to = "Dollars")
highchart() |>
  hc_add_series(data = tet_rates,
                type = "line",
                hcaes(x = Year, y = Dollars, group = `Wages/Rates/Fares`)) |>
  hc_title(text = "Rates and Rails: Fares and the Wages Needed to Afford Them") |>
  hc_xAxis(title = list(text="Year")) |>
  hc_yAxis(title = list(text="Dollars, Hours, Dollars per Hour")) |>
  hc_caption(text = "Source: U.S. Bureau of Transportation Statistics") |>
  hc_add_theme(hc_theme(
  colors = c("#4A7F8A", "#312416", "#C08226", "#122C40",
             "#664729", "#31393A", "#962E02"),
  chart = list(
    backgroundColor = "transparent",
    divBackgroundImage = "https://i.ibb.co/RN6G92Y/background07-20240508042552.png"
  ),
  title = list(
    style = list(
      color = "#333333",
      fontFamily = "Arial"
    )
  ),
  subtitle = list(
    style = list(
      color = "#666666",
      fontFamily = "Arial"
    )
  ),
  caption = list(
    style = list(
      color = "#666666",
      fontFamily = "Arial"
    )
  ),
  legend = list(
    itemStyle = list(
      fontFamily = "Arial",
      color = "#333333"
    ),
    itemHoverStyle = list(
      color = "gray"
    )
  )
))

The plot above illustrates how six metrics have changed over time. Each of the three vertical lines demarcate a year in which the federal minimum wage was increased. While the Amtrak fare increase appears the sharpest, the number of hours needed to work at minimum wage to afford it has remained fairly stable. There is about a 15 minute difference between 2004 and 2022. The average hourly compensation for United States citizens demonstrates a steady increase. The federal minimum wage has stagnated since its last update in 2010. Average commuter rail fares have doubled between 2004 and 2022. The number of hours needed to work at minimum wage to afford it has remained below one.

A Shiny app version is available here.

Growth of Domestic Product

Gross Domestic Product Growth by Economic Sector

ggplotly(ggplot(tet_gdp_dollar, aes (x = Year, y = `GDP (billions of dollars)`, alluvium = Sector)) +
  geom_alluvium(aes(fill = Sector),
                width = 0.3,
                decreasing = FALSE,
                curve_type = "cubic",
                color = "white") +
    labs(title = "GDP Growth by Sector: Billions of Dollars over the Years",
         caption = "Source: U.S. Bureau of Transportation Statistics") +
    scale_fill_manual(values = colors) +
    custom_gg_theme) |>
# Plotly caption work-around
layout(annotations = list(x = 1, y = -0.1, text = "Source: U.S. Bureau of Transportation Statistics", 
      showarrow = F, xref='paper', yref='paper', 
      xanchor='right', yanchor='auto', xshift=-350, yshift=-2,
      font=list(size=8, color="#333333")),
      images = list(
    source =  "background03 (20240506113504).png",
    xref = "x",
    yref = "y",
    x = 0,
    y = 2,
    sizex = 100,
    sizey = 4,
    sizing = "stretch",
    opacity = 0.8,
    layer = "below"
  )

 )

The purpose of an alluvial chart is to observe changes in ranks and groupings over time. The above example illustrates the growth of the U.S. GDP from 1975 to 2022. The transportation and food sectors of the economy have oscillated between the third and fourth position from the top. Interestingly, it highlights that from around 1985 to 1991 the healthcare sector grew from the fourth largest, to the second largest, with a sizeable lead over transportation. It has remained at the second position ever since.

A Shiny app version is available here.

Expense None the Richer

Average Annual Household Expenditures

# Ensure the correct order
tet_expenditures_household_long$Expense <- factor(tet_expenditures_household_long$Expense,
                                                  levels = 
                                                    c("Apparel and Services",
                                                      "Health", 
                                                      "Insurance and Pensions", 
                                                      "Food",
                                                      "Transportation",
                                                      "Housing"))

highchart() |>
  hc_add_series(data = tet_expenditures_household_long,
                hcaes(x = Year, y = Dollars, group = Expense),
                type = "area") |>
  hc_chart(options3d = list(enabled = TRUE,
                            beta = 55, 
                            alpha = 7, 
                            depth = 360
                            )) |>
  hc_title(text = "Average Annual Household Expenditures over Time") |>
  hc_caption(text = "Source: U.S. Bureau of Transportation Statistics") |>
  hc_xAxis(title = list(text = "Year",
                        style = list(fontSize = 50)),
           gridLineColor = "#666666",
           labels = list(skew3d = TRUE,
                         position3d = "chart",
                         style = list(fontSize = 34),
                         step = 2)) |>
  hc_yAxis(title = list(text = "Dollars spent"),
           gridLineColor = "#666666") |>
  hc_tooltip(pointFormat = 
               "{point.Year}: ${point.Dollars}",
             headerFormat = " ") |>
  hc_plotOptions(area = list(depth = 60)) |>
  hc_add_theme(custom_hc_theme)

What is the average household spending money on? The area chart above serves to illustrate the mounting costs citizens face on an annual level. The most dramatic rise is in the amount of money being spent on housing. Americans have been paying more for their shelter every single year, with a sharp trend upward around 2020. While the pandemic caused a dip in how much was spent on food and transportation, the rate at which the cost of housing is growing has also risen.