BANA 7050: Assignment 2

Time Series Data Description, Exploratory Analysis, and Decomposition

Author

Andrew Grant

Section 1

The ‘Total Construction Spending: Residential in the United States’ time series data set is available on the Federal Reserve Economic Data (FRED) website. The data was collected from the U.S. Census Bureau and contains the total construction spending, in millions of dollars, from January 2002 through May 2025 on a monthly frequency, and is not seasonally adjusted.

The data generating process involves the monthly estimates of the total dollar value of construction work done in the U.S, provided by the Value of Construction Put in Place Survey (VIP). Variation in this variable includes the cost of raw materials, the demand for residential construction, and the overall state of the economy. With all of these variables playing a role, this particular variable could be difficult to forecast.

Section 2

Code
construction<-construction %>%
  mutate(Month=yearmonth(Date))%>%
  select(-Date)%>%
  as_tsibble(index = Month)
Code
construction %>%
  ggplot(aes(x=Month,y=Construction_Spending))+
  geom_line()+
  ggtitle("Residential Construction Spending Over Time")+
  xlab("Year Month")+
  ylab("Construction Spending")

The plot above illustrates the behavior of residential construction spending overtime. We can see that prior to the financial crisis of 2008, residential construction spending had a tendency to increase throughout the year, and then decrease around the winter. Once the financial crisis occurred, residential spending was on a downward trend and did not start increasing until around 2013. The trend of increasing spending, while dipping in the winter, continued until around Jan 2020, until which from then up to May 2025, the seasonal trend is still apparent, but construction values spending went up significantly from histroical values.

Code
construction %>%
  ggplot()+
  geom_boxplot(aes(" ",Construction_Spending))+
  ggtitle("Boxplot of Construction Spending ")+
  xlab("Construction Spending")+
  ylab('Value')

The box plot above indicates that the data is relatively skewed to the right as there are presences of outliers on the top of the plot in the form of dots. Since the outliers are larger in value, it would make sense that the data is skewed to the right.

Code
construction %>%
  ggplot()+
  geom_histogram(aes(Construction_Spending))+
  ggtitle("Histogram of Construction Spending ")+
  scale_x_continuous(breaks=seq(1000,90000,10000))+
  xlab("Construction Spending")+
  ylab('Count')

The histogram above supports the fact that the data is right skewed. The data is not normally distributed, so transforming the variable with log or Box-Cox could possibly mitigate this issue.

Code
construction %>%
  ggplot()+
  geom_density(aes(Construction_Spending))+
  ggtitle("Density Plot of Construction Spending ")+
  xlab("Construction Spending")+
  ylab('Count')

The density plot above takes a single line approach to the histogram, where the line represents the height of each bar. This plot further supports the data is right skewed. It also shows the data is bimodal with two peaks occurring around 32,500 and 80,000.

Code
stats<- Construction_sum %>%
  summarise(
    Size = length(Construction_Spending),
    Mean = mean(Construction_Spending),
    Median = median(Construction_Spending),
    Range = range(Construction_Spending),
    Stan_Dev=sd(Construction_Spending)
  )

kable(stats, format = "html", align = NULL )
Size Mean Median Range Stan_Dev
281 44206.33 41530 16241 18288.11
281 44206.33 41530 89189 18288.11

The above table gives the size of the time series and the mean, median, range, and standard deviation of residential construction spending.

Section 3

Code
construction_ma<-construction %>%
  arrange(Month) %>%
  mutate(ma_right = slide_dbl(Construction_Spending,mean,.before = 12, .after = 0, .complete = TRUE),
         ma_center = slide_dbl(Construction_Spending, mean, .before = 6, .after = 6, .complete = TRUE),
         ma_left = slide_dbl (Construction_Spending, mean, .before = 0, .after = 12, .complete = TRUE ),
         )

construction_ma %>%
  ggplot() +
  geom_line(aes(Month,Construction_Spending), size = 1)+
  geom_line(aes(Month,ma_right), size = 1, color = "red")+
  geom_line(aes(Month,ma_center), size = 1, color = "blue")+
  geom_line(aes(Month,ma_left),size = 1, color = "green")+
theme_bw()+
  ggtitle("Moving Averages of Construction Spending")+
  xlab("Year Month")+
  ylab("Construction Spending")

Code
construction_ma%>%
  mutate(resid = Construction_Spending- ma_center)%>%
  ggplot()+
  geom_line(aes(Month,Construction_Spending))+
  geom_line(aes(Month, ma_center), color = "blue")+
  geom_line(aes(Month,resid), color = "red")+
  ggtitle("Moving Average and Remainder of Residential Construction Spending")+
  xlab("Year Month")+
  ylab("Construction Spending")

The remainder does not offer many additional insights of data patterns, but does support the seasonality component described above in this analysis.

Code
construction%>%
  model(classical_decomposition(Construction_Spending))%>%
  components()%>%
  autoplot()

From the classical decomposition of the time series executed above, the seasonality argument still holds true by looking at the strong seasonality in the seasonal component of the decomposition.

Code
construction%>%
  model(STL(Construction_Spending))%>%
  components()%>%
  autoplot

The same conclusions can be drawn from the STL decomposition as the classical decomposition.

Section 4

Code
construction%>%
  model(Naive = SNAIVE(Construction_Spending)
        )%>%
  forecast( h = 6)%>%
  autoplot(construction, level = NULL, size = 1,)+
  geom_vline(aes(xintercept=ymd("2025-05-01")),color = "red", linetype="dashed")+
  ggtitle("Six Month Forecast of Residential Construction Spending")+
  xlab("Year Month")+
  ylab("Construction Spending")

After conducting a 6 time period naive forecast, including seasonality, (SNAIVE), the forecast is following the overall trend present in the time series after the year 2020, with the value of residential construction spending decreasing in the winter months. It makes sense that starting in June 2025, the value remains relatively stable, then drops as the winter seasons approach and the subsequently occur.