Code
construction<-construction %>%
mutate(Month=yearmonth(Date))%>%
select(-Date)%>%
as_tsibble(index = Month)Time Series Data Description, Exploratory Analysis, and Decomposition
The ‘Total Construction Spending: Residential in the United States’ time series data set is available on the Federal Reserve Economic Data (FRED) website. The data was collected from the U.S. Census Bureau and contains the total construction spending, in millions of dollars, from January 2002 through May 2025 on a monthly frequency, and is not seasonally adjusted.
The data generating process involves the monthly estimates of the total dollar value of construction work done in the U.S, provided by the Value of Construction Put in Place Survey (VIP). Variation in this variable includes the cost of raw materials, the demand for residential construction, and the overall state of the economy. With all of these variables playing a role, this particular variable could be difficult to forecast.
construction<-construction %>%
mutate(Month=yearmonth(Date))%>%
select(-Date)%>%
as_tsibble(index = Month)construction %>%
ggplot(aes(x=Month,y=Construction_Spending))+
geom_line()+
ggtitle("Residential Construction Spending Over Time")+
xlab("Year Month")+
ylab("Construction Spending")The plot above illustrates the behavior of residential construction spending overtime. We can see that prior to the financial crisis of 2008, residential construction spending had a tendency to increase throughout the year, and then decrease around the winter. Once the financial crisis occurred, residential spending was on a downward trend and did not start increasing until around 2013. The trend of increasing spending, while dipping in the winter, continued until around Jan 2020, until which from then up to May 2025, the seasonal trend is still apparent, but construction values spending went up significantly from histroical values.
construction %>%
ggplot()+
geom_boxplot(aes(" ",Construction_Spending))+
ggtitle("Boxplot of Construction Spending ")+
xlab("Construction Spending")+
ylab('Value')The box plot above indicates that the data is relatively skewed to the right as there are presences of outliers on the top of the plot in the form of dots. Since the outliers are larger in value, it would make sense that the data is skewed to the right.
construction %>%
ggplot()+
geom_histogram(aes(Construction_Spending))+
ggtitle("Histogram of Construction Spending ")+
scale_x_continuous(breaks=seq(1000,90000,10000))+
xlab("Construction Spending")+
ylab('Count')The histogram above supports the fact that the data is right skewed. The data is not normally distributed, so transforming the variable with log or Box-Cox could possibly mitigate this issue.
construction %>%
ggplot()+
geom_density(aes(Construction_Spending))+
ggtitle("Density Plot of Construction Spending ")+
xlab("Construction Spending")+
ylab('Count')The density plot above takes a single line approach to the histogram, where the line represents the height of each bar. This plot further supports the data is right skewed. It also shows the data is bimodal with two peaks occurring around 32,500 and 80,000.
stats<- Construction_sum %>%
summarise(
Size = length(Construction_Spending),
Mean = mean(Construction_Spending),
Median = median(Construction_Spending),
Range = range(Construction_Spending),
Stan_Dev=sd(Construction_Spending)
)
kable(stats, format = "html", align = NULL )| Size | Mean | Median | Range | Stan_Dev |
|---|---|---|---|---|
| 281 | 44206.33 | 41530 | 16241 | 18288.11 |
| 281 | 44206.33 | 41530 | 89189 | 18288.11 |
The above table gives the size of the time series and the mean, median, range, and standard deviation of residential construction spending.
construction_ma<-construction %>%
arrange(Month) %>%
mutate(ma_right = slide_dbl(Construction_Spending,mean,.before = 12, .after = 0, .complete = TRUE),
ma_center = slide_dbl(Construction_Spending, mean, .before = 6, .after = 6, .complete = TRUE),
ma_left = slide_dbl (Construction_Spending, mean, .before = 0, .after = 12, .complete = TRUE ),
)
construction_ma %>%
ggplot() +
geom_line(aes(Month,Construction_Spending), size = 1)+
geom_line(aes(Month,ma_right), size = 1, color = "red")+
geom_line(aes(Month,ma_center), size = 1, color = "blue")+
geom_line(aes(Month,ma_left),size = 1, color = "green")+
theme_bw()+
ggtitle("Moving Averages of Construction Spending")+
xlab("Year Month")+
ylab("Construction Spending")construction_ma%>%
mutate(resid = Construction_Spending- ma_center)%>%
ggplot()+
geom_line(aes(Month,Construction_Spending))+
geom_line(aes(Month, ma_center), color = "blue")+
geom_line(aes(Month,resid), color = "red")+
ggtitle("Moving Average and Remainder of Residential Construction Spending")+
xlab("Year Month")+
ylab("Construction Spending")The remainder does not offer many additional insights of data patterns, but does support the seasonality component described above in this analysis.
construction%>%
model(classical_decomposition(Construction_Spending))%>%
components()%>%
autoplot()From the classical decomposition of the time series executed above, the seasonality argument still holds true by looking at the strong seasonality in the seasonal component of the decomposition.
construction%>%
model(STL(Construction_Spending))%>%
components()%>%
autoplotThe same conclusions can be drawn from the STL decomposition as the classical decomposition.
construction%>%
model(Naive = SNAIVE(Construction_Spending)
)%>%
forecast( h = 6)%>%
autoplot(construction, level = NULL, size = 1,)+
geom_vline(aes(xintercept=ymd("2025-05-01")),color = "red", linetype="dashed")+
ggtitle("Six Month Forecast of Residential Construction Spending")+
xlab("Year Month")+
ylab("Construction Spending")After conducting a 6 time period naive forecast, including seasonality, (SNAIVE), the forecast is following the overall trend present in the time series after the year 2020, with the value of residential construction spending decreasing in the winter months. It makes sense that starting in June 2025, the value remains relatively stable, then drops as the winter seasons approach and the subsequently occur.