My personal time series for this course is the Milk Price Dataset. It shows the variation of milk (both organic & inorganic) prices per gallon over the course of a few decades in the United States. This Dataset was provided by the St. Louis FED and publicaly published on their website. Here is the link to their website: (https://fred.stlouisfed.org/series/APU0000709112).
I am expecting the Milk Price dataset to follow demand increases, to change based on economic stability (and the time of the year) and to increase gradually overall as time progresses due to inflation. Those previously listed factors will cause variation in the Milk Price Variable. I think this will be a hard variable to forecast due to the implications from politics and the strength of the economy. A lot of factors influence milk.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
The visualization above is a Line Graph that shows the progression of milk prices over the decades. This graph shows that milk prices are increasing (despite the fluctuations).
### Boxplotggplot(milk, aes(x =factor(1), y = milk_price)) +geom_boxplot() +labs(title ='Boxplot of Milk Prices', x ='', y ='Milk Price') +theme_minimal() +theme(axis.text.x =element_blank(), axis.ticks.x =element_blank())
The visualization above is a Boxplot that shows the distribution of milk prices within the milk dataset. It shows the Median, Interquartile Range, Whiskers (Maximum and Minimum values) and Outliers. It is shown that there is a Median of around 3.2, no major Outliers, a Maximum value of around 4.2 and a Minimum value of around 2.5.
### Histogramhist(milk_price, breaks =10, col ="skyblue", border ="black", main ="Histogram of Milk Prices", xlab ="Price", ylab ="Frequency")
The visualization above is a Histogram that shows the frequency of each milk price. This graph shows that there is a huge concentration of Milk Prices between the $2.6 and $3.6.
### Density Plotggplot(milk, aes(x = milk_price)) +geom_density(fill ="skyblue", color ="blue", alpha =0.7) +labs(title ='Density Plot of Milk Prices', x ='Milk Prices') +theme_minimal()
The visualization above is a Density Plot that shows where the milk data is more concentrated, and the skewness/symmetry of it as well. This graph shows that there is a huge concentration of Milk Prices between the $2.6 and $3.6. It also shows that the graph is skewed to the right.
### Milk Table#### Summary Table##### Milk Pricemilk_price_summary <-summary(milk$milk_price)##### Function to Calculate Modecalculate_mode <-function(x) { uniq_x <-unique(x) uniq_x[which.max(tabulate(match(x, uniq_x)))]}##### Create the Tablesummary_table <-data.frame(Mean =mean(milk$milk_price),Median =median(milk$milk_price),Mode =calculate_mode(milk$milk_price),Standard_Deviation =sd(milk$milk_price),Range =diff(range(milk$milk_price)),Minimum =min(milk$milk_price),Maximum =max(milk$milk_price))##### Print the nicely formatted summary table with a titlekable(summary_table, caption ="Milk Price Summary") %>%kable_styling(full_width =FALSE)
Milk Price Summary
Mean
Median
Mode
Standard_Deviation
Range
Minimum
Maximum
3.200443
3.178
2.666
0.4241547
1.759
2.459
4.218
After a complete initial analysis of the milk data, there are not any clear outliers. Everything is in the proper range of values and makes sense with the ebbs & flows of the milk prices currently. The only somewhat questionable increase in milk prices occurred in 2005, but I wouldn’t exactly classify it as a clear outlier. I have also noticed milk prices are overall increasing as the years go on, despite the fluctuations in price. Milk prices are the highest they have ever been and I would predict them to continue to increase overall as time goes on.
Section 3
### Moving Average and Remainder Visualizations#### Date Conversionmilk$date <-as.Date(milk$date)#### Sorting of Datamilk <- milk[order(milk$date), ]#### Moving Averagewindow_size <-7milk$moving_average <-rollmean(milk$milk_price, k = window_size, fill =NA)#### Visualization of Moving Averageggplot(milk, aes(x = date)) +geom_line(aes(y = milk_price), color ='blue', linetype ='solid', size =1, alpha =0.8) +geom_line(aes(y = moving_average), color ='green', linetype ='solid', size =1) +labs(title ='Time Series and Moving Average Visualization',x ='Date',y ='Milk Price') +theme_minimal()
After comparing the remainder to the moving average visualizations, I have noticed that there is a clear is a larger than normal difference between the original time and the moving average around 2005 that wasn’t as clear in the moving average visualization.
### Determining Seasonality Using Time Series Decompositionts_data <-ts(milk$milk_price, frequency =12)decomposition <-decompose(ts_data)autoplot(decomposition)
Since the pattern is prominent within the seasonal section across time, it is evident that there is strong seasonality. This does not match my expectations because I did not expect there to be seasonality with milk sales. In retrospect, it makes sense however. Certain periods of time have heavier usages of milk than others.
Section 4
### Naive Forecast#### Manually Create the Naive Forecastlast_price <-tail(milk$milk_price, 1)naive_forecast <-data.frame(date =seq(tail(milk$date, 1), by ="months", length.out =6), milk_price =rep(last_price, 6))#### Plot Original Time Seriesplot(milk$date, milk$milk_price, type ="l", main ="Naive Forecast of Milk Prices", ylab ="Milk Price", xlab ="Date")#### Add the Naive Forecast Linelines(naive_forecast$date, naive_forecast$milk_price, col ="red")#### Add Legendlegend("topleft", legend =c("Original", "Naive Forecast"), col =c("black", "red"), lty =1)
### Naive Forecast with Drift#### Manually Create a Drift Modeldrift_lm <-lm(data = milk, milk_price ~as.numeric(milk$date))#### Make Predictions Using the Drift Modelmilk$pred <-predict(drift_lm, newdata = milk)#### Naive Forecast with Driftnaive_forecast_drift <-rep(tail(milk$pred, 1), 6)#### Sequence of Datesforecast_dates <-seq(tail(milk$date, 1) +1, by ="months", length.out =6)#### Plotplot(milk$date, milk$milk_price, type ="l", main ="Naive Forecast with Drift for Milk Prices", ylab ="Milk Price", xlab ="Date")#### Add the Naive Forecast with Drift Linelines(c(tail(milk$date, 1), forecast_dates), c(tail(milk$milk_price, 1), naive_forecast_drift), col ="red")#### Add Legendlegend("topleft", legend =c("Original", "Naive Forecast with Drift"), col =c("black", "red"), lty =1)
I believe that based off the results shown above, that a naive forecast would do better than a naive forecast with drift. Accounting for the dynamic nature of milk prices, I think the naive forecast does a decent job at representing the behavior of the data. Due to the many factors that influence the price of milk however, I do not believe that naive forecasting does a good job. There are too many factors and variables not included to make the naive forecast a great viable option. I feel as if there is something majorly missing and would recommend further investigation into the matter. Milk Prices are simply very volatile as previously stated at the start of this assignment.