Set working directory
Read Dataset
wpi <- read.csv("Wholesal_price_index.csv")
Load Libraries
library(tseries)
library(Hmisc)
library(psych)
library(ggplot2)
View the Structure of the dataset
str(wpi)
## 'data.frame': 113 obs. of 2 variables:
## $ Monthly : chr "2012M04" "2012M05" "2012M06" "2012M07" ...
## $ wholesale.price.index: num 105 105 105 106 107 ...
View(wpi)
ChecK for missing values
- checked row with missing values.
missing_row_index <- which(is.na(wpi$wholesale.price.index))
print(missing_row_index)
## [1] 101
Create a Time series plot
- Convert the column to a numeric vector
- Create a time series object
- Plot the time series
wpi_numeric <- as.numeric(wpi$wholesale.price.index)
x <- ts(wpi_numeric, start = c(2012, 4), frequency = 12, end = c(2021, 8))
ts.plot(x, main = "Wholesale Price Index", xlab = "Year", ylab = "Percentage")

- From 2012 to 2021, the wholesale pricing index showed shifting
trends that had important repercussions for businesses. Between 2012 and
2014, wholesale prices gradually climbed, indicating higher costs for
products and services. However, in 2016, there was a decrease, which
could indicate a period of relative affordability for firms. The
subsequent price increase from 2018 onwards is likely to present issues
for enterprises, as greater wholesale costs may have an impact on
profitability and pricing strategies. The general trend shows that
enterprises should regularly watch wholesale price variations in order
to modify strategies and remain competitive in changeable market
conditions.
Identifer the outlier using a boxplot
boxplot(x)

boxplot.stats(x)
## $stats
## [1] 104.7 110.8 114.3 120.1 133.7
##
## $n
## [1] 113
##
## $conf
## [1] 112.9177 115.6823
##
## $out
## [1] 134.5 135.9
The boxplot and statistical summary illustrate interesting
trends in the wholesale pricing index data, providing useful information
for enterprises. Here is the interpretation:
- The median wholesale price index, 114.3, is a key reference
point for studying price distribution across time.
- The interquartile range (IQR), which ranges from 110.8 to 120.1,
represents the middle 50% of the data and demonstrates the diversity in
wholesale pricing.
- Outliers at 134.5 and 135.9 indicate possible abnormalities or
excessive price variations within specific time periods, requiring more
examination. The outlier difference also do not cause any significant
impact on the analysis
This examination like this can aid businesses in the
following ways:
- Understanding the average range of wholesale prices aids in
developing competitive pricing strategies and assessing price variations
in relation to industry standards.
- Monitoring outliers assists in identifying unexpected market
situations or interruptions, allowing for rapid adjustments to supply
chain or pricing plans.
- Understanding the central tendency and spread makes it easier to
estimate future price fluctuations, allowing you to make more educated
judgements about inventory management, production planning, and
financial projections.
Auto-correlation Function (ACF) AND Partial Correlation Function
(PCF) Plot
acf(x, lag.max = 200, type = c("correlation", "covariance", "partial"), plot = TRUE, na.action = na.fail, demean = TRUE)

- The chart indicates that recent wholesale pricing adjustments
are linked to previous ones. This means that firms can utilize recent
trends to predict what will happen in the near future. However, if we
look further back in time, this association weakens. As a result,
relying too much on historical data to forecast prices may prove
ineffective. It serves as a reminder to firms to be adaptable and make
sound decisions based on current data.
- The term “lag” relates to how far back in time we are looking
when comparing previous observations to recent ones. A lag of 0 implies
we’re comparing each observation to itself, a lag of 1 means we’re
comparing each observation to the one right before it, and so on.
“Auto-Correlation Function,” which determines how much each observation
in a time series is associated with previous values at various lags. So,
when i say “initial positive correlation” in the insight, it means there
is a positive association between recent observations and their previous
values. This positive connection implies that recent price fluctuations
follow similar patterns to previous price changes.
- However, as we travel back in time (with longer lags), this
correlation declines, implying that earlier data is less valuable for
predicting present prices.
pacf(x, lag.max = 50)

- This assists in identifying substantial lags that have a direct
impact on present value, providing useful information for forecasting
future trends and making informed decisions. Understanding the PACF can
help firms improve forecasting accuracy and strategic planning by
emphasizing the most important historical data points to examine when
projecting future market moves. PACF cuts out after lag 1, but we can’t
use it to determine which model to use because the data isn’t
constant.
adf.test(x)
##
## Augmented Dickey-Fuller Test
##
## data: x
## Dickey-Fuller = -1.1302, Lag order = 4, p-value = 0.9142
## alternative hypothesis: stationary
- The Augmented Dickey-Fuller Test (ADF Test) functions as a
checkup for time series data, determining if it behaves predictably over
time. In this situation, the test result indicates that our data lacks
sufficient evidence of consistency across time, implying that it may
contain changing patterns or trends. The Dickey-Fuller statistic, which
in this case is -1.1302, indicates how far the data deviates from being
stationary. Because data does not always follow the same patterns,
businesses may need to utilize diverse methodologies to analyse and
anticipate future trends.
Non stationary to stationary differences
dx <- diff(x, 1)
ts.plot(dx)

- The diff() function in R allows us to convert non-stationary
data into stationary data by calculating the differences between
consecutive observations. In this scenario, the output displays the
differences between the data points over time. The timeline spans the
years 2012 to 2020. The values at each point indicate the shift from one
observation to the next. When the difference is -5, it indicates that
the data declined by 5 units compared to the prior observation. A
difference of 0 denotes no change, but a difference of 5 suggests the
data has increased by 5 units.
- The plot shows that the discrepancies fluctuate with time,
showing changes in the underlying pattern of the data. This
transformation makes data more predictable and easy to analyse, which
can help firms discover trends and make informed decisions.
adf.test(dx)
## Warning in adf.test(dx): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: dx
## Dickey-Fuller = -4.4256, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
- *The Dickey-Fuller statistic (-4.4256) is calculated using the data differences (dx).*
- *The lag order, or the number of lags utilized in the test, is four.*
- *The p-value for the test is reported as 0.01.*
The warning notice indicates that the calculated p-value is
less than the reported p-value. This usually happens when the p-value is
extremely low, indicating significant evidence against the null
hypothesis.In this scenario, because the p-value is less than a
significance level (often 0.05), we reject the null hypothesis that the
data is not stationary. Instead, we infer that the data is steady, which
means that it retains consistent statistical features over
time.
For organisations, this result suggests that data
discrepancies (dx) follow a consistent pattern, making it easier to
analyse and forecast future trends. This information can help firms make
better decisions and understand the behavior of their time series
data.
Create training dataset
x1<-x[1:110]
x1
## [1] 104.7 105.3 105.3 106.2 106.9 107.6 107.4 107.3 107.1 108.0 108.4 108.6
## [13] 108.6 108.6 110.1 111.2 112.9 114.3 114.6 114.3 113.4 113.6 113.6 114.3
## [25] 114.1 114.8 115.2 116.7 117.2 116.4 115.6 114.1 112.1 110.8 109.6 109.9
## [37] 110.2 111.4 111.8 111.1 110.0 109.9 110.1 109.9 109.4 108.0 107.1 107.7
## [49] 109.0 110.4 111.7 111.8 111.2 111.4 111.5 111.9 111.7 112.6 113.0 113.2
## [61] 113.2 112.9 112.7 113.9 114.8 114.9 115.6 116.4 115.7 116.0 116.1 116.3
## [73] 117.3 118.3 119.1 119.9 120.1 120.9 122.0 121.6 119.7 119.2 119.5 119.9
## [85] 121.1 121.6 121.5 121.3 121.5 121.3 122.0 122.3 123.0 123.4 122.2 120.4
## [97] 119.2 117.5 119.3 121.0 114.3 122.9 123.6 125.1 125.4 126.5 128.1 129.9
## [109] 132.0 132.9
ARIMA Model
result <- arima(x1, order = c(1,1,0))
tsdiag(result)

The ARIMA model aids in understanding and predicting wholesale
pricing variations over time. The diagnostic plots tell us the
following:
Standardized residuals are the differences between actual
prices and what our model forecasts. Ideally, these differences should
be random and near zero. While most of them do, some deviate too far
from zero, indicating that our model may be inaccurate.
ACF of Residuals: This is equivalent to determining whether
there is a residual pattern in the variations between projected and
actual prices. We discovered a strong pattern every five months,
indicating that our model is not capturing all of the relevant
information.
Ljung-Box Statistic: Think of this as a test to examine if
there is any residual pattern in the disparities between anticipated and
actual prices at various periods in time. Our test revealed some trends
at some locations, indicating that our model may be missing some key
components.
While my ARIMA model provides a fair overall picture of how
wholesale prices vary over time, these plots show that it may overlook
some minor subtleties. This suggests that we may need to modify our
model to produce more accurate forecasts.
Testing the model
predict(result, 3)
## $pred
## Time Series:
## Start = 111
## End = 113
## Frequency = 1
## [1] 132.8450 132.8483 132.8481
##
## $se
## Time Series:
## Start = 111
## End = 113
## Frequency = 1
## [1] 1.373933 1.884582 2.286675
- The ARIMA model estimates the wholesale price index for the
following three time periods with reasonable accuracy and includes an
uncertainty measure for each prediction. The estimated wholesale price
index values are 132.85, 132.85, and 132.85, with standard errors of
1.37, 1.88, and 2.29, respectively. This knowledge is critical for
businesses because it allows you to anticipate probable wholesale price
variations, which helps you make strategic decisions about
manufacturing, pricing, inventory management, and budgeting. Businesses
that incorporate such forecasts into their planning processes will be
able to better adjust to changing market conditions, optimize resource
allocation, and preserve industry competitiveness. Furthermore, the
inclusion of standard errors allows organisations to assess the
trustworthiness of these forecasts, enabling you to make informed
decisions in the face of uncertainty.
x[111:113]
## [1] 133.7 134.5 135.9
- The wholesale price index numbers for the next three periods are
133.7, 134.5, and 135.9 respectively. The ARIMA model accurately
predicts future price changes, as evidenced by the comparison of actual
and anticipated values (132.85, 132.85, and 132.85). This comparison
gives vital input to enterprises, allowing companies to analyse the
forecasting model’s reliability and change their strategy and operations
accordingly. It emphasizes the significance of constantly analyzing and
refining forecasting processes in order to improve decision-making and
respond to market changes.
Forecasting Wholesale Price Index
result_1 <- arima (x, order = c(1,1,0))
predict(result_1, 3)
## $pred
## Sep Oct Nov
## 2021 135.8319 135.8352 135.8350
##
## $se
## Sep Oct Nov
## 2021 1.366955 1.886703 2.293286
- The forecasted wholesale price index values for September,
October, and November 2021 are 135.83, 135.84, and 135.84, respectively.
These projections, together with the related standard errors, provide
essential insights for businesses. They enable decision-makers to
proactively change price plans, optimize inventory levels, and
strategically deploy resources. Businesses can use advanced forecasting
techniques to keep ahead of market volatility, improve operational
efficiency, and maintain competitiveness. Such insights enable
organisations to confidently navigate changing market dynamics,
resulting in long-term growth and profitability.
x2 <- x [1:113]
plot (x2, main = "Wholesale Price Index", xlab = "Year", ylab = "Percentages")
forecast = predict(result_1, n.ahead = 5)
lines(114:118, forecast$pred, type = "o", col="red")

- The wholesale price index is plotted over time, with percentages
ranging from 105 to 135. The timeline runs from 2012 to 2021, with each
point representing a particular year. Initially, the index begins at 105
and gradually rises until roughly 2014 (year 20), when it undergoes
swings. The index shows a substantial decline around 2016 (year 40),
followed by a gradual rise until around 2020 (year 80). The red dots
reflect the anticipated values for the next five time points, which
expand the plot beyond the current data. These forecasts provide firms
with significant insights into probable future trends, which help with
strategic planning and decision-making.