2026-01-31

Introduction

Amazon is one of the largest and most traded companies in the stock market. Predicting stock prices is important for investors and financial analysts. Statistical models help us understand how different factors affect stock prices. In this project, we use regression analysis to predict Amazon’s stock price.

Data Description

We use historical Amazon stock data obtained from Yahoo Finance. The dataset includes daily stock prices and trading volume. In this analysis, the closing price is used as the variable we want to predict.

Amazon Stock Price Over Time

Note: The brown line shows a smoothed trend highlighting the overall movement of Amazon’s stock price.

Price and Trading Volume

Note: A log scale is used for trading volume to reduce the effect of extreme values.

Regression Model

We use Amazon’s closing stock price using a multiple linear regression model.

\[ Price = \beta_0 + \beta_1\, Volume + \beta_2 \, Time + \varepsilon \] where \(\beta_0\) is the intercept. \(\beta_1\) and \(\beta_2\) are regression coefficients, and \(\varepsilon\) represents random error.

Estimating the Model

The regression coefficients are estimated using the least squares method.

\[ \hat{\beta} = (X^\top X)^{-1} X^\top y \]

This formula finds the values of the coefficients that minimize the sum of squared errors between the observed and predicted stock prices.

Prediction on Recent Data

To evaluate prediction performance, the model is trained on historical data and tested on the most recent five years of observations.

3D Visualization of the Regression Model

R Code Used for 3D Visualization

# Fit regression model
model <- lm(close ~ volume + time_years, data = df)

# Create grid for prediction
vol_seq <- seq(min(df$volume), max(df$volume), length.out = 30)
t_seq <- seq(min(df$time_years), max(df$time_years), length.out = 30)

grid <- expand.grid(volume = vol_seq, time_years = t_seq)
grid$close_pred <- predict(model, newdata = grid)

plot_ly() %>%
  add_markers(
    x = df$volume,
    y = df$time_years,
    z = df$close,
    marker = list(size = 1),
    name = "Observed Prices") %>%
  layout(
    scene = list(
      xaxis = list(title = "Trading Volume"),
      yaxis = list(title = "Time (years)"),
      zaxis = list(title = "Closing Price")
    )
  )

Conclusion

This project used regression analysis to model and predict Amazon’s stock price. The results show that trading volume and time help explain price movements. Out-of-sample prediction demonstrates how the model performs on unseen data. Overall, regression provides a useful statistical framework for stock analysis.