A good dataset for Multiple Linear Regression (MLR) should
have:
.One continuous dependent variable (response variable).
.Two or more independent variables (predictors).
.Enough observations to identify relationships.
Let’s use Zee Entertainment Enterprises as in the last work we
did
Dependent Variable (Y)
.Close
Independent Variables (X)
.Prev Close
.Open
.High
.Low
.Volume
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.1 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ZEEL <- read.csv("ZEEL.csv")
model <- lm(
Close ~ `Prev.Close` + Open + High + Low + Volume,
data = ZEEL
)
summary(model)
##
## Call:
## lm(formula = Close ~ Prev.Close + Open + High + Low + Volume,
## data = ZEEL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -92.735 -1.512 -0.285 1.253 67.805
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.500e-01 1.329e-01 3.387 0.000712 ***
## Prev.Close -6.050e-02 1.145e-02 -5.285 1.31e-07 ***
## Open -4.981e-01 1.390e-02 -35.830 < 2e-16 ***
## High 7.368e-01 9.363e-03 78.689 < 2e-16 ***
## Low 8.209e-01 7.854e-03 104.523 < 2e-16 ***
## Volume 2.585e-08 8.287e-09 3.119 0.001823 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.843 on 5300 degrees of freedom
## Multiple R-squared: 0.9992, Adjusted R-squared: 0.9992
## F-statistic: 1.401e+06 on 5 and 5300 DF, p-value: < 2.2e-16
A multiple linear regression model was fitted to examine the
relationship between the closing stock price (Close) of Zee
Entertainment Enterprises Ltd (ZEEL) and the predictor variables
Prev.Close, Open, High, Low, and Volume. The results indicate that all
predictor variables are statistically significant since their p-values
are less than 0.05. The variables High and Low have strong positive
effects on the closing price, while Prev.Close and Open have negative
coefficients after controlling for the effects of the other
variables.
Trading volume also has a positive and statistically significant
effect, although its coefficient is very small due to the large scale of
volume values. The model achieved an R-squared value of 0.9992, meaning
that approximately 99.92% of the variation in the closing price is
explained by the independent variables included in the model.
Furthermore, the overall model is highly significant (F-statistic =
1.401 × 10⁶, p-value < 2.2 × 10⁻¹⁶), indicating that the predictors
collectively have a significant relationship with the closing price.
These results suggest that the model has excellent predictive
performance for ZEEL stock prices.
However, because the stock price variables (Prev.Close, Open, High,
Low, and Close) are likely to be highly correlated, multicollinearity
may be present, which can make individual coefficient estimates less
stable and more difficult to interpret even though the overall
predictive accuracy remains very high.