HW: Find a data set of which you can fit multiple linear regression and interpret your results

A good dataset for Multiple Linear Regression (MLR) should have:

.One continuous dependent variable (response variable).

.Two or more independent variables (predictors).

.Enough observations to identify relationships.

Let’s use Zee Entertainment Enterprises as in the last work we did

Dependent Variable (Y)

.Close

Independent Variables (X)

.Prev Close
.Open
.High
.Low
.Volume
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ZEEL <- read.csv("ZEEL.csv")

model <- lm(
  Close ~ `Prev.Close` + Open + High + Low + Volume,
  data = ZEEL
)

summary(model)
## 
## Call:
## lm(formula = Close ~ Prev.Close + Open + High + Low + Volume, 
##     data = ZEEL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -92.735  -1.512  -0.285   1.253  67.805 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.500e-01  1.329e-01   3.387 0.000712 ***
## Prev.Close  -6.050e-02  1.145e-02  -5.285 1.31e-07 ***
## Open        -4.981e-01  1.390e-02 -35.830  < 2e-16 ***
## High         7.368e-01  9.363e-03  78.689  < 2e-16 ***
## Low          8.209e-01  7.854e-03 104.523  < 2e-16 ***
## Volume       2.585e-08  8.287e-09   3.119 0.001823 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.843 on 5300 degrees of freedom
## Multiple R-squared:  0.9992, Adjusted R-squared:  0.9992 
## F-statistic: 1.401e+06 on 5 and 5300 DF,  p-value: < 2.2e-16

A multiple linear regression model was fitted to examine the relationship between the closing stock price (Close) of Zee Entertainment Enterprises Ltd (ZEEL) and the predictor variables Prev.Close, Open, High, Low, and Volume. The results indicate that all predictor variables are statistically significant since their p-values are less than 0.05. The variables High and Low have strong positive effects on the closing price, while Prev.Close and Open have negative coefficients after controlling for the effects of the other variables.

Trading volume also has a positive and statistically significant effect, although its coefficient is very small due to the large scale of volume values. The model achieved an R-squared value of 0.9992, meaning that approximately 99.92% of the variation in the closing price is explained by the independent variables included in the model. Furthermore, the overall model is highly significant (F-statistic = 1.401 × 10⁶, p-value < 2.2 × 10⁻¹⁶), indicating that the predictors collectively have a significant relationship with the closing price. These results suggest that the model has excellent predictive performance for ZEEL stock prices.

However, because the stock price variables (Prev.Close, Open, High, Low, and Close) are likely to be highly correlated, multicollinearity may be present, which can make individual coefficient estimates less stable and more difficult to interpret even though the overall predictive accuracy remains very high.