library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
library(ggplot2)
laptop_prices <- read.csv("/Users/revathiyajjavarapu/Documents/statistics(1)/laptop_prices.csv")
initial_model <- lm(Price_euros ~ Ram, data = laptop_prices)
summary(initial_model)
##
## Call:
## lm(formula = Price_euros ~ Ram, data = laptop_prices)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2813.72 -297.59 -94.07 244.39 2859.29
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 276.03 25.54 10.81 <2e-16 ***
## Ram 101.76 2.59 39.29 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 471.3 on 1273 degrees of freedom
## Multiple R-squared: 0.548, Adjusted R-squared: 0.5477
## F-statistic: 1544 on 1 and 1273 DF, p-value: < 2.2e-16
model <- lm(Price_euros ~ Ram + Inches + PrimaryStorage + Ram:PrimaryStorage, data = laptop_prices)
summary(model)
##
## Call:
## lm(formula = Price_euros ~ Ram + Inches + PrimaryStorage + Ram:PrimaryStorage,
## data = laptop_prices)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2392.79 -276.11 -80.72 207.91 2944.02
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 987.010924 137.453940 7.181 1.18e-12 ***
## Ram 115.412695 4.969542 23.224 < 2e-16 ***
## Inches -48.332840 9.841642 -4.911 1.02e-06 ***
## PrimaryStorage -0.063155 0.072046 -0.877 0.3809
## Ram:PrimaryStorage -0.019055 0.007681 -2.481 0.0132 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 457.4 on 1270 degrees of freedom
## Multiple R-squared: 0.5753, Adjusted R-squared: 0.574
## F-statistic: 430.1 on 4 and 1270 DF, p-value: < 2.2e-16
vif_values <- vif(model, type = "predictor")
## GVIFs computed for predictors
vif_values
## GVIF Df GVIF^(1/(2*Df)) Interacts With Other Predictors
## Ram 1.205243 3 1.031603 PrimaryStorage Inches
## Inches 1.205243 1 1.097836 -- Ram, PrimaryStorage
## PrimaryStorage 1.205243 3 1.031603 Ram Inches
RAM is a primary determinant of laptop performance, influencing price.
I have added Screen Size as it may impact the price of the laptop, Screen dimensions could impact price, as larger screens are generally more expensive.
Storage capacity can significantly influence a laptop’s price, as higher storage often translates to higher cost, particularly for SSDs. It’s worth examining if PrimaryStorage has a direct effect or an interactive effect with Ram.
Interaction term Ram: PrimaryStorage is added if laptops with high RAM but limited storage (or vice versa) are priced differently, suggesting an interdependent effect between RAM and storage.
The GVIF values for Ram, Inches, and PrimaryStorage are low, indicating minimal multicollinearity issues. A GVIF value close to 1 suggests very low correlation with other variables.
Adjusted GVIF value (taking into account the degrees of freedom) helps when interpreting interaction terms. All predictors, including the interaction, have values well below 2, suggesting acceptable levels of multicollinearity.
Ram and PrimaryStorage interact but maintain low multicollinearity when adjusting for the degrees of freedom in the interaction. Inches has a GVIF close to 1, indicating it’s largely independent of other terms
par(mfrow = c(2, 2))
plot(model)
The first one is Residuals vs fitted plot. In this plot, there’s some spread at higher fitted values, suggesting possible heteroscedasticity (non-constant variance) or an issue with model fit for higher-priced laptops..
2nd one is Q-Q plot, in this plot, there’s some deviation in the tails, indicating that the residuals may not be perfectly normal, especially in the upper and lower extremes.
3rd one is Scale Location, This plot checks for homoscedasticity by plotting the square root of the standardized residuals against the fitted values.
4th one is Residuals vs levarage, shows the residuals against leverage (a measure of how far away an independent variable’s value is from its mean), few points are outside the Cook’s distance lines, meaning they may have a strong influence on the model’s fit.