library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(car) # for VIF
## Loading required package: carData
##
## Attaching package: 'car'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following object is masked from 'package:purrr':
##
## some
library(broom) # for tidy model output
laptop_prices <- read.csv("~/Documents/statistics(1)/annotated-laptop_prices_reverted.csv")
lm_price <- lm(Price_euros ~ Inches + Ram + CPU_freq, data = laptop_prices)
# Check VIF
vif(lm_price)
## Inches Ram CPU_freq
## 1.126600 1.180069 1.225516
summary(lm_price)
##
## Call:
## lm(formula = Price_euros ~ Inches + Ram + CPU_freq, data = laptop_prices)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2432.6 -260.5 -71.2 218.7 2954.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 859.383 133.205 6.452 1.57e-10 ***
## Inches -83.581 9.201 -9.084 < 2e-16 ***
## Ram 96.092 2.641 36.390 < 2e-16 ***
## CPU_freq 312.691 27.227 11.485 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 442.3 on 1271 degrees of freedom
## Multiple R-squared: 0.6026, Adjusted R-squared: 0.6016
## F-statistic: 642.3 on 3 and 1271 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(lm_price)
Residuals vs. Fitted Plot: The residuals show a slight pattern and seem to increase in variability with higher fitted values. This suggests some degree of heteroscedasticity, meaning the variance of residuals is not constant across all levels of the fitted values. This could affect the reliability of confidence intervals and p-values in the model.
Q-Q Plot: The Q-Q plot indicates that the residuals deviate from the normal distribution, particularly at the tails. This suggests that the residuals are not normally distributed, which may impact the validity of statistical tests for model coefficients.
Scale-Location Plot: The increasing trend in this plot confirms heteroscedasticity, as the spread of residuals grows with fitted values. This means that higher-priced laptops have more variability in residuals than lower-priced ones.
Residuals vs. Leverage Plot: There are a few data points with higher leverage (e.g., 1067) that might influence the model significantly. However, they do not exceed Cook’s distance threshold, so they may not be extreme outliers, but they still have some influence.
coef_summary <- summary(lm_price)$coefficients
coef_summary
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 859.38304 133.205151 6.451575 1.569956e-10
## Inches -83.58084 9.201119 -9.083769 3.907376e-19
## Ram 96.09189 2.640589 36.390318 2.941810e-199
## CPU_freq 312.69129 27.226564 11.484787 4.063479e-29
ram_coef <- coef(lm_price)["Ram"]
cat("For each additional GB of RAM, the laptop's price is expected to increase by", ram_coef, "euros.")
## For each additional GB of RAM, the laptop's price is expected to increase by 96.09189 euros.
Intercept: The baseline price is approximately 859.38 euros when all other predictors are at zero, though this interpretation isn’t practically meaningful.
Inches: Each additional inch in screen size is associated with a decrease of around 83.58 euros in price, holding other factors constant. This is statistically significant with p<0.001
Ram: Each additional GB of RAM increases the price by about 96.09 euros, assuming all other factors remain constant. This coefficient is highly significant, as indicated by the very low p-value and it aligns with the notion that more RAM generally raises laptop prices.
CPU Frequency (CPU_freq): A 1 GHz increase in CPU frequency is associated with a 312.69-euro increase in price, which is also statistically significant (p<0.001).
The interpretation for the RAM coefficient specifically is: for each additional GB of RAM, the laptop’s price is expected to increase by approximately 96.09 euros, reflecting the added cost associated with higher memory capacity.