Load the data
phone <- read.csv('https://raw.githubusercontent.com/Kingtilon1/DATA607/main/Cellphone.csv')
glimpse(phone)
## Rows: 161
## Columns: 14
## $ Product_id <int> 203, 880, 40, 99, 880, 947, 774, 947, 99, 1103, 289, 605,…
## $ Price <int> 2357, 1749, 1916, 1315, 1749, 2137, 1238, 2137, 1315, 258…
## $ Sale <int> 10, 10, 10, 11, 11, 12, 13, 13, 14, 15, 16, 16, 16, 16, 1…
## $ weight <dbl> 135.0, 125.0, 110.0, 118.5, 125.0, 150.0, 134.1, 150.0, 1…
## $ resoloution <dbl> 5.2, 4.0, 4.7, 4.0, 4.0, 5.5, 4.0, 5.5, 4.0, 5.1, 5.3, 5.…
## $ ppi <int> 424, 233, 312, 233, 233, 401, 233, 401, 233, 432, 277, 20…
## $ cpu.core <int> 8, 2, 4, 2, 2, 4, 2, 4, 2, 4, 8, 8, 4, 4, 4, 4, 4, 4, 4, …
## $ cpu.freq <dbl> 1.350, 1.300, 1.200, 1.300, 1.300, 2.300, 1.200, 2.300, 1…
## $ internal.mem <dbl> 16, 4, 8, 4, 4, 16, 8, 16, 4, 16, 32, 4, 16, 32, 16, 8, 1…
## $ ram <dbl> 3.000, 1.000, 1.500, 0.512, 1.000, 2.000, 1.000, 2.000, 0…
## $ RearCam <dbl> 13.00, 3.15, 13.00, 3.15, 3.15, 16.00, 2.00, 16.00, 3.15,…
## $ Front_Cam <dbl> 8.0, 0.0, 5.0, 0.0, 0.0, 8.0, 0.0, 8.0, 0.0, 2.0, 8.0, 0.…
## $ battery <int> 2610, 1700, 2000, 1400, 1700, 2500, 1560, 2500, 1400, 280…
## $ thickness <dbl> 7.4, 9.9, 7.6, 11.0, 9.9, 9.5, 11.7, 9.5, 11.0, 8.1, 7.7,…
phone <- phone %>%
mutate(high_price = ifelse(Price > mean(Price), 1, 0))
phone$weight_squared <- phone$weight^2
phone$battery_interaction <- phone$high_price * phone$battery
model <- lm(Price ~ weight + weight_squared + high_price + battery + battery_interaction, data = phone)
summary(model)
##
## Call:
## lm(formula = Price ~ weight + weight_squared + high_price + battery +
## battery_interaction, data = phone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -758.30 -207.85 -8.25 237.59 824.01
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 749.117736 117.655399 6.367 2.08e-09 ***
## weight 1.001932 1.521148 0.659 0.511
## weight_squared -0.007187 0.001676 -4.289 3.14e-05 ***
## high_price 719.758634 149.853704 4.803 3.66e-06 ***
## battery 0.397754 0.060165 6.611 5.82e-10 ***
## battery_interaction 0.040680 0.052486 0.775 0.439
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 313.6 on 155 degrees of freedom
## Multiple R-squared: 0.8386, Adjusted R-squared: 0.8334
## F-statistic: 161.1 on 5 and 155 DF, p-value: < 2.2e-16
The linear model might not be fully suitable due to uneven variability in errors across different price ranges, like seeing more consistent errors for mid-range smartphones compared to high-end ones like Apple products. Despite this, the normal distribution of errors in the Q-Q plot aligns with linear regression expectations. However, the model’s inability to handle varying levels of error across price categories could limit its ability to accurately predict prices in the real world, particularly for premium products.
residuals_vs_fitted <- ggplot(data = augment(model), aes(.fitted, .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
xlab("Fitted values") +
ylab("Residuals") +
ggtitle("Residuals vs Fitted")
qq_plot <- ggplot(data = augment(model), aes(sample = .resid)) +
stat_qq() +
stat_qq_line() +
ggtitle("Normal Q-Q Plot of Residuals")
residual_plots <- gridExtra::grid.arrange(residuals_vs_fitted, qq_plot, ncol = 2)
residual_plots
## TableGrob (1 x 2) "arrange": 2 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]