library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
laptop_prices <- read.csv("/Users/revathiyajjavarapu/Documents/statistics(1)/laptop_prices.csv")
laptop_prices <- laptop_prices %>% filter(!is.na(Price_euros), !is.na(PrimaryStorageType))
# ANOVA
anova_result <- aov(Price_euros ~ PrimaryStorageType, data = laptop_prices)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## PrimaryStorageType 3 162036534 54012178 148.1 <2e-16 ***
## Residuals 1271 463566354 364726
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
boxplot(Price_euros ~ PrimaryStorageType,
data = laptop_prices,
main = "Price Distribution by Primary Storage Type",
xlab = "Primary Storage Type",
ylab = "Price in Euros",
col = c("lightblue", "lightgreen", "lightcoral", "lightyellow"),
border = "darkblue")
grid(nx = NULL, ny = NULL)
PrimaryStorageType: 3, meaning there are 4 levels/categories in PrimaryStorageType. Residuals: 1271, which refers to the remaining degrees of freedom after accounting for the predictor
sum of squares is the variation in Price_euros that can be explained by the PrimaryStorageType. 463,566,354 is the residual variation in Price_euros that is not explained by the model (i.e., unexplained variability)
54012178 is the average variation in Price_euros explained by PrimaryStorageType (Sum Sq divided by Df). 364726: This is the average unexplained variation (Sum Sq divided by residual Df).
The F-statistic is used to compare the model’s fit with and without the predictor. A larger F value indicates that the predictor (PrimaryStorageType) significantly improves the model.
The extremely small p-value (< 0.00001) suggests that PrimaryStorageType has a statistically significant effect on Price_euros We can conclude that different storage types (e.g., SSD, HDD, etc.) impact the price.
The F-statistic (148.1) and the large Sum Sq for PrimaryStorageType indicate that storage type is a strong predictor of laptop prices.
cor(laptop_prices$Ram, laptop_prices$Price_euros) # Correlation between RAM and Price
## [1] 0.7402865
cor(laptop_prices$Inches, laptop_prices$Price_euros) # Correlation between Screen Size and Price
## [1] 0.06660794
cor(laptop_prices$Weight, laptop_prices$Price_euros) # Correlation between Weight and Price
## [1] 0.2118834
cor(laptop_prices$CPU_freq, laptop_prices$Price_euros) # Correlation between CPU Frequency and Price
## [1] 0.4288472
cor(laptop_prices$PrimaryStorage, laptop_prices$Price_euros) # Correlation between Primary Storage and Price
## [1] -0.1247752
model <- lm(Price_euros ~ Ram, data = laptop_prices)
summary(model)
##
## Call:
## lm(formula = Price_euros ~ Ram, data = laptop_prices)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2813.72 -297.59 -94.07 244.39 2859.29
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 276.03 25.54 10.81 <2e-16 ***
## Ram 101.76 2.59 39.29 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 471.3 on 1273 degrees of freedom
## Multiple R-squared: 0.548, Adjusted R-squared: 0.5477
## F-statistic: 1544 on 1 and 1273 DF, p-value: < 2.2e-16
plot(laptop_prices$Ram, laptop_prices$Price_euros,
main = "Price vs RAM with Regression Line",
xlab = "RAM (GB)",
ylab = "Price in Euros",
pch = 19, col = "blue")
abline(lm(Price_euros ~ Ram, data = laptop_prices), col = "red", lwd = 2)
grid(nx = NULL, ny = NULL)
By coefficient of these catergorical columns, though primary storage is predictor, but RAM’s correlation coefficient is higher.
From this coefficients of RAM, Intercept and Ram, When Ram is 0 GB (a hypothetical scenario), the estimated base price of a laptop is €276.03.For each additional GB of Ram, the price of the laptop increases by €101.76.
R-squared (0.548): This value indicates that approximately 54.8% of the variability in Price_euros is explained by Ram The adjusted R-squared is very close to the R-squared, indicating that the model’s explanatory power is stable even after accounting for the number of predictors.
The extremely low p-value suggests that the relationship between Ram and Price_euros is significant
The regression line is increasing significantly in scatter plot.