library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
laptop_prices <- read.csv("/Users/revathiyajjavarapu/Documents/statistics(1)/laptop_prices.csv")

Anova test

laptop_prices <- laptop_prices %>% filter(!is.na(Price_euros), !is.na(PrimaryStorageType))

# ANOVA
anova_result <- aov(Price_euros ~ PrimaryStorageType, data = laptop_prices)

summary(anova_result)
##                      Df    Sum Sq  Mean Sq F value Pr(>F)    
## PrimaryStorageType    3 162036534 54012178   148.1 <2e-16 ***
## Residuals          1271 463566354   364726                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Visualization

boxplot(Price_euros ~ PrimaryStorageType, 
        data = laptop_prices,
        main = "Price Distribution by Primary Storage Type",
        xlab = "Primary Storage Type",
        ylab = "Price in Euros",
        col = c("lightblue", "lightgreen", "lightcoral", "lightyellow"),
        border = "darkblue")

grid(nx = NULL, ny = NULL)

PrimaryStorageType: 3, meaning there are 4 levels/categories in PrimaryStorageType. Residuals: 1271, which refers to the remaining degrees of freedom after accounting for the predictor

sum of squares is the variation in Price_euros that can be explained by the PrimaryStorageType. 463,566,354 is the residual variation in Price_euros that is not explained by the model (i.e., unexplained variability)

54012178 is the average variation in Price_euros explained by PrimaryStorageType (Sum Sq divided by Df). 364726: This is the average unexplained variation (Sum Sq divided by residual Df).

The F-statistic is used to compare the model’s fit with and without the predictor. A larger F value indicates that the predictor (PrimaryStorageType) significantly improves the model.

The extremely small p-value (< 0.00001) suggests that PrimaryStorageType has a statistically significant effect on Price_euros We can conclude that different storage types (e.g., SSD, HDD, etc.) impact the price.

The F-statistic (148.1) and the large Sum Sq for PrimaryStorageType indicate that storage type is a strong predictor of laptop prices.

correlation coefficients

cor(laptop_prices$Ram, laptop_prices$Price_euros)               # Correlation between RAM and Price
## [1] 0.7402865
cor(laptop_prices$Inches, laptop_prices$Price_euros)            # Correlation between Screen Size and Price
## [1] 0.06660794
cor(laptop_prices$Weight, laptop_prices$Price_euros)            # Correlation between Weight and Price
## [1] 0.2118834
cor(laptop_prices$CPU_freq, laptop_prices$Price_euros)          # Correlation between CPU Frequency and Price
## [1] 0.4288472
cor(laptop_prices$PrimaryStorage, laptop_prices$Price_euros)    # Correlation between Primary Storage and Price
## [1] -0.1247752

linear regression model with RAM as the predictor

model <- lm(Price_euros ~ Ram, data = laptop_prices)

summary(model)
## 
## Call:
## lm(formula = Price_euros ~ Ram, data = laptop_prices)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2813.72  -297.59   -94.07   244.39  2859.29 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   276.03      25.54   10.81   <2e-16 ***
## Ram           101.76       2.59   39.29   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 471.3 on 1273 degrees of freedom
## Multiple R-squared:  0.548,  Adjusted R-squared:  0.5477 
## F-statistic:  1544 on 1 and 1273 DF,  p-value: < 2.2e-16
plot(laptop_prices$Ram, laptop_prices$Price_euros, 
     main = "Price vs RAM with Regression Line",
     xlab = "RAM (GB)", 
     ylab = "Price in Euros",
     pch = 19, col = "blue")

abline(lm(Price_euros ~ Ram, data = laptop_prices), col = "red", lwd = 2)

grid(nx = NULL, ny = NULL)

By coefficient of these catergorical columns, though primary storage is predictor, but RAM’s correlation coefficient is higher.

From this coefficients of RAM, Intercept and Ram, When Ram is 0 GB (a hypothetical scenario), the estimated base price of a laptop is €276.03.For each additional GB of Ram, the price of the laptop increases by €101.76.

R-squared (0.548): This value indicates that approximately 54.8% of the variability in Price_euros is explained by Ram The adjusted R-squared is very close to the R-squared, indicating that the model’s explanatory power is stable even after accounting for the number of predictors.

The extremely low p-value suggests that the relationship between Ram and Price_euros is significant

The regression line is increasing significantly in scatter plot.