Main Objective-

-The primary objective of this analysis is to understand the key factors that influence laptop prices.

-Develop a predictive model that can estimate the price of a laptop based on its specifications and features

#Benefit-

-This analysis helps manufacturers, retailers to understand how the features of laptop influencing the prices. Setting competitive price for laptops according to the specifications. Marketing teams to odentigy key selling points.

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(corrplot)
## corrplot 0.95 loaded
laptop_data <- read.csv("~/Documents/statistics(1)/annotated-laptop_prices_reverted.csv")

Initial EDA

print("Summary of the dataset:")
## [1] "Summary of the dataset:"
summary(laptop_data)
##    Company            Product            TypeName             Inches     
##  Length:1275        Length:1275        Length:1275        Min.   :10.10  
##  Class :character   Class :character   Class :character   1st Qu.:14.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :15.60  
##                                                           Mean   :15.02  
##                                                           3rd Qu.:15.60  
##                                                           Max.   :18.40  
##       Ram              OS                Weight       Price_euros  
##  Min.   : 2.000   Length:1275        Min.   :0.690   Min.   : 174  
##  1st Qu.: 4.000   Class :character   1st Qu.:1.500   1st Qu.: 609  
##  Median : 8.000   Mode  :character   Median :2.040   Median : 989  
##  Mean   : 8.441                      Mean   :2.041   Mean   :1135  
##  3rd Qu.: 8.000                      3rd Qu.:2.310   3rd Qu.:1496  
##  Max.   :64.000                      Max.   :4.700   Max.   :6099  
##     Screen             ScreenW        ScreenH     TouchscreenIPSpanel
##  Length:1275        Min.   :1366   Min.   : 768   Length:1275        
##  Class :character   1st Qu.:1920   1st Qu.:1080   Class :character   
##  Mode  :character   Median :1920   Median :1080   Mode  :character   
##                     Mean   :1900   Mean   :1074                      
##                     3rd Qu.:1920   3rd Qu.:1080                      
##                     Max.   :3840   Max.   :2160                      
##  RetinaDisplay      CPU_company           CPU_freq      CPU_model        
##  Length:1275        Length:1275        Min.   :0.900   Length:1275       
##  Class :character   Class :character   1st Qu.:2.000   Class :character  
##  Mode  :character   Mode  :character   Median :2.500   Mode  :character  
##                                        Mean   :2.303                     
##                                        3rd Qu.:2.700                     
##                                        Max.   :3.600                     
##  PrimaryStorage   SecondaryStorage PrimaryStorageType SecondaryStorageType
##  Min.   :   8.0   Min.   :   0.0   Length:1275        Length:1275         
##  1st Qu.: 256.0   1st Qu.:   0.0   Class :character   Class :character    
##  Median : 256.0   Median :   0.0   Mode  :character   Mode  :character    
##  Mean   : 444.5   Mean   : 176.1                                          
##  3rd Qu.: 512.0   3rd Qu.:   0.0                                          
##  Max.   :2048.0   Max.   :2048.0                                          
##  GPU_company         GPU_model         Touchscreen       
##  Length:1275        Length:1275        Length:1275       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
## 
# Check for missing values
print("Missing values per column:")
## [1] "Missing values per column:"
colSums(is.na(laptop_data))
##              Company              Product             TypeName 
##                    0                    0                    0 
##               Inches                  Ram                   OS 
##                    0                    0                    0 
##               Weight          Price_euros               Screen 
##                    0                    0                    0 
##              ScreenW              ScreenH  TouchscreenIPSpanel 
##                    0                    0                    0 
##        RetinaDisplay          CPU_company             CPU_freq 
##                    0                    0                    0 
##            CPU_model       PrimaryStorage     SecondaryStorage 
##                    0                    0                    0 
##   PrimaryStorageType SecondaryStorageType          GPU_company 
##                    0                    0                    0 
##            GPU_model          Touchscreen 
##                    0                    0
#  Data structure
print("Structure of the dataset:")
## [1] "Structure of the dataset:"
str(laptop_data)
## 'data.frame':    1275 obs. of  23 variables:
##  $ Company             : chr  "Apple" "Apple" "HP" "Apple" ...
##  $ Product             : chr  "MacBook Pro" "Macbook Air" "250 G6" "MacBook Pro" ...
##  $ TypeName            : chr  "Ultrabook" "Ultrabook" "Notebook" "Ultrabook" ...
##  $ Inches              : num  13.3 13.3 15.6 15.4 13.3 15.6 15.4 13.3 14 14 ...
##  $ Ram                 : int  8 8 8 16 8 4 16 8 16 8 ...
##  $ OS                  : chr  "macOS" "macOS" "No OS" "macOS" ...
##  $ Weight              : num  1.37 1.34 1.86 1.83 1.37 2.1 2.04 1.34 1.3 1.6 ...
##  $ Price_euros         : num  1340 899 575 2537 1804 ...
##  $ Screen              : chr  "Standard" "Standard" "Full HD" "Standard" ...
##  $ ScreenW             : int  2560 1440 1920 2880 2560 1366 2880 1440 1920 1920 ...
##  $ ScreenH             : int  1600 900 1080 1800 1600 768 1800 900 1080 1080 ...
##  $ TouchscreenIPSpanel : chr  "Yes" "No" "No" "Yes" ...
##  $ RetinaDisplay       : chr  "Yes" "No" "No" "Yes" ...
##  $ CPU_company         : chr  "Intel" "Intel" "Intel" "Intel" ...
##  $ CPU_freq            : num  2.3 1.8 2.5 2.7 3.1 3 2.2 1.8 1.8 1.6 ...
##  $ CPU_model           : chr  "Core i5" "Core i5" "Core i5 7200U" "Core i7" ...
##  $ PrimaryStorage      : int  128 128 256 512 256 500 256 256 512 256 ...
##  $ SecondaryStorage    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PrimaryStorageType  : chr  "SSD" "Flash Storage" "SSD" "SSD" ...
##  $ SecondaryStorageType: chr  "No" "No" "No" "No" ...
##  $ GPU_company         : chr  "Intel" "Intel" "Intel" "AMD" ...
##  $ GPU_model           : chr  "Iris Plus Graphics 640" "HD Graphics 6000" "HD Graphics 620" "Radeon Pro 455" ...
##  $ Touchscreen         : chr  "Yes" "No" "No" "Yes" ...

#Univariate analysis

ggplot(laptop_data, aes(x = Price_euros)) +
  geom_histogram(bins = 30, fill = "blue", alpha = 0.7) +
  labs(title = "Distribution of Laptop Prices", x = "Price (Euros)", y = "Frequency")

The distribution is right-skewed, most laptops are priced in the lower range, while fewer laptops are in the higher price brackets

# Boxplot
ggplot(laptop_data, aes(y = Price_euros)) +
  geom_boxplot() +
  labs(title = "Boxplot of Laptop Prices", y = "Price (Euros)")

The median price is below the upper whisker, showing that most laptops are mid-range or budget-friendly.

A significant number of outliers exist above the upper whisker, representing high-end laptops

High-end outliers may be associated with specific brands (e.g., Apple, Dell) or features (e.g., gaming, ultrabook)

ggplot(laptop_data, aes(x = TypeName)) +
  geom_bar(fill = "orange", alpha = 0.7) +
  labs(title = "Frequency of Laptop Types", x = "Laptop Type", y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Notebooks are the most common type of laptop, followed by Ultrabooks and Gaming laptops.

Categories like Netbook, Workstation, and 2-in-1 Convertible have fewer laptops

#bivariate analysis

Price vs Screen Size(Inches)

ggplot(laptop_data, aes(x = Inches, y = Price_euros)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(title = "Screen Size vs Price", x = "Screen Size (Inches)", y = "Price (Euros)")
## `geom_smooth()` using formula = 'y ~ x'

# Correlation coefficient
cor(laptop_data$Inches, laptop_data$Price_euros, use = "complete.obs")
## [1] 0.06660794

There is a positive relationship between screen size and price, though the trend is not very strong. Laptops with larger screen sizes generally have higher prices

Price Vs Laptop type

ggplot(laptop_data, aes(x = TypeName, y = Price_euros)) +
  geom_boxplot(fill = "orange", alpha = 0.7) +
  labs(title = "Price by Laptop Type", x = "Laptop Type", y = "Price (Euros)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Grouped summary statistics
laptop_data %>%
  group_by(TypeName) %>%
  summarise(
    Mean_Price = mean(Price_euros, na.rm = TRUE),
    Median_Price = median(Price_euros, na.rm = TRUE),
    Count = n()
  )
## # A tibble: 6 × 4
##   TypeName           Mean_Price Median_Price Count
##   <chr>                   <dbl>        <dbl> <int>
## 1 2 in 1 Convertible      1290.        1199    117
## 2 Gaming                  1731.        1493.   205
## 3 Netbook                  673.         355     23
## 4 Notebook                 789.         695    707
## 5 Ultrabook               1557.        1499    194
## 6 Workstation             2280.        2065.    29

Gaming and Workstation laptops have the highest median prices, with significant variability

Netbooks and Notebooks have the lowest prices, indicating these are more budget-friendly options.

TouchScreen Vs Laptop Type

table(laptop_data$Touchscreen, laptop_data$TypeName)
##      
##       2 in 1 Convertible Gaming Netbook Notebook Ultrabook Workstation
##   No                  56    117      19      594       114          18
##   Yes                 61     88       4      113        80          11
# Stacked bar chart
ggplot(laptop_data, aes(x = TypeName, fill = Touchscreen)) +
  geom_bar(position = "fill") +
  labs(title = "Proportion of Touchscreens by Laptop Type", x = "Laptop Type", y = "Proportion") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

2-in-1 Convertible laptops are predominantly touchscreen, as expected.

Other categories, such as Gaming and Netbooks, have very few touchscreen models.

Notebooks and Ultrabooks show a mix of touchscreen and non-touchscreen models.

Hypothesis 1

Laptops with Touchscreens are priced more than Non Touchscreen

t_test_touchscreen <- t.test(Price_euros ~ Touchscreen, data = laptop_data)
print(t_test_touchscreen)
## 
##  Welch Two Sample t-test
## 
## data:  Price_euros by Touchscreen
## t = -8.8671, df = 598.24, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
## 95 percent confidence interval:
##  -477.8107 -304.5334
## sample estimates:
##  mean in group No mean in group Yes 
##          1025.441          1416.613
#box plot
ggplot(laptop_data, aes(x = Touchscreen, y = Price_euros, fill = Touchscreen)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Price Distribution: Touchscreen vs Non-Touchscreen", 
       x = "Touchscreen", y = "Price (Euros)") +
  theme_minimal()

The difference in mean prices lies between -477.81 and -304.53 euros, with 95% confidence

p value: p<2.2e-16, much smaller than α=0.05α=0.05, indicating a significant difference in mean prices.

This plot visually confirms that touchscreen laptops are $positioned in a higher price range, with higher median and variability.

The presence of outliers in both groups suggests premium models or specific configurations driving up prices.

Hyppothesis 2

Certian laptop brands are priced higher than the others

# One-way ANOVA for brand effect on price
anova_company <- aov(Price_euros ~ Company, data = laptop_data)
summary(anova_company)
##               Df    Sum Sq Mean Sq F value Pr(>F)    
## Company       23 102648747 4462989   10.68 <2e-16 ***
## Residuals   1251 522954141  418029                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Bar plot
avg_prices <- laptop_data %>%
  group_by(Company) %>%
  summarise(Average_Price = mean(Price_euros, na.rm = TRUE)) %>%
  arrange(desc(Average_Price))

ggplot(avg_prices, aes(x = reorder(Company, -Average_Price), y = Average_Price, fill = Company)) +
  geom_bar(stat = "identity", alpha = 0.7) +
  labs(title = "Average Price by Laptop Brand", 
       x = "Brand", y = "Average Price (Euros)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  theme_minimal()

Tukey’s Honestly Significant Difference (HSD) test compares the mean prices of each pair of laptop brands to identify significant differences between them following the ANOVA analysis

The very small p-value indicates that brand has a statistically significant effect on laptop prices.

Brands like Razer and Mediacom stand out as having higher variability in pricing, with Mediacom often appearing in non-significant comparisons.

Hypothesis 3

Screen Size is influencing the laptop prices

model_screen <- lm(Price_euros ~ Inches, data = laptop_data)
summary(model_screen)
## 
## Call:
## lm(formula = Price_euros ~ Inches, data = laptop_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -954.8 -540.3 -146.8  375.8 4889.7 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   644.43     206.88   3.115  0.00188 **
## Inches         32.65      13.71   2.382  0.01737 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 699.5 on 1273 degrees of freedom
## Multiple R-squared:  0.004437,   Adjusted R-squared:  0.003655 
## F-statistic: 5.673 on 1 and 1273 DF,  p-value: 0.01737
laptop_data <- laptop_data %>%
  mutate(Screen_Size_Group = cut(Inches, 
                                 breaks = c(0, 13, 15, 17, Inf), 
                                 labels = c("<13\"", "13-15\"", "15-17\"", ">17\"")))

# Boxplot for screen size groups
ggplot(laptop_data, aes(x = Screen_Size_Group, y = Price_euros, fill = Screen_Size_Group)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Price Distribution by Screen Size Group", 
       x = "Screen Size Group", y = "Price (Euros)") +
  theme_minimal()

The median price increases with screen size. Laptops in the >17" group have the highest median prices.

P value Indicates that screen size has a statistically significant impact on price at the 5% significance level.

R-squared value is 0.44% of the variability in price is explained by screen size alone. This suggests other factors (e.g., brand, features) play a larger role.

The positive and significant coefficient confirms that larger screen sizes generally lead to higher prices.

The low R-squared indicates screen size alone is not sufficient to explain price variation, highlighting the need to include other predictors.