Main Objective-
-The primary objective of this analysis is to understand the key
factors that influence laptop prices.
-Develop a predictive model that can estimate the price of a laptop
based on its specifications and features
#Benefit-
-This analysis helps manufacturers, retailers to understand how the
features of laptop influencing the prices. Setting competitive price for
laptops according to the specifications. Marketing teams to odentigy key
selling points.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(corrplot)
## corrplot 0.95 loaded
laptop_data <- read.csv("~/Documents/statistics(1)/annotated-laptop_prices_reverted.csv")
Initial EDA
print("Summary of the dataset:")
## [1] "Summary of the dataset:"
summary(laptop_data)
## Company Product TypeName Inches
## Length:1275 Length:1275 Length:1275 Min. :10.10
## Class :character Class :character Class :character 1st Qu.:14.00
## Mode :character Mode :character Mode :character Median :15.60
## Mean :15.02
## 3rd Qu.:15.60
## Max. :18.40
## Ram OS Weight Price_euros
## Min. : 2.000 Length:1275 Min. :0.690 Min. : 174
## 1st Qu.: 4.000 Class :character 1st Qu.:1.500 1st Qu.: 609
## Median : 8.000 Mode :character Median :2.040 Median : 989
## Mean : 8.441 Mean :2.041 Mean :1135
## 3rd Qu.: 8.000 3rd Qu.:2.310 3rd Qu.:1496
## Max. :64.000 Max. :4.700 Max. :6099
## Screen ScreenW ScreenH TouchscreenIPSpanel
## Length:1275 Min. :1366 Min. : 768 Length:1275
## Class :character 1st Qu.:1920 1st Qu.:1080 Class :character
## Mode :character Median :1920 Median :1080 Mode :character
## Mean :1900 Mean :1074
## 3rd Qu.:1920 3rd Qu.:1080
## Max. :3840 Max. :2160
## RetinaDisplay CPU_company CPU_freq CPU_model
## Length:1275 Length:1275 Min. :0.900 Length:1275
## Class :character Class :character 1st Qu.:2.000 Class :character
## Mode :character Mode :character Median :2.500 Mode :character
## Mean :2.303
## 3rd Qu.:2.700
## Max. :3.600
## PrimaryStorage SecondaryStorage PrimaryStorageType SecondaryStorageType
## Min. : 8.0 Min. : 0.0 Length:1275 Length:1275
## 1st Qu.: 256.0 1st Qu.: 0.0 Class :character Class :character
## Median : 256.0 Median : 0.0 Mode :character Mode :character
## Mean : 444.5 Mean : 176.1
## 3rd Qu.: 512.0 3rd Qu.: 0.0
## Max. :2048.0 Max. :2048.0
## GPU_company GPU_model Touchscreen
## Length:1275 Length:1275 Length:1275
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
# Check for missing values
print("Missing values per column:")
## [1] "Missing values per column:"
colSums(is.na(laptop_data))
## Company Product TypeName
## 0 0 0
## Inches Ram OS
## 0 0 0
## Weight Price_euros Screen
## 0 0 0
## ScreenW ScreenH TouchscreenIPSpanel
## 0 0 0
## RetinaDisplay CPU_company CPU_freq
## 0 0 0
## CPU_model PrimaryStorage SecondaryStorage
## 0 0 0
## PrimaryStorageType SecondaryStorageType GPU_company
## 0 0 0
## GPU_model Touchscreen
## 0 0
# Data structure
print("Structure of the dataset:")
## [1] "Structure of the dataset:"
str(laptop_data)
## 'data.frame': 1275 obs. of 23 variables:
## $ Company : chr "Apple" "Apple" "HP" "Apple" ...
## $ Product : chr "MacBook Pro" "Macbook Air" "250 G6" "MacBook Pro" ...
## $ TypeName : chr "Ultrabook" "Ultrabook" "Notebook" "Ultrabook" ...
## $ Inches : num 13.3 13.3 15.6 15.4 13.3 15.6 15.4 13.3 14 14 ...
## $ Ram : int 8 8 8 16 8 4 16 8 16 8 ...
## $ OS : chr "macOS" "macOS" "No OS" "macOS" ...
## $ Weight : num 1.37 1.34 1.86 1.83 1.37 2.1 2.04 1.34 1.3 1.6 ...
## $ Price_euros : num 1340 899 575 2537 1804 ...
## $ Screen : chr "Standard" "Standard" "Full HD" "Standard" ...
## $ ScreenW : int 2560 1440 1920 2880 2560 1366 2880 1440 1920 1920 ...
## $ ScreenH : int 1600 900 1080 1800 1600 768 1800 900 1080 1080 ...
## $ TouchscreenIPSpanel : chr "Yes" "No" "No" "Yes" ...
## $ RetinaDisplay : chr "Yes" "No" "No" "Yes" ...
## $ CPU_company : chr "Intel" "Intel" "Intel" "Intel" ...
## $ CPU_freq : num 2.3 1.8 2.5 2.7 3.1 3 2.2 1.8 1.8 1.6 ...
## $ CPU_model : chr "Core i5" "Core i5" "Core i5 7200U" "Core i7" ...
## $ PrimaryStorage : int 128 128 256 512 256 500 256 256 512 256 ...
## $ SecondaryStorage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PrimaryStorageType : chr "SSD" "Flash Storage" "SSD" "SSD" ...
## $ SecondaryStorageType: chr "No" "No" "No" "No" ...
## $ GPU_company : chr "Intel" "Intel" "Intel" "AMD" ...
## $ GPU_model : chr "Iris Plus Graphics 640" "HD Graphics 6000" "HD Graphics 620" "Radeon Pro 455" ...
## $ Touchscreen : chr "Yes" "No" "No" "Yes" ...
#Univariate analysis
ggplot(laptop_data, aes(x = Price_euros)) +
geom_histogram(bins = 30, fill = "blue", alpha = 0.7) +
labs(title = "Distribution of Laptop Prices", x = "Price (Euros)", y = "Frequency")

The distribution is right-skewed, most laptops are
priced in the lower range, while fewer laptops are in the higher price
brackets
# Boxplot
ggplot(laptop_data, aes(y = Price_euros)) +
geom_boxplot() +
labs(title = "Boxplot of Laptop Prices", y = "Price (Euros)")

The median price is below the upper whisker, showing that most
laptops are mid-range or budget-friendly.
A significant number of outliers exist above the
upper whisker, representing high-end laptops
High-end outliers may be associated with specific brands (e.g.,
Apple, Dell) or features (e.g., gaming, ultrabook)
ggplot(laptop_data, aes(x = TypeName)) +
geom_bar(fill = "orange", alpha = 0.7) +
labs(title = "Frequency of Laptop Types", x = "Laptop Type", y = "Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

Notebooks are the most common type of laptop,
followed by Ultrabooks and Gaming laptops.
Categories like Netbook, Workstation, and 2-in-1 Convertible have
fewer laptops
#bivariate analysis
Price vs Screen Size(Inches)
ggplot(laptop_data, aes(x = Inches, y = Price_euros)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", color = "red", se = FALSE) +
labs(title = "Screen Size vs Price", x = "Screen Size (Inches)", y = "Price (Euros)")
## `geom_smooth()` using formula = 'y ~ x'

# Correlation coefficient
cor(laptop_data$Inches, laptop_data$Price_euros, use = "complete.obs")
## [1] 0.06660794
There is a positive relationship between screen size
and price, though the trend is not very strong. Laptops with larger
screen sizes generally have higher prices
Price Vs Laptop type
ggplot(laptop_data, aes(x = TypeName, y = Price_euros)) +
geom_boxplot(fill = "orange", alpha = 0.7) +
labs(title = "Price by Laptop Type", x = "Laptop Type", y = "Price (Euros)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Grouped summary statistics
laptop_data %>%
group_by(TypeName) %>%
summarise(
Mean_Price = mean(Price_euros, na.rm = TRUE),
Median_Price = median(Price_euros, na.rm = TRUE),
Count = n()
)
## # A tibble: 6 × 4
## TypeName Mean_Price Median_Price Count
## <chr> <dbl> <dbl> <int>
## 1 2 in 1 Convertible 1290. 1199 117
## 2 Gaming 1731. 1493. 205
## 3 Netbook 673. 355 23
## 4 Notebook 789. 695 707
## 5 Ultrabook 1557. 1499 194
## 6 Workstation 2280. 2065. 29
Gaming and Workstation laptops have
the highest median prices, with significant variability
Netbooks and Notebooks have the
lowest prices, indicating these are more budget-friendly options.
TouchScreen Vs Laptop Type
table(laptop_data$Touchscreen, laptop_data$TypeName)
##
## 2 in 1 Convertible Gaming Netbook Notebook Ultrabook Workstation
## No 56 117 19 594 114 18
## Yes 61 88 4 113 80 11
# Stacked bar chart
ggplot(laptop_data, aes(x = TypeName, fill = Touchscreen)) +
geom_bar(position = "fill") +
labs(title = "Proportion of Touchscreens by Laptop Type", x = "Laptop Type", y = "Proportion") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

2-in-1 Convertible laptops are predominantly
touchscreen, as expected.
Other categories, such
as Gaming and Netbooks, have very few
touchscreen models.
Notebooks and Ultrabooks show a mix
of touchscreen and non-touchscreen models.
Hypothesis 1
Laptops with Touchscreens are priced more than Non Touchscreen
t_test_touchscreen <- t.test(Price_euros ~ Touchscreen, data = laptop_data)
print(t_test_touchscreen)
##
## Welch Two Sample t-test
##
## data: Price_euros by Touchscreen
## t = -8.8671, df = 598.24, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
## 95 percent confidence interval:
## -477.8107 -304.5334
## sample estimates:
## mean in group No mean in group Yes
## 1025.441 1416.613
#box plot
ggplot(laptop_data, aes(x = Touchscreen, y = Price_euros, fill = Touchscreen)) +
geom_boxplot(alpha = 0.7) +
labs(title = "Price Distribution: Touchscreen vs Non-Touchscreen",
x = "Touchscreen", y = "Price (Euros)") +
theme_minimal()

The difference in mean prices lies between -477.81 and -304.53 euros,
with 95% confidence
p value: p<2.2e-16, much smaller than α=0.05α=0.05, indicating a
significant difference in mean prices.
This plot visually confirms that touchscreen laptops are $positioned
in a higher price range, with higher median and variability.
The presence of outliers in both groups suggests premium models or
specific configurations driving up prices.
Hyppothesis 2
Certian laptop brands are priced higher than the others
# One-way ANOVA for brand effect on price
anova_company <- aov(Price_euros ~ Company, data = laptop_data)
summary(anova_company)
## Df Sum Sq Mean Sq F value Pr(>F)
## Company 23 102648747 4462989 10.68 <2e-16 ***
## Residuals 1251 522954141 418029
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Bar plot
avg_prices <- laptop_data %>%
group_by(Company) %>%
summarise(Average_Price = mean(Price_euros, na.rm = TRUE)) %>%
arrange(desc(Average_Price))
ggplot(avg_prices, aes(x = reorder(Company, -Average_Price), y = Average_Price, fill = Company)) +
geom_bar(stat = "identity", alpha = 0.7) +
labs(title = "Average Price by Laptop Brand",
x = "Brand", y = "Average Price (Euros)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme_minimal()

Tukey’s Honestly Significant Difference (HSD) test compares the mean
prices of each pair of laptop brands to identify significant differences
between them following the ANOVA analysis
The very small p-value indicates that brand has a
statistically significant effect on laptop prices.
Brands
like Razer and Mediacom stand out as
having higher variability in pricing, with Mediacom often appearing in
non-significant comparisons.
Hypothesis 3
Screen Size is influencing the laptop prices
model_screen <- lm(Price_euros ~ Inches, data = laptop_data)
summary(model_screen)
##
## Call:
## lm(formula = Price_euros ~ Inches, data = laptop_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -954.8 -540.3 -146.8 375.8 4889.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 644.43 206.88 3.115 0.00188 **
## Inches 32.65 13.71 2.382 0.01737 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 699.5 on 1273 degrees of freedom
## Multiple R-squared: 0.004437, Adjusted R-squared: 0.003655
## F-statistic: 5.673 on 1 and 1273 DF, p-value: 0.01737
laptop_data <- laptop_data %>%
mutate(Screen_Size_Group = cut(Inches,
breaks = c(0, 13, 15, 17, Inf),
labels = c("<13\"", "13-15\"", "15-17\"", ">17\"")))
# Boxplot for screen size groups
ggplot(laptop_data, aes(x = Screen_Size_Group, y = Price_euros, fill = Screen_Size_Group)) +
geom_boxplot(alpha = 0.7) +
labs(title = "Price Distribution by Screen Size Group",
x = "Screen Size Group", y = "Price (Euros)") +
theme_minimal()

The median price increases with screen size. Laptops in
the >17" group have the highest median prices.
P value Indicates that screen size has a statistically
significant impact on price at the 5% significance level.
R-squared value is 0.44% of the variability in price is explained by
screen size alone. This suggests other factors (e.g., brand, features)
play a larger role.
The positive and significant coefficient confirms that larger screen
sizes generally lead to higher prices.
The low R-squared indicates screen size alone is not sufficient to
explain price variation, highlighting the need to include other
predictors.