Dataset Description:

1.The dataset used for this analysis contains information about laptop prices and various specifications including storage, weight, RAM, processor type, brand, and other relevant features that might influence laptop pricing.

2.This dataset includes both numeric and categorical variables.

3.It was sourced from [Kaggle], which can be accessed through the following link: [https://www.kaggle.com/datasets/owm4096/laptop-prices].

Tha main Goal:

-The goal of this project is to identify the key factors that influence laptop pricing. - We aim to determine which specifications (e.g., storage capacity, weight, RAM size, brand) are the most significant predictors of a laptop’s price.

-By doing so, we hope to build a predictive model that can estimate the price of a laptop based on its features and help consumers make informed purchase decisions.

Visualizations for at least two interesting aspects of the data

Created 2 pairs of numeric variables

# Load necessary libraries
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data <- read.csv("~/Documents/statistics(1)/laptop_prices.csv")
str(data)
## 'data.frame':    1275 obs. of  23 variables:
##  $ Company             : chr  "Apple" "Apple" "HP" "Apple" ...
##  $ Product             : chr  "MacBook Pro" "Macbook Air" "250 G6" "MacBook Pro" ...
##  $ TypeName            : chr  "Ultrabook" "Ultrabook" "Notebook" "Ultrabook" ...
##  $ Inches              : num  13.3 13.3 15.6 15.4 13.3 15.6 15.4 13.3 14 14 ...
##  $ Ram                 : int  8 8 8 16 8 4 16 8 16 8 ...
##  $ OS                  : chr  "macOS" "macOS" "No OS" "macOS" ...
##  $ Weight              : num  1.37 1.34 1.86 1.83 1.37 2.1 2.04 1.34 1.3 1.6 ...
##  $ Price_euros         : num  1340 899 575 2537 1804 ...
##  $ Screen              : chr  "Standard" "Standard" "Full HD" "Standard" ...
##  $ ScreenW             : int  2560 1440 1920 2880 2560 1366 2880 1440 1920 1920 ...
##  $ ScreenH             : int  1600 900 1080 1800 1600 768 1800 900 1080 1080 ...
##  $ Touchscreen         : chr  "No" "No" "No" "No" ...
##  $ IPSpanel            : chr  "Yes" "No" "No" "Yes" ...
##  $ RetinaDisplay       : chr  "Yes" "No" "No" "Yes" ...
##  $ CPU_company         : chr  "Intel" "Intel" "Intel" "Intel" ...
##  $ CPU_freq            : num  2.3 1.8 2.5 2.7 3.1 3 2.2 1.8 1.8 1.6 ...
##  $ CPU_model           : chr  "Core i5" "Core i5" "Core i5 7200U" "Core i7" ...
##  $ PrimaryStorage      : int  128 128 256 512 256 500 256 256 512 256 ...
##  $ SecondaryStorage    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PrimaryStorageType  : chr  "SSD" "Flash Storage" "SSD" "SSD" ...
##  $ SecondaryStorageType: chr  "No" "No" "No" "No" ...
##  $ GPU_company         : chr  "Intel" "Intel" "Intel" "AMD" ...
##  $ GPU_model           : chr  "Iris Plus Graphics 640" "HD Graphics 6000" "HD Graphics 620" "Radeon Pro 455" ...
# Create derived columns 'Total_Storage' and 'Price_per_GB'
data <- data %>%
  mutate(
    Total_Storage = PrimaryStorage + SecondaryStorage,
    Price_per_GB = Price_euros / Total_Storage
  )

# Display the first few rows of the relevant columns for verification
head(data %>% select(Price_euros, Weight, Total_Storage, Price_per_GB))
##   Price_euros Weight Total_Storage Price_per_GB
## 1     1339.69   1.37           128    10.466328
## 2      898.94   1.34           128     7.022969
## 3      575.00   1.86           256     2.246094
## 4     2537.45   1.83           512     4.955957
## 5     1803.60   1.37           256     7.045312
## 6      400.00   2.10           500     0.800000

Visualization 1

plot(data$Weight, data$Price_euros, 
     xlab = "Weight (kg)", 
     ylab = "Price (Euros)", 
     main = "Relationship between Weight and Price",
     pch = 19, col = "blue")  
abline(lm(data$Price_euros ~ data$Weight), col = "red")  

cor_price_weight <- cor(data$Price_euros, data$Weight, use = "complete.obs")
cor_price_weight
## [1] 0.2118834

Price vs. Weight: A scatter plot was created to visualize the relationship between laptop Price_euros and Weight. A trend line was added to highlight the general trend in this relationship.

This visualization is interesting because it suggests a weak positive relationship between weight and price, where heavier laptops may be slightly more expensive. However, the spread of data points indicates that weight alone is not a strong predictor of price, suggesting that other factors might be more influential

Visualization 2

plot(data$Price_per_GB, data$Price_euros, 
     xlab = "Price per GB (Euros)", 
     ylab = "Price (Euros)", 
     main = "Relationship between Price per GB and Price",
     pch = 19, col = "red")  
abline(lm(data$Price_euros ~ data$Price_per_GB), col = "blue")  

cor_price_price_per_gb <- cor(data$Price_euros, data$Price_per_GB, use = "complete.obs")
cor_price_price_per_gb
## [1] 0.1206383

Price per GB and Price: A scatter plot of Price_euros against Price_per_GB was created, with a trend line added to observe the relationship.

-The upward trend suggests that laptops with a higher Price_per_GB tend to have higher overall prices. -However, the correlation coefficient for this relationship is low, indicating variability and potential outliers. -This suggests that while there is a positive trend, other factors like brand, processor, or RAM may contribute significantly to price variations. Further analysis of these factors could reveal more about what drives higher laptop prices.

Plan moving forward:

-Calculate correlation coefficients between Price_euros and other numeric variables like RAM, Total_Storage, ProcessorType, and analyze which factors have stronger relationships with price.

-Find the relations between specifications like categorical, numerical columns that influence price of laptops.

-Build a multiple regression model using key predictors (e.g., RAM, Processor, Brand) to estimate laptop prices and identify potential outliers in the dataset

Initial findings:

#Hypothesis 1: Laptops with Touchscreen Features Are Priced Higher Compared to Non-Touchscreen Laptops

#Visualization 1

# Box Plot for Price by Touchscreen Feature
ggplot(data, aes(x = Touchscreen, y = Price_euros, fill = Touchscreen)) +
  geom_boxplot() +
  labs(title = "Price Comparison: Touchscreen vs Non-Touchscreen Laptops",
       x = "Touchscreen Feature",
       y = "Price (Euros)") +
  theme_minimal() +
  scale_fill_manual(values = c("red", "blue"), labels = c("Non-Touchscreen", "Touchscreen"))

-This visualization indicates that touch screen laptops are highly priced than non touch screens.

#Hypothesis 2: Certain Laptop Brands (Companies) more Expensive

Visualization 2

# Box plot for Price by Company
ggplot(data, aes(x = Company, y = Price_euros)) +
  geom_boxplot(fill = 'lightblue', color = 'darkblue', outlier.colour = 'red', outlier.shape = 16, outlier.size = 2) +
  labs(title = "Box Plot of Price by Company",
       x = "Company",
       y = "Price (Euros)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) 

This visualization indicates that Razer company laptop has high price, there are also some outliers, we need to investigate more

#Hypothesis 3: Laptops with more weight are more expensive

Visualization 3

data <- data %>%
  mutate(Weight_Category = case_when(
    Weight < 1.5 ~ "Light",
    Weight >= 1.5 & Weight < 2.5 ~ "Medium",
    Weight >= 2.5 ~ "Heavy"
  ))

ggplot(data, aes(x = Weight_Category, y = Price_euros, fill = Weight_Category)) +
  geom_boxplot() +
  labs(title = "Price Distribution by Weight Category",
       x = "Weight Category",
       y = "Price (Euros)") +
  theme_minimal() +
  scale_fill_manual(values = c("lightblue", "lightgreen", "lightcoral"))

-Categorized weight into 3 types- light, medium, heavy and did box plot for visualization.

-from this we can conclude that heavy laptops are more expensive.