library(tidyverse)Laptop Prices
Introduction
This project develops practical competency in transforming wide-format datasets into tidy formats suitable for downstream analysis. All transformations and data manipulations are performed using the `tidyr` and `dplyr` packages in R, with visualizations built using `ggplot2`.
Dataset used:
Uncleaned Laptop Price Dataset: https://www.kaggle.com/datasets/ehtishamsadiq/uncleaned-laptop-price-dataset
Approach
This dataset contains 1,303 laptop models covering specs like CPU, RAM, storage, GPU, screen resolution, OS, weight, and price. As the name suggests, the data comes as-is messy, mixed, and in need of some serious cleaning before anything useful can be done with it.
The first thing I’ll need to tackle is all the columns that are stuffed with multiple pieces of information. For example, RAM is stored as “8GB” instead of just 8, the CPU column bundles brand, model, and clock speed together, and Memory mixes storage size with storage type. I’ll use `stringr` to pull out the numeric values and separate the meaningful parts into their own columns.
Once the data is clean, I’ll reshape the specification columns into long format using `pivot_longer()`, which will make it much easier to compare specs across laptop types. From there I want to explore which laptop types give you the most RAM and storage for your money, essentially a value-for-price analysis, using laptop type as the main grouping variable rather than brand, since type is a better reflection of what you’re actually getting.
Base Code
Data Import
# Raw Data Import from my GitHub repository
raw_data <- read_csv(
"https://raw.githubusercontent.com/MKudanova/Data607/refs/heads/main/Project2/laptopData.csv")Rows: 1303 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Company, TypeName, Inches, ScreenResolution, Cpu, Ram, Memory, Gpu...
dbl (2): Unnamed: 0, Price
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(raw_data)Rows: 1,303
Columns: 12
$ `Unnamed: 0` <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,…
$ Company <chr> "Apple", "Apple", "HP", "Apple", "Apple", "Acer", "Ap…
$ TypeName <chr> "Ultrabook", "Ultrabook", "Notebook", "Ultrabook", "U…
$ Inches <chr> "13.3", "13.3", "15.6", "15.4", "13.3", "15.6", "15.4…
$ ScreenResolution <chr> "IPS Panel Retina Display 2560x1600", "1440x900", "Fu…
$ Cpu <chr> "Intel Core i5 2.3GHz", "Intel Core i5 1.8GHz", "Inte…
$ Ram <chr> "8GB", "8GB", "8GB", "16GB", "8GB", "4GB", "16GB", "8…
$ Memory <chr> "128GB SSD", "128GB Flash Storage", "256GB SSD", "512…
$ Gpu <chr> "Intel Iris Plus Graphics 640", "Intel HD Graphics 60…
$ OpSys <chr> "macOS", "macOS", "No OS", "macOS", "macOS", "Windows…
$ Weight <chr> "1.37kg", "1.34kg", "1.86kg", "1.83kg", "1.37kg", "2.…
$ Price <dbl> 71378.68, 47895.52, 30636.00, 135195.34, 96095.81, 21…
head(raw_data)# A tibble: 6 × 12
`Unnamed: 0` Company TypeName Inches ScreenResolution Cpu Ram Memory Gpu
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 0 Apple Ultrabo… 13.3 IPS Panel Retin… Inte… 8GB 128GB… Inte…
2 1 Apple Ultrabo… 13.3 1440x900 Inte… 8GB 128GB… Inte…
3 2 HP Notebook 15.6 Full HD 1920x10… Inte… 8GB 256GB… Inte…
4 3 Apple Ultrabo… 15.4 IPS Panel Retin… Inte… 16GB 512GB… AMD …
5 4 Apple Ultrabo… 13.3 IPS Panel Retin… Inte… 8GB 256GB… Inte…
6 5 Acer Notebook 15.6 1366x768 AMD … 4GB 500GB… AMD …
# ℹ 3 more variables: OpSys <chr>, Weight <chr>, Price <dbl>
Data Cleaning
# Clean and standardize the raw laptop dataset
clean_data <- raw_data |>
# Remove unnecessary index column if present
select(-`Unnamed: 0`) |>
# Standardize column names to lowercase
rename_with(tolower) |>
# Trim whitespace and convert empty strings to NA in all character columns
mutate(
across(where(is.character), ~ na_if(str_trim(.x), "")),
# Convert obvious placeholder values to NA
across(c(memory, ram, weight, inches, opsys), ~ na_if(.x, "?"))
) |>
# Clean and transform variables
mutate(
# Convert embedded numeric text to numeric values
ram = parse_number(ram),
weight = parse_number(weight),
inches = parse_number(inches),
# Extract storage size in GB
storage_gb = parse_number(memory),
# Extract main storage type
storage_type = case_when(
str_detect(memory, regex("ssd", ignore_case = TRUE)) ~ "SSD",
str_detect(memory, regex("flash", ignore_case = TRUE)) ~ "Flash Storage",
str_detect(memory, regex("hdd", ignore_case = TRUE)) ~ "HDD",
str_detect(memory, regex("hybrid", ignore_case = TRUE)) ~ "Hybrid",
TRUE ~ NA_character_
),
# Standardize operating system categories
opsys = case_when(
str_detect(opsys, regex("windows", ignore_case = TRUE)) ~ "Windows",
str_detect(opsys, regex("mac", ignore_case = TRUE)) ~ "macOS",
str_detect(opsys, regex("linux", ignore_case = TRUE)) ~ "Linux",
str_detect(opsys, regex("chrome", ignore_case = TRUE)) ~ "Chrome OS",
str_detect(opsys, regex("no os", ignore_case = TRUE)) ~ "No OS",
TRUE ~ "Other"
),
# Convert price to USD using an approximate exchange rate
price_usd = round(price / 84)
) |>
# Remove original price column after conversion
select(-price)
# Reshape selected numeric specifications from wide to long format
tidy_data <- clean_data |>
pivot_longer(
cols = c(ram, weight, inches, storage_gb),
names_to = "specification",
values_to = "value"
) |>
# Remove rows only when the pivoted value is missing
drop_na(value)
# Inspect cleaned wide data
glimpse(clean_data)Rows: 1,303
Columns: 13
$ company <chr> "Apple", "Apple", "HP", "Apple", "Apple", "Acer", "Ap…
$ typename <chr> "Ultrabook", "Ultrabook", "Notebook", "Ultrabook", "U…
$ inches <dbl> 13.3, 13.3, 15.6, 15.4, 13.3, 15.6, 15.4, 13.3, 14.0,…
$ screenresolution <chr> "IPS Panel Retina Display 2560x1600", "1440x900", "Fu…
$ cpu <chr> "Intel Core i5 2.3GHz", "Intel Core i5 1.8GHz", "Inte…
$ ram <dbl> 8, 8, 8, 16, 8, 4, 16, 8, 16, 8, 4, 4, 16, 4, 8, 8, 8…
$ memory <chr> "128GB SSD", "128GB Flash Storage", "256GB SSD", "512…
$ gpu <chr> "Intel Iris Plus Graphics 640", "Intel HD Graphics 60…
$ opsys <chr> "macOS", "macOS", "No OS", "macOS", "macOS", "Windows…
$ weight <dbl> 1.37, 1.34, 1.86, 1.83, 1.37, 2.10, 2.04, 1.34, 1.30,…
$ storage_gb <dbl> 128, 128, 256, 512, 256, 500, 256, 256, 512, 256, 500…
$ storage_type <chr> "SSD", "Flash Storage", "SSD", "SSD", "SSD", "HDD", "…
$ price_usd <dbl> 850, 570, 365, 1609, 1144, 254, 1357, 735, 948, 488, …
# Inspect final tidy data
glimpse(tidy_data)Rows: 5,089
Columns: 11
$ company <chr> "Apple", "Apple", "Apple", "Apple", "Apple", "Apple",…
$ typename <chr> "Ultrabook", "Ultrabook", "Ultrabook", "Ultrabook", "…
$ screenresolution <chr> "IPS Panel Retina Display 2560x1600", "IPS Panel Reti…
$ cpu <chr> "Intel Core i5 2.3GHz", "Intel Core i5 2.3GHz", "Inte…
$ memory <chr> "128GB SSD", "128GB SSD", "128GB SSD", "128GB SSD", "…
$ gpu <chr> "Intel Iris Plus Graphics 640", "Intel Iris Plus Grap…
$ opsys <chr> "macOS", "macOS", "macOS", "macOS", "macOS", "macOS",…
$ storage_type <chr> "SSD", "SSD", "SSD", "SSD", "Flash Storage", "Flash S…
$ price_usd <dbl> 850, 850, 850, 850, 570, 570, 570, 570, 365, 365, 365…
$ specification <chr> "ram", "weight", "inches", "storage_gb", "ram", "weig…
$ value <dbl> 8.00, 1.37, 13.30, 128.00, 8.00, 1.34, 13.30, 128.00,…
# View first few rows of tidy output
head(tidy_data)# A tibble: 6 × 11
company typename screenresolution cpu memory gpu opsys storage_type
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Apple Ultrabook IPS Panel Retina Disp… Inte… 128GB… Inte… macOS SSD
2 Apple Ultrabook IPS Panel Retina Disp… Inte… 128GB… Inte… macOS SSD
3 Apple Ultrabook IPS Panel Retina Disp… Inte… 128GB… Inte… macOS SSD
4 Apple Ultrabook IPS Panel Retina Disp… Inte… 128GB… Inte… macOS SSD
5 Apple Ultrabook 1440x900 Inte… 128GB… Inte… macOS Flash Stora…
6 Apple Ultrabook 1440x900 Inte… 128GB… Inte… macOS Flash Stora…
# ℹ 3 more variables: price_usd <dbl>, specification <chr>, value <dbl>
Data Analysis: Top laptop brand affordability analysis
# Top laptop brand affordability analysis
# Calculate average price for each laptop brand
brand_price <- clean_data |>
filter(!is.na(company), !is.na(price_usd)) |> # remove rows with missing brand or price
group_by(company) |>
summarise(
avg_price = round(mean(price_usd), 2), # average price per brand
n_models = n(), # number of laptop models for each brand
.groups = "drop"
) |>
mutate(
# Create affordability tiers based on average price
affordability = case_when(
avg_price < 400 ~ "Affordable (0–400 USD)",
avg_price < 800 ~ "Mid-range (400–800 USD)",
avg_price < 1200 ~ "Premium (800–1200 USD)",
TRUE ~ "High-end (1200+ USD)"
),
# Order the tiers logically for plotting
affordability = factor(
affordability,
levels = c(
"Affordable (0–400 USD)",
"Mid-range (400–800 USD)",
"Premium (800–1200 USD)",
"High-end (1200+ USD)"
)
)
)
# Keep only brands with at least 5 models
brand_price_filtered <- brand_price |>
filter(n_models >= 5)
# Inspect the result
glimpse(brand_price_filtered)Rows: 12
Columns: 4
$ company <chr> "Acer", "Apple", "Asus", "Dell", "HP", "Lenovo", "MSI", …
$ avg_price <dbl> 397.55, 992.19, 702.01, 752.82, 681.12, 691.36, 1093.08,…
$ n_models <int> 103, 21, 156, 287, 266, 290, 53, 7, 6, 7, 9, 47
$ affordability <fct> Affordable (0–400 USD), Premium (800–1200 USD), Mid-rang…
ggplot(brand_price_filtered,
aes(x = reorder(company, avg_price),
y = avg_price,
fill = affordability)) +
geom_col() +
coord_flip() +
labs(
title = "Laptop Brand Affordability by Average Price",
subtitle = "Average laptop price by brand (USD)",
x = "Laptop Brand",
y = "Average Price (USD)",
fill = "Affordability Tier"
) +
theme_minimal()Conclusion
This analysis examined laptop brand affordability by calculating the average price of laptops for each manufacturer after converting prices to U.S. dollars. Brands were grouped into affordability tiers based on their average price, ranging from affordable to high-end.
The results show clear differences in market positioning across laptop brands. Mediacom and Acer fall into the affordable category with average prices below $400. Brands such as HP, Lenovo, Asus, Dell, and Toshiba appear in the mid-range segment, with average prices between $400 and $800. Apple, Microsoft, MSI, and Samsung occupy the premium category, with average prices between $800 and $1200. Razer stands out as the only brand classified as high-end, with an average price exceeding $2000.
These findings suggest that laptop manufacturers tend to occupy distinct price segments within the market. Some brands focus on budget-friendly devices, while others target premium or high-performance categories. Understanding these price tiers helps illustrate how different brands position their products within the broader laptop market.