Branded Bottoms

Author

Charanpreet Singh

Behind the Price Tag: Data Study of Designer Resold Online

Introduction

This project Branded Bottoms explores a dataset containing detailed listings of luxury and streetwear fashion items from high end brands such as Acne Studios, Carhartt, Kapital, Levi’s, Rick Owens, and Yves Saint Laurent. The dataset appears to be sourced from a fashion resale marketplace Grailed.

The dataset includes 5,440 secondhand listings with information on each product’s price, condition, seller credibility, location, & more.

The Variables that are central to this analysis are the Price, condition, user/seller/rating_average (average seller rating), location category

This analysis investigates how resale prices vary based on item condition and seller location, providing insight into what factors may influence pricing strategies in secondhand fashion markets.

Entering Libraries & Dataset

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
Branded <- read_csv("updated_dataset.csv")
Rows: 5440 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (17): designer_names, condition, location, category, type, category_pat...
dbl   (5): price, user/seller_score/rating_average, follower_count, created_...
lgl   (1): makeoffer
dttm  (1): created_at

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(df)
                                              
1 function (x, df1, df2, ncp, log = FALSE)    
2 {                                           
3     if (missing(ncp))                       
4         .Call(C_df, x, df1, df2, log)       
5     else .Call(C_dnf, x, df1, df2, ncp, log)
6 }                                           

Cleaning the data

# Drop rows where seller rating or location is NA
df_cleaned <- Branded |>
  drop_na(`user/seller_score/rating_average`, location)

# Also remove any rows where price is NA or zero
df_cleaned <- df_cleaned |>
  filter(!is.na(price), price > 0)

# Select only relevant columns for analysis - The variables that I will utilize
df_cleaned <- df_cleaned |>
  select(
    designer_names,
    condition,
    price,
    `user/seller_score/rating_average`,
    follower_count,
    location,
    category,
    created_year,
    created_month
  )

# Use group_by and summarise for plotting
grouped_data <- df_cleaned |>
  group_by(condition, location) |>
  summarise(avg_price = mean(price), .groups = "drop")

Cite : ChatGPT - Help in cleaning data

Plotting the data on a Bar Plot

# Custom color palette which will let me use multitude of colors
palette_colors <- c("steelblue", "orange", "red", "purple", "firebrick", "gold", "slateblue")

#Plot for the
ggplot(grouped_data, aes(x = condition, y = avg_price, fill = location)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = palette_colors) +
  labs(
    title = "Average Resale Price by Item Condition and Seller Location",
    x = "Item Condition",
    y = "Average Price (USD)",
    fill = "Location",
    caption = "Source: Grailed.com"
  ) +
  theme_minimal()

Cite: DataNovia.com - palette colors & plotting help

Scatterplot of Price vs. Seller Rating

# Plotting using 3 variables - Price, Condition, & Avg Seller rating
ggplot(df_cleaned, aes(x = `user/seller_score/rating_average`, y = price, color = condition)) +
  geom_point(alpha = 0.5) +
  scale_color_manual(values = c("darkgreen", "darkorange", "dodgerblue", "purple")) +
  labs(
    title = "Price vs. Seller Rating by Item Condition",
    x = "Seller Rating (Average)",
    y = "Price (USD)",
    color = "Condition",
    caption = "Source: Grailed.com"
  ) +
  theme_minimal()

Reflection & Insights

The first visualization — a bar chart of average resale price by item condition and seller location — reveals that new items consistently command higher prices than gently used or used products, regardless of location. Sellers based in the United States tend to list items at slightly higher prices than sellers in Asia or Europe, particularly for new and gently used products. In Canda especially, there are new items tend to sell higher than many other countries. This suggests that geography plays a meaningful role in shaping resale prices on fashion platforms. Along with that, we can make an inference that with recent economic policies changing like tarrifs being imposed, there can be a massive influx of the data completely changing.

An unexpected insight was the relatively narrow price gap between gently used and used items in some regions, indicating that buyers may not strongly distinguish between the two. It also raises the possibility that other factors like brand reputation, rarity, or trending aesthetics are influencing pricing more than condition alone.

The second visualization being a scatterplot comparing seller rating and item price, colored by item condition offers another layer of insight. While there is some clustering of higher prices among highly rated sellers, the relationship is not perfectly linear. Interestingly, some new items with mid level seller ratings still command high prices, implying that product condition & uniqueness might matter more than seller trust alone. The dense cluster of lower prices, even among top-rated sellers, also suggests strong competition or a flooded market for certain categories. A limitation of this project was the inability to incorporate factors such as shipping cost, item traits like color, hashtags, country of origin, and exact item category depth, as these were stored in nested formats that were difficult to parse in the time allowed. In future versions, using a more analysis friendly dataset or pre processing the nested fields would provide more detailed insights. Additionally, cleaning and structuring the dataset required significant trial and error due to inconsistent formatting and deeply nested JSON like entries.

Despite the challenges, these visualizations clearly show how condition, geography, and seller rating all interact to shape pricing on fashion resale platforms and suggest several areas for further research and refinement.