Resale Value by Shoe Size - Does Size Matter?
Load in the dataset “stockx_data.csv”, name it “stockx”
stockx <- read_csv("stockx_data.csv")
## Parsed with column specification:
## cols(
## order_date = col_character(),
## brand = col_character(),
## sneaker_name = col_character(),
## sale_price = col_double(),
## retail_price = col_double(),
## release_date = col_character(),
## shoe_size = col_double(),
## buyer_region = col_character()
## )
First, use ggplot to make a histogram to compare retail price by brand. This showed that all Yeezy shoes retail for $220, while Off-White shoes range from $130-250. At first, I was thinking of trying to shrink my dataset (since it’s almost 100,000 data points) by only looking at sneaker’s over/under/between a certain price. This graph showed me, however, that doing so would likely result in one of the brands not being represented in the dataset.
plot1 <- stockx %>%
ggplot(aes(x=retail_price, fill=brand))+
geom_histogram(position="identity", alpha = 0.4, binwidth = 5, color = "black")+
scale_fill_discrete()
plot1

Next, I used ggplot to make a histogram to compare shoe size and brand. This shows that shoes between size ~9-12 were the most commonly bought sizes during the time range of this data collection. There were more Yeezy sales in this dataset than there were Off-White, but both brands show a common trend for most popular sizes.
plot2 <- stockx %>%
ggplot(aes(x=shoe_size, fill=brand))+
geom_histogram(position="dodge", alpha = 0.4, binwidth = 0.5, color = "black")+
scale_fill_discrete()
plot2

Next, I used ggplot to make a histogram to compare sale price and brand. The scale makes it hard to see the data past the $1,500 range, but this shows that most shoes re-sold for under $1,000. While it is hard to see, most of the shoes sold in the $2,000-4,000 range are Off-White brand.
plot3 <- stockx %>%
ggplot(aes(x=sale_price, fill=brand))+
geom_histogram(position="dodge", alpha = 0.4, binwidth = 500, color = "black")+
scale_fill_discrete()
plot3

I then used the mutate function to create a new variable, “profit,” by subtracting retail price from sale price. I made a histogram to look at profit and brand, but this plot didn’t really make much sense in my opinion. The values of profits vary so much, so having a histogram with profit on the x-axis and count on the y-axis isn’t very useful.
plot4 <- stockx %>%
mutate(profit = sale_price - retail_price) %>%
ggplot(aes(x=profit, fill=brand))+
geom_histogram(position="dodge", alpha = 0.4, binwidth = 500, color = "black")+
scale_fill_discrete()
plot4

So, I made a scatterplot to look at shoe size and profit by brand. I chose to look at these variable because there was a clear preference towards shoe sizes, so I was curious to see if those popular sizes also fetched the highest “profit”. I again used the mutate function to create the “profit” variable, and created a scatterplot, which, because of the size of the points was a bit jumbled. But this was the style of graph I decided upon and I wanted to roughly test it out first.
plot5 <- stockx %>%
mutate(profit = sale_price - retail_price) %>%
ggplot(aes(x=shoe_size, y=profit, color=brand)) +
geom_point()
plot5

Just in case….I tried a barplot with the same variables and decided against it once I saw the results.
plot6 <- stockx %>%
mutate(profit = sale_price - retail_price) %>%
ggplot(aes(x=shoe_size, y=profit, fill=brand)) +
geom_col(position = "dodge")
plot6

Going back to the scatterplot, I made the points smaller (geom_point(size=.1)) to see if it was more legible that way. It definitely was.
plot7 <- stockx %>%
mutate(profit = sale_price - retail_price) %>%
ggplot(aes(x=shoe_size, y=profit, color=brand)) +
geom_point(size= .1)
plot7

Preparing to use plotly to make my visualization interactive, I called up plotly.
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
For my statistcal analysis, I thought a linear regression wouldn’t be best to use here, since shoe size isn’t a continuous variable. Instead, I decided to use the summary function to get a 5-number summary. It was intersting to see that every pair of shoes in this dataset was bought for more than the retail price was.
summary(stockx)
## order_date brand sneaker_name sale_price
## Length:99956 Length:99956 Length:99956 Min. : 186.0
## Class :character Class :character Class :character 1st Qu.: 275.0
## Mode :character Mode :character Mode :character Median : 370.0
## Mean : 446.6
## 3rd Qu.: 540.0
## Max. :4050.0
## retail_price release_date shoe_size buyer_region
## Min. :130.0 Length:99956 Min. : 3.500 Length:99956
## 1st Qu.:220.0 Class :character 1st Qu.: 8.000 Class :character
## Median :220.0 Mode :character Median : 9.500 Mode :character
## Mean :208.6 Mean : 9.344
## 3rd Qu.:220.0 3rd Qu.:11.000
## Max. :250.0 Max. :17.000
Preparing for my final plot, I loaded in ggthemes.
library(ggthemes)
I wanted to play around with themes, colors, text, etc. and decide what I wanted before I created my final plot. So again I used mutate to create the variable of “profit,” then I put shoe size on the x-axis, profit on the y-axis, and brand on the legend. I made my points smaller using size=.1. I changed the background to grey, made Off-White black and Yeezy green. I added axis labels, a legend label and a title. I changed the size and font over the legend, the axis titles and axis labels, and made the legend labels and axis tick marks italic. I made the x-axis tick marks go from 3.5 - 17, the range of the shoe sizes, and to be labeled every 1.5 sizes. Lastly, I made the legend background grey.
plot8 <- stockx %>%
mutate(profit = sale_price - retail_price) %>%
ggplot(aes(x=shoe_size, y=profit, color=brand)) +
geom_point(size= .1) +
theme(panel.background = element_rect(fill= "grey", colour= "white"))+
scale_color_manual(values = c("black", "green3"))+
xlab("Shoe Size") +
ylab("Profit ($)") +
labs(color= "Brand") +
ggtitle("Resale Value by Shoe Size - Does Size Matter?")+
theme(legend.text = element_text(size = 10, face = "italic", family = "sans"))+
theme(axis.text=element_text(size=8, family = "sans", face = "italic"),
axis.title=element_text(size=12, family = "sans"))+
scale_x_continuous(breaks=seq(3.5,17,1.5))+
theme(legend.background = element_rect(fill="grey"))
plot8

After playing around with customizations, I was happy with what I had and used plotly to make it interactive. Here is my final plot!
plot9 <- stockx %>%
mutate(profit = sale_price - retail_price) %>%
ggplot(aes(x=shoe_size, y=profit, color=brand)) +
geom_point(size= .1) +
theme(panel.background = element_rect(fill= "grey", colour= "white"))+
scale_color_manual(values = c("black", "green3"))+
xlab("Shoe Size") +
ylab("Profit ($)") +
labs(color= "Brand") +
ggtitle("Resale Value by Shoe Size - Does Size Matter?")+
theme(legend.text = element_text(size = 10, face = "italic", family = "sans"))+
theme(axis.text=element_text(size=8, family = "sans", face = "italic"),
axis.title=element_text(size=12, family = "sans"))+
scale_x_continuous(breaks=seq(3.5,17,1.5))+
theme(legend.background = element_rect(fill="grey"))
plot9 <- ggplotly(plot9)
plot9