Mario Kart Wii - Game Auctions

Introduction

For my project, I’ll be studying Mario Kart Wii listings on Ebay from October 2009. The dataset includes information such as the number of bids, game condition, the starting auction price, total price, seller rating, and more. As a big Mario Kart fan, this dataset definitely caught my eye. Mario Kart is a timeless classic for Nintendo and only grew more successful as the game, and Nintendo as a company, transformed overtime. From the NES to Wii to DS to Nintendo’s newest game system, the Switch, Mario Kart remained one of Nintendo’s most successful games (1).

Mario Kart Wii was my first Mario Kart game and I used to play it all the time as a kid with my two little brothers. We’d race against each other for hours on end for years after my mom bought the console as a bundle with Mario Kart. My cousin had Mario Kart 8 on his DS and, whenever we saw each other, the four of us would fight over who gets to play. Although my brothers and I got older, our united love for Mario Kart came back when a mobile game version released in 2019. Fast forward to today and I have my own Nintendo Switch where I get to relive my childhood memories on Mario Kart 8 Deluxe, which was a huge upgrade from the old school Wii game I was used to. Anyway, I love Mario Kart. Let’s get into the data.

Mario Kart Wii initially released on April 10, 2008 for $49.99 (2).

library(tidyverse)
library(RColorBrewer)
library(highcharter)

mariokart <- read_csv("mariokart.csv") #load in data

Rows: 143 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): cond, ship_sp, stock_photo, title
dbl (8): id, duration, n_bids, start_pr, ship_pr, total_pr, seller_rate, wheels

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(mariokart, 10) #preview first ten lines

# A tibble: 10 × 12
          id duration n_bids cond  start_pr ship_pr total_pr ship_sp seller_rate
       <dbl>    <dbl>  <dbl> <chr>    <dbl>   <dbl>    <dbl> <chr>         <dbl>
 1   1.50e11        3     20 new       0.99    4        51.6 standa…        1580
 2   2.60e11        7     13 used      0.99    3.99     37.0 firstC…         365
 3   3.20e11        3     16 new       0.99    3.5      45.5 firstC…         998
 4   2.80e11        3     18 new       0.99    0        44   standa…           7
 5   1.70e11        1     20 new       0.01    0        71   media           820
 6   3.60e11        3     19 new       0.99    4        45   standa…      270144
 7   1.20e11        1     13 used      0.01    0        37.0 standa…        7284
 8   3.00e11        1     15 new       1       2.99     54.0 upsGro…        4858
 9   2.00e11        3     29 used      0.99    4        47   priori…          27
10   3.30e11        7      8 used     20.0     4        50   firstC…         201
# ℹ 3 more variables: stock_photo <chr>, wheels <dbl>, title <chr>

Cleaning Up the Data

I plan to compare the total listing price variable, however there are a few listings that may be outliers. To clean up the data, I’ll exclude these listings that are more expensive because they include more than one game. Wii games at the time retailed for about $50 each so I’ll filter to rows with a total price of less than $100 which would be about the price of more than one Wii game.

mariokart1 <- mariokart |>
  relocate(title, total_pr) |> #bring forward title and price columns
  arrange(desc(total_pr)) |> #sort price in descending order
  filter(!total_pr > 100) |> #filter out lists that were over $100
  select(!c(id, stock_photo)) #filter out columns I won't use

head(mariokart1, 10)

# A tibble: 10 × 10
   title     total_pr duration n_bids cond  start_pr ship_pr ship_sp seller_rate
   <chr>        <dbl>    <dbl>  <dbl> <chr>    <dbl>   <dbl> <chr>         <dbl>
 1 NEW MARI…     75          1      3 new      70.0     0    standa…      118345
 2 BRAND NE…     71          1     20 new       0.01    0    media           820
 3 MARIO KA…     66.4        1      1 new      55.0    11.4  upsGro…      118345
 4 Mario Ka…     65.0        7      2 used     55       9.02 parcel           25
 5 MARIO KA…     65.0        1      1 new      65.0     0    standa…      118345
 6 NEW MARI…     65.0        1      1 new      65.0     0    standa…      118345
 7 MARIO KA…     65.0        1      1 new      65.0     0    standa…      118345
 8 Wii  MAR…     64.5        7     12 used      0.99    4    standa…         991
 9 Mario Ka…     64          3     14 new       0.99    4    standa…         127
10 BRAND NE…     64.0        1      9 new       1       2.99 upsGro…        4858
# ℹ 1 more variable: wheels <dbl>

Exploratory Analysis

From all the data provided for each listing, I chose to explore the relationship between the total price of a listing with the number of bids it received. I hypothesize that listings with a lower total price will have more bids.

This density graph shows that bid count is normally distributed making it an appropriate variable to assess.

#view distribution of num of bids
bidsgraph <- mariokart1 |>
  ggplot(aes(x = n_bids)) +
  geom_density() +
  labs(title = "Number of Bids Distribution")

bidsgraph

Linear Model

Equation: y = -0.04979x + 15.74484 ; where y = predicted number of bids and x = total price

#linear model
lm <- lm(mariokart1$n_bids ~ mariokart1$total_pr)
lm


Call:
lm(formula = mariokart1$n_bids ~ mariokart1$total_pr)

Coefficients:
        (Intercept)  mariokart1$total_pr  
           15.74484             -0.04979

#graph the variables
mariokart1 |>
  ggplot(aes(total_pr, n_bids)) +
  geom_point() +
  stat_smooth(method = "lm", col = "blue")

`geom_smooth()` using formula = 'y ~ x'

The graphed regression line is fairly horizontal, which means the total price of a listing doesn’t seem to have a linear relationship with the number of bids.

summary(lm)


Call:
lm(formula = mariokart1$n_bids ~ mariokart1$total_pr)

Residuals:
    Min      1Q  Median      3Q     Max 
-11.511  -3.857   0.247   3.314  15.844 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         15.74484    2.58263   6.096 1.01e-08 ***
mariokart1$total_pr -0.04979    0.05348  -0.931    0.353    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.767 on 139 degrees of freedom
Multiple R-squared:  0.006199,  Adjusted R-squared:  -0.0009509 
F-statistic: 0.867 on 1 and 139 DF,  p-value: 0.3534

The linear model produced a p-value of 0.3534. Considering the standard level of significance of 0.05, this reveals that the coefficients are not statistically significant. Overall, this linear regression proves my hypothesis as false. Based on this data, the total price of a listing does not have the negative linear relationship with bids like I predicted.

Other Factors

The next factor I’d like to explore is how the game’s condition may affect some of the other variables. Game condition is a character column and is represented by either new or used.

unique(mariokart1$cond)

[1] "new"  "used"

To compare and visualize the relationship between game condition and bids, I created new variables ‘bids_range’, which categorizes listings by less than ten bids, ten to twenty bids, and 20 bids or more, and ‘auction_price’ which is the total price subtracted by the shipping price.

#categorizing bids by count, assigning them to a new column called bids_range
mariokart1 <- mariokart1 |>
  mutate(bids_range = case_when(
    n_bids < 10 ~ "Less than 10 Bids", #bids < 10 
    n_bids >= 10 & n_bids <= 20 ~ "10 to 20 Bids", #bids 10-20
    n_bids > 20 ~ "More than 20 Bids" #bids >20
  )) |>
  mutate(auction_price = total_pr - ship_pr) # create column for auction price

Game Condition and Bids

#ordering bids_range categories in numerical order
mariokart1$bids_range <- factor(mariokart1$bids_range, levels = c("More than 20 Bids", "10 to 20 Bids", "Less than 10 Bids"))

#bar graph plotted by condition, color fill determined by bidding range
mariokart1 |>
  ggplot(aes(cond, fill = bids_range)) +
  geom_bar() +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "Game Condition vs. Bids")

Unfortunately, the bar plot of game condition vs. the number of bids didn’t reveal anything interesting.

Duration vs. Auction Price

#scatterplot of duration vs. auction price
mariokart1 |>
  ggplot(aes(duration, auction_price, color = cond)) +
  geom_point(size = 2.5) +
  scale_color_brewer(palette = "Set2") +
  geom_hline(yintercept = 50, alpha = 0.5) + #add line at the retail price point
  labs(y = "Game Price in USD",
       x = "Listing Duration in Days",
       title = "Mario Kart Wii Ebay Auction Data, October 2009",
       caption = "Source: Ebay") +
  scale_x_continuous(breaks = c(1, 3, 5, 7, 9)) #fixing scale to be integers and not decimals

This scatterplot allows you to visualize how the auction prices differ based on the game condition. The line at the $50 mark represents the original game’s retail price. In relation to the reference line, most used games seem to be marked for a lower price while ones in newer condition are marked higher. I was curious to see if lower priced listings had a lower duration, however this graph didn’t reveal anything too interesting either.

Wii Wheels and Range of Bids

#bar graph by wii wheels included
mariokart1 |>
  filter(wheels < 3) |>
  ggplot(aes(wheels, fill = bids_range)) +
  geom_bar() +
  labs(x = "Number of Wii Wheels included",
       fill = " ",
       title = "Mario Kart Wii - Ebay Listings (October 2009)",
       caption = "Source: Ebay") +
  scale_fill_brewer(palette = "BuPu") +
  theme_light()

Most auctions included two or less Wii Remote wheels so I graphed a histogram only including those auctions. With the bid number ranges I created, you can see that most auctions received 10 to 20 bids regardless of the number of wheels included.

Final Visualization

#interactive line plot
mariokart1 |>
  hchart(type = "line", hcaes(x = total_pr, y = n_bids, group = ship_sp))|>
  hc_xAxis(title = list(text = "Total Listing Price in USD")) |>
  hc_yAxis(title = list(text = "Bids")) |>
  hc_title(text = "Mario Kart Wii Ebay Auctions (October 2009)") |>
  hc_caption(text = "Source: Ebay")

The final visualization is a line plot that demonstrates the relationship between the total auction price and bids, sorted by type of shipping. Out of the many shipping methods included, it seems media shipping, standard shipping, and UPS Ground shipping tend to be selling for a higher price. Overall, regardless of shipping method, bids seem to be higher for listings whose total price is about the same as the game’s original price. Due to the shipping prices being such low numbers in such a low range, this line graph makes it easier to explore these relationships.

Sources

data: https://www.openintro.org/data/index.php?data=mariokart