Sales Analysis of N64 games

This is my very first project on R. I purposely chose a small data set to make my job easier to get comfortable with the code, but I believe if I can do it with a small data set, I can do it with a big one as well !

Here we are trying to use data cleaning and visualization tools to highlight differences in Sales over the years on the game Console, Nintendo 64 between 1996 & 2000.

Some lines of code may look confusing since I’m learning, my apologies.

Packages

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.0 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggplot2)
library(dplyr)
df <- read.csv('best-selling-nintendo64.csv')

Checking Consistency of Data

head(df)
##                                   Game   Developer.s. Publisher.s. Release.date
## 1                       Super Mario 64   Nintendo EAD     Nintendo   1996-06-23
## 2                        Mario Kart 64   Nintendo EAD     Nintendo   1996-12-14
## 3                        GoldenEye 007           Rare     Nintendo   1997-08-25
## 4 The Legend of Zelda: Ocarina of Time   Nintendo EAD     Nintendo   1998-11-21
## 5                    Super Smash Bros. HAL Laboratory     Nintendo   1999-01-21
## 6                      Pokémon Stadium   Nintendo EAD     Nintendo   1999-04-30
##      Sales
## 1 11910000
## 2  9870000
## 3  8090000
## 4  7600000
## 5  5550000
## 6  5460000
summary(df)
##      Game           Developer.s.       Publisher.s.       Release.date      
##  Length:46          Length:46          Length:46          Length:46         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      Sales         
##  Min.   : 1040000  
##  1st Qu.: 1417500  
##  Median : 2100000  
##  Mean   : 2956408  
##  3rd Qu.: 3295000  
##  Max.   :11910000
summary(is.na(df))
##     Game         Developer.s.    Publisher.s.    Release.date   
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:46        FALSE:46        FALSE:46        FALSE:46       
##    Sales        
##  Mode :logical  
##  FALSE:46
table(df$Game)
## 
##         007: The World Is Not Enough                   1080° Snowboarding 
##                                    1                                    1 
##                        Banjo-Kazooie                          Banjo-Tooie 
##                                    1                                    1 
##                          Cruis'n USA                    Diddy Kong Racing 
##                                    1                                    1 
##                       Donkey Kong 64                        Excitebike 64 
##                                    1                                    1 
##                 F-1 World Grand Prix                             F-Zero X 
##                                    1                                    1 
##                        GoldenEye 007                    Hey You, Pikachu! 
##                                    1                                    1 
##                     Jet Force Gemini         Kirby 64: The Crystal Shards 
##                                    1                                    1 
##         Kobe Bryant in NBA Courtside                           Mario Golf 
##                                    1                                    1 
##                        Mario Kart 64                          Mario Party 
##                                    1                                    1 
##                        Mario Party 2                        Mario Party 3 
##                                    1                                    1 
##                         Mario Tennis                      Namco Museum 64 
##                                    1                                    1 
##                          Paper Mario                         Perfect Dark 
##                                    1                                    1 
##                        Pilotwings 64             Pocket Monsters' Stadium 
##                                    1                                    1 
##                         Pokémon Snap                      Pokémon Stadium 
##                                    1                                    1 
##                    Pokémon Stadium 2                          Star Fox 64 
##                                    1                                    1 
##           Star Wars Episode I: Racer            Star Wars: Rogue Squadron 
##                                    1                                    1 
##     Star Wars: Shadows of the Empire                       Super Mario 64 
##                                    1                                    1 
##                    Super Smash Bros.   The Legend of Zelda: Majora's Mask 
##                                    1                                    1 
## The Legend of Zelda: Ocarina of Time               Tony Hawk's Pro Skater 
##                                    1                                    1 
##               Turok 2: Seeds of Evil               Turok: Dinosaur Hunter 
##                                    1                                    1 
##                         Wave Race 64              WCW vs. nWo: World Tour 
##                                    1                                    1 
##                      WCW/nWo Revenge                         WWF No Mercy 
##                                    1                                    1 
##                WWF WrestleMania 2000                        Yoshi's Story 
##                                    1                                    1
table(df$Publisher.s)
## 
## Acclaim Entertainment            Activision       Electronic Arts 
##                     2                     1                     1 
##             LucasArts                 Namco              Nintendo 
##                     1                     1                    33 
##                  Rare                   THQ 
##                     3                     4
table(df$Developer.s.)
## 
##           AKI Corporation and Asmik Ace Entertainment 
##                                                     4 
##                                              Ambrella 
##                                                     1 
##                             Camelot Software Planning 
##                                                     2 
##                                       Edge of Reality 
##                                                     1 
##                                               Eurocom 
##                                                     1 
##                                Factor 5 and LucasArts 
##                                                     1 
##                                        HAL Laboratory 
##                                                     2 
##                      HAL Laboratory and Pax Softonica 
##                                                     1 
##                                           Hudson Soft 
##                                                     3 
##                                  Iguana Entertainment 
##                                                     2 
##                                   Intelligent Systems 
##                                                     1 
##                                Left Field Productions 
##                                                     2 
##                                             LucasArts 
##                                                     2 
##                                      Mass Media Games 
##                                                     1 
##                                          Nintendo EAD 
##                                                    11 
## Nintendo EAD / Nintendo R&D3 / Paradigm Entertainment 
##                                                     1 
##                       Nintendo EAD and HAL Laboratory 
##                                                     1 
##                                Paradigm Entertainment 
##                                                     1 
##                                                  Rare 
##                                                     7 
##                                              Williams 
##                                                     1
table(df$Release.date)
## 
## 1996-06-23 1996-09-27 1996-12-03 1996-12-14 1997-03-04 1997-04-27 1997-08-25 
##          2          1          2          1          1          1          1 
## 1997-11-14 1997-11-30 1997-12-21 1998-02-28 1998-04-27 1998-06-29 1998-07-14 
##          1          1          1          1          1          1          1 
## 1998-07-31 1998-08-01 1998-10-21 1998-10-26 1998-11-21 1998-12-07 1998-12-12 
##          1          1          1          1          1          1          1 
## 1998-12-18 1999-01-21 1999-03-21 1999-04-30 1999-06-11 1999-10-11 1999-10-12 
##          1          1          1          2          1          1          1 
## 1999-10-31 1999-11-22 1999-12-17 2000-02-29 2000-03-24 2000-04-27 2000-04-30 
##          1          1          1          1          1          1          1 
## 2000-05-22 2000-07-21 2000-08-11 2000-10-17 2000-11-17 2000-11-20 2000-12-07 
##          1          1          1          1          1          1          1 
## 2000-12-14 
##          1
table(df$Sales)
## 
##  1040000  1080000  1094765  1100000  1120000  1140000  1160000  1190000 
##        1        1        1        1        1        1        1        2 
##  1300000  1370000  1400000  1470000  1500000  1600000  1610000  1720000 
##        1        1        1        1        1        1        1        1 
##  1770000  1830000  1880000  1910000  2000000  2030000  2170000  2320000 
##        1        1        1        1        1        1        1        1 
##  2480000  2520000  2540000  2600000  2700000  2850000  2940000  3000000 
##        1        1        1        1        1        1        1        1 
##  3100000  3360000  3630000  3650000  4000000  4880000  5270000  5460000 
##        1        1        1        1        1        1        1        1 
##  5550000  7600000  8090000  9870000 11910000 
##        1        1        1        1        1

Quick Cleaning

colnames(df)[2] <- "Developper"
colnames(df)[3] <- "Publisher"
colnames(df)[4] <- "Release_date"
  • Widening the Scope of Release Dates of the Games
df$Release_date <- as.Date(df$Release_date)
df$Year_Release <- as.numeric(format(df$Release_date, "%Y"))
print(df$Year_Release)
##  [1] 1996 1996 1997 1998 1999 1999 1999 1997 1997 1998 1999 2000 1999 2000 1996
## [16] 1997 1998 1996 2000 2000 1999 2000 1998 1998 2000 2000 1998 1998 2000 1996
## [31] 2000 1998 1997 1999 1998 2000 1997 1998 2000 1999 1999 1996 1998 1998 2000
## [46] 1999
Total_Publisher <- length(df$Developper)

How Many Games Have Been Released ?

ggplot(df,aes(x=Year_Release, )) +
  geom_bar(color='black', fill='lightblue') +
  theme(axis.text.x= element_text(angle=90,hjust=1)) +
  ggtitle("No. game released per year on N64")+ 
  geom_text(aes(label = ..count..), stat = "count", vjust = 1.5, colour = "black")
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.

We can notice several things :

  • The distribution is negatively skewed.
  • Most of the games have been released in 1998 & 2000.
  • When the N64 was released in 1996, only 6 games were available.

Who Are THe Main Publishers ?

Number_Games_Released <-  5.75 #average Sales
ggplot (df,aes(x=Publisher),y=Sales) +
  geom_bar(color = 'black', fill = 'lightblue',) +
  theme(axis.text.x= element_text(angle=90,hjust=1))+
  ggtitle("No. game released per year on N64")+ 
  geom_text(aes(label = ..count..), stat = "count", vjust = -0.4, colour = "black")+
  geom_hline(data = df, aes(yintercept = Number_Games_Released))+
  geom_text(aes(0, Number_Games_Released, label = Number_Games_Released , vjust = -0.5, hjust = -1))

We can notice several things :

  • The average number of games released per Publisher is 5.75 (mean=5.75).
  • The previous statement is biased because on the 46 games released, 33 have been published by Nintendo itself, which made the firm responsible of 71% of the games released on the N64.

Focus on sales : comparing nintendo and other publishers

show(df$Sales)
##  [1] 11910000  9870000  8090000  7600000  5550000  5460000  5270000  4880000
##  [9]  4000000  3650000  3630000  3360000  3100000  3000000  2940000  2850000
## [17]  2700000  2600000  2540000  2520000  2480000  2320000  2170000  2030000
## [25]  2000000  1910000  1880000  1830000  1770000  1720000  1610000  1600000
## [33]  1500000  1470000  1400000  1370000  1300000  1190000  1190000  1160000
## [41]  1140000  1120000  1100000  1094765  1080000  1040000
min(df$Sales) #1040000
## [1] 1040000
max(df$Sales) #11910000
## [1] 11910000

I struggled to create a bar plot grouping Nintendo yearly Sales and all Publisher Sales so I decided to do it separately by grouping information

Total_yearly_Sales <- df %>% 
  filter(Publisher == "Nintendo" | Publisher ==  "Rare" | Publisher == "LucasArts"| 
           Publisher == "THQ" | Publisher == "Acclaim Entertainment"| 
           Publisher == "Activision"| Publisher == "Electronic Arts"| 
           Publisher == "Namco") %>% 
  group_by(Publisher,Game, Year_Release, Sales) %>% 
  summarize(Sales = 
              sum(Sales, na.rm=TRUE))
## `summarise()` has grouped output by 'Publisher', 'Game', 'Year_Release'. You
## can override using the `.groups` argument.

Yearly Sales containing every publishers from 1996 to 2000

ggplot(Total_yearly_Sales,aes(x = Year_Release, y = Sales, fill = Nintendo_yearly_Sales), stat = 'identity',) +
  geom_bar( fill = 'steelblue', stat = 'identity') 

Going through the same process targeting only Nintendo

Nintendo_yearly_Sales <- df %>% 
  filter(Publisher == "Nintendo") %>% 
  group_by(Publisher,Game, Year_Release, Sales)
Nintendo_Sales <- Nintendo_yearly_Sales$Sales
#Yearly Sales Containing only Nintendo as a publisher from 1996 to 2000
ggplot(Nintendo_yearly_Sales,aes(x = Year_Release, y = Nintendo_Sales), stat = 'identity',) +
  geom_bar( fill = 'lightblue', stat = 'identity')

We can identify several things if we compare both plots:

  • It seems like only Nintendo published games in 1996
  • 1996 & 1999 have been the most profitable periods in selling games
  • Nintendo dominance in sales is getting weaker starting from 1997 and reaches its draw down in 1999

Focus on Sales : How is the profit distributed among publishers ?

Analyzing the top 10 most profitable games

Top_10_Sales <- df %>% arrange(desc(Sales))
head(Top_10_Sales,10)
##                                    Game     Developper Publisher Release_date
## 1                        Super Mario 64   Nintendo EAD  Nintendo   1996-06-23
## 2                         Mario Kart 64   Nintendo EAD  Nintendo   1996-12-14
## 3                         GoldenEye 007           Rare  Nintendo   1997-08-25
## 4  The Legend of Zelda: Ocarina of Time   Nintendo EAD  Nintendo   1998-11-21
## 5                     Super Smash Bros. HAL Laboratory  Nintendo   1999-01-21
## 6                       Pokémon Stadium   Nintendo EAD  Nintendo   1999-04-30
## 7                        Donkey Kong 64           Rare  Nintendo   1999-11-22
## 8                     Diddy Kong Racing           Rare      Rare   1997-11-14
## 9                           Star Fox 64   Nintendo EAD  Nintendo   1997-04-27
## 10                        Banjo-Kazooie           Rare  Nintendo   1998-06-29
##       Sales Year_Release
## 1  11910000         1996
## 2   9870000         1996
## 3   8090000         1997
## 4   7600000         1998
## 5   5550000         1999
## 6   5460000         1999
## 7   5270000         1999
## 8   4880000         1997
## 9   4000000         1997
## 10  3650000         1998

In the top 10 biggest sales on Nintendo 64 we can see :

Creating a bar plot showing yearly sales of each publishers singularly

from 1996 to 2000 :

ggplot(data=Total_yearly_Sales, aes(x = Year_Release, y = Sales, fill = Publisher)) +
  geom_bar(stat = "identity", position = position_dodge()) +   ggtitle("Total Yearly Sales per Publishers")

This last visualization helps us draw several conclusions :