Video games are an important part of the entertainment industry. Since a young age I have been fascinated by the world of video games and what it had to offer. Most of us really have enjoy playing video games since we were kids and still do sometimes as an adult. It is the same reason I decided to analyze this gaming data. With this data we will break down some of the trends like the most popular platform, genre, the biggest market, etc. Unfortunately the data is based on physical sales and the most recent years are not all complete, so there are quite a few limitations on the data. Nevertheless, we have enough data to use and analyze. We will explore the data as much as possible in a simple and informative way.
We are using a data file that contains a list of video games with sales. The file was generated by a scrape of vgchartz.com and with another web scrape from Metacritic, the file is call vgsales.csv and we will attach it as a source to all images.
data <- read.csv("/cloud/project/vgsales.csv", stringsAsFactors = FALSE)
library(tidyverse)
library(RColorBrewer)
library(ggplot2)
library(hrbrthemes)
library(viridis)
library(plotly)
library(ggthemes)
library(dplyr)
library(psych)
library(lubridate)
dim(data)
## [1] 16598 11
names(data)
## [1] "Rank" "Name" "Platform" "Year" "Genre"
## [6] "Publisher" "NA_Sales" "EU_Sales" "JP_Sales" "Other_Sales"
## [11] "Global_Sales"
From our inspection we can conclude:
The file is selected and copied for analysis. From first observations I have mentioned that data from 2017:2020 is incomplete, removing data with the NaN values will definitely help with the analysis!
#Checking the data
str(data)
## 'data.frame': 16598 obs. of 11 variables:
## $ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Name : chr "Wii Sports" "Super Mario Bros." "Mario Kart Wii" "Wii Sports Resort" ...
## $ Platform : chr "Wii" "NES" "Wii" "Wii" ...
## $ Year : chr "2006" "1985" "2008" "2009" ...
## $ Genre : chr "Sports" "Platform" "Racing" "Sports" ...
## $ Publisher : chr "Nintendo" "Nintendo" "Nintendo" "Nintendo" ...
## $ NA_Sales : num 41.5 29.1 15.8 15.8 11.3 ...
## $ EU_Sales : num 29.02 3.58 12.88 11.01 8.89 ...
## $ JP_Sales : num 3.77 6.81 3.79 3.28 10.22 ...
## $ Other_Sales : num 8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
## $ Global_Sales: num 82.7 40.2 35.8 33 31.4 ...
data[ , c("Platform", "Year", "Genre", "Publisher")] <- lapply(data[ , c("Platform", "Year", "Genre", "Publisher")], as.factor)
str(data)
## 'data.frame': 16598 obs. of 11 variables:
## $ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Name : chr "Wii Sports" "Super Mario Bros." "Mario Kart Wii" "Wii Sports Resort" ...
## $ Platform : Factor w/ 31 levels "2600","3DO","3DS",..: 26 12 26 26 6 6 5 26 26 12 ...
## $ Year : Factor w/ 40 levels "1980","1981",..: 27 6 29 30 17 10 27 27 30 5 ...
## $ Genre : Factor w/ 12 levels "Action","Adventure",..: 11 5 7 11 8 6 5 4 5 9 ...
## $ Publisher : Factor w/ 579 levels "10TACLE Studios",..: 369 369 369 369 369 369 369 369 369 369 ...
## $ NA_Sales : num 41.5 29.1 15.8 15.8 11.3 ...
## $ EU_Sales : num 29.02 3.58 12.88 11.01 8.89 ...
## $ JP_Sales : num 3.77 6.81 3.79 3.28 10.22 ...
## $ Other_Sales : num 8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
## $ Global_Sales: num 82.7 40.2 35.8 33 31.4 ...
#checking missing value
data[data == "N/A"]<-NA
colSums(is.na(data))
## Rank Name Platform Year Genre Publisher
## 0 0 0 271 0 58
## NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
## 0 0 0 0 0
#calculating the missing values.
colSums(is.na(data))/nrow(data)
## Rank Name Platform Year Genre Publisher
## 0.000000000 0.000000000 0.000000000 0.016327268 0.000000000 0.003494397
## NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
#Dropping missing values
data <- data %>%
drop_na(Year, Publisher)
anyNA(data)
## [1] FALSE
data <- data[data$Year != "N/A" & data$Year != "2017" & data$Year != "2020", ]
data$Year <- factor(data$Year)
The data have been converted to the desired data.
#Describe Data
describe(data)
## vars n mean sd median trimmed mad min max
## Rank 1 16287 8288.97 4792.14 8291.00 8288.15 6157.24 1.00 16600.00
## Name* 2 16287 5707.19 3276.85 5776.00 5720.84 4219.48 1.00 11322.00
## Platform* 3 16287 16.73 8.27 17.00 16.68 10.38 1.00 31.00
## Year* 4 16287 27.40 5.83 28.00 27.86 5.93 1.00 37.00
## Genre* 5 16287 5.93 3.76 6.00 5.86 5.93 1.00 12.00
## Publisher* 6 16287 297.92 181.75 328.00 302.21 268.35 1.00 579.00
## NA_Sales 7 16287 0.27 0.82 0.08 0.13 0.12 0.00 41.49
## EU_Sales 8 16287 0.15 0.51 0.02 0.06 0.03 0.00 29.02
## JP_Sales 9 16287 0.08 0.31 0.00 0.02 0.00 0.00 10.22
## Other_Sales 10 16287 0.05 0.19 0.01 0.02 0.01 0.00 10.57
## Global_Sales 11 16287 0.54 1.57 0.17 0.28 0.21 0.01 82.74
## range skew kurtosis se
## Rank 16599.00 0.00 -1.20 37.55
## Name* 11321.00 -0.03 -1.21 25.68
## Platform* 30.00 -0.05 -0.99 0.06
## Year* 36.00 -1.01 1.85 0.05
## Genre* 11.00 0.07 -1.43 0.03
## Publisher* 578.00 -0.14 -1.40 1.42
## NA_Sales 41.49 18.74 642.49 0.01
## EU_Sales 29.02 18.77 745.95 0.00
## JP_Sales 10.22 11.12 191.08 0.00
## Other_Sales 10.57 24.10 1011.31 0.00
## Global_Sales 82.73 17.30 595.62 0.01
#Data summary
summary(data)
## Rank Name Platform Year
## Min. : 1 Length:16287 DS :2130 2009 :1431
## 1st Qu.: 4132 Class :character PS2 :2127 2008 :1428
## Median : 8291 Mode :character PS3 :1304 2010 :1257
## Mean : 8289 Wii :1290 2007 :1201
## 3rd Qu.:12438 X360 :1234 2011 :1136
## Max. :16600 PSP :1197 2006 :1008
## (Other):7005 (Other):8826
## Genre Publisher NA_Sales
## Action :3250 Electronic Arts : 1339 Min. : 0.0000
## Sports :2304 Activision : 966 1st Qu.: 0.0000
## Misc :1686 Namco Bandai Games : 928 Median : 0.0800
## Role-Playing:1468 Ubisoft : 917 Mean : 0.2657
## Shooter :1282 Konami Digital Entertainment: 823 3rd Qu.: 0.2400
## Adventure :1274 THQ : 712 Max. :41.4900
## (Other) :5023 (Other) :10602
## EU_Sales JP_Sales Other_Sales Global_Sales
## Min. : 0.0000 Min. : 0.00000 Min. : 0.00000 Min. : 0.010
## 1st Qu.: 0.0000 1st Qu.: 0.00000 1st Qu.: 0.00000 1st Qu.: 0.060
## Median : 0.0200 Median : 0.00000 Median : 0.01000 Median : 0.170
## Mean : 0.1478 Mean : 0.07885 Mean : 0.04844 Mean : 0.541
## 3rd Qu.: 0.1100 3rd Qu.: 0.04000 3rd Qu.: 0.04000 3rd Qu.: 0.480
## Max. :29.0200 Max. :10.22000 Max. :10.57000 Max. :82.740
##
Data Summary
We will plot some visuals to see where NA stands compare to other markets
# Plotting Sales of the Markets
data %>%
select(Platform,Year,Genre, NA_Sales,EU_Sales, JP_Sales,Other_Sales,Global_Sales) %>%summary()
## Platform Year Genre NA_Sales
## DS :2130 2009 :1431 Action :3250 Min. : 0.0000
## PS2 :2127 2008 :1428 Sports :2304 1st Qu.: 0.0000
## PS3 :1304 2010 :1257 Misc :1686 Median : 0.0800
## Wii :1290 2007 :1201 Role-Playing:1468 Mean : 0.2657
## X360 :1234 2011 :1136 Shooter :1282 3rd Qu.: 0.2400
## PSP :1197 2006 :1008 Adventure :1274 Max. :41.4900
## (Other):7005 (Other):8826 (Other) :5023
## EU_Sales JP_Sales Other_Sales Global_Sales
## Min. : 0.0000 Min. : 0.00000 Min. : 0.00000 Min. : 0.010
## 1st Qu.: 0.0000 1st Qu.: 0.00000 1st Qu.: 0.00000 1st Qu.: 0.060
## Median : 0.0200 Median : 0.00000 Median : 0.01000 Median : 0.170
## Mean : 0.1478 Mean : 0.07885 Mean : 0.04844 Mean : 0.541
## 3rd Qu.: 0.1100 3rd Qu.: 0.04000 3rd Qu.: 0.04000 3rd Qu.: 0.480
## Max. :29.0200 Max. :10.22000 Max. :10.57000 Max. :82.740
##
market_means <- data.frame(Mean = c(mean(data$NA_Sales), mean(data$EU_Sales), mean(data$JP_Sales), mean(data$Other_Sales), mean(data$Global_Sales)))
row.names(market_means) <- c("North America ", "Europe", "Japan", "Rest of the world ", "Worldwide")
market_means$Mean_round <- round(market_means$Mean ,digit=2)
market_means
## Mean Mean_round
## North America 0.26569534 0.27
## Europe 0.14776754 0.15
## Japan 0.07884939 0.08
## Rest of the world 0.04843679 0.05
## Worldwide 0.54102229 0.54
theme_set(theme_bw())
ggplot(data = market_means, mapping = aes(x=row.names(market_means), y=Mean_round)) +
geom_boxplot() + geom_segment(aes(x=row.names(market_means),
xend=row.names(market_means),
y=0,
yend=Mean_round)) +
geom_label(mapping = aes(label=Mean_round), fill = "darkblue", size = 3.5, color = "white", fontface = "bold", hjust=.5) +
ggtitle("Sales share on the Markets") +
xlab("Markets") +
ylab("Mean of Sales") +
labs(caption="source: vgsales.csv") +
theme(
plot.title = element_text(size = 24, hjust = .5, face = "bold"),
axis.title.x = element_text(size = 18, hjust = .5, face = "italic"),
axis.title.y = element_text(size = 18, hjust = .5, face = "italic"),
axis.text.x = element_text(size = 10, face = "bold", angle = 0),
axis.text.y = element_text(size = 10, face = "bold"),
legend.position = "none")
We can see that according to the graph, NA accounts for 1/2 of the world sales, EU following with about 1/4 and JP about 1/7.
We will construct a genre frequency
# Construct a frequency distribution, sum of the numbers in each category (Genre) based on how many times it shows up.
freq_genre <- data.frame(cbind(Frequency = table(data$Genre), Percent = prop.table(table(data$Genre)) * 100))
freq_genre <- freq_genre[order(freq_genre$Frequency, decreasing = T), ]
freq_genre
## Frequency Percent
## Action 3250 19.954565
## Sports 2304 14.146252
## Misc 1686 10.351814
## Role-Playing 1468 9.013324
## Shooter 1282 7.871308
## Adventure 1274 7.822189
## Racing 1225 7.521336
## Platform 875 5.372383
## Simulation 847 5.200467
## Fighting 836 5.132928
## Strategy 670 4.113710
## Puzzle 570 3.499724
# Plot
ggplot(data = freq_genre, mapping = aes(x = Frequency, y = row.names(freq_genre))) +
geom_bar(stat = "identity", mapping = aes(fill = row.names(freq_genre), color = row.names(freq_genre)), alpha = .7, size = 1.1) +
geom_label(mapping = aes(label=Frequency), fill = "darkblue", size = 3.5, color = "white", fontface = "bold", hjust=.5) +
ggtitle("Genre Frequency Distribution") +
xlab("Genres") +
ylab("Frequency") +
labs(caption="source: vgsales.csv") +
theme(
plot.title = element_text(size = 24, hjust = .5, face = "bold"),
axis.title.x = element_text(size = 18, hjust = .5, face = "italic"),
axis.title.y = element_text(size = 18, hjust = .5, face = "italic"),
axis.text.x = element_text(size = 10, face = "bold", angle = 0),
axis.text.y = element_text(size = 10, face = "bold"),
legend.position = "none")
We can see that action is the highest genre, which is accurate with our previous summary report. A side note, we can combine some of the genres, since some of the them fall into the same category, but we will leave it as it is for now.
We will use a heat graph to see how high the score of sales are based on genre.
options(repr.plot.width = 20, repr.plot.height = 5)
sales_comp_gen <- data %>%
select(Genre, NA_Sales, EU_Sales, JP_Sales, Other_Sales) %>%
group_by(Genre) %>%
summarise(NA_Sales = sum(NA_Sales),
EU_Sales = sum(EU_Sales),
JP_Sales = sum(JP_Sales),
Other_Sales = sum(Other_Sales))
sales_comp_gen <- pivot_longer(data = sales_comp_gen,
cols = c("NA_Sales", "EU_Sales", "JP_Sales", "Other_Sales"))
ggplot(data = sales_comp_gen, aes(x = name, y = Genre, fill = value))+
geom_tile(aes(fill = value))+
geom_text(aes(label = value), position = position_dodge(width = .1), color = "black")+
labs(title = "Sales Comparison by Genre",
subtitle = "Video Games Sales Data",
x = "Total Sales",
y = NULL,
fill = NULL)+
theme_minimal()+ labs(caption="source: vgsales.csv") +
theme(legend.position = "right")+
scale_fill_distiller(palette = "Spectral")
We can see that the genre score in action is very high in NA and EU follow by the sports genre. NA dominates all across the genres in sales.
We will group and combine the consoles based on it’s platform and company, to see which platform is played the most.
# Construct a frequency distribution, sum of the numbers in each category (Platform) based on how many times it shows up.
freq_platform <- data.frame(cbind(Frequency = table(data$Platform), Percent = prop.table(table(data$Platform)) * 100))
freq_platform <- freq_platform[order(freq_platform$Frequency, decreasing = T), ]
freq_platform
## Frequency Percent
## DS 2130 13.077914901
## PS2 2127 13.059495303
## PS3 1304 8.006385461
## Wii 1290 7.920427335
## X360 1234 7.576594830
## PSP 1197 7.349419783
## PS 1189 7.300300853
## PC 938 5.759194450
## XB 803 4.930312519
## GBA 786 4.825934795
## GC 542 3.327807454
## 3DS 499 3.063793209
## PSV 408 2.505065390
## PS4 335 2.056855161
## N64 316 1.940197704
## SNES 239 1.467428010
## XOne 213 1.307791490
## SAT 173 1.062196844
## WiiU 143 0.878000860
## 2600 116 0.712224474
## NES 98 0.601706883
## GB 97 0.595567017
## DC 52 0.319273040
## GEN 27 0.165776386
## NG 12 0.073678394
## SCD 6 0.036839197
## WS 6 0.036839197
## 3DO 3 0.018419598
## TG16 2 0.012279732
## GG 1 0.006139866
## PCFX 1 0.006139866
# Regroup platform as Platform_type
freq_platform$Platform = c('DS', 'PS2', 'PS3', 'Wii', 'X360', 'PSP', 'PS', 'PC', 'GBA', 'XB', 'GC', '3DS', 'PSV', 'PS4', 'N64', 'SNES', 'XOne', 'SAT', 'WiiU', '2600', 'NES', 'GB', 'DC', 'GEN', 'NG', 'SCD', 'WS', '3DO', 'TG16', 'GG', 'PCFX')
pc <- c("PC")
xbox <- c("X360", "XB", "XOne")
nintendo <- c("Wii", "WiiU", "N64", "GC", "NES", "3DS", "DS", "SNES", "GBA", "GB", "SCD")
playstation <- c("PS", "PS2", "PS3", "PS4", "PSP", "PSV")
platforms <- freq_platform %>%
mutate(Platform_type = ifelse(Platform %in% pc, "PC",
ifelse(Platform %in% xbox, "Xbox",
ifelse(Platform %in% nintendo, "Nintendo",
ifelse(Platform %in% playstation, "Playstation", "Others")))))
ggplot(data = platforms, mapping = aes(x = Frequency, y = Platform_type)) +
geom_bar(stat = "identity", mapping = aes(fill = Platform_type, color = Platform_type), alpha = 0.7, size = 0.3) +
ggtitle("Gaming Company Frequency Distribution") +
xlab("Frequency") +
ylab("Company") +
coord_flip() + labs(caption="source: vgsales.csv")+ theme(
plot.title = element_text(size = 19, hjust = .5, face = "bold"),
axis.title.x = element_text(size = 18, hjust = .5, face = "italic"),
axis.title.y = element_text(size = 18, hjust = .5, face = "italic"),
axis.text.x = element_text(size = 10, face = "bold", angle = 0),
axis.text.y = element_text(size = 10, face = "bold"),
legend.position = "none")
We can see that the platform/company Playstation and Nintendo are very dominant in the market. Nintendo is just trailing behind by a small margin, which looks about right according to the Video game history.
Visualizing the top publisher
data$Year <- as.Date(as.character(data$Year), format="%Y")
data$Year <- year(data$Year)
data$Name <- as.character(data$Name)
data.publisher.sales <- aggregate(
Global_Sales~Publisher+Year,
data,
sum
)
data.publisher.sales.clean <- aggregate(
Global_Sales~Publisher,
data,
sum
)
data.publisher.sales.clean <- data.publisher.sales.clean[
order(data.publisher.sales.clean$Global_Sales, decreasing=T),
]
ggplot(data.publisher.sales,
aes(
x=Global_Sales,
y=reorder(Publisher, Global_Sales),
fill=Year
)
) +
geom_bar(stat="identity") +
scale_fill_continuous(low="red", high="blue") +
scale_y_discrete(limits=head(data.publisher.sales.clean, 10)$Publisher) + labs(caption="source: vgsales.csv")+
labs(
y="Publisher",
x="Global Sales")
## Warning: Removed 2054 rows containing missing values (position_stack).
sales are Nintendo with viarity of games spanning from 1980’s to 2000’s. This is makes sense wince Wii games and Super Marios, Nintendo’s games
#Re-checking
total_sales_publisher <- aggregate.data.frame(x = list(Total_Sales = data$Global_Sales),
by = list(Publisher = data$Publisher),
FUN = sum)
total_sales_publisher <- total_sales_publisher[order(total_sales_publisher$Total_Sales, decreasing = T), ]
head(total_sales_publisher, 10)
## Publisher Total_Sales
## 368 Nintendo 1784.43
## 139 Electronic Arts 1093.39
## 17 Activision 721.41
## 463 Sony Computer Entertainment 607.28
## 531 Ubisoft 473.25
## 497 Take-Two Interactive 399.30
## 513 THQ 340.44
## 282 Konami Digital Entertainment 278.56
## 451 Sega 270.66
## 351 Namco Bandai Games 253.65
We re-check the data and it is accurate with the graph.
for the hypothesis, we will compare 3 of the markets, NA, JP and Global.
min(data$NA_Sales)
## [1] 0
max(data$NA_Sales)
## [1] 41.49
median(data$NA_Sales)
## [1] 0.08
# In the following data the ones having p-value <0.05 do not have a significant change but the rest change the sales significantly.]
fit <- lm( NA_Sales ~ Genre , data = data)
summary(fit)
##
## Call:
## lm(formula = NA_Sales ~ Genre, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.510 -0.235 -0.154 -0.015 41.199
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.265160 0.014323 18.513 < 2e-16 ***
## GenreAdventure -0.185152 0.026990 -6.860 7.14e-12 ***
## GenreFighting -0.001117 0.031665 -0.035 0.972
## GenreMisc -0.029739 0.024507 -1.213 0.225
## GenrePlatform 0.244543 0.031098 7.864 3.97e-15 ***
## GenrePuzzle -0.051107 0.037079 -1.378 0.168
## GenreRacing 0.026211 0.027375 0.957 0.338
## GenreRole-Playing -0.042749 0.025677 -1.665 0.096 .
## GenreShooter 0.183483 0.026930 6.813 9.87e-12 ***
## GenreSimulation -0.050862 0.031501 -1.615 0.106
## GenreSports 0.025678 0.022238 1.155 0.248
## GenreStrategy -0.163921 0.034645 -4.732 2.25e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8165 on 16275 degrees of freedom
## Multiple R-squared: 0.01519, Adjusted R-squared: 0.01452
## F-statistic: 22.82 on 11 and 16275 DF, p-value: < 2.2e-16
For our first hypothesis the data shows having a p-value <0.05, it does not have a significant change but the rest change the sales significantly. Therefore, there is no significant change in NA_Sales with respect to genre and platform.
min(data$JP_Sales)
## [1] 0
max(data$JP_Sales)
## [1] 10.22
median(data$JP_Sales)
## [1] 0
# In the following data the ones having p-value<0.05 do not have a significant change but the rest change the sales significantly.
fit <- lm( JP_Sales ~ Genre , data = data)
summary(fit)
##
## Call:
## lm(formula = JP_Sales ~ Genre, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.2386 -0.0633 -0.0488 -0.0288 9.9814
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.048812 0.005381 9.072 < 2e-16 ***
## GenreAdventure -0.008004 0.010139 -0.789 0.429900
## GenreFighting 0.055434 0.011895 4.660 3.19e-06 ***
## GenreMisc 0.014456 0.009206 1.570 0.116393
## GenrePlatform 0.100502 0.011683 8.603 < 2e-16 ***
## GenrePuzzle 0.050626 0.013929 3.635 0.000279 ***
## GenreRacing -0.002600 0.010284 -0.253 0.800406
## GenreRole-Playing 0.189778 0.009646 19.674 < 2e-16 ***
## GenreShooter -0.019031 0.010117 -1.881 0.059971 .
## GenreSimulation 0.026205 0.011834 2.214 0.026812 *
## GenreSports 0.009677 0.008354 1.158 0.246719
## GenreStrategy 0.024471 0.013015 1.880 0.060091 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3067 on 16275 degrees of freedom
## Multiple R-squared: 0.03354, Adjusted R-squared: 0.03289
## F-statistic: 51.35 on 11 and 16275 DF, p-value: < 2.2e-16
For our 2nd hypothesis the data shows having a p-value <0.05, it does not have a significant change but the rest change the sales significantly. Therefore, there is no significant change in JP_Sales with respect to genre.
min(data$Global_Sales)
## [1] 0.01
max(data$Global_Sales)
## [1] 82.74
median(data$Global_Sales)
## [1] 0.17
# In the following data the ones having p-value<0.05 do not have a significant change but the rest change the sales significantly.]
fit <- lm( Global_Sales ~ Genre , data = data)
summary(fit)
##
## Call:
## lm(formula = Global_Sales ~ Genre, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.938 -0.461 -0.310 -0.039 82.172
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.530102 0.027338 19.391 < 2e-16 ***
## GenreAdventure -0.345965 0.051516 -6.716 1.93e-11 ***
## GenreFighting 0.001059 0.060438 0.018 0.986
## GenreMisc -0.061614 0.046776 -1.317 0.188
## GenrePlatform 0.417476 0.059357 7.033 2.10e-12 ***
## GenrePuzzle -0.105172 0.070772 -1.486 0.137
## GenreRacing 0.063172 0.052251 1.209 0.227
## GenreRole-Playing 0.099183 0.049010 2.024 0.043 *
## GenreShooter 0.270366 0.051400 5.260 1.46e-07 ***
## GenreSimulation -0.070019 0.060125 -1.165 0.244
## GenreSports 0.038145 0.042445 0.899 0.369
## GenreStrategy -0.271490 0.066126 -4.106 4.05e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.559 on 16275 degrees of freedom
## Multiple R-squared: 0.01214, Adjusted R-squared: 0.01147
## F-statistic: 18.18 on 11 and 16275 DF, p-value: < 2.2e-16
Our last hypothesis, the data shows having a p-value <0.05, it does not have a significant change but the rest change the sales significantly. Therefore, there is also no significant change in Global_Sales with respect to genre and platform.
In conclusion, as a long time gamer, Im very impressed with the scores we were able to see in this report. They seem to be very accurate to the real world, credit to those that worked on the data.
We examine that on the heat image, the genre platform score high. This might brought up some confusion but if your a true gamer, you will take a quick notice that the genre is based on the console uniqueness, for example Nintendo and Playstation release games only available to their platforms, that is why you see it as genre. Based on the sales archieved, North America is the region that had the highest market and dominates across all genres. Most video games sales are played on Playstation and Nintendo with Action being the most dominant genre on all markets with the exception of Japan which seems to prefer Role playing genre.
We also acknowledge that the claims of our analysis are limited. Because the data was narrowed down- for example, we took out those publishers with NA, and the data was not fully complete, was missing “phone” as platform. Because all of our observations are made relative to a few region market sales and platforms, the results of our analysis would be expected to change if we were to use more regions and platforms. Thus we can say that our analysis claims are not absolute for all games.
In the future, it would be interesting to study the number of sales on different platforms and regions.