A video game is an electronic game that involves interaction with a user interface to generate visual feedback on a two- or three-dimensional video display device such as a touchscreen, virtual reality headset or monitor/TV set. Since the 1980s, video games have become an increasingly important part of the entertainment industry, and whether they are also a form of art is a matter of dispute.
This dataset contains a list of video games with sales greater than 100,000 copies. It was generated by a scrape of vgchartz.com.
Fields include
- Rank - Ranking of overall sales
- Name - The games name
- Platform - Platform of the games release (i.e. PC,PS4, etc.)
- Year - Year of the game’s release
- Genre - Genre of the game
- Publisher - Publisher of the game
- NA_Sales - Sales in North America (in millions)
- EU_Sales - Sales in Europe (in millions)
- JP_Sales - Sales in Japan (in millions)
- Other_Sales - Sales in the rest of the world (in millions)
- Global_Sales - Total worldwide sales.
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
Total row
## [1] 16598
Sample data
## Rank Name Platform Year Genre Publisher NA_Sales
## 1 1 Wii Sports Wii 2006 Sports Nintendo 41.49
## 2 2 Super Mario Bros. NES 1985 Platform Nintendo 29.08
## 3 3 Mario Kart Wii Wii 2008 Racing Nintendo 15.85
## 4 4 Wii Sports Resort Wii 2009 Sports Nintendo 15.75
## 5 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo 11.27
## 6 6 Tetris GB 1989 Puzzle Nintendo 23.20
## EU_Sales JP_Sales Other_Sales Global_Sales
## 1 29.02 3.77 8.46 82.74
## 2 3.58 6.81 0.77 40.24
## 3 12.88 3.79 3.31 35.82
## 4 11.01 3.28 2.96 33.00
## 5 8.89 10.22 1.00 31.37
## 6 2.26 4.22 0.58 30.26
## Rank Name Platform Year Genre Publisher
## 0 0 0 271 0 58
## NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
## 0 0 0 0 0
Data with N/A is remove
## Rank Name Platform Year Genre Publisher
## 0 0 0 0 0 0
## NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
## 0 0 0 0 0
## 'data.frame': 16291 obs. of 11 variables:
## $ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Name : Factor w/ 11493 levels "¡Shin Chan Flipa en colores!",..: 10991 9343 5531 10993 7364 9707 6648 10989 6651 2594 ...
## $ Platform : Factor w/ 31 levels "2600","3DO","3DS",..: 26 12 26 26 6 6 5 26 26 12 ...
## $ Year : Factor w/ 40 levels "1980","1981",..: 27 6 29 30 17 10 27 27 30 5 ...
## $ Genre : Factor w/ 12 levels "Action","Adventure",..: 11 5 7 11 8 6 5 4 5 9 ...
## $ Publisher : Factor w/ 579 levels "10TACLE Studios",..: 369 369 369 369 369 369 369 369 369 369 ...
## $ NA_Sales : num 41.5 29.1 15.8 15.8 11.3 ...
## $ EU_Sales : num 29.02 3.58 12.88 11.01 8.89 ...
## $ JP_Sales : num 3.77 6.81 3.79 3.28 10.22 ...
## $ Other_Sales : num 8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
## $ Global_Sales: num 82.7 40.2 35.8 33 31.4 ...
Convert year to date
Convert Name to character
Since the data from datasets is already ordered by Sales. We just need to plot existing data.
ggplot(
head(vgs, 10),
aes(
x=Global_Sales,
y=reorder(Name, Global_Sales),
fill=Year
)
) +
geom_bar(stat="identity") +
labs(
y="Name",
x="Global Sales"
)As seen above, Wii Sports is the top selling with Super Mario Bros in the second position and Mario Kart Wii in third.
In the given datasets, there’re sales data in Europe, Japan, North Americam other or overall (Global) sales
vgs.sales.by.year <- aggregate(
cbind(Global_Sales, Other_Sales, JP_Sales, EU_Sales, NA_Sales)~Year,
vgs,
sum
)
vgs.sales.by.year <- pivot_longer(
vgs.sales.by.year,
c('Global_Sales', 'Other_Sales', 'JP_Sales', 'EU_Sales', 'NA_Sales'),
)ggplot(
vgs.sales.by.year,
aes(
x=Year,
y=value,
color=factor(name)
)
) +
geom_line() +
labs(
color='Sales Type'
) North america is leading the market sales and grown significantly in late 90’s, followed by Europe and Japan. DISCLAIMER: The data drop includes the factor of incomplete and missing data from datasets.
vgs.publisher.sales.clean <- aggregate(
Global_Sales~Publisher,
vgs,
sum
)
vgs.publisher.sales.clean <- vgs.publisher.sales.clean[
order(vgs.publisher.sales.clean$Global_Sales, decreasing=T),
]
ggplot(
vgs.publisher.sales,
aes(
x=Global_Sales,
y=reorder(Publisher, Global_Sales),
fill=Year
)
) +
geom_bar(stat="identity") +
scale_fill_continuous(low="red", high="blue") +
scale_y_discrete(limits=head(vgs.publisher.sales.clean, 10)$Publisher) +
labs(
y="Platform",
x="Global Sales"
)## Warning: Removed 2055 rows containing missing values (position_stack).
Top ten publisher by sales are Nintendo with viarity of games spanning from 1980’s to 2000’s. This is makes sense wince Wii games and Super Marios, Nintendo’s games.
The most active published games were in the year 2008 and 2009 with the number of published popular games above 1250. The data goes back to the year of 80’s with a sudden spike of published game in early decade.
ggplot(
vgs,
aes(
x=Genre,
y=Global_Sales,
)
) +
geom_jitter(
alpha=0.1,
color="red"
) +
geom_label_repel(
data=head(vgs, 10),
aes(
label=Name
)
) +
labs(
y="Global Sales",
x="Genre"
) Most of games sits under 5 millions in sales.