library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readr)
While there is a wealth of online discourse reviewing and ranking videogames over generations of consoles, there seems to be a lack of academic research into one of the most reputatble hubs for videogame news and reviews, Metacritic. Further motivation for this research has come from the recent digression in public opinion towards the videogame industry and its main actors. There are a couple research questions I intend to analyze going into this research project, and I'm sure I will uncover more as I continue.
This research will explore trends in Metacritic reviews compared to user reviews, and by examining these trends, we can make predictions about the future vitality of the videogame industry. It could also shed light into the shifting public opinion, determining whether or not it is validated by user and Metacritic reviews.
The data being used in this research is open source from Kaggle, provided by the user named Deep Contractor. He originally collected the data from the Metacritic website, speficically a section containing the scores of console and PC games released from 2001 to 2021. The dataset contains a total of 18,800 games with six variables, name of the game, its platform, release date, a summary of the game, the games metacritic score, and its user review. For the purposes of my research I will mainly focus on release date, metacritic score and usser review.
# Reading in Data
all_games <- read_csv("all_games.csv")
## Rows: 18800 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name, platform, release_date, summary, user_review
## dbl (1): meta_score
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Summarizing data
summary(all_games)
## name platform release_date summary
## Length:18800 Length:18800 Length:18800 Length:18800
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## meta_score user_review
## Min. :20.00 Length:18800
## 1st Qu.:64.00 Class :character
## Median :72.00 Mode :character
## Mean :70.65
## 3rd Qu.:80.00
## Max. :99.00
# Range of release dates
range(all_games$release_date)
## [1] "April 1, 2001" "September 9, 2021"
# Mean Metacritic score
mean(all_games$meta_score)
## [1] 70.64888
# Converting user review from a character vector to a numeric
user_score <- as.numeric(all_games$user_review)
## Warning: NAs introduced by coercion
# Calculating mean user review
mean(user_score, na.rm = TRUE)
## [1] 6.990846
# Standard deviation and variance of Metacritic score
sd(all_games$meta_score)
## [1] 12.22501
var(all_games$meta_score)
## [1] 149.4508
# Standard deviation for user reviews
sd(user_score, na.rm = TRUE)
## [1] 1.351554
var(user_score, na.rm = TRUE)
## [1] 1.826698