library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readr)

Research Question

While there is a wealth of online discourse reviewing and ranking videogames over generations of consoles, there seems to be a lack of academic research into one of the most reputatble hubs for videogame news and reviews, Metacritic. Further motivation for this research has come from the recent digression in public opinion towards the videogame industry and its main actors. There are a couple research questions I intend to analyze going into this research project, and I'm sure I will uncover more as I continue.

Hypothesis

This research will explore trends in Metacritic reviews compared to user reviews, and by examining these trends, we can make predictions about the future vitality of the videogame industry. It could also shed light into the shifting public opinion, determining whether or not it is validated by user and Metacritic reviews.

Descriptive Statistics

The data being used in this research is open source from Kaggle, provided by the user named Deep Contractor. He originally collected the data from the Metacritic website, speficically a section containing the scores of console and PC games released from 2001 to 2021. The dataset contains a total of 18,800 games with six variables, name of the game, its platform, release date, a summary of the game, the games metacritic score, and its user review. For the purposes of my research I will mainly focus on release date, metacritic score and usser review.

# Reading in Data
all_games <- read_csv("all_games.csv")
## Rows: 18800 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name, platform, release_date, summary, user_review
## dbl (1): meta_score
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Summarizing data
summary(all_games)
##      name             platform         release_date         summary         
##  Length:18800       Length:18800       Length:18800       Length:18800      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    meta_score    user_review       
##  Min.   :20.00   Length:18800      
##  1st Qu.:64.00   Class :character  
##  Median :72.00   Mode  :character  
##  Mean   :70.65                     
##  3rd Qu.:80.00                     
##  Max.   :99.00
# Range of release dates
range(all_games$release_date)
## [1] "April 1, 2001"     "September 9, 2021"
# Mean Metacritic score
mean(all_games$meta_score)
## [1] 70.64888
# Converting user review from a character vector to a numeric 
user_score <- as.numeric(all_games$user_review)
## Warning: NAs introduced by coercion
# Calculating mean user review
mean(user_score, na.rm = TRUE)
## [1] 6.990846
# Standard deviation and variance of Metacritic score
sd(all_games$meta_score)
## [1] 12.22501
var(all_games$meta_score)
## [1] 149.4508
# Standard deviation for user reviews
sd(user_score, na.rm = TRUE)
## [1] 1.351554
var(user_score, na.rm = TRUE)
## [1] 1.826698