Introduction

As an avid video gamer, I often rely on Metacritic to help me decide which games are worth buying. Metacritic reports an aggregated reviewer score for each video game that is the average of all individual reviewer scores. An example page can be viewed here, for the critically acclaimed Red Dead Redemption 2: https://www.metacritic.com/game/playstation-4/red-dead-redemption-2

Metacritic contains an extensive library of reviewer scores, dating all the way back to the 90s where Nintendo 64 and the first Sony Playstation were the most state-of-the-art consoles. Some of the most debated topics in video gaming include:

  1. Who won the console war in each generation (i.e., did Nintendo, Sony, or Microsoft have the most popular console)?
  2. How has the popularity of video game genres evolved over time?
  3. What are the best games of all time for each console, genre, and developer?
  4. How differently are games for each console, genre, and developer reviewed (i.e. well, poorly, polarizing)?
  5. Is there a reviewer bias in games for certain consoles, genres, or developers? For example, it is rumored that GameSpot gives lower scores to Nintendo games than other top reviewers, such as IGN.

What if I could help settle these debates by scraping Metacritic for every single video game review and creating a Shiny app to visualize and uncover trends?

Web Scraping

I used scrapy to pull information for every video game review on Metacritic. This included fields specific to each game (name, console, developer, genre, and release date) and field specific to each individual review (reviewer and score). Each of these fields was included in the “items.py” script.

The next step and the bulk of the work was creating the spider. Thankfully Metacritic did not have any security restrictions, so it was relatively easy to scrape. I constructed my spider using the following steps, using Chrome developer tools and scrapy shell at each step to uncover the correct xpaths:

  1. In start_requests, initialize a starting URL for each console. Metacritic has a page listing the first 50 games for each console (ps, n64, etc.), sorted descending by aggregated review score (top games appear first). Yield the URL and also the console name (as a meta variable) to the next function, since we’ll need the console name at the end.
  2. In parse, I specify the URL for each individual result page. Each console has an N number of pages based on the number of games reviewed for that console. Yield the URL and continue passing on the console name as meta.
  3. In parse_result_page, I go one level deeper by finding the URL for each individual game. Yield the URL and continue passing console name as meta.
  4. In parse_detail_page, I scrape the page yielded by parse_result_page for the name of the game, release date, developer, and genre. Then I find the URL to go one level deeper, which is the page that lists each individual review. Yield this URL, and pass on the console name, game name, release date, developer, and genre as meta.
  5. In parse_review_page, I scrape the reviewer and score for every single review. This is the last function, and all fields specified in items.py are yielded at the end.

In summary, the scraping flow is: Metacritic home page >> Page listing all games for each console >> Page with information for each game >> Page with information for each individual review

Since there were so many reviews (over 25 million!), the spider took a few hours to crawl.

All scrapy files can be viewed on GitHub here in folder “metacritic”: https://github.com/nwtang/NYCDSA_scrapy

Data Preparation

The scraped data was thankfully relatively clean and easy to work with. I did some simple preprocessing to pull just the year from the release date for each entry and save it as its own field, and to rename the console names to a more readable form (e.g. “ps” to “PlayStation”). The developer field required a bit more thinking and even some “domain knowledge” - some big developers have merged or changed names over the years (e.g. Bandai and Namco, SquareSoft to Square Enix), so I did a manual replacement for these.

The genre field was the trickiest, since Metacritic often lists multiple genres for each game. My goal was to pull out the most representative genre out of all listed genres. I noticed that some genres were more specific than others (e.g. the more specific “Fighting” genre may be listed with the more general “Action” genre), so for each game I tried to pull the more specific genres from the list first. Other examples of this include “Shooter”, “Sports”, “Platformer”, etc. This worked in classifying the majority of games, but some remained unclassified because they were only listed with the more general genres (“Action”, “Action-Adventure”, “Role-Playing”, etc.) For these games, I used the following hierarchy for assignment: first “Role-Playing”, then “Action-Adventure”, then “Action”. The last step was consolidating some of the genres together (e.g. “Beat-em-up” was reclassified as “Fighting”).

Shiny App and Data Visualization

To look into the questions I listed in the Introduction section, I created a Shiny app to provide an easy to use UI for data visualization. Let’s use each of the five tabs to explore:

1. Who won the console war in each generation?

For this question, I used a filled timeseries plot showing the fraction of total reviews for each console by year. This helps visualize the evolution of how many games are being released and reviewed over time.

We can make the following observations here:

  1. PC has always been relevant in the gaming world, and had a second “prime” from 2013-2016. Possibly due to both Sony and Microsoft transitioning consoles, and Nintendo being stuck with the weak Wii U.

  2. Sony has had the most consistent presence in the gaming industry, with its Playstation consoles taking at least 25% of total reviews year by year. Based on this plot, I’d say that won the 2nd gen console war with the PS2, and so far is winning the 4th gen console war with the PS4. It is neck-and-neck in the 1st and 3rd gens with the N64 and Xbox360, respectively.

  3. Nintendo has been up and down with the fraction of total reviews, and seems to have less of a presence than Sony and Microsoft. This is likely due to Nintendo’s tendency to favor a general audience with innovation and ease of use (such as motion controls with the Wii and Tv/handheld flexibility with the Switch) over more hardcore gamers, causing big name games to only release on Sony and Microsoft consoles.

  4. I would conclude here that this plot is useful to visualize the evolution of gaming over time, but we would have to look at overall console sales to determine who won the console war. The total number of reviews is a better measure of the quantity of games/reviews for each console, but does not necessarily correlate with sales.

2. How has the popularity of video game genres evolved over time?

I used the same kind of plot to look into this question, but with the genre field as the fill parameter rather than console.

This plot is a bit more complicated to look at, so I’ll walk through a few observations I made here:

  1. The Action Adventure genre made a huge spike in 1998 with the release of Legend of Zelda: Ocarina of Time, which led to more and more games of the same playstyle. This genre has seen more and more games over the years with the introduction of “sandbox” style games in the mid 2000s, such as GTA and The Elder Scrolls series.

  2. Another genre that made a huge leap in the late 90s was Shooters. Goldeneye (N64) was released in 1997, which was the very first FPS (first-person shooter). It revolutionized gaming, leading to FPSes becoming one of the most popular genres in the market (e.g. Halo, Call of Duty, etc.)

  3. Sports games made a huge leap in 2000 with the introduction of franchise modes in Madden, NBA Live, and FIFA, where a user can manage their own team over a 50-year simulation. Previously sports games only had one mode, which was just playing the matches themselves.

  4. RPGs have shifted more towards Action and Western and away from Japanese in recent years. JRPGs peaked in the late 90s with the release of Final Fantasy VII among other classics, and have died down in popularity in the US due to players favoring more action and open-world gameplay.

3. What are the best games of all time for each console, genre, and developer?

For this question I simply created a DataTable listing the top reviewed games that can be filtered by year, console, genre, and developer. The user of the app is free to select their favorite console/genre/developer, or any combination of the three. Below shows the top 10 ranked games for the N64.

How does this offer an upgrade over what you can already view on Metacritic? While Metacritic is able to show a list of top ranked games by console, the user is not able to view an entire list of games from selected consoles, and is not able to filter the list by genre and developer. This table gives the user more flexibility in exploring top ranked games.

4. How differently are games for each console, genre, and developer reviewed (i.e. well, poorly, polarizing)?

To look into this question, I used boxplots to compare review scores between either consoles, genres, developers, or reviewers (the user selects the variable plotted on the x-axis). Let’s first take a look at boxplots for all consoles:

One observation here is that Wii games were panned by reviewers compared to other consoles. But wasn’t the Wii one of the most successful consoles due to its appeal to casual gamers and good selection of games for more hardcore gamers? Let’s try to filter games by developer to weed out all of the “experimental” games that may have tanked the Wii’s average rating:

Here, I selected the top four Japanese developers known to publish games on Nintendo consoles. The Wii looks a lot more even with other consoles here, suggesting that the odd result we saw from the first plot may be due to smaller developers releasing experimental games targeted towards general audiences (such as horse breeding or fitness games) that ended up getting panned by reviewers.

Let’s take a look at genres now:

It’s clear that RPGs of all kinds are the highest rated, which is rewarding for the developers since RPGs tend to be the most complicated games that require the longest development period. Simulation and Action games tend to be the lowest rated, which also tend to have the highest fraction of “experimental” games.

Lastly, let’s pit some rival developers against each other:

To no surprise, Rockstar (with GTA and Red Dead Redemption) and Nintendo are way ahead of the pack.

5. Is there a reviewer bias in games for certain consoles, genres, or developers?

I used a scatterplot of review scores from Reviewer 1 vs. Reviewer 2 (e.g. GameSpot vs. IGN) to visualize trends here, and made reviews filterable by console, genre, and developer. The user is also able to change the color of points based on their selection of field. Let’s test some hypotheses, such as the rumor that GameSpot is biased against Nintendo. But first we should visualize a “control” scenario by plotting all GameSpot vs. IGN reviews against each other to confirm that on average, they seem to agree with each other:

The line running through the middle of the points represents a 1-to-1 line. Visually it looks like GameSpot is a bit tougher than IGN (with most points falling below the 1-1 line, i.e. IGN tends to review games more favorably). Now let’s take a look at games developed by Nintendo only, and color by console:

Now this is interesting - it is clear that GameSpot tends to review Nintendo games more poorly than IGN. Is this just a recent trend, or has GameSpot always reviewed Nintendo games like this? We could use the year slider here, but I think it’s easier to interpret what we already have with the points colored by console. It looks like GameSpot lightened up on Nintendo games a bit during the Wii era, but has continued to be tough during all other generations. How about by genre?

GameSpot seems to be more favorable towards Nintendo’s “Miscellaneous” and “Platformer” games (maybe Mario and Animal Crossing)? Most or all games from other genres still tend to be rated higher by IGN. So based on our findings here, we can support the rumor that GameSpot tends to be tougher on Nintendo games than IGN.

Future Work

If I had more time, I would be interested in developing a model that predicts how a reviewer would receive a future game given the genre and developer. I would also dig into how to obtain additional features such as total sales and revenue for each game, total amount spent on advertising, and price history. I would also use existing data to try to uncover quantitative trends (e.g. regression, correlation) to help bolster the existing arguments that are supported with existing data visualization.

Closing

The examples in this write-up barely scrape the surface of what users may uncover about the history of video gaming reviews. Metacritic is a great resource for digging into these topics, and scraping the site for all reviews allows us to present the data with more flexibility. Thank you for reading, and I encourage you to play around with the app to discover your own insights!

R Code

preprocess.R Takes raw scrapy output and formats it for the shiny app.

library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)

metacritic <- read.csv(file="./metacritic_reviews.csv")

# Create column for year, parse genre into three columns
metacritic <- metacritic %>% mutate(year=lubridate::year(as.POSIXct(release_date,format="%m/%d/%Y"))) %>%
  separate(genre,c("genre1","genre2","genre3"),",") %>% mutate(genre2=trimws(genre2),genre3=trimws(genre3)) %>%
  filter(year>1995)

# Assign a more general genre based on the following rules:
# Anything in genres_lv1 gets assigned if it's found in the three columns
# After that, assignments go in the following order if found: Action RPG, RPG, Action Adventure, Action
# Only 8 NAs are left after this, all can be assigned to Action
genres_lv1 <- c("Platformer","Shooter","Fighting","Beat-'Em-Up","Horror","Rhythm","Simulation",
                "Strategy","Racing","Driving","Sports","Miscellaneous","Adventure",
                "Massively Multiplayer Online","Massively Multiplayer","MOBA","Western-Style",
                "Japanese-Style","Console-style RPG","PC-style RPG","Puzzle")
metacritic <- metacritic %>% mutate(genre_final = case_when(
  metacritic$genre1 %in% genres_lv1 ~ metacritic$genre1,
  (!(metacritic$genre1 %in% genres_lv1) & (metacritic$genre2 %in% genres_lv1)) ~ metacritic$genre2,
  (!(metacritic$genre1 %in% genres_lv1) & !(metacritic$genre2 %in% genres_lv1) & 
     (metacritic$genre3 %in% genres_lv1)) ~ metacritic$genre3,
  (!(metacritic$genre1 %in% genres_lv1) & !(metacritic$genre2 %in% genres_lv1) & 
     !(metacritic$genre3 %in% genres_lv1)) & ((metacritic$genre1 == "Action RPG") | 
                                                (metacritic$genre2 == "Action RPG") | (metacritic$genre3 == "Action RPG")) ~ "Action RPG",
  (!(metacritic$genre1 %in% genres_lv1) & !(metacritic$genre2 %in% genres_lv1) & 
     !(metacritic$genre3 %in% genres_lv1)) & ((metacritic$genre1 == "Role-Playing") | 
                                                (metacritic$genre2 == "Role-Playing") | (metacritic$genre3 == "Role-Playing")) ~ "Role-Playing",
  (!(metacritic$genre1 %in% genres_lv1) & !(metacritic$genre2 %in% genres_lv1) & 
     !(metacritic$genre3 %in% genres_lv1)) & ((metacritic$genre1 == "Action Adventure") | 
                                                (metacritic$genre2 == "Action Adventure") | (metacritic$genre3 == "Action Adventure")) ~ 
    "Action Adventure",
  (!(metacritic$genre1 %in% genres_lv1) & !(metacritic$genre2 %in% genres_lv1) & 
     !(metacritic$genre3 %in% genres_lv1)) & ((metacritic$genre1 == "Action") | 
                                                (metacritic$genre2 == "Action") | (metacritic$genre3 == "Action")) ~ "Action")
)
metacritic$genre_final <- metacritic$genre_final %>% replace_na("Action")

# Rename some of the genres for grouping
metacritic <- metacritic %>% select(-genre1,-genre2,-genre3,genre=genre_final)
metacritic$genre <- recode(metacritic$genre, 
                           "Beat-'Em-Up"="Fighting",
                           "Massively Multiplayer Online"="Strategy",
                           "Massively Multiplayer"="Strategy",
                           "MOBA"="Strategy",
                           "Console-style RPG"="Japanese RPG",
                           "Japanese-Style"="Japanese RPG",
                           "Driving"="Racing",
                           "Western-Style"="Western RPG",
                           "PC-style RPG"="Western RPG",
                           "Role-Playing"="Western RPG",
                           "Rhythm"="Miscellaneous")

# Reformat console names
metacritic$console <- recode(metacritic$console,
                             "pc"="PC","n64"="N64","gamecube"="Gamecube","wii"="Wii","wii-u"="Wii U",
                             "switch"="Switch","dreamcast"="Dreamcast","ps"="Playstation",
                             "ps2"="Playstation 2","ps3"="Playstation 3",
                             "ps4"="Playstation 4","xbox"="Xbox",
                             "xbox360"="Xbox 360","xboxone"="Xbox One")

# Group duplicate developer names for big developers
dev_replace <- function(df,detected,replaced) {
  return(df %>% mutate(developer=replace(developer,str_detect(developer,detected),replaced)))
}

metacritic <- metacritic %>% 
  dev_replace("Nintendo","Nintendo") %>% 
  dev_replace("Bandai","BandaiNamcoGames") %>%
  dev_replace("Namco","BandaiNamcoGames") %>%
  dev_replace("EA","ElectronicArts") %>%
  dev_replace("Rockstar","RockstarGames") %>%
  dev_replace("Ubisoft","Ubisoft") %>%
  dev_replace("SquareEnix","SquareEnix") %>%
  dev_replace("SquareSoft","SquareEnix") %>%
  dev_replace("Sony","SonyInteractiveEntertainment") %>%
  dev_replace("SCE","SonyInteractiveEntertainment") %>%
  dev_replace("Sega","Sega") %>%
  dev_replace("Microsoft","MicrosoftGameStudios") %>%
  dev_replace("Konami","Konami") %>%
  dev_replace("Koei","Koei") %>%
  dev_replace("Capcom","Capcom") %>%
  dev_replace("Atlus","Atlus")

write.csv(metacritic,file="metacritic_reviews_final.csv")

ui.R

library(shiny)
library(shinydashboard)
library(dplyr)
library(tidyr)
library(ggplot2)
library(DT)


years <- c(1996:2019)
consoles <- c("PC","N64","Gamecube","Wii","Wii U",
              "Switch","Dreamcast","Playstation",
              "Playstation 2","Playstation 3",
              "Playstation 4","Xbox",
              "Xbox 360","Xbox One")
genres <- c("Action","Action Adventure","Adventure",
            "Fighting","Platformer","Shooter",
            "Horror","Action RPG",
            "Western RPG","Japanese RPG","Strategy",
            "MMO","Sports","Racing","Simulation",
            "Puzzle","Miscellaneous")
metacritic <- read.csv(file="./metacritic_reviews_final.csv")
developers <- metacritic %>% group_by(developer) %>% summarise(n=n()) %>% 
  arrange(desc(n)) %>% filter(n>850) %>% mutate_if(is.factor,as.character) %>% .$developer
reviewers <- metacritic %>% group_by(reviewer) %>% summarise(n=n()) %>% 
  arrange(desc(n)) %>% filter(n>1400) %>% mutate_if(is.factor,as.character) %>% .$reviewer

shinyUI(dashboardPage(
  dashboardHeader(title = "Metacritic Games"),
  dashboardSidebar(sidebarMenu(
    menuItem("Consoles over the Years",tabName="tsconsole",icon=icon("gamepad")),
    menuItem("Genres over the Years",tabName="tsgenre",icon=icon("shapes")),
    menuItem("Video Games Ranked",tabName="rankings",icon=icon("list-ol")),
    menuItem("Boxplots of Review Scores",tabName="boxplots",icon=icon("box")),
    menuItem("Reviewer vs. Reviewer Plots",tabName="scatter",icon=icon("chart-bar"))
  )),
  dashboardBody(
    tabItems(
      tabItem(tabName = "tsconsole",
              box(title="Percentage of Reviews by Console",status="primary",solidHeader=TRUE,
                  plotOutput("ts_console",height=800),width=12)
              ),
      tabItem(tabName = "tsgenre",
              box(title="Percentage of Reviews by Genre",status="primary",solidHeader=TRUE,
                  plotOutput("ts_genre",height=800),width=12)
      ),
      tabItem(tabName = "rankings",
              box(title="Filters",status="info",solidHeader=TRUE,
                  sliderInput("rank_year",label=h5(strong("Year")),min=1996,max=2019,
                              value=c(1996,2019),sep=""),
                  selectizeInput(inputId="rank_console",
                                 label="Console",
                                 choices=c("All",consoles)),
                  selectizeInput(inputId="rank_genre",
                                 label="Genre",
                                 choices=c("All",genres)),
                  selectizeInput(inputId="rank_dev",
                                 label="Developer",
                                 choices=c("All",developers),
                                 multiple=TRUE,
                                 options=list(maxItems=1),
                                 selected="All"),
                  width=3),
              box(title="Rankings",status="primary",solidHeader=TRUE,dataTableOutput("rank_table"),
                  width=9)
              ),
      tabItem(tabName = "boxplots",
              box(title="Filters",status="info",solidHeader=TRUE,
                  radioButtons("box_x","X-Axis Variable:",
                               c("Console"="con",
                                 "Genre"="gen",
                                 "Developer"="dev",
                                 "Reviewer"="rev")),
                  sliderInput("box_year",label=h5(strong("Year")),min=1996,max=2019,
                              value=c(1996,2019),sep=""),
                  selectizeInput(inputId="box_console",
                                 label="Console",
                                 choices=c(consoles),
                                 multiple=TRUE,
                                 options=list(maxItems=length(consoles),placeholder="Leave blank to plot all consoles")
                                 ),
                  selectizeInput(inputId="box_genre",
                                 label="Genre",
                                 choices=c(genres),
                                 multiple=TRUE,
                                 options=list(maxItems=length(genres),placeholder="Leave blank to plot all genres")
                                 ),
                  selectizeInput(inputId="box_dev",
                                 label="Developer",
                                 choices=c(developers),
                                 multiple=TRUE,
                                 options=list(maxItems=10,placeholder="Leave blank to plot only top developers")
                                 ),
                  selectizeInput(inputId="box_rev",
                                 label="Reviewer",
                                 choices=c(reviewers),
                                 multiple=TRUE,
                                 options=list(maxItems=10,placeholder="Leave blank to plot only most frequent reviewers")
                                 ),
                  width=3),
              box(title="Boxplots",status="primary",solidHeader=TRUE,plotOutput("boxplot",height=800),
                  width=9)),
      tabItem(tabName = "scatter",
              box(title="Filters",status="info",solidHeader=TRUE,
                  selectizeInput(inputId="scat_rev1",
                                 label="Reviewer 1",
                                 choices=reviewers,
                                 multiple=TRUE,
                                 options=list(maxItems=1),
                                 selected="IGN"),
                  selectizeInput(inputId="scat_rev2",
                                 label="Reviewer 2",
                                 choices=reviewers,
                                 multiple=TRUE,
                                 options=list(maxItems=1),
                                 selected="GameSpot"),
                  sliderInput("scat_year",label=h5(strong("Year")),min=1996,max=2019,
                              value=c(1996,2019),sep=""),
                  selectizeInput(inputId="scat_console",
                                 label="Console",
                                 choices=c(consoles),
                                 multiple=TRUE,
                                 options=list(maxItems=length(consoles),placeholder="Leave blank to plot all consoles")
                  ),
                  selectizeInput(inputId="scat_genre",
                                 label="Genre",
                                 choices=c(genres),
                                 multiple=TRUE,
                                 options=list(maxItems=length(genres),placeholder="Leave blank to plot all genres")
                  ),
                  selectizeInput(inputId="scat_dev",
                                 label="Developer",
                                 choices=c(developers),
                                 multiple=TRUE,
                                 options=list(maxItems=length(developers),placeholder="Leave blank to plot all developers")
                  ),
                  radioButtons("scat_color","Color by:",
                               c("None"="none",
                                 "Console"="con",
                                 "Genre"="gen"
                                 )
                               ),
                  width=3),
              box(title="Reviewer vs. Reviewer Scatterplot",status="primary",solidHeader=TRUE,plotOutput("scatter",height=800),
                  width=9)
      )
    )
  )
))

server.R

library(shiny)
library(shinydashboard)
library(dplyr)
library(tidyr)
library(ggplot2)
library(lubridate)
library(DT)
library(forcats)

metacritic <- read.csv(file="./metacritic_reviews_final.csv")
developers <- metacritic %>% group_by(developer) %>% summarise(n=n()) %>% 
  arrange(desc(n)) %>% filter(n>850) %>% mutate_if(is.factor,as.character) %>% .$developer
reviewers <- metacritic %>% group_by(reviewer) %>% summarise(n=n()) %>% 
  arrange(desc(n)) %>% filter(n>1400) %>% mutate_if(is.factor,as.character) %>% .$reviewer
# metacritic$console <- factor(metacritic$console,levels=c("PC","N64","Gamecube","Wii","Wii U",
#                                                          "Switch","Dreamcast","Playstation",
#                                                          "Playstation 2","Playstation 3",
#                                                          "Playstation 4","Xbox",
#                                                          "Xbox 360","Xbox One"))
# metacritic$genre <- factor(metacritic$genre,levels=c("Action","Action Adventure","Adventure",
#                                                      "Fighting","Platformer","Shooter",
#                                                      "Horror","Action RPG",
#                                                      "Western RPG","Japanese RPG","Strategy",
#                                                      "MMO","Sports","Racing","Simulation",
#                                                      "Puzzle","Miscellaneous"))

shinyServer(function(input, output) {
  # Plot for console timeseries
   output$ts_console <- renderPlot({
     metacritic_by_year <- metacritic %>% group_by(year) %>% summarise(n_year=n())
     metacritic_group <- metacritic %>% group_by(year,console) %>% summarise(n=n()) %>% 
       inner_join(metacritic_by_year,by="year") %>% mutate(pct = n/n_year) %>% select(-n,-n_year) %>% 
       spread(key=console,value=pct,fill=0) %>% gather(key=console,value=pct,-year)
     metacritic_group$console <- factor(metacritic_group$console,levels=c("PC","N64","Gamecube","Wii","Wii U",
                                                                          "Switch","Dreamcast","Playstation",
                                                                          "Playstation 2","Playstation 3",
                                                                          "Playstation 4","Xbox",
                                                                          "Xbox 360","Xbox One"))
     metacritic_group %>% 
       ggplot(aes(x=year,y=pct,group=console))+geom_area(aes(x=year,y=pct,fill=console))+
       labs(fill='console')+scale_x_continuous(name="Year",limits=c(1996,2019),breaks=seq(1996,2020,2))+
       scale_y_continuous(name="Fraction of Total Reviews",limits=c(0,1),breaks=seq(0,1,0.25))+
       theme(text = element_text(size=16))
   })
   
   # Plot for genre timeseries
   output$ts_genre <- renderPlot({
     metacritic_by_year <- metacritic %>% group_by(year) %>% summarise(n_year=n())
     metacritic_group <- metacritic %>% group_by(year,genre) %>% summarise(n=n()) %>% 
       inner_join(metacritic_by_year,by="year") %>% mutate(pct = n/n_year) %>% select(-n,-n_year) %>% 
       spread(key=genre,value=pct,fill=0) %>% gather(key=genre,value=pct,-year)
     metacritic_group$genre <- factor(metacritic_group$genre,levels=c("Action","Action Adventure","Adventure",
                                                                      "Fighting","Platformer","Shooter",
                                                                      "Horror","Action RPG",
                                                                      "Western RPG","Japanese RPG","Strategy",
                                                                      "MMO","Sports","Racing","Simulation",
                                                                      "Puzzle","Miscellaneous"))
     metacritic_group %>% 
       ggplot(aes(x=year,y=pct,group=genre))+geom_area(aes(x=year,y=pct,fill=genre))+
       labs(fill='genre')+scale_x_continuous(name="Year",limits=c(1996,2019),breaks=seq(1996,2020,2))+
       scale_y_continuous(name="Fraction of Total Reviews",limits=c(0,1),breaks=seq(0,1,0.25))+
       theme(text = element_text(size=16))
   })

   # Function to determine which variables to filter for rankings
   filter_rank <- function(df,vr,val) {
     if (val=="All") {
       return(df)
     } else {
       return(df %>% filter(UQ(as.symbol(vr))==val))
     }
   }
   
   # Rankings table
   rank_start = reactive({input$rank_year[1]})
   rank_end = reactive({input$rank_year[2]})
   rank_console = reactive({input$rank_console})
   rank_genre = reactive({input$rank_genre})
   rank_dev = reactive({input$rank_dev})
   output$rank_table <- renderDataTable({datatable(
     metacritic %>% filter_rank("console",rank_console()) %>% filter_rank("genre",rank_genre()) %>% 
       filter_rank("developer",rank_dev()) %>% filter(year>=rank_start() & year<=rank_end()) %>%
       group_by(console,developer,game,year,genre) %>% 
       summarise(AvgScore = round(mean(score,na.rm=TRUE),1)) %>% arrange(desc(AvgScore)) %>%
       select(Year=year,Console=console,Game=game,Developer=developer,Genre=genre,AvgScore)
   )})
   
   # Function for filtering boxplots
   filter_box <- function(df,vr,val,limit=NULL,lf=NULL) {
     if (is.null(val)) {
       if (is.null(limit)) {
         return(df)
       } else {
         return(df %>% filter(UQ(as.symbol(vr)) %in% lf[1:limit]))
       }
     } else {
       return(df %>% filter(UQ(as.symbol(vr)) %in% val))
     }
   }
   
   # Boxplots
   box_start = reactive({input$box_year[1]})
   box_end = reactive({input$box_year[2]})
   box_console = reactive({input$box_console})
   box_genre = reactive({input$box_genre})
   box_dev = reactive({input$box_dev})
   box_rev = reactive({input$box_rev})
   output$boxplot <- renderPlot({
      metacritic %>% filter_box("console",box_console()) %>%
         filter_box("genre",box_genre()) %>%
         filter_box("developer",box_dev()) %>%
         filter_box("reviewer",box_rev()) %>%
         filter(year>=box_start() & year<=box_end()) %>%
         ggplot(aes(x=console,y=score,fill=console,alpha=0.5))+
         geom_boxplot(aes(show.legend=FALSE))+
         xlab("Console")+ylab("Review Score")+
         theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust = 1), text = element_text(size=16))}) 
   observeEvent(c(input$box_console,input$box_genre,input$box_dev,input$box_rev,input$box_x), {
     output$boxplot <- renderPlot({
       switch(input$box_x,
              con = metacritic %>% filter_box("console",box_console()) %>%
                filter_box("genre",box_genre()) %>%
                filter_box("developer",box_dev()) %>%
                filter_box("reviewer",box_rev()) %>%
                filter(year>=box_start() & year<=box_end()) %>%
                ggplot(aes(x=console,y=score,fill=console,alpha=0.5))+
                geom_boxplot(show.legend=FALSE)+
                xlab("Console")+ylab("Review Score")+
                theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust = 1), text = element_text(size=16)),
              gen = metacritic %>% filter_box("console",box_console()) %>%
                filter_box("genre",box_genre()) %>%
                filter_box("developer",box_dev()) %>%
                filter_box("reviewer",box_rev()) %>%
                filter(year>=box_start() & year<=box_end()) %>%
                ggplot(aes(x=genre,y=score,fill=genre,alpha=0.5))+
                geom_boxplot(show.legend=FALSE)+
                xlab("genre")+ylab("Review Score")+
                theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust = 1), text = element_text(size=16)),
              dev = metacritic %>% filter_box("console",box_console()) %>%
                filter_box("genre",box_genre()) %>%
                filter_box("developer",box_dev(),limit=10,lf=developers) %>%
                filter_box("reviewer",box_rev()) %>%
                filter(year>=box_start() & year<=box_end()) %>%
                ggplot(aes(x=developer,y=score,fill=developer,alpha=0.5))+
                geom_boxplot(show.legend=FALSE)+
                xlab("Developer")+ylab("Review Score")+
                theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust = 1), text = element_text(size=16)),
              rev = metacritic %>% filter_box("console",box_console()) %>%
                filter_box("genre",box_genre()) %>%
                filter_box("developer",box_dev()) %>%
                filter_box("reviewer",box_rev(),limit=10,lf=reviewers) %>%
                filter(year>=box_start() & year<=box_end()) %>%
                ggplot(aes(x=reviewer,y=score,fill=reviewer,alpha=0.5))+
                geom_boxplot(show.legend=FALSE)+
                xlab("Reviewer")+ylab("Review Score")+
                theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust = 1), text = element_text(size=16))
       )
     })
  })
   
   # Scatter plots
   scat_start = reactive({input$scat_year[1]})
   scat_end = reactive({input$scat_year[2]})
   scat_console = reactive({input$scat_console})
   scat_genre = reactive({input$scat_genre})
   scat_dev = reactive({input$scat_dev})
   scat_rev1 = reactive({input$scat_rev1})
   scat_rev2 = reactive({input$scat_rev2})
   output$scatter <- renderPlot({
      switch(input$scat_color,
             none=metacritic %>% filter_box("console",scat_console()) %>%
                filter_box("genre",scat_genre()) %>%
                filter_box("developer",scat_dev()) %>%
                filter_box("reviewer",c(scat_rev1(),scat_rev2())) %>%
                filter(year>=scat_start() & year<=scat_end()) %>%
                select(-X) %>% pivot_wider(names_from=reviewer,values_from=score,values_fn=list(score=mean)) %>%
                ggplot(aes(x=UQ(as.symbol(scat_rev1())),y=UQ(as.symbol(scat_rev2()))))+geom_point(size=2.5)+geom_abline()+
                theme(text = element_text(size=16)),
            con=metacritic %>% filter_box("console",scat_console()) %>%
               filter_box("genre",scat_genre()) %>%
               filter_box("developer",scat_dev()) %>%
               filter_box("reviewer",c(scat_rev1(),scat_rev2())) %>%
               filter(year>=scat_start() & year<=scat_end()) %>%
               select(-X) %>% pivot_wider(names_from=reviewer,values_from=score,values_fn=list(score=mean)) %>%
               ggplot(aes(x=UQ(as.symbol(scat_rev1())),y=UQ(as.symbol(scat_rev2()))))+
               geom_point(aes(color=console), size=2.5)+geom_abline()+
               theme(text = element_text(size=16)),
            gen=metacritic %>% filter_box("console",scat_console()) %>%
               filter_box("genre",scat_genre()) %>%
               filter_box("developer",scat_dev()) %>%
               filter_box("reviewer",c(scat_rev1(),scat_rev2())) %>%
               filter(year>=scat_start() & year<=scat_end()) %>%
               select(-X) %>% pivot_wider(names_from=reviewer,values_from=score,values_fn=list(score=mean)) %>%
               ggplot(aes(x=UQ(as.symbol(scat_rev1())),y=UQ(as.symbol(scat_rev2()))))+
               geom_point(aes(color=genre), size=2.5)+geom_abline()+
               theme(text = element_text(size=16))
            )
   })
})