If a question asks for any calculations (means, medians, tables, proportions, etc…) or graphs, make sure they appear in the knitted document

The final document should not show any warnings

Data Description

The video game data set (vg) has 11 variables on over 64000 video games. The important variables are:

  1. title: the name of the game
  2. genre: the type of game (Action/RPG/Shooter/Sports/Other)
  3. platform: The platform the game is available to play on (Nintendo, Playstation, Xbox, PC, Other)
  4. year: The year the game was released on the platform
  5. critic_score: The critical score for the game, on a scale from 1 to 10
  6. *_sales*: The number of games sold in the corresponding region (na = North America, jp = Japan, eu = Europe, other = none of the other listed locations)

Question 1: Wrangling the data set

Create a data set that only has games from 1997 to 2018, with the following columns:

1) title: the name of the game 2) genre: the type of game (Action/RPG/Shooter/Sports/Other) 3) platform: The platform the game is available to play on (Nintendo, Playstation, Xbox, PC, Other) 4) year: The year the game was released on the platform 5) critic_score: The critical score for the game, on a scale from 1 to 10 6) total_sales: The total number of the game sold across all regions

It shouldn’t have any missing values for year or total_sales, but is fine for critic_score to be missing.

Save it as vg1.

If done correctly, it should have 2,155 rows

Display the results using tibble(vg1)

vg1 <- 
  vg |> 
  # Keeping just the games released between 1997 and 2018
  filter(between(year, 1997, 2018)) |> 
  # Creating the total sales column
  mutate(total_sales = na_sales + jp_sales + eu_sales + other_sales) |> 
  # Keeping just the specified columns
  dplyr::select(title, genre, platform, year, critic_score, total_sales) |> 
  # Removing the rows with the corresponding missing values
  filter(
    !is.na(total_sales),
    !is.na(year)
  )

tibble(vg1)
## # A tibble: 2,155 × 6
##    title                          genre  platform  year critic_score total_sales
##    <chr>                          <chr>  <chr>    <int>        <dbl>       <dbl>
##  1 Grand Theft Auto V             Action Playsta…  2013          9.4        20.3
##  2 Grand Theft Auto V             Action Playsta…  2014          9.7        19.4
##  3 Grand Theft Auto: Vice City    Action Playsta…  2002          9.6        16.2
##  4 Grand Theft Auto V             Action Xbox      2013         NA          15.9
##  5 Call of Duty: Black Ops 3      Shoot… Playsta…  2015          8.1        15.1
##  6 Call of Duty: Modern Warfare 3 Shoot… Xbox      2011          8.7        14.8
##  7 Call of Duty: Black Ops        Shoot… Xbox      2010          8.8        14.7
##  8 Red Dead Redemption 2          Action Playsta…  2018          9.8        13.9
##  9 Call of Duty: Black Ops II     Shoot… Xbox      2012          8.4        13.9
## 10 Call of Duty: Black Ops II     Shoot… Playsta…  2012          8          13.8
## # ℹ 2,145 more rows

Question 2) Total Sales by Year and Platform

Using the vg1 data set, create a data frame that has the number of games (games), overall sales (overall_sales) and average critic score (critic_avg) for each year and platform combination. It should only have the three main video game consoles: Nintendo, Playstation, and Xbox (No PC or Other). Make sure they are arranged in year then platform order

Save the results as vg2. Display the results using tibble(vg2). If done correctly, it should have 61 rows

vg2 <- 
  vg1 |> 
  # Keeping just the 3 specified platforms
  filter(
    platform %in% c("Nintendo", "Playstation", "Xbox")
  ) |> 
  # Summarizing the data by year and platform
  summarize(
    .by = c(year, platform),
    games = n(),
    overall_sales = sum(total_sales),
    critic_avg = mean(critic_score, na.rm = T)
  ) |> 
  # Arranging the rows by year and platform
  arrange(year, platform) 

tibble(vg2)
## # A tibble: 61 × 5
##     year platform    games overall_sales critic_avg
##    <int> <chr>       <int>         <dbl>      <dbl>
##  1  1997 Nintendo        5          4.49       7.23
##  2  1997 Playstation    31         31.1        7.91
##  3  1998 Nintendo        8          6.07       7.85
##  4  1998 Playstation    31         29.2        7.66
##  5  1999 Nintendo        8          4.1        7.85
##  6  1999 Playstation    22         33.3        7.54
##  7  2000 Nintendo        7          5.76       8.62
##  8  2000 Playstation    42         60.2        7.89
##  9  2001 Nintendo       16         20.2        8.33
## 10  2001 Playstation    39         63.3        8.19
## # ℹ 51 more rows

Question 3) Video game sales by year

Create the graph seen in Brightspace using vg2. It’s very similar to the graph made in homework 3.2, and you can use it as a reference.

# Create the graph described in the question using vgames data set
ggplot(
  data = vg2,
  mapping = aes(
    x = year,
    y = overall_sales,
    color = platform
  )
) +
  geom_line(
    linewidth = 1
  ) +
  
  geom_text(
    data = vg2 |> filter(.by = platform, year == min(year)),
    mapping = aes(label = platform),
    nudge_x = c(-1.25, -1.5, -0.75)
  ) +
  
  theme_bw() +
  labs(
    x = "Year",
    y = "Sales (In Millions)",
    title = "Video Game Sales per Year by Platform",
    color = NULL
  ) +
  
  scale_color_manual(
    values = c("Nintendo" = "red",
               "Playstation" = "blue",
               "Xbox" = "green")
  ) +
  
  # Add this to your graph to change the tick marks on the x-axis
  scale_x_continuous(
    breaks = seq(1997, 2017, 2),
    minor_breaks = NULL,
    limits = c(1995, 2017)
  ) +
  
  # Add this to your graph to center title and move the legend inside the graph
  theme(
    plot.title = element_text(hjust = 0.5),
    legend.position = "none"
  )