If a question asks for any calculations (means, medians, tables, proportions, etc…) or graphs, make sure they appear in the knitted document
The final document should not show any warnings
The video game data set (vg) has 11 variables on over 64000 video games. The important variables are:
Create a data set that only has games from 1997 to 2018, with the following columns:
1) title: the name of the game 2) genre: the type of game (Action/RPG/Shooter/Sports/Other) 3) platform: The platform the game is available to play on (Nintendo, Playstation, Xbox, PC, Other) 4) year: The year the game was released on the platform 5) critic_score: The critical score for the game, on a scale from 1 to 10 6) total_sales: The total number of the game sold across all regions
It shouldn’t have any missing values for year or total_sales, but is fine for critic_score to be missing.
Save it as vg1.
If done correctly, it should have 2,155 rows
Display the results using
tibble(vg1)
vg1 <-
vg |>
# Keeping just the games released between 1997 and 2018
filter(between(year, 1997, 2018)) |>
# Creating the total sales column
mutate(total_sales = na_sales + jp_sales + eu_sales + other_sales) |>
# Keeping just the specified columns
dplyr::select(title, genre, platform, year, critic_score, total_sales) |>
# Removing the rows with the corresponding missing values
filter(
!is.na(total_sales),
!is.na(year)
)
tibble(vg1)
## # A tibble: 2,155 × 6
## title genre platform year critic_score total_sales
## <chr> <chr> <chr> <int> <dbl> <dbl>
## 1 Grand Theft Auto V Action Playsta… 2013 9.4 20.3
## 2 Grand Theft Auto V Action Playsta… 2014 9.7 19.4
## 3 Grand Theft Auto: Vice City Action Playsta… 2002 9.6 16.2
## 4 Grand Theft Auto V Action Xbox 2013 NA 15.9
## 5 Call of Duty: Black Ops 3 Shoot… Playsta… 2015 8.1 15.1
## 6 Call of Duty: Modern Warfare 3 Shoot… Xbox 2011 8.7 14.8
## 7 Call of Duty: Black Ops Shoot… Xbox 2010 8.8 14.7
## 8 Red Dead Redemption 2 Action Playsta… 2018 9.8 13.9
## 9 Call of Duty: Black Ops II Shoot… Xbox 2012 8.4 13.9
## 10 Call of Duty: Black Ops II Shoot… Playsta… 2012 8 13.8
## # ℹ 2,145 more rows
Using the vg1 data set, create a data frame that has the number of games (games), overall sales (overall_sales) and average critic score (critic_avg) for each year and platform combination. It should only have the three main video game consoles: Nintendo, Playstation, and Xbox (No PC or Other). Make sure they are arranged in year then platform order
Save the results as vg2. Display the results using
tibble(vg2)
. If done correctly, it should have 61
rows
vg2 <-
vg1 |>
# Keeping just the 3 specified platforms
filter(
platform %in% c("Nintendo", "Playstation", "Xbox")
) |>
# Summarizing the data by year and platform
summarize(
.by = c(year, platform),
games = n(),
overall_sales = sum(total_sales),
critic_avg = mean(critic_score, na.rm = T)
) |>
# Arranging the rows by year and platform
arrange(year, platform)
tibble(vg2)
## # A tibble: 61 × 5
## year platform games overall_sales critic_avg
## <int> <chr> <int> <dbl> <dbl>
## 1 1997 Nintendo 5 4.49 7.23
## 2 1997 Playstation 31 31.1 7.91
## 3 1998 Nintendo 8 6.07 7.85
## 4 1998 Playstation 31 29.2 7.66
## 5 1999 Nintendo 8 4.1 7.85
## 6 1999 Playstation 22 33.3 7.54
## 7 2000 Nintendo 7 5.76 8.62
## 8 2000 Playstation 42 60.2 7.89
## 9 2001 Nintendo 16 20.2 8.33
## 10 2001 Playstation 39 63.3 8.19
## # ℹ 51 more rows
Create the graph seen in Brightspace using vg2. It’s very similar to the graph made in homework 3.2, and you can use it as a reference.
# Create the graph described in the question using vgames data set
ggplot(
data = vg2,
mapping = aes(
x = year,
y = overall_sales,
color = platform
)
) +
geom_line(
linewidth = 1
) +
geom_text(
data = vg2 |> filter(.by = platform, year == min(year)),
mapping = aes(label = platform),
nudge_x = c(-1.25, -1.5, -0.75)
) +
theme_bw() +
labs(
x = "Year",
y = "Sales (In Millions)",
title = "Video Game Sales per Year by Platform",
color = NULL
) +
scale_color_manual(
values = c("Nintendo" = "red",
"Playstation" = "blue",
"Xbox" = "green")
) +
# Add this to your graph to change the tick marks on the x-axis
scale_x_continuous(
breaks = seq(1997, 2017, 2),
minor_breaks = NULL,
limits = c(1995, 2017)
) +
# Add this to your graph to center title and move the legend inside the graph
theme(
plot.title = element_text(hjust = 0.5),
legend.position = "none"
)