607_Assignment1_DylanGold

Assignment 1

Approach:

I have first browsed online for a data source. My data source is in .csv format. I enjoy video games as a hobby so I picked a data set showing game sales from Kaggle. I will load this data set into a data frame and perform operations to further explore this data set. For example a filter to select games from certain companies and more.

Source:

https://www.kaggle.com/datasets/gregorut/videogamesales?resource=download

Overview

This data is a list of video game sales for games that had more than 100,000 sales. I will load the data frame, condense it by reducing the number of columns while moving to a new data frame and try to do some sort of transformation based on what I want to see.

Loading the Dataframe

Loading in the data We can assign the html address to a variable then use the variable to create a data frame for future use. I will also show the head to see some of the rows

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.2
Warning: package 'ggplot2' was built under R version 4.5.2
Warning: package 'tibble' was built under R version 4.5.2
Warning: package 'tidyr' was built under R version 4.5.2
Warning: package 'readr' was built under R version 4.5.2
Warning: package 'purrr' was built under R version 4.5.2
Warning: package 'dplyr' was built under R version 4.5.2
Warning: package 'stringr' was built under R version 4.5.2
Warning: package 'forcats' was built under R version 4.5.2
Warning: package 'lubridate' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gt)
Warning: package 'gt' was built under R version 4.5.2
url <- "https://raw.githubusercontent.com/DylanGoldJ/607-Assignment-1/refs/heads/main/vgsales.csv"

df <- read_csv(
  file = url,
  show_col_types = FALSE
)
head(df, 10)
# A tibble: 10 × 11
    Rank Name          Platform Year  Genre Publisher NA_Sales EU_Sales JP_Sales
   <dbl> <chr>         <chr>    <chr> <chr> <chr>        <dbl>    <dbl>    <dbl>
 1     1 Wii Sports    Wii      2006  Spor… Nintendo      41.5    29.0      3.77
 2     2 Super Mario … NES      1985  Plat… Nintendo      29.1     3.58     6.81
 3     3 Mario Kart W… Wii      2008  Raci… Nintendo      15.8    12.9      3.79
 4     4 Wii Sports R… Wii      2009  Spor… Nintendo      15.8    11.0      3.28
 5     5 Pokemon Red/… GB       1996  Role… Nintendo      11.3     8.89    10.2 
 6     6 Tetris        GB       1989  Puzz… Nintendo      23.2     2.26     4.22
 7     7 New Super Ma… DS       2006  Plat… Nintendo      11.4     9.23     6.5 
 8     8 Wii Play      Wii      2006  Misc  Nintendo      14.0     9.2      2.93
 9     9 New Super Ma… Wii      2009  Plat… Nintendo      14.6     7.06     4.7 
10    10 Duck Hunt     NES      1984  Shoo… Nintendo      26.9     0.63     0.28
# ℹ 2 more variables: Other_Sales <dbl>, Global_Sales <dbl>

Reducing number of columns

We can see the data is already sorted by rank, determined by total sales. I see that the top 10 selling games in this data all have the publisher of Nintendo. First I will create a new data frame with just the Rank, Name, Year, Publisher and global sales

df2 <- df[c("Rank", "Name", "Year", "Publisher", "Global_Sales")]
head(df2)
# A tibble: 6 × 5
   Rank Name                     Year  Publisher Global_Sales
  <dbl> <chr>                    <chr> <chr>            <dbl>
1     1 Wii Sports               2006  Nintendo          82.7
2     2 Super Mario Bros.        1985  Nintendo          40.2
3     3 Mario Kart Wii           2008  Nintendo          35.8
4     4 Wii Sports Resort        2009  Nintendo          33  
5     5 Pokemon Red/Pokemon Blue 1996  Nintendo          31.4
6     6 Tetris                   1989  Nintendo          30.3

Filtering the data frame

I am interested in seeing games that are not by Nintendo so I will try to filter non Nintendo games and use the head command to show the top 10

df3 <- filter(df2, Publisher != "Nintendo")
head(df3, 10)
# A tibble: 10 × 5
    Rank Name                           Year  Publisher             Global_Sales
   <dbl> <chr>                          <chr> <chr>                        <dbl>
 1    16 Kinect Adventures!             2010  Microsoft Game Studi…         21.8
 2    17 Grand Theft Auto V             2013  Take-Two Interactive          21.4
 3    18 Grand Theft Auto: San Andreas  2004  Take-Two Interactive          20.8
 4    24 Grand Theft Auto V             2013  Take-Two Interactive          16.4
 5    25 Grand Theft Auto: Vice City    2002  Take-Two Interactive          16.2
 6    29 Gran Turismo 3: A-Spec         2001  Sony Computer Entert…         15.0
 7    30 Call of Duty: Modern Warfare 3 2011  Activision                    14.8
 8    32 Call of Duty: Black Ops        2010  Activision                    14.6
 9    34 Call of Duty: Black Ops 3      2015  Activision                    14.2
10    35 Call of Duty: Black Ops II     2012  Activision                    14.0

Conclusion

This was a good way to set up some of the things needed in these assignments like Rstudio, Github and more. I was also able to load in a data frame I found online. I modified the data frame to make it concise and easier to read as well as used a simple filter to show different results in the top 10 rows of the data frame.