I have first browsed online for a data source. My data source is in .csv format. I enjoy video games as a hobby so I picked a data set showing game sales from Kaggle. I will load this data set into a data frame and perform operations to further explore this data set. For example a filter to select games from certain companies and more.
This data is a list of video game sales for games that had more than 100,000 sales. I will load the data frame, condense it by reducing the number of columns while moving to a new data frame and try to do some sort of transformation based on what I want to see.
Loading the Dataframe
Loading in the data We can assign the html address to a variable then use the variable to create a data frame for future use. I will also show the head to see some of the rows
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.2
Warning: package 'ggplot2' was built under R version 4.5.2
Warning: package 'tibble' was built under R version 4.5.2
Warning: package 'tidyr' was built under R version 4.5.2
Warning: package 'readr' was built under R version 4.5.2
Warning: package 'purrr' was built under R version 4.5.2
Warning: package 'dplyr' was built under R version 4.5.2
Warning: package 'stringr' was built under R version 4.5.2
Warning: package 'forcats' was built under R version 4.5.2
Warning: package 'lubridate' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.1 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gt)
Warning: package 'gt' was built under R version 4.5.2
# A tibble: 10 × 11
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1 Wii Sports Wii 2006 Spor… Nintendo 41.5 29.0 3.77
2 2 Super Mario … NES 1985 Plat… Nintendo 29.1 3.58 6.81
3 3 Mario Kart W… Wii 2008 Raci… Nintendo 15.8 12.9 3.79
4 4 Wii Sports R… Wii 2009 Spor… Nintendo 15.8 11.0 3.28
5 5 Pokemon Red/… GB 1996 Role… Nintendo 11.3 8.89 10.2
6 6 Tetris GB 1989 Puzz… Nintendo 23.2 2.26 4.22
7 7 New Super Ma… DS 2006 Plat… Nintendo 11.4 9.23 6.5
8 8 Wii Play Wii 2006 Misc Nintendo 14.0 9.2 2.93
9 9 New Super Ma… Wii 2009 Plat… Nintendo 14.6 7.06 4.7
10 10 Duck Hunt NES 1984 Shoo… Nintendo 26.9 0.63 0.28
# ℹ 2 more variables: Other_Sales <dbl>, Global_Sales <dbl>
Reducing number of columns
We can see the data is already sorted by rank, determined by total sales. I see that the top 10 selling games in this data all have the publisher of Nintendo. First I will create a new data frame with just the Rank, Name, Year, Publisher and global sales
# A tibble: 6 × 5
Rank Name Year Publisher Global_Sales
<dbl> <chr> <chr> <chr> <dbl>
1 1 Wii Sports 2006 Nintendo 82.7
2 2 Super Mario Bros. 1985 Nintendo 40.2
3 3 Mario Kart Wii 2008 Nintendo 35.8
4 4 Wii Sports Resort 2009 Nintendo 33
5 5 Pokemon Red/Pokemon Blue 1996 Nintendo 31.4
6 6 Tetris 1989 Nintendo 30.3
Filtering the data frame
I am interested in seeing games that are not by Nintendo so I will try to filter non Nintendo games and use the head command to show the top 10
# A tibble: 10 × 5
Rank Name Year Publisher Global_Sales
<dbl> <chr> <chr> <chr> <dbl>
1 16 Kinect Adventures! 2010 Microsoft Game Studi… 21.8
2 17 Grand Theft Auto V 2013 Take-Two Interactive 21.4
3 18 Grand Theft Auto: San Andreas 2004 Take-Two Interactive 20.8
4 24 Grand Theft Auto V 2013 Take-Two Interactive 16.4
5 25 Grand Theft Auto: Vice City 2002 Take-Two Interactive 16.2
6 29 Gran Turismo 3: A-Spec 2001 Sony Computer Entert… 15.0
7 30 Call of Duty: Modern Warfare 3 2011 Activision 14.8
8 32 Call of Duty: Black Ops 2010 Activision 14.6
9 34 Call of Duty: Black Ops 3 2015 Activision 14.2
10 35 Call of Duty: Black Ops II 2012 Activision 14.0
Conclusion
This was a good way to set up some of the things needed in these assignments like Rstudio, Github and more. I was also able to load in a data frame I found online. I modified the data frame to make it concise and easier to read as well as used a simple filter to show different results in the top 10 rows of the data frame.