MATH2405 TP3, 2020

Setup

library(dplyr)
library(readr)
library(tidyr)
library(knitr)
library(magrittr)

Locate Data

This data set contains information related to e-sports, retrieved from Kaggle.

https://www.kaggle.com/rankirsh/esports-earnings/

Variables include:

Name of game
Release date of game
Genre of game
Total tournament earnings
Total number of e-sport players
Total number of e-sport tournaments

Read/Import Data

Esports <- read_csv("GeneralEsportData.csv", show_col_types = FALSE)
head(Esports)

Used read_csv function from package readr
Saved it as a data frame and named it Esports
Showed output using head()

Data description

This data set contains information related to e-sports.

Variables include:

Game - Name of game
ReleaseDate - Year that the game was released
Genre - Type of genre of game
TotalEarnings - Total earnings from e-sports tournaments of game
OnlineEarnings - Total earnings from e-sports online tournaments of game
TotalPlayers - Total number of professional players related to game
TotalTournaments - Total number of tournaments related to game

Source:

Ran, K, 2021, Esports Earnings 1998 - 2021, Kaggle, viewed 17 August 2021, <https://www.kaggle.com/rankirsh/esports-earnings/>

Inspect dataset and variables

# Check dimensions of the data frame
dim.data.frame(Esports)

## [1] 535   7

# Class of Variable "Game"
class(Esports$Game)

## [1] "character"

# Class of Variable "ReleaseDate"
class(Esports$ReleaseDate)

## [1] "numeric"

# Class of Variable "Genre"
class(Esports$Genre)

## [1] "character"

# Class of Variable "TotalEarnings"
class(Esports$TotalEarnings)

## [1] "numeric"

# Class of Variable "OnlineEarnings"
class(Esports$OnlineEarnings)

## [1] "numeric"

# Class of Variable "TotalPlayers"
class(Esports$TotalPlayers)

## [1] "numeric"

# Class of Variable "TotalTournaments"
class(Esports$TotalTournaments)

## [1] "numeric"

# Check column names in the data frame 
colnames(Esports)

## [1] "Game"             "ReleaseDate"      "Genre"            "TotalEarnings"   
## [5] "OnlineEarnings"   "TotalPlayers"     "TotalTournaments"

Checked dimensions of data frame with dim.data.frame()
Checked data types of all variables using class(). All variables in correct data type, no conversions needed.
No factor variables present/necessary
Checked column names using colnames()

Tidy data

This data conforms the tidy data principles. The data frame is in tidy format.

Each variable has its own columns
Each observation has its own row
Each value has its own cell

Summary statistics

# Summary Stats of Total Earnings, grouped by Genre
Esports %>% 
  group_by(Genre) %>% 
  summarize(mean = mean(TotalEarnings),
            median = median(TotalEarnings),
            min = min (TotalEarnings),
            max = max (TotalEarnings),
            sd = sd(TotalEarnings))

# Summary Stats of Online Earnings, grouped by Genre
Esports %>% 
  group_by(Genre) %>% 
  summarize(mean = mean(OnlineEarnings),
            median = median(OnlineEarnings),
            min = min (OnlineEarnings),
            max = max (OnlineEarnings),
            sd = sd(OnlineEarnings))

# Summary Stats of Total Players, grouped by Genre
Esports %>% 
  group_by(Genre) %>% 
  summarize(mean = mean(TotalPlayers),
            median = median(TotalPlayers),
            min = min (TotalPlayers),
            max = max (TotalPlayers),
            sd = sd(TotalPlayers))

# Summary Stats of Total Tournaments, grouped by Genre
Esports %>% 
  group_by(Genre) %>% 
  summarize(mean = mean(TotalTournaments),
            median = median(TotalTournaments),
            min = min (TotalTournaments),
            max = max (TotalTournaments),
            sd = sd(TotalTournaments))

Grouped by the nominal variable “Genre” using group_by()
Provided summary statistics of numerical variables using summarize()

Create a list

mylist <- list(c("Battle Royale", "Collectible Card Game", "Fighting Game", "First-Person Shooter", "Multiplayer Online Battle Arena", "Puzzle Game", "Racing", "Role-Playing Game", "Sports", "Strategy", "Third-Person Shooter"), c(1:11))
mylist

## [[1]]
##  [1] "Battle Royale"                   "Collectible Card Game"          
##  [3] "Fighting Game"                   "First-Person Shooter"           
##  [5] "Multiplayer Online Battle Arena" "Puzzle Game"                    
##  [7] "Racing"                          "Role-Playing Game"              
##  [9] "Sports"                          "Strategy"                       
## [11] "Third-Person Shooter"           
## 
## [[2]]
##  [1]  1  2  3  4  5  6  7  8  9 10 11

Created a list that contains a numeric value for each response to the categorical variable, numbered 1:11 using list()
This was used by creating a list with two vectors, one that contained names and one that contained the numbers 1:11
Named it mylist and printed output

Join the list

mydf <- as.data.frame(mylist)
colnames(mydf) <- c("Genre", "GenreID")

Esports %<>%
  left_join(mydf)

## Joining, by = "Genre"

Esports

Created data frame from list and called it mydf
Gave mydf column names, “Genre” and “GenreID”
Joined mydf to orginal data frame Esports using left_join(), joining by “Genre”
Output shows Esports data frame with new column “GenreID”

Subsetting I

tenobs <- Esports[c(1:10), ]
my_mat <- as.matrix(tenobs)
str(my_mat)

##  chr [1:10, 1:8] "Age of Empires" "Age of Empires II" "Age of Empires III" ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:8] "Game" "ReleaseDate" "Genre" "TotalEarnings" ...

Subsetted Esports using the first 10 observations and named it tenobs
Converted tenobs into matrix using as.matrix() and named it my_mat
Checked structure of my_mat using str()
Found the matrix structure is “character”
This is because matrix is homogeneous in nature, and as.matrix will result in a character matrix so long as a non numerical/logical/complex column exists. In our case we have several character variable columns present

Subsetting II

twovar <- Esports[ ,c(1,8)]
save(twovar, file = "First and Last Variable.RData")

Subsetted Esports using the first and last variable and named it twovar
Used save() function and named it “First and Last Variable.RData” to save it as a .RData file

Create a new Data Frame

Score1st <- c(98, 72, 65, 51, 6)
Grade <- factor(c("High Distinction", "Distinction", "Credit", "Pass", "Fail"),
                levels = c("Fail", "Pass", "Credit", "Distinction", "High Distinction"))
Results <- data.frame(Score1st, Grade)
Results

str(Results)

## 'data.frame':    5 obs. of  2 variables:
##  $ Score1st: num  98 72 65 51 6
##  $ Grade   : Factor w/ 5 levels "Fail","Pass",..: 5 4 3 2 1

Score2nd <- c(99, 73, 63, 52, 41)
Results2 <- cbind(Results, Score2nd)
Results2

str(Results2)

## 'data.frame':    5 obs. of  3 variables:
##  $ Score1st: num  98 72 65 51 6
##  $ Grade   : Factor w/ 5 levels "Fail","Pass",..: 5 4 3 2 1
##  $ Score2nd: num  99 73 63 52 41

dim.data.frame(Results2)

## [1] 5 3

Created vector and named it Score1st
Created factor vector and named it Grade, ordered with levels
Created data frame using above vectors and named it Results
Showed structure of Results with str()
Created another numeric vector and named it Score2nd
Added Score2nd to Results using cbind(), named it Results2
Checked attributes and structure with str(Results2) and dim.data.frame(Results2)

Create another Data Frame

Create another data frame with a common variable to the dataset created in step 11.

Join the data frame to the dataset above, and ensure that the dataset is joined properly.
Ensuring the new categorical variable is carried to the larger dataset. Eg. A dataset to join could be State, Abbreviation, Municipality, Prevailing Religion.
Provide the R codes with outputs and explain everything that you do in this step.

# This is a chunk to create another data frame with the given specifications
StudentName <- c("Jessica", "Henry", "Peter", "Samantha", "Jeff")
Results3 <- data.frame(StudentName, Grade)
Results3

Results4 <- left_join(Results2, Results3)

## Joining, by = "Grade"

Results5 <- Results4[ ,c(4,2,1,3)]
Results5

Created another vector StudentName containing names of various students
Created data frame and named it Results 3, using vectors StudentName and Grade (Grade being the common variable also in step 11)
Joined using left_join(), joined by “Grade”. Named new data frame Results4
Ordered the columns to make it look nicer, named new data frame Results5