Download and load the Superheroes Dataset (heroes_information.csv) into a data frame.
Use any of the tools you currently have from Base R, the tidyverse and ggplot2 to answer the questions below.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
heroes_info <- read.csv("heroes_information.csv")
#heroes_info
class(heroes_info)
## [1] "data.frame"
dim(heroes_info)
## [1] 734 11
summary(heroes_info)
## X name Gender Eye.color
## Min. : 0.0 Length:734 Length:734 Length:734
## 1st Qu.:183.2 Class :character Class :character Class :character
## Median :366.5 Mode :character Mode :character Mode :character
## Mean :366.5
## 3rd Qu.:549.8
## Max. :733.0
##
## Race Hair.color Height Publisher
## Length:734 Length:734 Min. :-99.0 Length:734
## Class :character Class :character 1st Qu.:-99.0 Class :character
## Mode :character Mode :character Median :175.0 Mode :character
## Mean :102.3
## 3rd Qu.:185.0
## Max. :975.0
##
## Skin.color Alignment Weight
## Length:734 Length:734 Min. :-99.00
## Class :character Class :character 1st Qu.:-99.00
## Mode :character Mode :character Median : 62.00
## Mean : 43.86
## 3rd Qu.: 90.00
## Max. :900.00
## NA's :2
str(heroes_info)
## 'data.frame': 734 obs. of 11 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ name : chr "A-Bomb" "Abe Sapien" "Abin Sur" "Abomination" ...
## $ Gender : chr "Male" "Male" "Male" "Male" ...
## $ Eye.color : chr "yellow" "blue" "blue" "green" ...
## $ Race : chr "Human" "Icthyo Sapien" "Ungaran" "Human / Radiation" ...
## $ Hair.color: chr "No Hair" "No Hair" "No Hair" "No Hair" ...
## $ Height : num 203 191 185 203 -99 193 -99 185 173 178 ...
## $ Publisher : chr "Marvel Comics" "Dark Horse Comics" "DC Comics" "Marvel Comics" ...
## $ Skin.color: chr "-" "blue" "red" "-" ...
## $ Alignment : chr "good" "good" "good" "bad" ...
## $ Weight : num 441 65 90 441 -99 122 -99 88 61 81 ...
heroes_info <- as_tibble(heroes_info)
heroes_info <- select(heroes_info, -X)
sum(is.na(heroes_info))
## [1] 2
heroes_info[heroes_info == "-"] <- NA
group_race <- heroes_info %>%
group_by(Race) %>%
summarise(n = n())
group_race %>% arrange(desc(n))
## # A tibble: 62 x 2
## Race n
## <chr> <int>
## 1 <NA> 304
## 2 Human 208
## 3 Mutant 63
## 4 God / Eternal 14
## 5 Cyborg 11
## 6 Human / Radiation 11
## 7 Android 9
## 8 Symbiote 9
## 9 Alien 7
## 10 Kryptonian 7
## # ... with 52 more rows
table(heroes_info$Gender)
##
## Female Male
## 200 505
I have noticed that there are higher-good superheroes than bad, both male and female. When comparing males and females, I see that males have a higher number in the same category in all aspects.
Gender_x_Alignment <- table(heroes_info$Gender, heroes_info$Alignment)
Gender_x_Alignment
##
## bad good neutral
## Female 35 161 4
## Male 165 316 18
heroes_info <- heroes_info %>% filter(Height > 0 )
heroes_info <- heroes_info %>% filter(Weight > 0 )
heroes_info
## # A tibble: 490 x 10
## name Gender Eye.color Race Hair.color Height Publisher Skin.color Alignment
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
## 1 A-Bo~ Male yellow Human No Hair 203 Marvel C~ <NA> good
## 2 Abe ~ Male blue Icth~ No Hair 191 Dark Hor~ blue good
## 3 Abin~ Male blue Unga~ No Hair 185 DC Comics red good
## 4 Abom~ Male green Huma~ No Hair 203 Marvel C~ <NA> bad
## 5 Abso~ Male blue Human No Hair 193 Marvel C~ <NA> bad
## 6 Adam~ Male blue Human Blond 185 DC Comics <NA> good
## 7 Agen~ Female blue <NA> Blond 173 Marvel C~ <NA> good
## 8 Agen~ Male brown Human Brown 178 Marvel C~ <NA> good
## 9 Agen~ Male <NA> <NA> <NA> 191 Marvel C~ <NA> good
## 10 Air-~ Male blue <NA> White 188 Marvel C~ <NA> bad
## # ... with 480 more rows, and 1 more variable: Weight <dbl>
#Weight
mean(heroes_info$Weight, na.rm = TRUE)
## [1] 112.1796
median(heroes_info$Weight, na.rm = TRUE)
## [1] 81
sd(heroes_info$Weight, na.rm = TRUE)
## [1] 104.4227
#Height
mean(heroes_info$Height, na.rm = TRUE)
## [1] 187.1239
median(heroes_info$Height, na.rm = TRUE)
## [1] 183
sd(heroes_info$Height, na.rm = TRUE)
## [1] 58.99002
hist(heroes_info$Weight, main="Maximum weight of Superheroes",
xlab="Weight",xlim=c(0,1100),
col="lightskyblue"
)
plot(heroes_info$Weight, heroes_info$Height , main = "Height vs. Weight",
xlab = "Weight", ylab = "Height")
abline(lm(Height ~ Weight, data = heroes_info), col = "blue")
The median weight for good superheroes is slightly less than bad and neutral. There is a greater variability for neutral than any other, but good superheroes have the largest outliers. The minimun number of bad superheroes is higher than others.
The chart tells us that good superheroes have the least average weight of all. The good superheroes have the less dispersed the weight, While the nuetral seems to have more dispersed weight. The minimum weight of bad superheroes is higher than other.
fivenum(heroes_info$Height)
## [1] 15.2 173.0 183.0 188.0 975.0
fivenum(heroes_info$Weight)
## [1] 4 61 81 106 900
boxplot(heroes_info$Weight ~ heroes_info$Alignment)
#heroes_info %>% group_by(Publisher)
heroes_info %>% count(Publisher, sort = TRUE)
## # A tibble: 10 x 2
## Publisher n
## <chr> <int>
## 1 "Marvel Comics" 318
## 2 "DC Comics" 143
## 3 "Dark Horse Comics" 11
## 4 "George Lucas" 5
## 5 "Shueisha" 4
## 6 "Team Epic TV" 4
## 7 "Star Trek" 2
## 8 "" 1
## 9 "Image Comics" 1
## 10 "Sony Pictures" 1
data_group <- filter(heroes_info, Publisher == "Marvel Comics" | Publisher == "DC Comics", na.rm = TRUE)
con_tab <- table(data_group$Publisher, data_group$Alignment)
con_tab
##
## bad good neutral
## DC Comics 40 94 8
## Marvel Comics 92 215 9
prop.table(con_tab, 1)
##
## bad good neutral
## DC Comics 0.28169014 0.66197183 0.05633803
## Marvel Comics 0.29113924 0.68037975 0.02848101