23, 272 characters.
# Load dplyr package
library(dplyr) #for use of dplyr functions such as glimpse(), mutate(), and filter()
library(ggplot2) #for use of ggplot2 functions such ggplot()
# Import data
comics <- read.csv("/resources/rstudio/BusinessStatistics/Data/comics.csv")
# Convert data to tbl_df
comics <- tbl_df(comics)
str(comics)
## Classes 'tbl_df', 'tbl' and 'data.frame': 23272 obs. of 11 variables:
## $ name : Factor w/ 23272 levels "'Spinner (Earth-616)",..: 19833 3335 22769 9647 20956 2220 17576 9346 18794 10957 ...
## $ id : Factor w/ 4 levels "No Dual","Public",..: 3 2 2 2 1 2 2 2 2 2 ...
## $ align : Factor w/ 4 levels "Bad","Good","Neutral",..: 2 2 3 2 2 2 2 2 3 2 ...
## $ eye : Factor w/ 26 levels "Amber Eyes","Auburn Hair",..: 11 5 5 5 5 5 6 6 6 5 ...
## $ hair : Factor w/ 28 levels "Auburn Hair",..: 7 27 3 3 4 14 7 7 7 4 ...
## $ gender : Factor w/ 3 levels "Female","Male",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ gsm : Factor w/ 6 levels "Bisexual Characters",..: NA NA NA NA NA NA NA NA NA NA ...
## $ alive : Factor w/ 2 levels "Deceased Characters",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ appearances : int 4043 3360 3061 2961 2258 2255 2072 2017 1955 1934 ...
## $ first_appear: Factor w/ 2328 levels "1-Apr","1-Aug",..: 1772 2074 2255 2089 2185 2192 2192 2139 2292 2192 ...
## $ publisher : Factor w/ 2 levels "dc","marvel": 2 2 2 2 2 2 2 2 2 2 ...
Good, Bad, Neurtal, and Reformed Criminals.
# Create a 2-way contingency table
tab <- table(comics$align, comics$id)
# Print tab
tab
##
## No Dual Public Secret Unknown
## Bad 474 2172 4493 7
## Good 647 2930 2475 0
## Neutral 390 965 959 2
## Reformed Criminals 0 1 1 0
1964 Males with Blue Eyes.
# Check the levels of gender
levels(comics$gender)
## [1] "Female" "Male" "Other"
# Create a 2-way contingency table
tab <- table(comics$gender, comics$eye)
# Print tab
tab
##
## Amber Eyes Auburn Hair Black Eyeballs Black Eyes Blue Eyes
## Female 6 6 2 221 1086
## Male 9 1 1 718 1964
## Other 0 0 0 5 1
##
## Brown Eyes Compound Eyes Gold Eyes Green Eyes Grey Eyes
## Female 828 0 8 436 33
## Male 1966 1 14 447 102
## Other 3 0 0 5 0
##
## Hazel Eyes Magenta Eyes Multiple Eyes No Eyes One Eye Orange Eyes
## Female 27 2 1 1 1 9
## Male 71 0 6 3 20 24
## Other 0 0 0 0 0 1
##
## Photocellular Eyes Pink Eyes Purple Eyes Red Eyes Silver Eyes
## Female 11 9 23 125 6
## Male 32 17 20 531 4
## Other 3 0 1 4 0
##
## Variable Eyes Violet Eyes White Eyes Yellow Eyeballs Yellow Eyes
## Female 7 19 116 2 73
## Male 28 4 374 4 239
## Other 5 0 10 0 5
Black Hair with 4706
comics %>%
count(hair, sort = TRUE)
## # A tibble: 29 x 2
## hair n
## <fct> <int>
## 1 <NA> 6538
## 2 Black Hair 5329
## 3 Brown Hair 3487
## 4 Blond Hair 2326
## 5 No Hair 1176
## 6 White Hair 1100
## 7 Red Hair 1081
## 8 Bald 838
## 9 Grey Hair 688
## 10 Green Hair 159
## # ... with 19 more rows
Good is the most common align value for female characters.
# Plot proportion of gender, conditional on align
ggplot(comics, aes(x = gender, fill = align)) +
geom_bar()
Female
The code should be the same as in Q4 with only one difference, geom_bar(position = "fill").
# Plot proportion of gender, conditional on align
ggplot(comics, aes(x = gender, fill = align)) +
geom_bar(position = "fill") #position = "fill", to have a stacked barchart
Revise the filter code below.
# Filter
comics_filtered <-
comics %>%
filter(gender %in% c("Male", "Female"))
# Filter
comics_filtered <-
comics%>%
filter(align %in% c("Good", "Bad"))
# Plot proportion of gender, conditional on align
ggplot(comics_filtered, aes(x = align, fill = gender)) +
geom_bar(position = "fill") #position = "fill", to have a stacked barchart
Make sure that you have the three listed variables in the chart. Swap variables to different axes and see if it tells you different stories. Briefly discuss the most interesting story you found about the three variables.
# Plot of alignment broken down by gender
ggplot(comics, aes(x = align, fill = id)) +
geom_bar() +
facet_wrap(~ gender)