23,272 characters
# Load dplyr package
library(dplyr) #for use of dplyr functions such as glimpse(), mutate(), and filter()
library(ggplot2) #for use of ggplot2 functions such ggplot()
# Import data
comics <- read.csv("/resources/rstudio/Business Statistics/data/comics.csv")
# Convert data to tbl_df
comics <- tbl_df(comics)
str(comics)
## Classes 'tbl_df', 'tbl' and 'data.frame': 23272 obs. of 11 variables:
## $ name : Factor w/ 23272 levels "'Spinner (Earth-616)",..: 19833 3335 22769 9647 20956 2220 17576 9346 18794 10957 ...
## $ id : Factor w/ 4 levels "No Dual","Public",..: 3 2 2 2 1 2 2 2 2 2 ...
## $ align : Factor w/ 4 levels "Bad","Good","Neutral",..: 2 2 3 2 2 2 2 2 3 2 ...
## $ eye : Factor w/ 26 levels "Amber Eyes","Auburn Hair",..: 11 5 5 5 5 5 6 6 6 5 ...
## $ hair : Factor w/ 28 levels "Auburn Hair",..: 7 27 3 3 4 14 7 7 7 4 ...
## $ gender : Factor w/ 3 levels "Female","Male",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ gsm : Factor w/ 6 levels "Bisexual Characters",..: NA NA NA NA NA NA NA NA NA NA ...
## $ alive : Factor w/ 2 levels "Deceased Characters",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ appearances : int 4043 3360 3061 2961 2258 2255 2072 2017 1955 1934 ...
## $ first_appear: Factor w/ 2328 levels "1-Apr","1-Aug",..: 1772 2074 2255 2089 2185 2192 2192 2139 2292 2192 ...
## $ publisher : Factor w/ 2 levels "dc","marvel": 2 2 2 2 2 2 2 2 2 2 ...
Revise the level code below so that R returns all levels (values) in the id variable.
Bad, good, neautral, and reformed criminals
# Create a 2-way contingency table
tab <- table(comics$align, comics$id)
# Print tab
tab
##
## No Dual Public Secret Unknown
## Bad 474 2172 4493 7
## Good 647 2930 2475 0
## Neutral 390 965 959 2
## Reformed Criminals 0 1 1 0
Revise the table code below so that R returns the answer for the question.
1,964 characters with blue eyes
# Check the levels of gender
levels(comics$gender)
## [1] "Female" "Male" "Other"
# Create a 2-way contingency table
tab <- table(comics$eye, comics$gender)
# Print tab
tab
##
## Female Male Other
## Amber Eyes 6 9 0
## Auburn Hair 6 1 0
## Black Eyeballs 2 1 0
## Black Eyes 221 718 5
## Blue Eyes 1086 1964 1
## Brown Eyes 828 1966 3
## Compound Eyes 0 1 0
## Gold Eyes 8 14 0
## Green Eyes 436 447 5
## Grey Eyes 33 102 0
## Hazel Eyes 27 71 0
## Magenta Eyes 2 0 0
## Multiple Eyes 1 6 0
## No Eyes 1 3 0
## One Eye 1 20 0
## Orange Eyes 9 24 1
## Photocellular Eyes 11 32 3
## Pink Eyes 9 17 0
## Purple Eyes 23 20 1
## Red Eyes 125 531 4
## Silver Eyes 6 4 0
## Variable Eyes 7 28 5
## Violet Eyes 19 4 0
## White Eyes 116 374 10
## Yellow Eyeballs 2 4 0
## Yellow Eyes 73 239 5
# Remove align level
comics <- comics %>%
filter(align != "Reformed Criminals") %>%
droplevels()
Revise the barchart code below to find the answer.
Black is the most common hair color
comics %>%
count(hair, sort = TRUE)
## # A tibble: 29 x 2
## hair n
## <fct> <int>
## 1 <NA> 5233
## 2 Black Hair 4706
## 3 Brown Hair 3009
## 4 Blond Hair 2008
## 5 No Hair 1049
## 6 White Hair 975
## 7 Red Hair 933
## 8 Bald 748
## 9 Grey Hair 577
## 10 Green Hair 138
## # ... with 19 more rows
# Create plot of align
ggplot(comics, aes(x = hair)) +
geom_bar()
Map gender to x-axis and align to color in the barchart code below.
Good is the most common align value in female characters
# Plot proportion of gender, conditional on align
ggplot(comics, aes(x = gender, fill = align)) +
geom_bar()
The code should be the same as in Q4 with only one difference, geom_bar(position = "fill").
Female has the greatest proportion
# Plot proportion of gender, conditional on align
ggplot(comics, aes(x = gender, fill = align)) +
geom_bar(position = "fill") #position = "fill", to have a stacked barchart
Revise the filter code below.
# Filter
comics_filtered <-
comics %>%
filter(gender %in% c("Male", "Female"))
# Filter
comics_filtered <-
comics %>%
filter(align %in% c("Good", "Bad"))
# Plot proportion of gender, conditional on align
ggplot(comics_filtered, aes(x = align, fill = gender)) +
geom_bar(position = "fill") #position = "fill", to have a stacked barchart
Make sure that you have the three listed variables in the chart. Swap variables to different axes and see if it tells you different stories. Briefly discuss the most interesting story you found about the three variables.
# Plot of alignment broken down by gender
ggplot(comics, aes(x = align, fill = id)) +
geom_bar() +
facet_wrap(~ gender)