Q1 How many comicbook characters are there?

23,272 characters

# Load dplyr package
library(dplyr) #for use of dplyr functions such as glimpse(), mutate(), and filter()
library(ggplot2) #for use of ggplot2 functions such ggplot()

# Import data
comics <- read.csv("/resources/rstudio/Business Statistics/data/comics.csv") 

# Convert data to tbl_df
comics <- tbl_df(comics)
str(comics)
## Classes 'tbl_df', 'tbl' and 'data.frame':    23272 obs. of  11 variables:
##  $ name        : Factor w/ 23272 levels "'Spinner (Earth-616)",..: 19833 3335 22769 9647 20956 2220 17576 9346 18794 10957 ...
##  $ id          : Factor w/ 4 levels "No Dual","Public",..: 3 2 2 2 1 2 2 2 2 2 ...
##  $ align       : Factor w/ 4 levels "Bad","Good","Neutral",..: 2 2 3 2 2 2 2 2 3 2 ...
##  $ eye         : Factor w/ 26 levels "Amber Eyes","Auburn Hair",..: 11 5 5 5 5 5 6 6 6 5 ...
##  $ hair        : Factor w/ 28 levels "Auburn Hair",..: 7 27 3 3 4 14 7 7 7 4 ...
##  $ gender      : Factor w/ 3 levels "Female","Male",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ gsm         : Factor w/ 6 levels "Bisexual Characters",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ alive       : Factor w/ 2 levels "Deceased Characters",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ appearances : int  4043 3360 3061 2961 2258 2255 2072 2017 1955 1934 ...
##  $ first_appear: Factor w/ 2328 levels "1-Apr","1-Aug",..: 1772 2074 2255 2089 2185 2192 2192 2139 2292 2192 ...
##  $ publisher   : Factor w/ 2 levels "dc","marvel": 2 2 2 2 2 2 2 2 2 2 ...

Q2 List all values in id variable.

Revise the level code below so that R returns all levels (values) in the id variable.

Bad, good, neautral, and reformed criminals

# Create a 2-way contingency table
tab <- table(comics$align, comics$id)

# Print tab
tab
##                     
##                      No Dual Public Secret Unknown
##   Bad                    474   2172   4493       7
##   Good                   647   2930   2475       0
##   Neutral                390    965    959       2
##   Reformed Criminals       0      1      1       0

Q3 How many male characters that have blue eyes are there?

Revise the table code below so that R returns the answer for the question.

1,964 characters with blue eyes

# Check the levels of gender
levels(comics$gender)
## [1] "Female" "Male"   "Other"

# Create a 2-way contingency table
tab <- table(comics$eye, comics$gender)

# Print tab
tab
##                     
##                      Female Male Other
##   Amber Eyes              6    9     0
##   Auburn Hair             6    1     0
##   Black Eyeballs          2    1     0
##   Black Eyes            221  718     5
##   Blue Eyes            1086 1964     1
##   Brown Eyes            828 1966     3
##   Compound Eyes           0    1     0
##   Gold Eyes               8   14     0
##   Green Eyes            436  447     5
##   Grey Eyes              33  102     0
##   Hazel Eyes             27   71     0
##   Magenta Eyes            2    0     0
##   Multiple Eyes           1    6     0
##   No Eyes                 1    3     0
##   One Eye                 1   20     0
##   Orange Eyes             9   24     1
##   Photocellular Eyes     11   32     3
##   Pink Eyes               9   17     0
##   Purple Eyes            23   20     1
##   Red Eyes              125  531     4
##   Silver Eyes             6    4     0
##   Variable Eyes           7   28     5
##   Violet Eyes            19    4     0
##   White Eyes            116  374    10
##   Yellow Eyeballs         2    4     0
##   Yellow Eyes            73  239     5

# Remove align level
comics <- comics %>%
  filter(align != "Reformed Criminals") %>%
  droplevels()

Q4 What is the most common hair color?

Revise the barchart code below to find the answer.

Black is the most common hair color

comics %>%
  count(hair, sort = TRUE)
## # A tibble: 29 x 2
##    hair           n
##    <fct>      <int>
##  1 <NA>        5233
##  2 Black Hair  4706
##  3 Brown Hair  3009
##  4 Blond Hair  2008
##  5 No Hair     1049
##  6 White Hair   975
##  7 Red Hair     933
##  8 Bald         748
##  9 Grey Hair    577
## 10 Green Hair   138
## # ... with 19 more rows

# Create plot of align
ggplot(comics, aes(x = hair)) + 
  geom_bar()

Q5 What is the most common align value in female characters?

Map gender to x-axis and align to color in the barchart code below.

Good is the most common align value in female characters

# Plot proportion of gender, conditional on align
ggplot(comics, aes(x = gender, fill = align)) + 
  geom_bar()

Q6 In which of the gender value, good characters have the greatest proportion?

The code should be the same as in Q4 with only one difference, geom_bar(position = "fill").

Female has the greatest proportion

# Plot proportion of gender, conditional on align
ggplot(comics, aes(x = gender, fill = align)) + 
  geom_bar(position = "fill") #position = "fill", to have a stacked barchart

Q7 Simplify the chart in Q5 by filtering for male and female in gender and good and bad in align.

Revise the filter code below.

# Filter
comics_filtered <- 
  comics %>% 
  filter(gender %in% c("Male", "Female"))

# Filter
comics_filtered <- 
  comics %>% 
  filter(align %in% c("Good", "Bad"))


# Plot proportion of gender, conditional on align
ggplot(comics_filtered, aes(x = align, fill = gender)) + 
  geom_bar(position = "fill") #position = "fill", to have a stacked barchart

Q8 Map three variables in the chart below: gender, id and align.

Make sure that you have the three listed variables in the chart. Swap variables to different axes and see if it tells you different stories. Briefly discuss the most interesting story you found about the three variables.

# Plot of alignment broken down by gender
ggplot(comics, aes(x = align, fill = id)) + 
  geom_bar() +
  facet_wrap(~ gender)