Q1 How many comicbook characters are there?

23, 272 characters.

# Load dplyr package
library(dplyr) #for use of dplyr functions such as glimpse(), mutate(), and filter()
library(ggplot2) #for use of ggplot2 functions such ggplot()

# Import data
comics <- read.csv("/resources/rstudio/BusinessStatistics/Data/comics.csv") 

# Convert data to tbl_df
comics <- tbl_df(comics)
str(comics)
## Classes 'tbl_df', 'tbl' and 'data.frame':    23272 obs. of  11 variables:
##  $ name        : Factor w/ 23272 levels "'Spinner (Earth-616)",..: 19833 3335 22769 9647 20956 2220 17576 9346 18794 10957 ...
##  $ id          : Factor w/ 4 levels "No Dual","Public",..: 3 2 2 2 1 2 2 2 2 2 ...
##  $ align       : Factor w/ 4 levels "Bad","Good","Neutral",..: 2 2 3 2 2 2 2 2 3 2 ...
##  $ eye         : Factor w/ 26 levels "Amber Eyes","Auburn Hair",..: 11 5 5 5 5 5 6 6 6 5 ...
##  $ hair        : Factor w/ 28 levels "Auburn Hair",..: 7 27 3 3 4 14 7 7 7 4 ...
##  $ gender      : Factor w/ 3 levels "Female","Male",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ gsm         : Factor w/ 6 levels "Bisexual Characters",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ alive       : Factor w/ 2 levels "Deceased Characters",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ appearances : int  4043 3360 3061 2961 2258 2255 2072 2017 1955 1934 ...
##  $ first_appear: Factor w/ 2328 levels "1-Apr","1-Aug",..: 1772 2074 2255 2089 2185 2192 2192 2139 2292 2192 ...
##  $ publisher   : Factor w/ 2 levels "dc","marvel": 2 2 2 2 2 2 2 2 2 2 ...

Q2 List all values in id variable.

Good, Bad, Neurtal, and Reformed Criminals.

# Create a 2-way contingency table
tab <- table(comics$align, comics$id)

# Print tab
tab
##                     
##                      No Dual Public Secret Unknown
##   Bad                    474   2172   4493       7
##   Good                   647   2930   2475       0
##   Neutral                390    965    959       2
##   Reformed Criminals       0      1      1       0

Q3 How many male characters that have blue eyes are there?

1964 Males with Blue Eyes.

# Check the levels of gender
levels(comics$gender)
## [1] "Female" "Male"   "Other"

# Create a 2-way contingency table
tab <- table(comics$gender, comics$eye)

# Print tab
tab
##         
##          Amber Eyes Auburn Hair Black Eyeballs Black Eyes Blue Eyes
##   Female          6           6              2        221      1086
##   Male            9           1              1        718      1964
##   Other           0           0              0          5         1
##         
##          Brown Eyes Compound Eyes Gold Eyes Green Eyes Grey Eyes
##   Female        828             0         8        436        33
##   Male         1966             1        14        447       102
##   Other           3             0         0          5         0
##         
##          Hazel Eyes Magenta Eyes Multiple Eyes No Eyes One Eye Orange Eyes
##   Female         27            2             1       1       1           9
##   Male           71            0             6       3      20          24
##   Other           0            0             0       0       0           1
##         
##          Photocellular Eyes Pink Eyes Purple Eyes Red Eyes Silver Eyes
##   Female                 11         9          23      125           6
##   Male                   32        17          20      531           4
##   Other                   3         0           1        4           0
##         
##          Variable Eyes Violet Eyes White Eyes Yellow Eyeballs Yellow Eyes
##   Female             7          19        116               2          73
##   Male              28           4        374               4         239
##   Other              5           0         10               0           5

Q4 What is the most common hair color?

Black Hair with 4706

comics %>%
  count(hair, sort = TRUE) 
## # A tibble: 29 x 2
##    hair           n
##    <fct>      <int>
##  1 <NA>        6538
##  2 Black Hair  5329
##  3 Brown Hair  3487
##  4 Blond Hair  2326
##  5 No Hair     1176
##  6 White Hair  1100
##  7 Red Hair    1081
##  8 Bald         838
##  9 Grey Hair    688
## 10 Green Hair   159
## # ... with 19 more rows

Q5 What is the most common align value in female characters?

Good is the most common align value for female characters.

# Plot proportion of gender, conditional on align
ggplot(comics, aes(x = gender, fill = align)) + 
  geom_bar()

Q6 In which of the gender value, good characters have the greatest proportion?

Female

The code should be the same as in Q4 with only one difference, geom_bar(position = "fill").

# Plot proportion of gender, conditional on align
ggplot(comics, aes(x = gender, fill = align)) + 
  geom_bar(position = "fill") #position = "fill", to have a stacked barchart

Q7 Simplify the chart in Q5 by filtering for male and female in gender and good and bad in align.

Revise the filter code below.

# Filter
comics_filtered <- 
  comics %>% 
  filter(gender %in% c("Male", "Female"))

# Filter 
comics_filtered <- 
  comics%>%
  filter(align %in% c("Good", "Bad"))

# Plot proportion of gender, conditional on align
ggplot(comics_filtered, aes(x = align, fill = gender)) + 
  geom_bar(position = "fill") #position = "fill", to have a stacked barchart

Q8 Map three variables in the chart below: gender, id and align.

Make sure that you have the three listed variables in the chart. Swap variables to different axes and see if it tells you different stories. Briefly discuss the most interesting story you found about the three variables.

# Plot of alignment broken down by gender
ggplot(comics, aes(x = align, fill = id)) + 
  geom_bar() +
  facet_wrap(~ gender)