Using the given code, answer the questions below.

library(tidyverse) 

class_roster <- read.csv("~/R/Business Stats/data/classRoster02.csv") %>%
  as_tibble()
class_roster
## # A tibble: 30 x 6
##        X Student Class     Major                   income fav_color
##    <int> <fct>   <fct>     <fct>                    <int> <fct>    
##  1     1 Scott   Sophomore Marketing                 1010 Green    
##  2     2 Colette Sophomore Business Administration    920 Blue     
##  3     3 Niti    Senior    Business Administration   1031 Green    
##  4     4 Tyler   Sophomore Management                1064 Red      
##  5     5 Ryan    Sophomore Undeclared                1021 Orange   
##  6     6 Jack    Sophomore Business Administration   1053 Orange   
##  7     7 Michael Sophomore Business Administration   1001 Red      
##  8     8 Brianna Sophomore Marketing                 1156 Blue     
##  9     9 Trevor  Sophomore Sports Management         1019 Blue     
## 10    10 Connor  Sophomore Sports Management          848 Blue     
## # ... with 20 more rows

Add a variable of your own to answer a research question you might have: i.e., What is the favorite number of PSU students? Do favorite numbers vary by major or class?

Q1. What does the row represent?

Each row represents one specific student.

Q2. What characteristics of students (variables) does the data describe?

They represent student, class, major and income.

Q3. What type of data is the new variable you added (i.e., numeric, character, logical)?

Q4. What type of R object is class_roster (i.e., vector, matrix, data frame, list)? And why?

This is a data frame, has the ability to hold different types of data.

Q5. Describe the first student (first row) using all variables.

Hint: Use View().

Q6. Count the number of values in your new variable.

Hint: Use count().

class_roster %>% count(fav_color, sort = TRUE)
## # A tibble: 8 x 2
##   fav_color     n
##   <fct>     <int>
## 1 Blue         13
## 2 Green         4
## 3 Red           4
## 4 Pink          3
## 5 Navy          2
## 6 Orange        2
## 7 Black         1
## 8 Purple        1

Q7. Plot your new variable.

Hint: Refer to the ggplot2 cheatsheet. Google it. See the section for One Variable. Note that there are two different cases: 1) Continuous and 2) Discrete. The type of chart you can use depends on what type of data your variable is.

class_roster %>% 
  ggplot(aes(fav_color)) + 
  geom_bar()

Q8. Does your answer in Q6 vary by major?

Hint: Use dplyr::group_by in addition to count().

class_roster %>% group_by(Major) %>% count(fav_color, sort = TRUE)
## # A tibble: 19 x 3
## # Groups:   Major [7]
##    Major                     fav_color     n
##    <fct>                     <fct>     <int>
##  1 Sports Management         Blue          4
##  2 Business Administration   Blue          3
##  3 Business Administration   Red           3
##  4 Marketing                 Blue          3
##  5 Management                Blue          2
##  6 Marketing                 Green         2
##  7 Accounting                Navy          1
##  8 Business Administration   Green         1
##  9 Business Administration   Orange        1
## 10 Business Administration   Pink          1
## 11 Interdisciplinary Studies Blue          1
## 12 Management                Red           1
## 13 Marketing                 Black         1
## 14 Marketing                 Navy          1
## 15 Marketing                 Pink          1
## 16 Sports Management         Pink          1
## 17 Sports Management         Purple        1
## 18 Undeclared                Green         1
## 19 Undeclared                Orange        1

Q9. To answer Q8, add the second variable to your chart in Q7.

Hint: Use ggplot2::facet_wrap. Refer to the ggplot2 cheatsheet. See the section for Faceting.

class_roster %>% 
  ggplot(aes(fav_color)) + 
  geom_bar() +
  facet_grid(.~ Major)