Exploratory Data Analysis

Add a variable of your own to answer a research question you might have: i.e., What is the favorite number of PSU students? Do favorite numbers vary by major or class?
Q1. What does the row represent?
Q2. What characteristics of students (variables) does the data describe?
Q3. What type of data is the new variable you added (i.e., numeric, character, logical)?
Q4. What type of R object is class_roster (i.e., vector, matrix, data frame, list)? And why?
Q5. Describe the first student (first row) using all variables.
Q6. Count the number of values in your new variable.
Q7. Plot your new variable.
Q8. Does your answer in Q6 vary by major?
Q9. To answer Q8, add the second variable to your chart in Q7.

Using the given code, answer the questions below.

library(tidyverse) 

class_roster <- read.csv("~/R/busStat/Data/classRoster02.csv") %>%
  as_tibble()
class_roster
## # A tibble: 30 x 6
##        X Student Class     Major                   income fav_carcompany
##    <int> <fct>   <fct>     <fct>                    <int> <fct>         
##  1     1 Scott   Sophomore Marketing                 1010 bmw           
##  2     2 Colette Sophomore Business Administration    920 ""            
##  3     3 Niti    Senior    Business Administration   1031 ""            
##  4     4 Tyler   Sophomore Management                1064 audi          
##  5     5 Ryan    Sophomore Undeclared                1021 tesla         
##  6     6 Jack    Sophomore Business Administration   1053 ford          
##  7     7 Michael Sophomore Business Administration   1001 mercedes      
##  8     8 Brianna Sophomore Marketing                 1156 pagani        
##  9     9 Trevor  Sophomore Sports Management         1019 mclaren       
## 10    10 Connor  Sophomore Sports Management          848 aston martin  
## # ... with 20 more rows

Add a variable of your own to answer a research question you might have: i.e., What is the favorite number of PSU students? Do favorite numbers vary by major or class?

What is each students favorite car company?

Q1. What does the row represent?

Each row represents a student.

Q2. What characteristics of students (variables) does the data describe?

Class, major, income, and favorite car company.

Q3. What type of data is the new variable you added (i.e., numeric, character, logical)?

Character

Q4. What type of R object is class_roster (i.e., vector, matrix, data frame, list)? And why?

The class roster is a data frame because it can hold more than one data type.

Q5. Describe the first student (first row) using all variables.

Hint: Use View(). Scott is sophmore majoring in marketing with an income of $1010 and his favorite car company is BMW.

Q6. Count the number of values in your new variable.

Hint: Use count(). There are 23 values in the new variable.

class_roster%>% count(fav_carcompany, sort = TRUE)
## # A tibble: 23 x 2
##    fav_carcompany     n
##    <fct>          <int>
##  1 honda              3
##  2 ""                 2
##  3 audi               2
##  4 bmw                2
##  5 ford               2
##  6 subaru             2
##  7 acura              1
##  8 alfa romeo         1
##  9 aston martin       1
## 10 chevy              1
## # ... with 13 more rows

Q7. Plot your new variable.

Hint: Refer to the ggplot2 cheatsheet. Google it. See the section for One Variable. Note that there are two different cases: 1) Continuous and 2) Discrete. The type of chart you can use depends on what type of data your variable is.

class_roster%>%
  ggplot(aes(fav_carcompany)) +
  geom_bar()

Q8. Does your answer in Q6 vary by major?

Hint: Use dplyr::group_by in addition to count().

class_roster%>% 
  group_by(Major)%>%
  count(fav_carcompany, sort = TRUE)
## # A tibble: 29 x 3
## # Groups:   Major [7]
##    Major                     fav_carcompany     n
##    <fct>                     <fct>          <int>
##  1 Business Administration   ""                 2
##  2 Accounting                honda              1
##  3 Business Administration   audi               1
##  4 Business Administration   chevy              1
##  5 Business Administration   ford               1
##  6 Business Administration   honda              1
##  7 Business Administration   land rover         1
##  8 Business Administration   mercedes           1
##  9 Business Administration   toyota             1
## 10 Interdisciplinary Studies ford               1
## # ... with 19 more rows

Q9. To answer Q8, add the second variable to your chart in Q7.

Hint: Use ggplot2::facet_wrap. Refer to the ggplot2 cheatsheet. See the section for Faceting.

class_roster%>%
  ggplot(aes(fav_carcompany, Major)) +
  geom_point() +
  facet_wrap(~ Major)