Import data

salary <- read.csv("../00_data/Salaries.csv")

Introduction

Questions

Variation

ggplot(data = salary) + 
    geom_bar(mapping = aes(x = rank))

Visualizing distributions

ggplot(data = salary) + 
    geom_bar(mapping = aes(x = rank)) 

salary %>% count(rank)
##        rank   n
## 1 AssocProf  64
## 2  AsstProf  67
## 3      Prof 266
ggplot(data = salary) + 
    geom_histogram(mapping = aes(x = salary))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data = salary, mapping = aes(x = salary, color = rank)) + 
    geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Typical values

salary %>%
   
     ggplot(aes(x = salary)) + 
    geom_histogram(binwidth =)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Unusual values ?

salary %>%
   
     ggplot(aes(x = salary)) + 
    geom_histogram(binwidth =)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

salary %>% 
    ggplot(aes(y = yrs.service)) + 
    geom_histogram() +
    coord_cartesian(ylim = c(0, 45))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Missing Values ?

Covariation

A categorical and continuous variable

salary %>% 
    
    ggplot(aes(x = sex, y = salary)) +
    geom_boxplot()

Two categorical variables

salary %>% 
    
    count(rank, sex) %>% 
    
    ggplot(aes(x = rank, y = sex, fill = n)) +
    geom_tile()

Two continous variables

library(hexbin)
salary %>%
    ggplot(aes(x = yrs.since.phd, y = yrs.service)) + 
    geom_hex()

Patterns and models ?

Need help with Patterns and models, missing values, and unusual values