Import data

data <- read_excel("../00_data/Data.xlsx")
## New names:
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
## • `` -> `...14`
data
## # A tibble: 10,846 × 14
##    team    `Team City` Population team_name  year  total   home   away  week
##    <chr>   <chr>            <dbl> <chr>     <dbl>  <dbl>  <dbl>  <dbl> <dbl>
##  1 Arizona Phoenix        1608139 Cardinals  2000 893926 387475 506451     1
##  2 Arizona Phoenix        1608139 Cardinals  2000 893926 387475 506451     2
##  3 Arizona Phoenix        1608139 Cardinals  2000 893926 387475 506451     3
##  4 Arizona Phoenix        1608139 Cardinals  2000 893926 387475 506451     4
##  5 Arizona Phoenix        1608139 Cardinals  2000 893926 387475 506451     5
##  6 Arizona Phoenix        1608139 Cardinals  2000 893926 387475 506451     6
##  7 Arizona Phoenix        1608139 Cardinals  2000 893926 387475 506451     7
##  8 Arizona Phoenix        1608139 Cardinals  2000 893926 387475 506451     8
##  9 Arizona Phoenix        1608139 Cardinals  2000 893926 387475 506451     9
## 10 Arizona Phoenix        1608139 Cardinals  2000 893926 387475 506451    10
## # ℹ 10,836 more rows
## # ℹ 5 more variables: weekly_attendance <chr>, ...11 <lgl>, ...12 <chr>,
## #   ...13 <lgl>, ...14 <dbl>

Introduction

Questions

Variation

Visualizing distributions

ggplot(data = data) +
    geom_bar(mapping = aes(x = team_name)) +
    coord_flip()

ggplot(data = data) +
    geom_histogram(mapping = aes(x = total), binwidth = 2000)

ggplot(data = data, mapping = aes(x = total, colour = team)) +
    geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Typical values

Unusual values

Missing Values

Covariation

A categorical and continuous variable

Two categorical variables

Two continous variables

Patterns and models