Import data

# excel file
colony <- read_excel("../00_data/myData.xlsx")

Introduction

Questions

Variation

Visualizing distributions

ggplot(data = colony) +
  geom_bar(mapping = aes(x = months))

ggplot(data = colony) +
  geom_histogram(mapping = aes(x = colony_lost_pct), binwidth = 0.5)
## Warning: Removed 54 rows containing non-finite values (`stat_bin()`).

colony %>% count(months)
## # A tibble: 4 × 2
##   months               n
##   <chr>            <int>
## 1 April-June         329
## 2 January-March      329
## 3 July-September     282
## 4 October-December   282
ggplot(data = colony, mapping = aes(x = colony_lost_pct, colour = months)) +
  geom_freqpoly(binwidth = 0.1)
## Warning: Removed 54 rows containing non-finite values (`stat_bin()`).

Typical values

ggplot(data = colony, mapping = aes(x = colony_lost_pct)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 54 rows containing non-finite values (`stat_bin()`).

Unusual values

Missing Values

Covariation

A categorical and continuous variable

ggplot(data = colony, mapping = aes(x = colony_lost_pct)) + 
  geom_freqpoly(mapping = aes(colour = months), binwidth = 40)
## Warning: Removed 54 rows containing non-finite values (`stat_bin()`).

ggplot(colony) + 
  geom_bar(mapping = aes(x = months))

Two categorical variables

Two continous variables

ggplot(data = colony) +
  geom_point(mapping = aes(x = colony_n, y = colony_lost))
## Warning: Removed 47 rows containing missing values (`geom_point()`).

Patterns and models

ggplot(data = colony) + 
  geom_point(mapping = aes(x = colony_lost_pct, y = colony_reno_pct))
## Warning: Removed 54 rows containing missing values (`geom_point()`).