Import Data

# excel file
data <- readxl::read_xlsx("../00_data/MyData.xlsx")

Introduction

Questions

Variation

Visualizing distributions

ggplot(data = data) +
  geom_bar(mapping = aes(x = year))

ggplot(data = data) +
  geom_histogram(mapping = aes(x = year), binwidth = 0.5)

ggplot(data = data, mapping = aes(x = year, colour = district  )) +
  geom_freqpoly(binwidth = 0.1)
## Warning: The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Typical values

ggplot(data = data, mapping = aes(x = candidatevotes )) +
  geom_histogram(binwidth = 10000)

Unusual values

ggplot(data) + 
  geom_histogram(mapping = aes(x = candidatevotes), binwidth = 10000) +
    coord_cartesian(ylim = c(0,100))

Missing Values

“Removing the outliers does not do anything to better so as such, no code will be put in this section”

Covariation

A categorical and continuous variable

ggplot(data = data, mapping = aes(x = candidatevotes)) + 
  geom_freqpoly(mapping = aes(colour = state), binwidth = 10000000000000)

Two categorical variables

ggplot(data = data) +
  geom_count(mapping = aes(x = candidate, y = state))

Two continous variables

ggplot(data = data) +
  geom_point(mapping = aes(x = year, y =  candidatevotes))

Patterns and models

“Data set would be the exact same as Two Continous Variables So I decided against including it.”