Import data

Soccer <- read_csv("../00_data/myData.csv")
## Rows: 900 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (11): country, city, stage, home_team, away_team, outcome, win_conditio...
## dbl   (3): year, home_score, away_score
## date  (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Introduction

Questions

Does being home than away have an advantage?

Which countries have hosted the world cup?

What teams usually make it to farther stages in the World Cup?

Variation

Visualizing distributions

ggplot(data = Soccer) +
  geom_bar(mapping = aes(x = year))

Typical values

Unusual values

Missing Values

ggplot(data = Soccer, mapping = aes(x = stage, y = winning_team)) + 
  geom_point()

Here we see what teams have gone farther to other stages

Covariation

A categorical and continuous variable

ggplot(Soccer) + 
  geom_bar(mapping = aes(x = outcome))

We can see here that teams definitely have home advantage

Two categorical variables

ggplot(data = Soccer) +
  geom_count(mapping = aes(x = outcome, y = stage))

Two continous variables

ggplot(data = Soccer) +
  geom_point(mapping = aes(x = winning_team, y = stage))

Patterns and models

ggplot(data = Soccer) + 
  geom_point(mapping = aes(x = country, y = year))

Here we see which countries have hosted the world cup.