Import Data

# excel filer
games <- read_excel("../00_data/MyData_charts.xlsx")
games
## # A tibble: 988 × 15
##     year country city    stage home_team away_team home_score away_score outcome
##    <dbl> <chr>   <chr>   <chr> <chr>     <chr>          <dbl>      <dbl> <chr>  
##  1  1930 Uruguay Montev… Grou… France    Mexico             4          1 H      
##  2  1930 Uruguay Montev… Grou… Belgium   United S…          0          3 A      
##  3  1930 Uruguay Montev… Grou… Brazil    Yugoslav…          1          2 A      
##  4  1930 Uruguay Montev… Grou… Peru      Romania            1          3 A      
##  5  1930 Uruguay Montev… Grou… Argentina France             1          0 H      
##  6  1930 Uruguay Montev… Grou… Chile     Mexico             3          0 H      
##  7  1930 Uruguay Montev… Grou… Bolivia   Yugoslav…          0          4 A      
##  8  1930 Uruguay Montev… Grou… Paraguay  United S…          0          3 A      
##  9  1930 Uruguay Montev… Grou… Uruguay   Peru               1          0 H      
## 10  1930 Uruguay Montev… Grou… Argentina Mexico             6          3 H      
## # ℹ 978 more rows
## # ℹ 6 more variables: win_conditions <chr>, winning_team <chr>,
## #   losing_team <chr>, date <dttm>, month <chr>, dayofweek <chr>

Clean Data

games1 <- na.omit(games[, c("year", "country", "city", "stage", "home_team", "away_team", "home_score", "away_score", "outcome", "winning_team", "losing_team", "date", "month", "dayofweek")])
games1
## # A tibble: 900 × 14
##     year country city    stage home_team away_team home_score away_score outcome
##    <dbl> <chr>   <chr>   <chr> <chr>     <chr>          <dbl>      <dbl> <chr>  
##  1  1930 Uruguay Montev… Grou… France    Mexico             4          1 H      
##  2  1930 Uruguay Montev… Grou… Belgium   United S…          0          3 A      
##  3  1930 Uruguay Montev… Grou… Brazil    Yugoslav…          1          2 A      
##  4  1930 Uruguay Montev… Grou… Peru      Romania            1          3 A      
##  5  1930 Uruguay Montev… Grou… Argentina France             1          0 H      
##  6  1930 Uruguay Montev… Grou… Chile     Mexico             3          0 H      
##  7  1930 Uruguay Montev… Grou… Bolivia   Yugoslav…          0          4 A      
##  8  1930 Uruguay Montev… Grou… Paraguay  United S…          0          3 A      
##  9  1930 Uruguay Montev… Grou… Uruguay   Peru               1          0 H      
## 10  1930 Uruguay Montev… Grou… Argentina Mexico             6          3 H      
## # ℹ 890 more rows
## # ℹ 5 more variables: winning_team <chr>, losing_team <chr>, date <dttm>,
## #   month <chr>, dayofweek <chr>

Chapter 13

What are primary keys in your data?

Primary key: A tibble: 0 × 6 home_team , home_score , away_team , away_score , date , n

Can you divide your data into two?

Divide it using dplyr::select in a way the two have a common variable, which you could use to join the two.

games1_1half <- games1 %>% select(year:away_score) %>% head(50)
games1_2half <- games1 %>% select(home_team:dayofweek) %>% head(50)
games1_1half
## # A tibble: 50 × 8
##     year country city       stage   home_team away_team    home_score away_score
##    <dbl> <chr>   <chr>      <chr>   <chr>     <chr>             <dbl>      <dbl>
##  1  1930 Uruguay Montevideo Group 1 France    Mexico                4          1
##  2  1930 Uruguay Montevideo Group 4 Belgium   United Stat…          0          3
##  3  1930 Uruguay Montevideo Group 2 Brazil    Yugoslavia            1          2
##  4  1930 Uruguay Montevideo Group 3 Peru      Romania               1          3
##  5  1930 Uruguay Montevideo Group 1 Argentina France                1          0
##  6  1930 Uruguay Montevideo Group 1 Chile     Mexico                3          0
##  7  1930 Uruguay Montevideo Group 2 Bolivia   Yugoslavia            0          4
##  8  1930 Uruguay Montevideo Group 4 Paraguay  United Stat…          0          3
##  9  1930 Uruguay Montevideo Group 3 Uruguay   Peru                  1          0
## 10  1930 Uruguay Montevideo Group 1 Argentina Mexico                6          3
## # ℹ 40 more rows
games1_2half
## # A tibble: 50 × 10
##    home_team away_team    home_score away_score outcome winning_team losing_team
##    <chr>     <chr>             <dbl>      <dbl> <chr>   <chr>        <chr>      
##  1 France    Mexico                4          1 H       France       Mexico     
##  2 Belgium   United Stat…          0          3 A       United Stat… Belgium    
##  3 Brazil    Yugoslavia            1          2 A       Yugoslavia   Brazil     
##  4 Peru      Romania               1          3 A       Romania      Peru       
##  5 Argentina France                1          0 H       Argentina    France     
##  6 Chile     Mexico                3          0 H       Chile        Mexico     
##  7 Bolivia   Yugoslavia            0          4 A       Yugoslavia   Bolivia    
##  8 Paraguay  United Stat…          0          3 A       United Stat… Paraguay   
##  9 Uruguay   Peru                  1          0 H       Uruguay      Peru       
## 10 Argentina Mexico                6          3 H       Argentina    Mexico     
## # ℹ 40 more rows
## # ℹ 3 more variables: date <dttm>, month <chr>, dayofweek <chr>

Can you join the two together?

Use tidyr::left_join or other joining functions.

games2 <- left_join(games1_1half, games1_2half)
## Joining with `by = join_by(home_team, away_team, home_score, away_score)`
games2
## # A tibble: 50 × 14
##     year country city    stage home_team away_team home_score away_score outcome
##    <dbl> <chr>   <chr>   <chr> <chr>     <chr>          <dbl>      <dbl> <chr>  
##  1  1930 Uruguay Montev… Grou… France    Mexico             4          1 H      
##  2  1930 Uruguay Montev… Grou… Belgium   United S…          0          3 A      
##  3  1930 Uruguay Montev… Grou… Brazil    Yugoslav…          1          2 A      
##  4  1930 Uruguay Montev… Grou… Peru      Romania            1          3 A      
##  5  1930 Uruguay Montev… Grou… Argentina France             1          0 H      
##  6  1930 Uruguay Montev… Grou… Chile     Mexico             3          0 H      
##  7  1930 Uruguay Montev… Grou… Bolivia   Yugoslav…          0          4 A      
##  8  1930 Uruguay Montev… Grou… Paraguay  United S…          0          3 A      
##  9  1930 Uruguay Montev… Grou… Uruguay   Peru               1          0 H      
## 10  1930 Uruguay Montev… Grou… Argentina Mexico             6          3 H      
## # ℹ 40 more rows
## # ℹ 5 more variables: winning_team <chr>, losing_team <chr>, date <dttm>,
## #   month <chr>, dayofweek <chr>