World Cup Trophy

From the maven data analystics,We will be analysing Maven world cup challenge, We will be making some analysis on the Historical data leading to the 2022 FIFA world cup tournament in Qatar, including all the matches from the previous world cups, all international matches for the qualified countries, and the groups and matches for the upcoming tournament.
In our analysis, our case study country will be Brazil, The Brazil national team also nicknamed the Seleção Canarinha is one of the 32 footballing nations that will participating in this years tournament in Qatar. we will dig into data and tell a single-page story of a country’s history with the World Cup, their road to Qatar, and their expectations for this year’s tournament and also present our insight by using data visualization

Install the needed packages
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.2
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggplot2)
library(lubridate)
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(maps)
## Warning: package 'maps' was built under R version 4.2.2
## 
## Attaching package: 'maps'
## 
## The following object is masked from 'package:purrr':
## 
##     map
Import the dataset
world_cup_matches <- read.csv("2022_world_cup_matches.csv")
world_cup_groups <- read.csv("2022_world_cup_groups.csv")
international_matches <- read.csv("international_matches.csv")
world_cup_games <- read.csv("world_cup_matches.csv")
world_cup <- read.csv("world_cups.csv")

Data preparation and cleaning

Brazil_international_matches <- international_matches %>%     filter(international_matches$Home.Team =="Brazil" | international_matches$Away.Team =="Brazil")
world_cup_v2 <- world_cup %>% 
  mutate(Runners.Up = recode(Runners.Up
                         ,"Germany FR" = "Germany"))
world_cup_v2 <- world_cup %>% 
  mutate(Third = recode(Third
                             ,"Germany FR" = "Germany"))
world_cup_v2 <- world_cup %>% 
  mutate(Winner = recode(Winner
                             ,"Germany FR" = "Germany"))
Brazil_international_matches <- Brazil_international_matches %>% 
  mutate(Tournament = recode(Tournament
                                ,"Germany FR" = "Germany"))
Brazil_wc_game <- world_cup_games %>% 
  filter(world_cup_games$Home.Team == "Brazil" | world_cup_games$Away.Team == "Brazil")

Number of goals scored at home and number of matches played in every tournament

Brazil_international_matches %>% 
  group_by(Tournament) %>% 
  filter(Home.Team == "Brazil") %>% 
  summarise(total_home_goals = sum(Home.Goals),Num_of_match_played = n()) %>% 
  arrange(desc(total_home_goals)) 
## # A tibble: 15 × 3
##    Tournament                   total_home_goals Num_of_match_played
##    <chr>                                   <int>               <int>
##  1 Friendly                                  556                 228
##  2 Copa America                              349                 137
##  3 FIFA World Cup qualification              174                  63
##  4 Confederations Cup                         54                  18
##  5 Copa Roca                                  32                  13
##  6 Copa Oswaldo Cruz                          29                   9
##  7 Pan American Championship                  25                   9
##  8 Copa Rio Branco                            23                   9
##  9 Copa Bernardo O'Higgins                    11                   4
## 10 Gold Cup                                    8                   5
## 11 Atlantic Cup                                7                   2
## 12 Brazil Independence Cup                     5                   4
## 13 Mundialito                                  4                   1
## 14 Superclasico de las Americas                4                   3
## 15 USA Cup                                     4                   2

Number of goals scored at away and number of matches played in every tournament

Brazil_international_matches %>% 
  group_by(Tournament) %>% 
  filter(Away.Team == "Brazil") %>% 
  summarise(total_away_goals = sum(Away.Goals),Num_of_match_played = n()) %>% 
  arrange(desc(total_away_goals))
## # A tibble: 18 × 3
##    Tournament                   total_away_goals Num_of_match_played
##    <chr>                                   <int>               <int>
##  1 Friendly                                  392                 201
##  2 FIFA World Cup qualification              105                  64
##  3 Copa America                               81                  54
##  4 Confederations Cup                         24                  15
##  5 Copa Oswaldo Cruz                          17                   7
##  6 Copa Roca                                  17                  10
##  7 Gold Cup                                   14                   9
##  8 Copa Rio Branco                            13                   9
##  9 Pan American Championship                  13                   7
## 10 King's Cup                                  7                   1
## 11 Lunar New Year Cup                          7                   1
## 12 Copa Bernardo O'Higgins                     6                   6
## 13 Tournoi de France                           5                   3
## 14 Superclasico de las Americas                4                   5
## 15 Rous Cup                                    3                   2
## 16 Atlantic Cup                                2                   3
## 17 Mundialito                                  2                   2
## 18 USA Cup                                     2                   1

Number of goals conceded at home in all competition

Brazil_international_matches %>% 
  filter(!Home.Team %in% c('Brazil'))%>% 
  summarise(goals_conceded = sum(Home.Goals))
##   goals_conceded
## 1            408

Number of goals conceded away in all competition

Brazil_international_matches %>% 
  filter(!Away.Team %in% c('Brazil'))%>% 
  summarise(goals_conceded = sum(Away.Goals))
##   goals_conceded
## 1            397

world cup winning teams,match played and the number of tournament goals

world_cup %>% 
  group_by(Year,Winner,Matches.Played) %>% 
  drop_na(Goals.Scored) %>% 
  summarise(num_of_goals = sum(Goals.Scored)) %>% 
  arrange(-Matches.Played)
## `summarise()` has grouped output by 'Year', 'Winner'. You can override using
## the `.groups` argument.
## # A tibble: 21 × 4
## # Groups:   Year, Winner [21]
##     Year Winner     Matches.Played num_of_goals
##    <int> <chr>               <int>        <int>
##  1  1998 France                 64          171
##  2  2002 Brazil                 64          161
##  3  2006 Italy                  64          147
##  4  2010 Spain                  64          145
##  5  2014 Germany                64          171
##  6  2018 France                 64          169
##  7  1982 Italy                  52          146
##  8  1986 Argentina              52          132
##  9  1990 Germany FR             52          115
## 10  1994 Brazil                 52          141
## # … with 11 more rows

Total number of teams to participate in the world cup

Hteam <- world_cup_games %>% 
  select(Home.Team)
Ateam <- world_cup_games %>% 
  select(Away.Team) 
Ateam <- rename(
  Ateam, Home.Team = Away.Team
)
total_teams <- bind_rows(Ateam,Hteam)

Total numbers of Countries to make to the world cup

n_distinct(total_teams)
## [1] 81

Number of games played in FIFA world cup

total_teams %>% 
  group_by(Home.Team) %>% 
  summarise(num_of_app = n()) %>% 
  arrange(desc(num_of_app))
## # A tibble: 81 × 2
##    Home.Team num_of_app
##    <chr>          <int>
##  1 Brazil           109
##  2 Germany          109
##  3 Italy             83
##  4 Argentina         81
##  5 England           69
##  6 France            66
##  7 Spain             63
##  8 Mexico            57
##  9 Uruguay           56
## 10 Sweden            51
## # … with 71 more rows

Number of goals scored by Brazil in the world cup

Hgoal <- world_cup_games %>% 
  filter(Home.Team == 'Brazil') %>% 
  summarise(Num_of_Hgoal = sum(Home.Goals))
Agoal <- world_cup_games %>% 
  filter(Away.Team == 'Brazil') %>% 
  summarise(num_of_Agoal = sum(Away.Goals))
world_cup_games %>% 
  summarise(Brazil_wc_goals = sum(Hgoal+Agoal))
##   Brazil_wc_goals
## 1             229

Number of goals conceded in the world by Brazil

Chgoal <- Brazil_wc_game %>% 
  filter(!Home.Team == 'Brazil') %>% 
  summarise(Num_of_Hgoal = sum(Home.Goals))
Cagoal <- Brazil_wc_game %>% 
  filter(!Away.Team == 'Brazil') %>% 
  summarise(num_of_Agoal = sum(Away.Goals))
Brazil_wc_game %>% 
  summarise(conceded_goals = sum(Chgoal+Cagoal))
##   conceded_goals
## 1            105

Number of world cup tournament

world_cup_v2 %>% 
  drop_na(Goals.Scored) %>% 
  summarise(number_of_edition= n())
##   number_of_edition
## 1                21

Total number of goals scored in FIFA world cup

world_cup %>% 
  drop_na(Goals.Scored) %>% 
  summarise(total_goal = sum(Goals.Scored))
##   total_goal
## 1       2548

Average number of goals scored in FIFA world cup

world_cup %>% 
  drop_na(Goals.Scored) %>% 
  summarise(Avg_goal = mean(Goals.Scored))
##   Avg_goal
## 1 121.3333

list of world cup winners

world_cup_v2 %>% 
  group_by(Winner) %>% 
  drop_na(Goals.Scored) %>% 
  summarise(number_of_winners = n()) %>% 
  arrange(desc(number_of_winners))
## # A tibble: 8 × 2
##   Winner    number_of_winners
##   <chr>                 <int>
## 1 Brazil                    5
## 2 Germany                   4
## 3 Italy                     4
## 4 Argentina                 2
## 5 France                    2
## 6 Uruguay                   2
## 7 England                   1
## 8 Spain                     1

list of world cup runners up

world_cup_v2 %>% 
  group_by(Runners.Up) %>% 
  drop_na(Goals.Scored) %>% 
  summarise(Runners_up = n()) %>% 
  arrange(desc(Runners_up))
## # A tibble: 11 × 2
##    Runners.Up     Runners_up
##    <chr>               <int>
##  1 Argentina               3
##  2 Germany FR              3
##  3 Netherlands             3
##  4 Brazil                  2
##  5 Czechoslovakia          2
##  6 Hungary                 2
##  7 Italy                   2
##  8 Croatia                 1
##  9 France                  1
## 10 Germany                 1
## 11 Sweden                  1

list of world cup third place

world_cup_v2 %>% 
  group_by(Third) %>% 
  drop_na(Goals.Scored) %>% 
  summarise(Third_place = n()) %>% 
  arrange(desc(Third_place))
## # A tibble: 15 × 2
##    Third       Third_place
##    <chr>             <int>
##  1 Germany               3
##  2 Brazil                2
##  3 France                2
##  4 Poland                2
##  5 Sweden                2
##  6 Austria               1
##  7 Belgium               1
##  8 Chile                 1
##  9 Croatia               1
## 10 Germany FR            1
## 11 Italy                 1
## 12 Netherlands           1
## 13 Portugal              1
## 14 Turkey                1
## 15 USA                   1

Visualization

match played and the number of tournament goals

world_cup %>% 
  group_by(Year,Winner,Matches.Played) %>% 
  drop_na(Goals.Scored) %>% 
  summarise(num_of_goals = sum(Goals.Scored)) %>% 
  ggplot(aes(x = Matches.Played, y = num_of_goals, fill= Year,size = num_of_goals))+
  geom_point()+
  labs(title = "Number of goals scored in each FIFA world cup by Number of participants", subtitle = "Total number of goals by number of participating nations")+
  labs(y= "Total number of goals scored", x = "Total number of matches played")
## `summarise()` has grouped output by 'Year', 'Winner'. You can override using
## the `.groups` argument.

From our graph, as the games becomes more modernize and competitive, The number of participating teams are increased which gradually which increases the total number of goals scored in the competition over the years

Number of games played in world cup by every participating nations

total_teams %>% 
  group_by(Home.Team) %>% 
  summarise(num_of_app = n()) %>% 
  arrange(desc(num_of_app)) %>% 
  head(20) %>%
  arrange(-num_of_app) %>% 
  ggplot(aes(x = Home.Team, y = num_of_app))+
  geom_col()+
  coord_flip() +
  labs(y= "Participating nations", x = "Number of games played")+
  labs(title = " Footballing nations and their total world cup games")+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

list of FIFA world cup winners

world_cup_v2 %>% 
  group_by(Winner) %>% 
  drop_na(Goals.Scored) %>% 
  summarise(number_of_winnings = n()) %>% 
  ggplot(aes(x = Winner, y = number_of_winnings))+
  geom_col(fill = "green")+
  labs(y= "Number of champion", x = "Winning Nations")

The Brazilian football national team are the country with most FIFA world cup champions with a record breaking five times Trophies,Germany and Italy also record a joint number of champions with four Trophies each, Spain and England are also proud recipient of one Trophy each

list of world cup runners up

world_cup_v2 %>% 
  group_by(Runners.Up) %>% 
  drop_na(Goals.Scored) %>% 
  summarise(Runners_up = n()) %>% 
  arrange(desc(Runners_up)) %>% 
  ggplot(aes(x =Runners.Up, y =Runners_up ))+
  geom_col(fill = "red")+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  labs(y= "Number of runners up", x = "Runners up Nations")