2023-09-23

What about the dataset

Relationship between Year and goals scored.

we will use the bar plot to demonstrate the relationship of how many goals scored from 1990 to 2014 in the Fifa World cup. Below is the goals scored in year 1990-2014

goals_by_year <- FIFA %>%
  filter(Year >= 1990 & Year <= 2014) %>%
  group_by(Year) %>%
  summarize(Total_Goals = sum(Home.Team.Goals + Away.Team.Goals))
head(goals_by_year)
## # A tibble: 6 × 2
##    Year Total_Goals
##   <int>       <int>
## 1  1990         115
## 2  1994         141
## 3  1998         171
## 4  2002         161
## 5  2006         147
## 6  2010         145

Bar plot

ggplot(goals_by_year, aes(x = Year, y = Total_Goals)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(
    title = "Total Goals Scored from 1990 to 2014",
    x = "Year",
    y = "Total Goals"
  ) +
  theme_minimal()

Conclusion

According to the dataset given, the goals by year wasn’t given so i created one and plot it according to the goals scored. The bar plot shows that there was 200 total goals in 2014 which is the most goals scored from from 1930-2014

Dot plot

A dot plot is a type of data visualization that displays individual data points as dots on a graph. Dot plots are useful for visualizing the distribution of a dataset and density of data points along a single axis. we used dot plot to show how many times 10 random teams from the dataset have appeared in the world cup.

data frame for how many time ecah team has appeared in the world cup

all_teams <- c(FIFA$Home.Team.Name, FIFA$Away.Team.Name)
team_appearances <- table(all_teams)
team_appearances_df <- data.frame(Team = names(team_appearances), Appearances = as.numeric(team_appearances))
team_appearances_df <- team_appearances_df %>%
  arrange(desc(Appearances))
head(team_appearances_df)
##         Team Appearances
## 1     Brazil         108
## 2      Italy          83
## 3  Argentina          81
## 4    England          62
## 5 Germany FR          62
## 6     France          61

set.seed(123)
#Set the seed for reproducibility
sampled_teams <- sample(team_appearances_df$Team, 10)
#Randomly sample a subset of 10 teams
sampled_data <- team_appearances_df %>%
  filter(Team %in% sampled_teams)
# Filter the data frame to include only the sampled teams
ggplot(sampled_data, aes(x = Appearances, y = Team)) +
  geom_point(size = 3, color = "blue") +
  labs(
    title = "Number of Appearances in the World Cup (Random Subset)",
    x = "Number of Appearances",
    y = "Team"
  ) +
  theme_minimal()

#used ggplot to create a dot plot.

Conclusion

I found out that though Nigeria has more appearance than Ghana but Ghana has a lot more than UAE. Yugoslavia has a lot more appearances than i thought.

Thank you