Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ
Analyzing Women’s World Cup Data
July 22, 2019
Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?
This data comes from data.world and includes final score and win/loss status data from 1991-2019 womens world cup, as well as the 2019 rosters for each team are included. Each row represents a country and their outcomes in the 1991-2019 world cups. The variables are year (year of tournament), team (abbreviated team), score (score by team), round (round of tournament), yearly_game_id (grouping variable - pairs team 1 and team 2 by round/year), team_num (team num (1 or 2 )), win_status (win/lose or tie). In 1991 there were 52 observations but that number increased every 4 years because of the number of teams that joined the world cup.
Hint: For example, importing data, understanding the data, data exploration, etc.
He starts by creating a new RMD and pastes the data in the R function. He created a table called wc_outcomes with the forward pipe operator, which told him that the world cup happens every 4 years as well as win/loss by country and the other variable outcomes. Throughout the video Dave used many different functions to create different graphs including, scatterplots, bivariate graphs, and grouped kernel density plots. These graphs gave Dave a lot more insight into the womens world cup outcomes as a whole.
Yes, David used the function of the forward pipe operator. This helped to perform multiple operations on the dataframe of wc_outcomes. The operator made it possible for Dave to easily chain a sequence of calculations.
A major finding from the analysis was that Rose Lavelle was involved in the most plays in the 2019 final game.
Since I play soccer it was interesting to find out that the majority of the team’s clearance’s go to one side of the field (left side).