EPL <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/carricoc2_xavier_edu/EV0g3AUxTBtHjyXpnE1bWlQBKobrTshii8ovGC-e5XzIUQ?download=1")Data wrangling with loops
with an introduction to Quarto
Introduction
I am going to be performing an analysis on the English Premier League from the 2020-2021 Season. This data records information on every single player that was registered to the 30 teams that make up the EPL, with demographic information such as Name, Age, and Position. It also included statistics on the number of passes, goals, dribbles, fouls, penalties, and other performance indicative variables. You can learn more about the EPL data at this link: https://myxavier-my.sharepoint.com/:x:/g/personal/carricoc2_xavier_edu/EV0g3AUxTBtHjyXpnE1bWlQBKobrTshii8ovGC-e5XzIUQ?download=1
Research Question
“Do Chelsea players had a higher average xG than Manchester United, Arsenal, Tottenham Hotspur, and Liverpool Players?” To answer this question, I would follow a fairly similar process as I did for my first hypothesis. I would pull from the EPL data set, and would be examining the variables Club and xG. Club represents the team a player is a member of, and xG stands for Expected Goal per Game. I formed this hypothesis because Chelsea won the Champions League Final this season, and I would assume that they would have a higher average xG than other clubs in the EPL. The code I ran to find this is similar to the first hypothesis’ code, expect I used different variables, and filtered by 4 clubs instead of 2 positions. I then visualized the answer using a bar chart.
After running this code an visualizing the data, I find that the hypothesis I formed is false. Chelsea does have a higher average xG than the likes of clubs like Arsenal, Manchester United, and Tottenham Hotspur, but has a slightly lower average xG than Liverpool.