Introduction

This report will be focusing on the Australian Women’s Football Team, Melbourne Victory. This team is looking to become more prolific in front of goal as the current level of goals scored is not at the standard expected to be competing at the highest level in Women’s Domestic Football. In order to combat this inefficiency, this report will look to highlight the standard of attacking talent currently within our squad, and provide data driven alternatives that may help the coaching staff make an informed decision on the players included in the squad moving forward. The data used within this report will be taken from both Melbourne Victory’s performances during the 2024/25 Women’s A-League campaign and statistics from multiple leagues across the globe.

Methods

This section will outline the process of data collection, cleaning and manipulation protocols for determining potential signings for the Melbourne Victory Women’s team. The analysis outlined was a combination of qualitative and quantitative, this was through contextual information being acquired through a Q&A session with the management and performance data being extracted from six separate global football leagues. All the coding, statistical processing and visualisations were generated within R-Studio [1], code was annotated throughout to provide a more seamless reproduction of results.

Prior to beginning the quantitative analysis, as mentioned before the management opened for a Q&A meeting that assisted in offering important context to help shape the metric selection and weighting of specific Key Performance Indicators (KPI). The key areas of question (which are in the table below) centred around three key points, these were

Summary of Pre-Analysis Q&A With Melbourne Victory Staff
Question_Number Question_Asked Answer_Given
1 Given the team’s current defensive solidity, what is the team’s current formation with and without possession, and how does the current striker fit within these systems respectively? We prefer to play 4-3-3 which may change to 4-1-4-1 out of possession. The current striker will mainly play as a poacher, with the midfield and wingers making the play and bringing the ball into the box.
2 Does the team recruit any individuals who have any history (large or small) with lower limb injuries? Yes, as long as they have recovered and have shown to be performing again.
3 Following on from where a striker fits within your current offensive and defensive set-up, what does your team deem important physiologically for a striker to be successful at your club? Tall, strong, good in the air.

This information advised the choices taken to prioritise specific metrics when looking to recruit a new striker to the team. An example of this being identifying minutes played as a proxy to determine a player’s overall durability and availability. Additionally, these physiological targets were obtained through external sources to contextualise the demands placed upon strikers coming into the new set-up within the Melbourne Victory Team.

Data Sources

Player performance data was sourced from two separate locations; the first data set was accessed through myplace and gave player statistical data on the squad of the Melbourne Victory team throughout the 2024-25 season[2]. Secondary data was also accessed through myplace however this time the data was sourced from FBRef, which has been shown to provide consistent high-quality datasets across major women’s football leagues. Both datasets were in the form of an excel file with distinctive sheets being used to showcase the separate leagues. These leagues are shown below. All data was taken over the 2024-25 season to minimise the element of external factors influencing selection.

  • National Women’s Soccer League (NWSL) (USA) [3]
  • A-League Women (Australia) [4]
  • FA Women’s Super League (WSL) (England) [5]
  • Frauen-Bundesliga (Germany) [6]
  • Liga F (Spain) [7]
  • Serie A Femminile (Italy) [8]

Each dataset contained a total of 53 individual metrics looking at player performance these included but not limited to playing time, progressive passes, yellow cards and G+A.

Data Importing and Screening

All datasets were imported into R-Studio [1] and the following packages were utilised for the importing and analysis of the data, these were

  • Tidyverse
  • Dplyr
  • Readxl
  • Ggplot2
  • Knitr
  • Kableextra
  • Tidytext
  • Flextable
  • Citr

Following importation, all datasets underwent a data cleaning process to ensure the data could be analysed effectively and increase the validity of the results produced. This process was to check for missing values, correcting any inconsistencies in variable names and ensuring positional accuracy.

From here we can then begin to filter the data appropriately to suit the needs of this report. Of the 53 variables displayed in the imported datasets, it was reduced to a base of 10 variables to focus on specific metrics needed to profile a striker’s performance. A full list of these metrics is shown below. Filtering the dataset to preserve only players listed as “FW” or “FW,MF” (as some players were performing in a hybrid role) was essential to examine forward players only.

List of Variables Used in the Analysis
Variable
Player
Nation
Age
Matches Played
Starts
Minutes Played
Position
Goals
Assists
xG

Additional metrics were added to the new dataset to mirror the existing dataset provided by the management team. These new metrics were added by the following equations which are listed below. From this point with both datasets being a mirror of each other, composite scores can be determined to give a clear picture of a player’s value to the team. From discussion with management the KPIs are weighted accordingly with the value assigned by the club, a higher emphasis on overall output hence why goals and xG are valued higher over assists yet assists play a key part in a striker’s ability.

  • \(Goals per 90Mins = (Goals/(Min/90))\)

  • \(Assists per 90Mins = (Assists/(Min/90))\)

  • \(xG per 90Mins = (xG/(Min/90))\)

  • \(Points = (x/Benchmark) * Max Points\)

  • \(Total = Points~Goals~+Points~Assists~+Points~xG~\)

Merging Data

Once filtering has taken place it was essential to merge all datasets to produce a single striker dataset containing all forwards across the six leagues. Three performance metrics were then able to be determined to account for playing time, the equations are listed above. This allowed for more meaningful comparisons to account for uneven fixture schedules across the leagues.

Data Visualisations

All visualisations within the results were produced within R-Studio. Every figure was made directly from the cleaned dataset to reduce the potential of manual error whilst increasing the ease of reproducibility. Additional physiological data is sourced from external sources to help contextualize analytical findings, but it is noted that these results were not merged with the datasets.

Results

This section will outline the key findings of the analysis from both internal and external sources. The internal analysis was done to ascertain standards within the current squad to be used for comparison. External sources will then be cross-examined against these standards to determine if players meet or exceed the prerequisites for playing for Melbourne Victory.

Table 1 - Top 5 Strikers – Melbourne Victory
Player Pos Age Min Gls Gls.p90 Ast Ast.p90 xG xG.p90
Emily Gielnik FW 32 1544 12 0.70 1 0.06 7.8 0.45
Nickoletta Flannery FW,MF 25 1807 5 0.25 2 0.10 3.6 0.18
Holly Furphy FW 22 516 3 0.52 0 0.00 0.7 0.12
Alex Chidiac MF 25 1919 2 0.09 4 0.19 1.7 0.08
Rachel Lowe FW,MF 23 2004 2 0.09 5 0.22 4.4 0.20

To ascertain an in-house reference point, the top five strikers within the team were assembled and shown within Table 1. This summarizes the players contributions across goals, assists, expected goals (xG) and minutes played. Table 1 demonstrates that throughout the squad there is a clear lack of goal scoring, however one player is shown to be a major contributor to the goal scoring in the team (Emily Gielnik). Emily Gielnik then logically became the performance benchmark for the recruitment of new players into the team, meaning that new signings should be looking to at minimum meet this standard but ideally exceed the standard set by Emily’s performance across the 2024-25 season.

To comprehend the accessibility of high performing strikers in the global market, the top strikers across the six leagues highlighted in the methods section were studied (Figure 1). For ease of interpretation, a traffic light system was implemented to clearly outline the individuals against the clubs’ standards. They were categorized as follows, below the benchmark (red), level with the benchmark (yellow) and above the benchmark (green). It can show that a substantial proportion of strikers are performing at the standard within the Melbourne Victory team currently, with noteworthy examples coming from players (Eva Pajor) and (Temwa Chawinga) who have scored a significantly higher number of goals than our best striker at the club.

Utilizing the same traffic light classification, further analysis was conducted more metrics with the datasets. Like goals scored, analysis of the data suggests that current forwards within our squad lack creative output, as we can see many players in the same position are providing more for their teammates. Players such as (Ewa Pajor) and (Lineth Beersteyn) are performing at a level far beyond that currently within the team, which means their value is particularly noted for addressing a clear deficiency.

Figure 3 provides an overview of Goals, Assists and xG of the top 25 strikers across the six leagues, with the top 5 players being highlighted. The graph grants an easy observation to high performing athletes across each metric. Figure 4 then provides the same metrics but across the impact of a single game (90 Mins), which provides a more robust comparison equating for variance of playing time.

Table 2 - Top 5 Strikers Across All Six Leagues
Player Pos Age Min Goals Gls.p90 Assists Ast.p90 xG xG.p90
Ewa Pajor FW 27 1991 25 1.13 10 0.45 23.9 1.08
Temwa Chaŵinga FW 25 2142 20 0.84 5 0.21 18.1 0.76
Cristiana Girelli FW 34 1776 19 0.96 2 0.10 11.0 0.56
Lineth Beerensteyn FW,MF 27 1514 17 1.01 1 0.06 15.2 0.90
Martina Piemonte FW 26 1833 17 0.83 3 0.15 14.0 0.69

Table 2 then summarizes players highlighted within Figure 4 compiling all of the metrics commented on above. It shows that these individuals are consistently performing at a high level across multiple measures, this begins to build a case for these players being considered for recruitment. To ultimately show the final crop of players who should be considered for recruitment, building a composite scoring system was key to providing this platform. The score is there to offer a more rounded depiction of an striker’s true value, emphasizing players who consistently contribute across multiple facets of the game. From Figure 5 it revealed an established group of players that were performing above the rest within the various leagues analysed. It is from this group that the recruitment team propose to involve these players as ones to recruit to improve the striker situation within the team currently.

Discussion

The purpose of this report is to ascertain new players to improve the attacking quality within the Melbourne Victory squad. The conclusions from the analysis show a base of players best suited for recruitment to the team based on recruitment objectives. This discussion will evaluate the dynamics involved in the analytical process, whilst addressing important shortfalls that can aid the recruitment process moving forward.

Among the KPIs valued, Goals and Expected Goals (xG) became the most prominent for assessing a striker’s true value. Goals offer a concrete outcome measure in comparison to xG which shows an echo of the value of the possibility leading to successful outcomes. This differentiation between them is imperative in assessing the overall finishing ability of a striker [9], and it can be shown from the results that a multitude of players who were overperforming in relation to their xG. This over-performance suggests a real potential for usage within the team’s current system and ethos of having their striker act as a ‘poacher’.

Conversely, despite these KPIs having a strong bearing on the value of strikers looking to be recruited, the dataset does inherently contain drawbacks to full assessment of these players [10]. This is due to the lack of contextual dynamics surrounding the information such as shot location and defensive pressure. Within academic literature it has been stated that without contextual variables, true assessments of player performance are difficult to obtain [11]. In practice this suggests that xG values are a moderated form of the theory and therefore it hinders the capacity to understand a players true finishing ability [12].

The composite scoring system provided a complete process to rank the strikers across the leagues, this multidimensional way of classifying strikers has been shown within current academic research [13].Current research on performance within women’s football recognises that despite composite scores appearing objective they can still be affected by methodological actions [9]. This concludes that despite the system identifying a clear group of individuals performing at the level needed to be selected for recruitment, these results are not conclusive, and further investigation is required[14].

Alongside the KPI’s outlined in the results, the management team gave clear indications on the physical characteristics they would want for strikers within their team. This was mainly centred around the players height to act as a dominant figure in the air. From the players highlighted, external information located from FotMob [[15]][16][[17]][18][19], indicates that 4 out of the 5 players selected are above what is considered average height for female football players [20]. The team’s ethos of using their striker as a ‘poacher’ does link to academic literature suggesting taller players are able to participate in actions such as set-piece dominance and hold-up play [21],[22]. Conversely the management team should be aware that despite not all selected players possessing this trait they should still be considered for recruiting, as despite physiological traits being an asset, they are not the direct predictor of player performance [23].

Investigations looking into injury within women’s football has been shown to be substantial field within academic research [24],[13]. So, to indirectly identify levels of robustness within the players in the dataset was examining their minutes played. This value alone presents valuable insights whilst bearing limitations that must be considered. Research shows that being able to be consistently available for selection is crucial [25], [24]. Conversely the true nature of players robustness can’t be fully assessed without the inclusion of factors such as training load, GPS data and detailed injury history, a common concern in performance research in women’s football [26].

This constraint of GPS data being absent from the datasets used for analysis does significantly affect analytical potential. It is commonly stated within Women’s Football research that the monitoring of external loads is vital in performance profiling [27],[28]. The metrics taken from these observations can play a part in providing the contextual component that was missing within the original dataset. Incorporating this data would have allowed for a far more complete assessment of the players in terms of appropriateness for team[29].

Despite the analysis being able to give strong data driven recommendations, there is still an overarching hinderance, due to the clear imbalance in research depth between men and women[30]. Women’s football research - despite its rapid expansion – remains underdeveloped in comparison to men’s football. Women’s football is hindered by smaller sample populations and with significantly lower investment in scientific investigations [29],[24]. This imbalance has created knowledge deficit where despite known differences in biomechanics and physiology, male-derived models are often relied upon [31]. A clear example of this divide is the high-speed running demands that vary between the sexes, with women’s football exhibiting a variance in intensity influenced by contextual factors [32]. These inconsistencies show that evidence-based decision making is still imperfect for forming performance models that reflect the demands in women’s football.

To improve upon the analysis conducted in this report, future recruitment drives should consider the following components to produce more holistic overviews in the future. Providing a combination of both quantitative and qualitative measures will capture nuanced contextual dynamics that can’t be captured through data, this will help understand the nature of players aligning with current team dynamics [33]. This measure was not highlighted by management and is something that is very important to understand moving forward. A further addition to boost the strength of scouting reports is to expand the size of the current dataset used for analysis. This expansion across both multiple seasons and international competitions can allow processes such as predictive modelling to have a far more robust design in player recruitment [34].

Conclusion

In conclusion, this analysis provides valuable insights into striker’s performance and supports evidence-informed recruitment and tactical decisions. It is noted that the analysis is hampered by existing dataset limitations within the dataset and subsequent gaps within the current academic literature surrounding for women’s football. Nonetheless, the results offer an informed platform for identifying players who demonstrate strong attacking output needed to benefit the Melbourne Victory team for the upcoming season.

References

[1]
“How to Cite R and R Packages,” 00:00:00 +0000 UTC, doi: 10.59350/t79xt-tf203.
[2]
“Melbourne Victory Women Stats, All Competitions,” Available: https://fbref.com/en/squads/4f6d7ee7/Melbourne-Victory-Women-Stats
[3]
“2024 NWSL Stats,” Available: https://fbref.com/en/comps/182/2024/2024-NWSL-Stats
[4]
“2024-2025 A-League Women Stats,” Available: https://fbref.com/en/comps/196/2024-2025/2024-2025-A-League-Women-Stats
[5]
“2024-2025 womens super league stats,” FBref.com, Available: https://fbref.com/en/comps/189/2024-2025/2024-2025-Womens-Super-League-Stats
[6]
“2024-2025 frauen-bundesliga stats,” FBref.com, Available: https://fbref.com/en/comps/183/2024-2025/2024-2025-Frauen-Bundesliga-Stats
[7]
“2024-2025 liga f stats,” FBref.com, Available: https://fbref.com/en/comps/230/2024-2025/2024-2025-Liga-F-Stats
[8]
“2024-2025 serie a stats,” FBref.com, Available: https://fbref.com/en/comps/208/2024-2025/2024-2025-Serie-A-Stats
[9]
J. Georgieva et al., “The incidence and characteristics of heading in the 2019 FIFA Women’s World Cup™,” Science and Medicine in Football, vol. 9, no. 2, pp. 104–111, Apr. 2025, doi: 10.1080/24733938.2024.2305396.
[10]
P. Mourao, “Exploring determinants of international transfers of women soccer players in Portuguese football,” International Journal of Sports Science & Coaching, vol. 19, no. 1, pp. 152–161, Feb. 2024, doi: 10.1177/17479541221142928.
[11]
C. Meylan, J. Trewin, and K. McKean, “Quantifying Explosive Actions in International Women’s Soccer,” International Journal of Sports Physiology and Performance, vol. 12, no. 3, pp. 310–315, Mar. 2017, doi: 10.1123/ijspp.2015-0520.
[12]
J. D. Vescovi and O. Falenchuk, “Contextual factors on physical demands in professional women’s soccer: Female Athletes in Motion study,” European Journal of Sport Science, vol. 19, no. 2, pp. 141–146, Mar. 2019, doi: 10.1080/17461391.2018.1491628.
[13]
K. Okholm Kryger, A. Wang, R. Mehta, F. M. Impellizzeri, A. Massey, and A. McCall, “Research on women’s football: A scoping review,” Science and Medicine in Football, vol. 6, no. 5, pp. 549–558, Dec. 2022, doi: 10.1080/24733938.2020.1868560.
[14]
S. Auer, J. Demter, M. Martin, and J. Lehmann, “LODStats an extensible framework for high-performance dataset analytics,” vol. 7603, A. Ten Teije, J. Völker, S. Handschuh, H. Stuckenschmidt, M. Acquin, A. Nikolov, N. Aussenac-Gilles, and N. Hernandez, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 353–362. Available: http://link.springer.com/10.1007/978-3-642-33876-2_31
[15]
“Ewa pajor - stats, career and market value,” Available: https://www.fotmob.com/en-GB/players/734720/ewa-pajor
[16]
“Barbra banda - stats, career and market value,” Available: https://www.fotmob.com/en-GB/players/1044266/barbra-banda
[17]
“Temwa chawinga - stats, career and market value,” Available: https://www.fotmob.com/en-GB/players/1618199/temwa-chawinga
[18]
“Caroline graham hansen - stats, career and market value,” Available: https://www.fotmob.com/en-GB/players/215716/caroline-graham-hansen
[19]
“Signe bruun - stats, career and market value,” Available: https://www.fotmob.com/en-GB/players/571261/signe-bruun
[20]
Z. Milanovic, G. Sporis, and N. Trajkovic, “Differences in Body Composite and Physical Match Performance in Female Soccer Players According to Team Position,” Journal of Human Sport and Exercise, vol. 7, no. 1Proc, pp. S67–S72, 2012, doi: 10.4100/jhse.2012.7.Proc1.08.
[21]
Y. Xing, T. Chamera, L. Chen, and S. Zhang, “Aerobic power across positions – an investigation into women’s soccer performance,” Annals of Agricultural and Environmental Medicine, vol. 30, no. 4, pp. 749–754, Dec. 2023, doi: 10.26444/aaem/169854.
[22]
L. Shaw, “VO2max and Playing Time in Female Collegiate Soccer Players,” vol. 209, 2023.
[23]
S. Narayanan and N. D. Pifer, “A data-driven framing of player and team performance in U.S. Women’s soccer,” Frontiers in Sports and Active Living, vol. 5, Mar. 2023, doi: 10.3389/fspor.2023.1125528.
[24]
D. Scott, J. Haigh, and R. Lovell, “Physical characteristics and match performances in women’s international versus domestic-level football players: A 2-year, league-wide study,” Science and Medicine in Football, vol. 4, no. 3, pp. 211–215, Jul. 2020, doi: 10.1080/24733938.2020.1745265.
[25]
E. I. Cantu, “Science Behind the Game: Exploring the Links Between Training Load, Physical Status and Performance in Division I Womens Soccer - ProQuest,” 2025, Available: https://www.proquest.com/openview/10e6f465b9cee1a6baefb110ddeed81e/1?pq-origsite=gscholar&cbl=18750&diss=y
[26]
J. K. Mara, K. G. Thompson, K. L. Pumpa, and S. Morgan, “The acceleration and deceleration profiles of elite female soccer players during competitive matches,” Journal of Science and Medicine in Sport, vol. 20, no. 9, pp. 867–872, Sep. 2017, doi: 10.1016/j.jsams.2016.12.078.
[27]
A. K. Winther et al., “Position specific physical performance and running intensity fluctuations in elite women’s football,” Scandinavian Journal of Medicine & Science in Sports, vol. 32, no. S1, pp. 105–114, 2022, doi: 10.1111/sms.14105.
[28]
D. Memmert, K. A. P. M. Lemmink, and J. Sampaio, “Current Approaches to Tactical Performance Analyses in Soccer Using Position Data,” Sports Medicine, vol. 47, no. 1, pp. 1–10, Jan. 2017, doi: 10.1007/s40279-016-0562-5.
[29]
J. D. Vescovi, E. Fernandes, and A. Klas, “Physical Demands of Women’s Soccer Matches: A Perspective Across the Developmental Spectrum,” Frontiers in Sports and Active Living, vol. 3, Apr. 2021, doi: 10.3389/fspor.2021.634696.
[30]
I. Mujika, J. Santisteban, F. M. Impellizzeri, and C. Castagna, “Fitness determinants of success in men’s and women’s football,” Journal of Sports Sciences, vol. 27, no. 2, pp. 107–114, Jan. 2009, doi: 10.1080/02640410802428071.
[31]
I. Baptista, A. K. Winther, D. Johansen, M. B. Randers, S. Pedersen, and S. A. Pettersen, “The variability of physical match demands in elite women’s football,” Science and Medicine in Football, vol. 6, no. 5, pp. 559–565, Dec. 2022, doi: 10.1080/24733938.2022.2027999.
[32]
J. K. Mara, K. G. Thompson, K. L. Pumpa, and N. B. Ball, “Periodization and Physical Performance in Elite Female Soccer Players,” International Journal of Sports Physiology and Performance, vol. 10, no. 5, pp. 664–669, Jul. 2015, doi: 10.1123/ijspp.2014-0345.
[33]
R. S. Weinberg and D. Gould, Foundations of sport and exercise psychology, Sixth edition. Champaign, Ill.: Human Kinetics, 2015.
[34]
G. Abeza, N. O’Reilly, J. Nadeau, and Y. Abdourazakou, “Big data in professional sport: The perspective of practitioners in the NFL, MLB, NBA, and NHL,” Journal of Strategic Marketing, vol. 31, no. 8, pp. 1413–1433, Nov. 2023, doi: 10.1080/0965254X.2022.2108881.

Apendix A - R Code in Full

knitr::opts_chunk$set(echo = TRUE)
rm(list=ls()) # This code clears the environment before importing data or any analysis can take place. 
library(tidyverse) # This package allows a multitude of functions to run including pipe operators  
library(dplyr) # This package helps with data manipulation 
library(ggplot2) # This library allows graphs to be produced
library(knitr) # This package allows incorporating code into text files
library(kableExtra) # This library allows tables to be rendered in html files
library(readxl) # This library allows the reading on excel files
library(tidytext) # This library helps with formatting tables
library(flextable) # This library helps with formatting tables
library(citr) # This library is to help with adding citations 


QnA <- data.frame(
  Question_Number = c(1, 2, 3),
  Question_Asked = c(
    "Given the team’s current defensive solidity, what is the team's current formation with and without possession, and how does the current striker fit within these systems respectively?",
    "Does the team recruit any individuals who have any history (large or small) with lower limb injuries?",
    "Following on from where a striker fits within your current offensive and defensive set-up, what does your team deem important physiologically for a striker to be successful at your club?"
  ),
  Answer_Given = c(
    "We prefer to play 4-3-3 which may change to 4-1-4-1 out of possession. The current striker will mainly play as a poacher, with the midfield and wingers making the play and bringing the ball into the box.",
    "Yes, as long as they have recovered and have shown to be performing again.",
    "Tall, strong, good in the air."
  )
)

kable(QnA, caption = "Summary of Pre-Analysis Q&A With Melbourne Victory Staff")




MelbourneVictory <- read.csv("C:/Users/adamb/OneDrive - University of Strathclyde/Attachments/Melbourne Victory/Assessment 2 data Melbourne Victory.csv")
#This the dataset provided by the mangement team

NWSLData24 <- read_excel("C:/Users/adamb/OneDrive - University of Strathclyde/Attachments/Melbourne Victory/FBRef data.xlsx", 
                         sheet = "NWSL 2024")# Importing Data from the excel file, keeping each sheet seperate to its own dataframe

ALeague24_25 <- read_excel("C:/Users/adamb/OneDrive - University of Strathclyde/Attachments/Melbourne Victory/FBRef data.xlsx",
                            sheet = "A-League 2024_25") # Importing Data from the excel file, keeping each sheet seperate to its own dataframe


WSL24_25 <- read_excel("C:/Users/adamb/OneDrive - University of Strathclyde/Attachments/Melbourne Victory/FBRef data.xlsx",
                       sheet = "WSL 2024_25") # Importing Data from the excel file, keeping each sheet seperate to its own dataframe


Bundesliga24_25 <- read_excel("C:/Users/adamb/OneDrive - University of Strathclyde/Attachments/Melbourne Victory/FBRef data.xlsx",
                              sheet = "Bundesliga 24_25") # Importing Data from the excel file, keeping each sheet seperate to its own dataframe


LaLiga24_25 <- read_excel("C:/Users/adamb/OneDrive - University of Strathclyde/Attachments/Melbourne Victory/FBRef data.xlsx",
                          sheet = "La Liga 24_25") # Importing Data from the excel file, keeping each sheet seperate to its own dataframe


SerieA24_25 <- read_excel("C:/Users/adamb/OneDrive - University of Strathclyde/Attachments/Melbourne Victory/FBRef data.xlsx",
                          sheet = "Serie A 24_25") # Importing Data from the excel file, keeping each sheet separate to its own dataframe

head(NWSLData24)# This command is done to view the first few rows of the dataset to inspect it, this is repeated for every dataset 

str(NWSLData24) # This command is to view each variable independently, and if needed make changes needed for analysis, again this process is repeated for every dataset

head(ALeague24_25)

str(ALeague24_25)

head(WSL24_25)

str(WSL24_25)

head(Bundesliga24_25)

str(Bundesliga24_25)

head(LaLiga24_25) 

str(LaLiga24_25)
  
head(SerieA24_25)

str(LaLiga24_25)

head(MelbourneVictory)

str(MelbourneVictory)

# From here we can see that minutes has been categorized as a character, this must be rectified in order to run analysis using the data. 

MelbourneVictory$Min <- as.numeric(gsub(",", "", MelbourneVictory$Min))
# Next each dataset must be screened to identify any cases where data has been missing, and if the missing data impacts on analysis then remove it accordingly 
ALeague24_25[!complete.cases(ALeague24_25),]
Bundesliga24_25[!complete.cases(Bundesliga24_25),]
LaLiga24_25[!complete.cases(LaLiga24_25),]
NWSLData24[!complete.cases(NWSLData24),]
SerieA24_25[!complete.cases(SerieA24_25),]
WSL24_25[!complete.cases(WSL24_25),]

# This code is to double check that no other missing data exists before moving onto filtering, this is repeated for each dataset respectively

cat("Total missing data","\n", sum(is.na(ALeague24_25)), "\n")
print(colSums(is.na(ALeague24_25)))

cat("Total missing data","\n", sum(is.na(Bundesliga24_25)), "\n")
print(colSums(is.na(Bundesliga24_25)))

cat("Total missing data","\n", sum(is.na(LaLiga24_25)), "\n")
print(colSums(is.na(LaLiga24_25)))

cat("Total missing data","\n", sum(is.na(NWSLData24)), "\n")
print(colSums(is.na(NWSLData24)))

cat("Total missing data","\n", sum(is.na(SerieA24_25)), "\n")
print(colSums(is.na(SerieA24_25)))

cat("Total missing data","\n", sum(is.na(WSL24_25)), "\n")
print(colSums(is.na(WSL24_25)))
# From this we can determine the only missing data-points are centered around missing nationality and age, this information does not directly impact the analysis and therefore we can proceed to filter the data appropriately
  
variables <- c(
  "Player",
  "Nation",
  "Age",
  "Matches Played",
  "Starts",
  "Minutes Played",
  "Position",
  "Goals",
  "Assists",
  "xG"
)

variables_df <- tibble(
  Variable = variables
)

kable(
  variables_df,
  format = "markdown",
  caption = "List of Variables Used in the Analysis"
)

Strikers <- MelbourneVictory %>%
  select(Player,Pos,Age,Starts,Min,Gls,Gls.p90,Ast,Ast.p90,xG) %>%
  filter(Pos == "FW"| Pos =="MF"| Pos == "FW,MF")

NWSLStriker <- NWSLData24 %>%
  select(Player,Nation,Age,MP,Starts,Min,Pos,Gls...12, Ast...13,xG...20)%>% # These are the metrics chosen for analysis 
  filter(Pos == "FW"| Pos == "FW,MF") # This function includes both Strikers and players who play in a hybrid role, this is applied to each dataset respectively

ALeagueStriker <- ALeague24_25 %>%
  select(Player,Nation,Age,MP,Starts,Min,Pos,Gls...12, Ast...13,xG...20)%>%
  filter(Pos == "FW"| Pos == "FW,MF")
  
WSLStriker <- WSL24_25 %>%
  select(Player,Nation,Age,MP,Starts,Min,Pos,Gls...12, Ast...13,xG...20)%>%
  filter(Pos == "FW"| Pos == "FW,MF")

BundesligaStriker <- Bundesliga24_25 %>%
  select(Player,Nation,Age,MP,Starts,Min,Pos,Gls...12, Ast...13,xG...20)%>%
  filter(Pos == "FW"| Pos == "FW,MF")


LaLigaStriker <- LaLiga24_25 %>%
  select(Player,Nation,Age,MP,Starts,Min,Pos,Gls...12, Ast...13,xG...20)%>%
  filter(Pos == "FW"| Pos == "FW,MF")

SerieAStriker <- SerieA24_25 %>%
  select(Player,Nation,Age,MP,Starts,Min,Pos,Gls...12, Ast...13,xG...20)%>%
  filter(Pos == "FW"| Pos == "FW,MF")


# The next stage after filtering by position was to merge each dataset into one final dataset for simpler analysis functions to be run

StrikerData1 <- merge.data.frame(x=BundesligaStriker, y=ALeagueStriker, all=TRUE)

head(StrikerData1) # This command is to check the data has been merged successfully, afterwards we repeat the process until we are left with one final dataset

StrikerData2 <- merge.data.frame(x=LaLigaStriker, y=NWSLStriker, all=TRUE)
StrikerData3 <- merge.data.frame(x=WSLStriker, y=SerieAStriker, all=TRUE)
StrikerData4 <- merge.data.frame(x=StrikerData1, y=StrikerData2, all=TRUE)

StrikerDataFinal <- merge.data.frame(x=StrikerData4, y=StrikerData3, all=TRUE) # This is the final dataset for analysis
head(StrikerDataFinal) # Checking each variable has transferred successfully 

StrikerDataFinal <- StrikerDataFinal |>
  mutate(Assistsper90 = (Ast...13/(Min/90))) # These functions are to add new variables to our dataset for analysis


StrikerDataFinal <- StrikerDataFinal |>
  mutate(Goalsper90 = (Gls...12/(Min/90)))

StrikerDataFinal <- StrikerDataFinal |>
  mutate(xGper90 = (xG...20/(Min/90)))



Top5MelbourneStrikers <- Strikers %>%
  mutate(
    Gls.p90 = (Gls / Min) * 90,
    Ast.p90 = (Ast / Min) * 90,
    xG.p90  = (xG / Min)  * 90
  ) %>%
  mutate(
    Gls.p90 = formatC(Gls.p90, format="f", digits=2), # This code keeps all values at the same signinficant figures
    Ast.p90 = formatC(Ast.p90, format="f", digits=2),
    xG.p90 = formatC(xG.p90, format="f", digits=2)
  )%>%
  arrange(desc(Gls)) %>%
  slice(1:5) %>%
  select(Player, Pos, Age, Min, Gls, Gls.p90, Ast, Ast.p90, xG,xG.p90)

Top5MelbourneStrikers %>%
  kable(caption = "Table 1 - Top 5 Strikers – Melbourne Victory")%>%
  kable_styling(full_width = FALSE)

StrikerDataFinal %>%
    arrange(desc(Gls...12)) %>%     # Rank by goals scored
    slice(1:25) %>%                 # Select top 25 players
    ggplot(aes(x = reorder(Player, Gls...12), 
               y = Gls...12)) +
    
    # Traffic Light colour KPI system
    geom_point(aes(color = case_when(
      Gls...12 >= 15 ~ "Above Club Standard",
      Gls...12 >= 12 & Gls...12 < 15 ~ "At Club Standard",
      TRUE ~ "Below Club Standard"
    )), 
    size = 3, alpha = 0.8) +
    
    coord_flip() +
    theme_dark() +
    
    labs(title = " Figure 1 - Top 25 Goalscorers Across 6 Global Leagues",
         x = "Player Name",
         y = "Goals Scored",
         color = "Goal Scoring Standard") +
    
    scale_color_manual(
      values = c(
        "Above Club Standard" = "green",
        "At Club Standard" = "yellow",
        "Below Club Standard" = "red"
      ),
      labels = c(
        "Above Club Standard (≥15 Goals)",
        "At Club Standard (12–15 Goals)",
        "Below Club Standard (<12 Goals)"
      )
    )
  
StrikerDataFinal %>%
    filter(xG...20 > 8) %>%
    slice(1:25)%>%
    ggplot(aes(x = Player, y = xG...20)) +
    
    #Traffic Light colour system to KPI category
    geom_point(aes(color = case_when(
      xG...20 >= 7.8 ~ "Above Club Standard",
      xG...20 >= 1.5 & xG...20 < 5 ~ "At Club Standard",
      TRUE ~ "Below Club Standard"
    )), 
    size = 2, alpha = 0.6) +
    coord_flip() +
    theme_dark() +
    
    labs(title = "Figure 2 - Highest xG Across 6 Global Leagues",
         x = "Player Name",
         y = "Expected Goals Scored",
         color = "xG Standard") +  # This will highlight both the axis's and the colour scheme
    scale_color_manual(
      values = c(
        "Above Club Standard" = "green", # Traffic Light System for ease of reader interpretation
        "At Club Standard" = "yellow",
        "Below Club Standard" = "red"
      ),
      labels = c(
        "Above Club Standard (≥6 Expected Goals)", # These levels are determined based on the current strikers within the squad
        "At Club Standard (2–4 Expected Goals)",
        "Below Club Standard (<2 Expected Goals)"
      )
    )    

ComparisonDF<- StrikerDataFinal %>%
    rename(
      Goals = Gls...12,
      Assists = Ast...13,
      xG = xG...20
    )
  
  # Pick the top 10 overall players by total attacking output
  ComparisonDF$TotalOutput <- ComparisonDF$Goals + ComparisonDF$Assists + ComparisonDF$xG
  Top10StrikersDF <- ComparisonDF %>% arrange(desc(TotalOutput)) %>% slice(1:10)
  
  # Convert to long format
  df_long_totals <- Top10StrikersDF %>%
    pivot_longer(cols = c(Goals, Assists, xG),
                 names_to = "Metric", values_to = "Value") %>%
    group_by(Metric) %>%
    mutate(
      Rank = rank(-Value, ties.method = "first"),
      Highlight = ifelse(Rank <= 5, "Top 5", "Other")
    ) %>%
    ungroup()
  
  ggplot(df_long_totals,
         aes(x = reorder_within(Player, Value, Metric),
             y = Value, fill = Highlight)) +
    geom_col() +
    facet_grid(Metric ~ ., scales = "free_y", switch = "y") +
    coord_flip() +
    scale_x_reordered() +
    scale_fill_manual(values = c("Top 5" = "red", "Other" = "grey80")) +
    labs(title = "Figure 3-Top 10 Players (Goals,Assists,xG)",
         x = "Player Name", y = "Value") +
    theme_minimal(base_size = 14)
  
 #  Create per-90 statistics
  ComparisonDF <- ComparisonDF %>%
    mutate(
      Goals.p90   = (Goals   / Min) * 90,
      Assists.p90 = (Assists / Min) * 90,
      xG.p90      = (xG      / Min) * 90
    )
  
  #  Pick top 10 players by total attacking output per 90mins
  ComparisonDF$TotalOutput.p90 <- ComparisonDF$Goals.p90 +
    ComparisonDF$Assists.p90 +
    ComparisonDF$xG.p90
  
  Top10StrikersDF <- ComparisonDF %>%
    arrange(desc(TotalOutput.p90)) %>%
    slice(1:10)
  
  #  Convert to long format
  df_long_p90 <- Top10StrikersDF %>%
    pivot_longer(
      cols = c(Goals.p90, Assists.p90, xG.p90),
      names_to = "Metric",
      values_to = "Value"
    ) %>%
    group_by(Metric) %>%
    mutate(
      Rank = rank(-Value, ties.method = "first"),
      Highlight = ifelse(Rank <= 5, "Top 5", "Other")
    ) %>%
    ungroup()
  
  # Plot the results
  ggplot(df_long_p90,
       aes(x = reorder_within(Player, Value, Metric),
           y = Value, fill = Highlight)) +
  geom_col() +
  facet_grid(Metric ~ ., scales = "free_y", switch = "y") +
  coord_flip() +
  scale_x_reordered() +
  scale_fill_manual(values = c("Top 5" = "blue", "Other" = "grey80")) +
  labs(title = "Figure 4 - Top 10 Players (G,A,xG) per 90",
       x = "Player Name", y = "Rate per 90") +
  theme_minimal(base_size = 14)
  
Combined_Top5 <- StrikerDataFinal %>%
  # Calculate per 90 stats
  rename(
    Goals   = Gls...12,
    Assists = Ast...13,
    xG      = xG...20
  ) %>%
  mutate(
    Gls.p90 = (Goals / Min) * 90,
    Ast.p90 = (Assists / Min) * 90,
    xG.p90  = (xG / Min) * 90
  ) %>%
  # Order by goals (or any metric you prefer)
  arrange(desc(Goals)) %>%
  # Keep the top 5 players across all leagues
  slice(1:5) %>%
  # Format values for presentation
  mutate(
    Gls.p90 = formatC(Gls.p90, format = "f", digits = 2),
    Ast.p90 = formatC(Ast.p90, format = "f", digits = 2),
    xG.p90      = formatC(xG.p90, format = "f", digits = 2)
  ) %>%
  # Select table columns
  select(Player, Pos, Age, Min, Goals, Gls.p90, Assists, Ast.p90, xG,xG.p90)

Combined_Top5 %>%
kable(caption = " Table 2 - Top 5 Strikers Across All Six Leagues") %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped"))

  # Create composite scores, higher value placed on goals
  df <- StrikerDataFinal %>%
 rename(
    Goals   = Gls...12,
    Assists = Ast...13,
    xG      = xG...20
  ) %>%
    mutate(
      Composite = 0.5 * Goalsper90 +
        0.3 * xG +
        0.2 * Assists
    )
  
  # Select the top 5 players
  top5 <- df %>%
    arrange(desc(Composite)) %>%
    slice(1:5)
  
  # Plot the results
  ggplot(top5, aes(x = reorder(Player, Composite), y = Composite)) +
    geom_col(fill = "darkgreen") +
    coord_flip() +
    labs(
      title = " Figure 5 - Top 5 Players by Composite Score",
      x = "Player Name",
      y = "Composite Score"
    ) +
    theme_minimal(base_size = 14)