Preparing Visualisations
# Set working directory as the path to the data set
setwd("C:/Users/raghu/OneDrive/Documents/Statistics_with_R/Week 2 Data Dive")
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.4 âś” readr 2.1.5
## âś” forcats 1.0.0 âś” stringr 1.5.1
## âś” ggplot2 3.5.1 âś” tibble 3.2.1
## âś” lubridate 1.9.3 âś” tidyr 1.3.1
## âś” purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
library(dplyr)
library(lubridate)
library(corrplot)
## corrplot 0.95 loaded
library(glue)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
# Read the data set
player_data <- read_csv("fifa_players.csv")
## Rows: 17954 Columns: 51
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): name, full_name, birth_date, positions, nationality, preferred_foo...
## dbl (42): age, height_cm, weight_kgs, overall_rating, potential, value_euro,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
spec(player_data)
## cols(
## name = col_character(),
## full_name = col_character(),
## birth_date = col_character(),
## age = col_double(),
## height_cm = col_double(),
## weight_kgs = col_double(),
## positions = col_character(),
## nationality = col_character(),
## overall_rating = col_double(),
## potential = col_double(),
## value_euro = col_double(),
## wage_euro = col_double(),
## preferred_foot = col_character(),
## `international_reputation(1-5)` = col_double(),
## `weak_foot(1-5)` = col_double(),
## `skill_moves(1-5)` = col_double(),
## body_type = col_character(),
## release_clause_euro = col_double(),
## national_team = col_character(),
## national_rating = col_double(),
## national_team_position = col_character(),
## national_jersey_number = col_double(),
## crossing = col_double(),
## finishing = col_double(),
## heading_accuracy = col_double(),
## short_passing = col_double(),
## volleys = col_double(),
## dribbling = col_double(),
## curve = col_double(),
## freekick_accuracy = col_double(),
## long_passing = col_double(),
## ball_control = col_double(),
## acceleration = col_double(),
## sprint_speed = col_double(),
## agility = col_double(),
## reactions = col_double(),
## balance = col_double(),
## shot_power = col_double(),
## jumping = col_double(),
## stamina = col_double(),
## strength = col_double(),
## long_shots = col_double(),
## aggression = col_double(),
## interceptions = col_double(),
## positioning = col_double(),
## vision = col_double(),
## penalties = col_double(),
## composure = col_double(),
## marking = col_double(),
## standing_tackle = col_double(),
## sliding_tackle = col_double()
## )
Hypothesis 1:
Higher overall rating and potential lead to a higher market
value.
####Visualization:
# Scatter plot for Overall Rating vs Market Value
ggplot(player_data, aes(x = overall_rating, y = value_euro)) +
geom_point(alpha = 0.5, color = "blue") +
geom_smooth(method = "lm", color = "red", se = FALSE) + # Adds linear trend line
labs(title = "Overall Rating vs Market Value", x = "Overall Rating", y = "Market Value (Euro)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 255 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 255 rows containing missing values or values outside the scale range
## (`geom_point()`).
### Interpretation: 1. The scatter plot shows an upward trend, we can
infer that players with higher overall ratings tend to have higher
market values. 2. There isn’t a very strong positive slope so we can
infer that there are other factors influencing overall rating.
Hypothesis 2:
Physical traits such as height, stamina, and strength positively
influence market value.
Visualisation:
# Select numeric columns related to physical traits and market value
numeric_cols <- player_data |> select(value_euro, height_cm, stamina, strength)
# Compute correlation matrix
cor_matrix <- cor(numeric_cols, use = "complete.obs")
# Plot heatmap using corrplot with correlation values in the boxes
corrplot(cor_matrix, method = "color", type = "upper",
tl.col = "black", tl.srt = 45, addCoef.col = "black") # Adds correlation values in boxes

Interpretation
- The heat map will show the correlation coefficients between market
value and physical attributes like height, stamina, and strength.
- Only stamina has a weak correlation to value in euro other than that
no other physical traits are influencing player value.