R Assignment 6: World Football API

Introduction

The purpose of this API is to develop a tool to acquire information on world football.

Examples:

  1. See all upcoming matches for Real Madrid
  2. View the group stage results for last year’s UEFA Champion’s League
  3. Check match schedules for English Premier League matchweek 11
  4. Get all matches where Giorgi Mamardashvili was in the squad

For this guide, I will be doing a spin on example #1 by pulling up statistics and general information on Real Madrid’s previous matches from all competitions

Downloading the Data

We will bring in the tidyverse commands to help us view our CSV here as a data frame

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
football_df <- 
  read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/pelles1_xavier_edu/EUHhbxTpTUFFnJGo5JGHw40BvVZ3YGZfLTwsHPQe8nUIvQ?download=1")
Rows: 14 Columns: 38
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (20): status, stage, area.name, area.code, area.flag, competition.name,...
dbl  (12): id, matchday, area.id, competition.id, season.id, season.currentM...
lgl   (2): group, season.winner
dttm  (2): utcDate, lastUpdated
date  (2): season.startDate, season.endDate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Analyzing the Data

Home vs Away goals scored by Real Madrid in all competitions.

football_df %>%
  filter(homeTeam.name == "Real Madrid CF") %>% 
  ggplot(aes(x = score.fullTime.home)) +
  geom_bar() +
  labs(x = "# of Goals Scored",
       y = "# of Games")

football_df %>%
  filter(awayTeam.name == "Real Madrid CF") %>% 
  ggplot(aes(x = score.fullTime.away)) +
  geom_bar() +
  labs(x = "# of Goals Scored",
       y = "# of Games")

While they average roughly 2 goals a game, they had some more explosive away performances with high scoring.

How I Extracted This Data

Below is a step-by-step guide you can use as a model to gather your own data!

Loading Packages

library(tidyverse)
library(rvest)     
library(xml2)      
library(httr)      
library(magrittr)
library(jsonlite)
library(readr)

Gaining Permission

Click the link below, create a free account, then check your email for the API token

https://www.football-data.org/client/register

Establishing the Header

For this API, the token is not part of the URL, rather it needs to be established as a header that will remain consistent regardless of what specific information is requested.

football_header <- add_headers(
  "X-Auth-Token" = "<YOUR API TOKEN>")

Choosing the Endpoint

The endpoint is the part of the URL that contains the category of info you want to retrieve. The website lists numerous endpoints, but I will be looking at match data for a particular team.

For this guide, I will be getting data for example #1: See all upcoming matches for Real Madrid

football_endpoint  <- ("https://api.football-data.org/v4/teams/86/matches?status=FINISHED")
  • “/Teams” filters by team, with “/86” being the id for Real Madrid

  • “/matches” pulls matches containing Real Madrid

  • “?status=FINISHED” brings in matches that have already been completed

Creating the URL

Here we will make the final URL to call by combining the header with the endpoint after extending it to filter out exactly what we want.

football_response <- GET(football_endpoint, football_header)

Making a Data Frame

football_api_data <- 
  football_response %>% 
  content(as = "text",
          encoding = "UTF-8") %>%
  fromJSON(flatten = TRUE) %>% 
  use_series(matches)

You can view it here right away

football_api_data <- 
  football_api_data %>% select(-referees)

CSV Download

We will then make the data a permanent data frame by downloading it as a CSV file

write_csv(football_api_data, "football_api_data_cleaned.csv")