Gathering Data from Websites Using R Programming language

Scrapping Premier Spanish Soccer League Data

John Karuitha

Tuesday, July 13, 2021

Background

Have you ever come across some valuable data on a website and wished to access and use it in your research?1 The author is at Karatina University, School of Business, P.O. Box 1957-10101, Karatina, Kenya. Did you get discouraged from using data from the internet because writing down the data by hand and transferring it to a spreadsheet seemed daunting?2 The R code for this project is available in my GitHub account on this link https://github.com/Karuitha/scrapping_la_liga/blob/master/code/scraps.R. Copy and paste the address on a new browser tab Have you ever spent a substantial amount of time trying to copy and paste data from a website with plenty of frustration and headache? If you have found yourself in any of these and similar situations, this write-up is for you. I will describe the basics of harvesting data from websites, commonly known as web scraping, using the R (R Core Team 2021R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.) programming language. I start with tabular data that most financial analysts, economists, and other business professionals and academics use on many occasions. In future articles, I will delve deeper into scrapping websites for text and different data types. The basics in this section should meet the needs of most users but watch out for more advanced applications of web scraping on my blog and rpubs.

The Target Website and Data

I have conveniently chosen the results of the premier soccer league in Spain, the La Liga. The data comes from two sources. The first source is wikipedia. On this site, we shall scrap data regarding the results of the Spanish La Liga from 1929 to 2021. The site gives the top 3 teams over the years together with the names of top scorers..3 You can view this page on this address https://en.wikipedia.org/wiki/List_of_Spanish_football_champions The second source is sky news. This second source provides results of the more recent La Liga results starting 2009 to 2021. I use this site to illustrate how to scrap data that spans multiple web pages..4 Again, you can view this data on this address https://www.skysports.com/la-liga-table/

Objectives and Caveats

In this article, I aim to;

  1. Demonstrate the basics of scrapping data tables from a website using R and the rvest package from the tidyverse.
  2. Clean the data tables to generate an actionable set of data.

In my analysis, I assume basic knowledge of the R programming language and regular expressions (Regex). Moreover, my article aims mainly at demonstrating web scrapping and not the analysis of the data. However, I do a bit of data cleaning because it is a natural extension of web scrapping. Most data acquired through web scrapping is rarely clean and hence requires a substantial amount of pre-processing to be useful for analysis. Finally, this project targets a general audience that may not appreciate the academic nuances of inserting citations at the end of a sentence. However, I do include the references as side notes to the main text.

Web Scrapping Intuition

It is important to grasp XML, HTML, and CSS fundamentals to appreciate the basics of web-scrapping in any programming language. Starting with William W. Tunnicliffe’s presentation in 1967, markup languages have evolved into various forms to suit different applications. Examples include TeX, HTML, XML, and XHTML. I highlight XML and HTML and how we can take advantage of each to identify relevant content (Ramasubramanian and Singh 2017Ramasubramanian, Karthik, and Abhishek Singh. 2017. Machine Learning Using r. 1. Springer.).

XML Tags

Markup languages have two basic constructs;

A tag usually begins with < and ends with a >. The tags come in three flavours.

The wording in between the < and > is the identifier of the content. In between the beginning and ending tags, we include our content. For instance, we could capture an employee’s name using the following markup code.

<employee_name> james Peter Onyango Kamau </employee_name>

Hence, we can use the marker employee_name to identify the content we want to capture. We now extend the same idea to HTML.

HTML

The Hypertext Markup Language (HTML) is the language that programmers use to create web pages. Usually, HTML combines well with CSS to create elegant web pages. To scrap web pages for data, we have to scan for the following five elements (Bradley and James 2019Bradley, Alex, and Richard JE James. 2019. “Web Scraping Using r.” Advances in Methods and Practices in Psychological Science 2 (3): 264–70.).

Web Scrapping in R

Commonly, there are six headers h1 to h6, as follows (Michaud 2013Michaud, Thomas. 2013. Foundations of Web Design: Introduction to HTML & CSS. New Riders.).

<h1> Header 1 </h1>

<h2> Header 2 </h2>

<h3> Header 3 </h3>

<h4> Header 4 </h4>

<h5> Header 5 </h5>

<h6> Header 6 </h6>

<p> Paragraph 1 </p>

<p> Paragraph 2 </p>

<table> tag that captures the main table structure.

<tbody> tag that specifies the body of the table.

<thead> tag that specifies the table header.

<tr> tag that specifies each of the rows of the table.

<a href="https://rpubs.com/Karuitha/karuitha_cars_pressure"> Click to see my project </a>

Part A: Scraping Data from a Webpage

In this section, I highlight how one can get data from a webpage. As noted earlier, I use the Wikipedia site. The data of interest is tabular. The steps in scraping this data using the rvest (Wickham 2021Wickham, Hadley. 2021. Rvest: Easily Harvest (Scrape) Web Pages. https://CRAN.R-project.org/package=rvest.) package in R are as follows (Wickham et al. 2019Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.).

The code below reproduces these steps.

Because the dataset is massive, I only present the first ten rows of the data. Please scroll right or left to view the entire set of variables in the table (Mailund 2017Mailund, Thomas. 2017. Beginning Data Science in r. Springer.).

url2 <- "https://en.wikipedia.org/wiki/List_of_Spanish_football_champions"

read_html(url2) %>% 
        
        ## Capture the nodes for tables
        html_nodes("table") %>% 
        
        ## Capture the tables
        html_table() %>% 
        
        ## Capture the third table in the series
        .[[3]] %>% 
        
        ## Remove square brackets in names
        set_names(names(.) %>% str_remove_all("\\[|\\]|\\d")) %>% 
        
        ## Clean names by removing spaces
        janitor::clean_names() %>%
        
        ## Remove redundant columns
        select(-starts_with("x")) %>% 
        
        ## Clean the team names
        mutate(winners = str_remove_all(winners, "\\(\\d+\\)|\\*")) %>% 
        
        ## Pick the top 10
        head(10) %>% 
        
        ## make a nice table
        knitr::kable()
season winners runners_up third_place top_scorer_s top_scorers_club_s goals
1929 Barcelona Real Madrid (1) Athletic Bilbao Paco Bienzobas Real Sociedad 14
1929–30 Athletic Bilbao Barcelona (1) Arenas Guillermo Gorostiza Athletic Bilbao 19
1930–31 Athletic Bilbao Racing Santander (1) Real Sociedad Bata Athletic Bilbao 27
1931–32 Real Madrid Athletic Bilbao (1) Barcelona Guillermo Gorostiza Athletic Bilbao 12
1932–33 Real Madrid Athletic Bilbao (2) Espanyol Manuel Olivares Real Madrid 16
1933–34 Athletic Bilbao Real Madrid (2) Racing Santander Isidro Lángara Oviedo 27
1934–35 Real Betis Real Madrid (3) Oviedo Isidro Lángara Oviedo 26
1935–36 Athletic Bilbao Real Madrid (4) Oviedo Isidro Lángara Oviedo 27
1936–37 Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled)
1937–38 Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled)

I follow the same steps on the sky news website to generate the La Liga standings for the 2020-2021 season.

#########################################
## PART 2: SCRAPPING ONE WEB PAGE

## Initial trial with one url for 2020/2021 season

## Define the url

url <- "https://www.skysports.com/la-liga-table/2020"

## Read the html
read_html(url) %>% 
        
        ## Capture the nodes for tables
        html_nodes("table") %>% 
        
        ## Capture the tables
        html_table() %>% 
        
        ## Capture the first table in the series
        .[[1]] %>%
        
        ## rename the position column
        rename(pos = `#`) %>% 
        
        ## Clean the column names 
        janitor::clean_names() %>% 
        
        ## Remove redundant last 6 column
        select(-last_6) %>% 
        
        ## make a nice table
        knitr::kable(caption = "La Liga Standings 2020-2021 Season")

La Liga Standings 2020-2021 Season

pos team pl w d l f a gd pts
1 Atletico Madrid 38 26 8 4 67 25 42 86
2 Real Madrid 38 25 9 4 67 28 39 84
3 Barcelona 38 24 7 7 85 38 47 79
4 Sevilla 38 24 5 9 53 33 20 77
5 Real Sociedad 38 17 11 10 59 38 21 62
6 Real Betis 38 17 10 11 50 50 0 61
7 Villarreal 38 15 13 10 60 44 16 58
8 Celta Vigo 38 14 11 13 55 57 -2 53
9 Granada 38 13 7 18 47 65 -18 46
10 Athletic Bilbao 38 11 13 14 46 42 4 46
11 Osasuna 38 11 11 16 37 48 -11 44
12 Cadiz 38 11 11 16 36 58 -22 44
13 Valencia 38 10 13 15 50 53 -3 43
14 Levante 38 9 14 15 46 57 -11 41
15 Getafe 38 9 11 18 28 43 -15 38
16 Alaves 38 9 11 18 36 57 -21 38
17 Elche 38 8 12 18 34 55 -21 36
18 SD Huesca 38 7 13 18 34 53 -19 34
19 Real Valladolid 38 5 16 17 34 57 -23 31
20 Eibar 38 6 12 20 29 52 -23 30
#########################################

So far, so good.

Part B: Scraping Data from Multiple Webpages

What happens when the data of interest spans multiple pages of a website? For instance, in the Sky News website for the La Liga results, each season has its page. However, the pagination follows a consistent pattern as follows.

Hence, we can generate web addresses that match the seasons we are targeting. We shall revisit5 This reminds me of Kenya’s President Uhuru Kenyatta’s promise to “revisit” the judiciary. Both parties are now busy reviewing each other. this critical issue in a moment. First, I write a function that will allow us to scrap a website when provided with a URL.

#########################################
## PART 3: SCRAP MULTIPLE WEB PAGES

# Scrap the 12 links of the results from 2009 to 2021.

# NB: Every page starts with "https://www.skysports.com/la-liga-table/"

# Then for every respective year, the year is appended.

# For instance for 2020, the address is "https://www.skysports.com/la-liga-table/2020"

## Write a scrapping function

scrapper <- function(url){
        
        ## Add delay after scrapping each page
        Sys.sleep(2)
        
        ## Read the html
        read_html(url) %>% 
                
                ## Capture the nodes for tables
                html_nodes("table") %>% 
                
                ## Capture the tables
                html_table() %>% 
                
                ## Capture the first table in the series
                .[[1]] %>%
                
                ## rename the position column
                rename(pos = `#`) %>% 
                
                ## Clean the column names 
                janitor::clean_names() %>% 
        
                ## Remove redundant last 6 column
                select(-last_6)
}

##########################################

The following code generates the web addresses on the Sky News website that correspond to the seasons from 2009 to 2020. Notice now we have all the URLs.

##########################################
## Capture all the urls for years 2009-2021

many_urls <- paste0("https://www.skysports.com/la-liga-table/", 2020:2009)

many_urls
##  [1] "https://www.skysports.com/la-liga-table/2020"
##  [2] "https://www.skysports.com/la-liga-table/2019"
##  [3] "https://www.skysports.com/la-liga-table/2018"
##  [4] "https://www.skysports.com/la-liga-table/2017"
##  [5] "https://www.skysports.com/la-liga-table/2016"
##  [6] "https://www.skysports.com/la-liga-table/2015"
##  [7] "https://www.skysports.com/la-liga-table/2014"
##  [8] "https://www.skysports.com/la-liga-table/2013"
##  [9] "https://www.skysports.com/la-liga-table/2012"
## [10] "https://www.skysports.com/la-liga-table/2011"
## [11] "https://www.skysports.com/la-liga-table/2010"
## [12] "https://www.skysports.com/la-liga-table/2009"
########################################

Now, we run a loop over all URLs using the function defined earlier. The results are in the appendix.

#######################################
## Run a loop over all the web pages 

la_liga_09_2020 <- lapply(many_urls, scrapper)

## Give each table a name corresponding to the year

names(la_liga_09_2020) <- paste0("year", 2020:2009)

##########################################

Basic Data Exploration

In this section, I do some data exploration using the La Liga data from 2009-2021. My question is, which team has won the most La Liga trophies over these seasons (2009-2021)? The visualization below tells it all.

##########################################
## Get the top team in each year

top_teams_2009_20 <- sapply(la_liga_09_2020, "[", 1, "team") %>% 
        
        ## Unlist to make one table
        unlist() %>% 
        
        ## Coerce to tibble
        tibble() %>% 
        
        ## Rename column
        rename(team = ".")

################################################################################
################################################################################
## Capture the top teams
top_teams_2009_20 %>% 
        
        ## make a table
        
        table() %>% 
        
        ## Convert table into tibble
        
        tibble() %>% 
        
        ## Rename the tibble column
        
        rename(no = ".") %>% 
        
        ## Add team names
        
        mutate(team = c("Atletico Madrid", "Barcelona", "Real madrid")) %>% 
        
        ## Convert teams to factors
        
        mutate(team = factor(team)) %>% 
        
        ## Relocate team column to first position
        
        relocate(team) %>% 
        
        ## Plot the data
        
        ggplot(aes(x = fct_reorder(team, no), y = no, fill = team)) + 
        
        ## Add a geom
        
        geom_col(col = "black", show.legend = FALSE) + 
        
        ## Add text labels
        
        geom_text(aes(label = no), nudge_y = 0.3) +
        
        ## Add x and y labels and a title, subtitle
        
        labs(x = "Teams", y = "", title = "LA LIGA ANALYSIS", 
             
             subtitle = "La Liga Winners, 2009-2021", 
             
             caption = "Developed by John Karuitha using R and the ggplot2 package") + 
        
        ## Select bar colors
        
        scale_fill_manual(values = c("red", "blue", "white")) + 
        
        ## Add a pleasant theme
        
        ggthemes::theme_clean()
## Don't know how to automatically pick scale for object of type table. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type table. Defaulting to continuous.

############################################

Conclusion

Web scraping is a valuable skill in every researcher’s toolkit, more so with the rise of social media research. However, most scrapped data can be messy and may require substantial effort cleaning. In this write-up, I used R to demonstrate web scrapping. Other programming languages like Python do offer the same capabilities.

Appendix

Appendix 1: la Liga Standings 2009-2021

############################################

## Appendix: The full list of data scrapped

la_liga_09_2020
## $year2020
## # A tibble: 20 x 10
##      pos team               pl     w     d     l     f     a    gd   pts
##    <int> <chr>           <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1 Atletico Madrid    38    26     8     4    67    25    42    86
##  2     2 Real Madrid        38    25     9     4    67    28    39    84
##  3     3 Barcelona          38    24     7     7    85    38    47    79
##  4     4 Sevilla            38    24     5     9    53    33    20    77
##  5     5 Real Sociedad      38    17    11    10    59    38    21    62
##  6     6 Real Betis         38    17    10    11    50    50     0    61
##  7     7 Villarreal         38    15    13    10    60    44    16    58
##  8     8 Celta Vigo         38    14    11    13    55    57    -2    53
##  9     9 Granada            38    13     7    18    47    65   -18    46
## 10    10 Athletic Bilbao    38    11    13    14    46    42     4    46
## 11    11 Osasuna            38    11    11    16    37    48   -11    44
## 12    12 Cadiz              38    11    11    16    36    58   -22    44
## 13    13 Valencia           38    10    13    15    50    53    -3    43
## 14    14 Levante            38     9    14    15    46    57   -11    41
## 15    15 Getafe             38     9    11    18    28    43   -15    38
## 16    16 Alaves             38     9    11    18    36    57   -21    38
## 17    17 Elche              38     8    12    18    34    55   -21    36
## 18    18 SD Huesca          38     7    13    18    34    53   -19    34
## 19    19 Real Valladolid    38     5    16    17    34    57   -23    31
## 20    20 Eibar              38     6    12    20    29    52   -23    30
## 
## $year2019
## # A tibble: 20 x 10
##      pos team               pl     w     d     l     f     a    gd   pts
##    <int> <chr>           <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1 Real Madrid        38    26     9     3    70    25    45    87
##  2     2 Barcelona          38    25     7     6    86    38    48    82
##  3     3 Atletico Madrid    38    18    16     4    51    27    24    70
##  4     4 Sevilla            38    19    13     6    54    34    20    70
##  5     5 Villarreal         38    18     6    14    63    49    14    60
##  6     6 Real Sociedad      38    16     8    14    56    48     8    56
##  7     7 Granada            38    16     8    14    52    45     7    56
##  8     8 Getafe             38    14    12    12    43    37     6    54
##  9     9 Valencia           38    14    11    13    46    53    -7    53
## 10    10 Osasuna            38    13    13    12    46    54    -8    52
## 11    11 Athletic Bilbao    38    13    12    13    41    38     3    51
## 12    12 Levante            38    14     7    17    47    53    -6    49
## 13    13 Real Valladolid    38     9    15    14    32    43   -11    42
## 14    14 Eibar              38    11     9    18    39    56   -17    42
## 15    15 Real Betis         38    10    11    17    48    60   -12    41
## 16    16 Alaves             38    10     9    19    34    59   -25    39
## 17    17 Celta Vigo         38     7    16    15    37    49   -12    37
## 18    18 Leganes            38     8    12    18    30    51   -21    36
## 19    19 Real Mallorca      38     9     6    23    40    65   -25    33
## 20    20 Espanyol           38     5    10    23    27    58   -31    25
## 
## $year2018
## # A tibble: 20 x 10
##      pos team               pl     w     d     l     f     a    gd   pts
##    <int> <chr>           <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1 Barcelona          38    26     9     3    90    36    54    87
##  2     2 Atletico Madrid    38    22    10     6    55    29    26    76
##  3     3 Real Madrid        38    21     5    12    63    46    17    68
##  4     4 Valencia           38    15    16     7    51    35    16    61
##  5     5 Getafe             38    15    14     9    48    35    13    59
##  6     6 Sevilla            38    17     8    13    62    47    15    59
##  7     7 Espanyol           38    14    11    13    48    50    -2    53
##  8     8 Athletic Bilbao    38    13    14    11    41    45    -4    53
##  9     9 Real Sociedad      38    13    11    14    45    46    -1    50
## 10    10 Real Betis         38    14     8    16    44    52    -8    50
## 11    11 Alaves             38    13    11    14    39    50   -11    50
## 12    12 Eibar              38    11    14    13    46    50    -4    47
## 13    13 Leganes            38    11    12    15    37    43    -6    45
## 14    14 Villarreal         38    10    14    14    49    52    -3    44
## 15    15 Levante            38    11    11    16    59    66    -7    44
## 16    16 Real Valladolid    38    10    11    17    32    51   -19    41
## 17    17 Celta Vigo         38    10    11    17    53    62    -9    41
## 18    18 Girona             38     9    10    19    37    53   -16    37
## 19    19 SD Huesca          38     7    12    19    43    65   -22    33
## 20    20 Rayo Vallecano     38     8     8    22    41    70   -29    32
## 
## $year2017
## # A tibble: 20 x 10
##      pos team                   pl     w     d     l     f     a    gd   pts
##    <int> <chr>               <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1 Barcelona              38    28     9     1    99    29    70    93
##  2     2 Atletico Madrid        38    23    10     5    58    22    36    79
##  3     3 Real Madrid            38    22    10     6    94    44    50    76
##  4     4 Valencia               38    22     7     9    65    38    27    73
##  5     5 Villarreal             38    18     7    13    57    50     7    61
##  6     6 Real Betis             38    18     6    14    60    61    -1    60
##  7     7 Sevilla                38    17     7    14    49    58    -9    58
##  8     8 Getafe                 38    15    10    13    42    33     9    55
##  9     9 Eibar                  38    14     9    15    44    50    -6    51
## 10    10 Girona                 38    14     9    15    50    59    -9    51
## 11    11 Espanyol               38    12    13    13    36    42    -6    49
## 12    12 Real Sociedad          38    14     7    17    66    59     7    49
## 13    13 Celta Vigo             38    13    10    15    59    60    -1    49
## 14    14 Alaves                 38    15     2    21    40    50   -10    47
## 15    15 Levante                38    11    13    14    44    58   -14    46
## 16    16 Athletic Bilbao        38    10    13    15    41    49    -8    43
## 17    17 Leganes                38    12     7    19    34    51   -17    43
## 18    18 Deportivo La Coruna    38     6    11    21    38    76   -38    29
## 19    19 Las Palmas             38     5     7    26    24    74   -50    22
## 20    20 Malaga                 38     5     5    28    24    61   -37    20
## 
## $year2016
## # A tibble: 20 x 10
##      pos team                   pl     w     d     l     f     a    gd   pts
##    <int> <chr>               <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1 Real Madrid            38    29     6     3   106    41    65    93
##  2     2 Barcelona              38    28     6     4   116    37    79    90
##  3     3 Atletico Madrid        38    23     9     6    70    27    43    78
##  4     4 Sevilla                38    21     9     8    69    49    20    72
##  5     5 Villarreal             38    19    10     9    56    33    23    67
##  6     6 Real Sociedad          38    19     7    12    59    53     6    64
##  7     7 Athletic Bilbao        38    19     6    13    53    43    10    63
##  8     8 Espanyol               38    15    11    12    49    50    -1    56
##  9     9 Alaves                 38    14    13    11    41    43    -2    55
## 10    10 Eibar                  38    15     9    14    56    51     5    54
## 11    11 Malaga                 38    12    10    16    49    55    -6    46
## 12    12 Valencia               38    13     7    18    56    65    -9    46
## 13    13 Celta Vigo             38    13     6    19    53    69   -16    45
## 14    14 Las Palmas             38    10     9    19    53    74   -21    39
## 15    15 Real Betis             38    10     9    19    41    64   -23    39
## 16    16 Deportivo La Coruna    38     8    12    18    43    61   -18    36
## 17    17 Leganes                38     8    11    19    36    55   -19    35
## 18    18 Sporting Gijon         38     7    10    21    42    72   -30    31
## 19    19 Osasuna                38     4    10    24    40    94   -54    22
## 20    20 Granada                38     4     8    26    30    82   -52    20
## 
## $year2015
## # A tibble: 20 x 10
##      pos team                   pl     w     d     l     f     a    gd   pts
##    <int> <chr>               <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1 Barcelona              38    29     4     5   112    29    83    91
##  2     2 Real Madrid            38    28     6     4   110    34    76    90
##  3     3 Atletico Madrid        38    28     4     6    63    18    45    88
##  4     4 Villarreal             38    18    10    10    44    35     9    64
##  5     5 Athletic Bilbao        38    18     8    12    58    45    13    62
##  6     6 Celta Vigo             38    17     9    12    51    59    -8    60
##  7     7 Sevilla                38    14    10    14    51    50     1    52
##  8     8 Malaga                 38    12    12    14    38    35     3    48
##  9     9 Real Sociedad          38    13     9    16    45    48    -3    48
## 10    10 Real Betis             38    11    12    15    34    52   -18    45
## 11    11 Las Palmas             38    12     8    18    45    53    -8    44
## 12    12 Valencia               38    11    11    16    46    48    -2    44
## 13    13 Espanyol               38    12     7    19    40    74   -34    43
## 14    14 Eibar                  38    11    10    17    49    61   -12    43
## 15    15 Deportivo La Coruna    38     8    18    12    45    61   -16    42
## 16    16 Granada                38    10     9    19    46    69   -23    39
## 17    17 Sporting Gijon         38    10     9    19    40    62   -22    39
## 18    18 Rayo Vallecano         38     9    11    18    52    73   -21    38
## 19    19 Getafe                 38     9     9    20    37    67   -30    36
## 20    20 Levante                38     8     8    22    37    70   -33    32
## 
## $year2014
## # A tibble: 20 x 10
##      pos team                   pl     w     d     l     f     a    gd   pts
##    <int> <chr>               <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1 Barcelona              38    30     4     4   110    21    89    94
##  2     2 Real Madrid            38    30     2     6   118    38    80    92
##  3     3 Atletico Madrid        38    23     9     6    67    29    38    78
##  4     4 Valencia               38    22    11     5    70    32    38    77
##  5     5 Sevilla                38    23     7     8    71    45    26    76
##  6     6 Villarreal             38    16    12    10    48    37    11    60
##  7     7 Athletic Bilbao        38    15    10    13    42    41     1    55
##  8     8 Celta Vigo             38    13    12    13    47    44     3    51
##  9     9 Malaga                 38    14     8    16    42    48    -6    50
## 10    10 Espanyol               38    13    10    15    47    51    -4    49
## 11    11 Rayo Vallecano         38    15     4    19    46    68   -22    49
## 12    12 Real Sociedad          38    11    13    14    44    51    -7    46
## 13    13 Elche                  38    11     8    19    35    62   -27    41
## 14    14 Levante                38     9    10    19    34    67   -33    37
## 15    15 Getafe                 38    10     7    21    33    64   -31    37
## 16    16 Deportivo La Coruna    38     7    14    17    35    60   -25    35
## 17    17 Granada                38     7    14    17    29    64   -35    35
## 18    18 Eibar                  38     9     8    21    34    55   -21    35
## 19    19 Almeria                38     8     8    22    35    64   -29    32
## 20    20 Cordoba                38     3    11    24    22    68   -46    20
## 
## $year2013
## # A tibble: 20 x 10
##      pos team               pl     w     d     l     f     a    gd   pts
##    <int> <chr>           <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1 Atletico Madrid    38    28     6     4    77    26    51    90
##  2     2 Barcelona          38    27     6     5   100    33    67    87
##  3     3 Real Madrid        38    27     6     5   104    38    66    87
##  4     4 Athletic Bilbao    38    20    10     8    66    39    27    70
##  5     5 Sevilla            38    18     9    11    69    52    17    63
##  6     6 Villarreal         38    17     8    13    60    44    16    59
##  7     7 Real Sociedad      38    16    11    11    62    55     7    59
##  8     8 Valencia           38    13    10    15    51    53    -2    49
##  9     9 Celta Vigo         38    14     7    17    49    54    -5    49
## 10    10 Levante            38    12    12    14    35    43    -8    48
## 11    11 Malaga             38    12     9    17    39    46    -7    45
## 12    12 Rayo Vallecano     38    13     4    21    46    80   -34    43
## 13    13 Getafe             38    11     9    18    35    54   -19    42
## 14    14 Espanyol           38    11     9    18    41    51   -10    42
## 15    15 Granada            38    12     5    21    32    56   -24    41
## 16    16 Elche              38     9    13    16    30    50   -20    40
## 17    17 Almeria            38    11     7    20    43    71   -28    40
## 18    18 Osasuna            38    10     9    19    32    62   -30    39
## 19    19 Real Valladolid    38     7    15    16    38    60   -22    36
## 20    20 Real Betis         38     6     7    25    36    78   -42    25
## 
## $year2012
## # A tibble: 20 x 10
##      pos team                   pl     w     d     l     f     a    gd   pts
##    <int> <chr>               <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1 Barcelona              38    32     4     2   115    40    75   100
##  2     2 Real Madrid            38    26     7     5   103    42    61    85
##  3     3 Atletico Madrid        38    23     7     8    65    31    34    76
##  4     4 Real Sociedad          38    18    12     8    70    49    21    66
##  5     5 Valencia               38    19     8    11    67    54    13    65
##  6     6 Malaga                 38    16     9    13    53    50     3    57
##  7     7 Real Betis             38    16     8    14    57    56     1    56
##  8     8 Rayo Vallecano         38    16     5    17    50    66   -16    53
##  9     9 Sevilla                38    14     8    16    58    54     4    50
## 10    10 Getafe                 38    13     8    17    43    57   -14    47
## 11    11 Levante                38    12    10    16    40    57   -17    46
## 12    12 Athletic Bilbao        38    12     9    17    44    65   -21    45
## 13    13 Espanyol               38    11    11    16    43    52    -9    44
## 14    14 Real Valladolid        38    11    10    17    49    58    -9    43
## 15    15 Granada                38    11     9    18    37    54   -17    42
## 16    16 Osasuna                38    10     9    19    33    50   -17    39
## 17    17 Celta Vigo             38    10     7    21    37    52   -15    37
## 18    18 Real Mallorca          38     9     9    20    43    72   -29    36
## 19    19 Deportivo La Coruna    38     8    11    19    47    70   -23    35
## 20    20 Real Zaragoza          38     9     7    22    37    62   -25    34
## 
## $year2011
## # A tibble: 20 x 10
##      pos team                pl     w     d     l     f     a    gd   pts
##    <int> <chr>            <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1 Real Madrid         38    32     4     2   121    32    89   100
##  2     2 Barcelona           38    28     7     3   114    29    85    91
##  3     3 Valencia            38    17    10    11    59    44    15    61
##  4     4 Malaga              38    17     7    14    54    53     1    58
##  5     5 Atletico Madrid     38    15    11    12    53    46     7    56
##  6     6 Levante             38    16     7    15    54    50     4    55
##  7     7 Osasuna             38    13    15    10    44    61   -17    54
##  8     8 Real Mallorca       38    14    10    14    42    46    -4    52
##  9     9 Sevilla             38    13    11    14    48    47     1    50
## 10    10 Athletic Bilbao     38    12    13    13    49    52    -3    49
## 11    11 Getafe              38    12    11    15    40    51   -11    47
## 12    12 Real Sociedad       38    12    11    15    46    52    -6    47
## 13    13 Real Betis          38    13     8    17    47    56    -9    47
## 14    14 Espanyol            38    12    10    16    46    56   -10    46
## 15    15 Rayo Vallecano      38    13     4    21    53    73   -20    43
## 16    16 Real Zaragoza       38    12     7    19    36    61   -25    43
## 17    17 Granada             38    12     6    20    35    56   -21    42
## 18    18 Villarreal          38     9    14    15    39    53   -14    41
## 19    19 Sporting Gijon      38    10     7    21    42    69   -27    37
## 20    20 Racing Santander    38     4    15    19    28    63   -35    27
## 
## $year2010
## # A tibble: 20 x 10
##      pos team                   pl     w     d     l     f     a    gd   pts
##    <int> <chr>               <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1 Barcelona              38    30     6     2    95    21    74    96
##  2     2 Real Madrid            38    29     5     4   102    33    69    92
##  3     3 Valencia               38    21     8     9    64    44    20    71
##  4     4 Villarreal             38    18     8    12    54    44    10    62
##  5     5 Sevilla                38    17     7    14    62    61     1    58
##  6     6 Athletic Bilbao        38    18     4    16    59    55     4    58
##  7     7 Atletico Madrid        38    17     7    14    62    53     9    58
##  8     8 Espanyol               38    15     4    19    46    55    -9    49
##  9     9 Osasuna                38    13     8    17    45    46    -1    47
## 10    10 Sporting Gijon         38    11    14    13    35    42    -7    47
## 11    11 Malaga                 38    13     7    18    54    68   -14    46
## 12    12 Racing Santander       38    12    10    16    41    56   -15    46
## 13    13 Real Zaragoza          38    12     9    17    40    53   -13    45
## 14    14 Levante                38    12     9    17    41    52   -11    45
## 15    15 Real Sociedad          38    14     3    21    49    66   -17    45
## 16    16 Getafe                 38    12     8    18    49    60   -11    44
## 17    17 Real Mallorca          38    12     8    18    41    56   -15    44
## 18    18 Deportivo La Coruna    38    10    13    15    31    47   -16    43
## 19    19 Hercules               38     9     8    21    36    60   -24    35
## 20    20 Almeria                38     6    12    20    36    70   -34    30
## 
## $year2009
## # A tibble: 20 x 10
##      pos team                   pl     w     d     l     f     a    gd   pts
##    <int> <chr>               <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1 Barcelona              38    31     6     1    98    24    74    99
##  2     2 Real Madrid            38    31     3     4   102    35    67    96
##  3     3 Valencia               38    21     8     9    59    40    19    71
##  4     4 Sevilla                38    19     6    13    65    49    16    63
##  5     5 Real Mallorca          38    18     8    12    59    44    15    62
##  6     6 Getafe                 38    17     7    14    58    48    10    58
##  7     7 Villarreal             38    16     8    14    58    57     1    56
##  8     8 Athletic Bilbao        38    15     9    14    50    53    -3    54
##  9     9 Atletico Madrid        38    13     8    17    57    61    -4    47
## 10    10 Deportivo La Coruna    38    13     8    17    35    49   -14    47
## 11    11 Espanyol               38    11    11    16    29    46   -17    44
## 12    12 Osasuna                38    11    10    17    37    46    -9    43
## 13    13 Almeria                38    10    12    16    43    55   -12    42
## 14    14 Real Zaragoza          38    10    11    17    46    64   -18    41
## 15    15 Sporting Gijon         38     9    13    16    36    51   -15    40
## 16    16 Racing Santander       38     9    12    17    42    59   -17    39
## 17    17 Malaga                 38     7    16    15    42    48    -6    37
## 18    18 Tenerife               38     9     9    20    40    74   -34    36
## 19    19 Real Valladolid        38     7    15    16    37    62   -25    36
## 20    20 Xerez                  38     8    10    20    38    66   -28    34
############################################

Appendix 2: La Liga Top 3 Teams and Top Scorers 1929-2021

season winners runners_up third_place top_scorer_s top_scorers_club_s goals
1929 Barcelona Real Madrid (1) Athletic Bilbao Paco Bienzobas Real Sociedad 14
1929–30 Athletic Bilbao Barcelona (1) Arenas Guillermo Gorostiza Athletic Bilbao 19
1930–31 Athletic Bilbao Racing Santander (1) Real Sociedad Bata Athletic Bilbao 27
1931–32 Real Madrid Athletic Bilbao (1) Barcelona Guillermo Gorostiza Athletic Bilbao 12
1932–33 Real Madrid Athletic Bilbao (2) Espanyol Manuel Olivares Real Madrid 16
1933–34 Athletic Bilbao Real Madrid (2) Racing Santander Isidro Lángara Oviedo 27
1934–35 Real Betis Real Madrid (3) Oviedo Isidro Lángara Oviedo 26
1935–36 Athletic Bilbao Real Madrid (4) Oviedo Isidro Lángara Oviedo 27
1936–37 Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled)
1937–38 Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled)
1938–39 Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled) Spanish Civil War (League Cancelled)
1939–40 Atlético Aviación[a] Sevilla (1) Athletic Bilbao Víctor Unamuno Athletic Bilbao 22
1940–41 Atlético Aviación[a] Athletic Bilbao (3) Valencia Pruden Atlético Aviación 30
1941–42 Valencia Real Madrid (5) Atlético Aviación Edmundo Suárez Valencia 27
1942–43 Athletic Bilbao Sevilla (2) Barcelona Mariano Martín Barcelona 32
1943–44 Valencia Atlético Aviación (1) Sevilla Edmundo Suárez Valencia 27
1944–45 Barcelona Real Madrid (6) Atlético Aviación Telmo Zarra Atlético Bilbao 19
1945–46 Sevilla Barcelona (2) Athletic Bilbao Telmo Zarra Atlético Bilbao 24
1946–47 Valencia Athletic Bilbao (4) Atlético Aviación Telmo Zarra Atlético Bilbao 34
1947–48 Barcelona Valencia (1) Atlético Madrid Pahiño Celta Vigo 23
1948–49 Barcelona Valencia (2) Real Madrid César Rodríguez Álvarez Barcelona 28
1949–50 Atlético Madrid Deportivo La Coruña (1) Valencia Telmo Zarra Athletic Bilbao 25
1950–51 Atlético Madrid Sevilla (3) Valencia Telmo Zarra Athletic Bilbao 38
1951–52 Barcelona Athletic Bilbao (5) Real Madrid Pahiño Real Madrid 28
1952–53 Barcelona Valencia (3) Real Madrid Telmo Zarra Athletic Bilbao 24
1953–54 Real Madrid Barcelona (3) Valencia Alfredo Di Stéfano Real Madrid 27
1954–55 Real Madrid Barcelona (4) Athletic Bilbao Juan Arza Sevilla 28
1955–56 Athletic Bilbao Barcelona (5) Real Madrid Alfredo Di Stéfano Real Madrid 24
1956–57 Real Madrid † Sevilla (4) Barcelona Alfredo Di Stéfano Real Madrid 31
1957–58 Real Madrid † Atlético Madrid (2) Barcelona Manuel BadenesAlfredo Di StéfanoRicardo ValladolidReal MadridValencia 19
1958–59 Barcelona Real Madrid (7) Athletic Bilbao Alfredo Di Stéfano Real Madrid 23
1959–60 Barcelona Real Madrid (8) Athletic Bilbao Ferenc Puskás Real Madrid 26
1960–61 Real Madrid Atlético Madrid (3) Zaragoza Ferenc Puskás Real Madrid 27
1961–62 Real Madrid Barcelona (6) Atlético Madrid Juan Seminario Zaragoza 25
1962–63 Real Madrid Atlético Madrid (4) Oviedo Ferenc Puskás Real Madrid 26
1963–64 Real Madrid Barcelona (7) Real Betis Ferenc Puskás Real Madrid 20
1964–65 Real Madrid Atlético Madrid (5) Zaragoza Cayetano Ré Barcelona 25
1965–66 Atlético Madrid Real Madrid (9) Barcelona Vavá Elche 19
1966–67 Real Madrid Barcelona (8) Espanyol Waldo Machado Valencia 24
1967–68 Real Madrid Barcelona (9) Las Palmas Fidel Uriarte Athletic Bilbao 22
1968–69 Real Madrid Las Palmas (1) Barcelona Amancio AmaroJosé Eulogio Gárate Real MadridAtlético Madrid 14
1969–70 Atlético Madrid Athletic Bilbao (6) Sevilla Amancio AmaroLuis AragonésJosé Eulogio Gárate Real MadridAtlético MadridAtlético Madrid 16
1970–71 Valencia Barcelona (10) Atlético Madrid José Eulogio GárateCarles Rexach Atlético MadridBarcelona 17
1971–72 Real Madrid Valencia (4) Barcelona Enrique Porta Granada 20
1972–73 Atlético Madrid Barcelona (11) Espanyol Marianín Oviedo 19
1973–74 Barcelona Atlético Madrid (6) Zaragoza Quini Sporting Gijón 20
1974–75 Real Madrid Zaragoza (1) Barcelona Carlos Athletic Bilbao 19
1975–76 Real Madrid Barcelona (12) Atlético Madrid Quini Sporting Gijón 21
1976–77 Atlético Madrid Barcelona (13) Athletic Bilbao Mario Kempes Valencia 24
1977–78 Real Madrid Barcelona (14) Athletic Bilbao Mario Kempes Valencia 28
1978–79 Real Madrid Sporting Gijón (1) Atlético Madrid Hans Krankl Barcelona 29
1979–80 Real Madrid Real Sociedad (1) Sporting Gijón Quini Sporting Gijón 24
1980–81 Real Sociedad Real Madrid (10) Atlético Madrid Quini Barcelona 20
1981–82 Real Sociedad Barcelona (15) Real Madrid Quini Barcelona 26
1982–83 Athletic Bilbao Real Madrid (11) Atlético Madrid Hipólito Rincón Real Betis 20
1983–84 Athletic Bilbao Real Madrid (12) Barcelona Jorge da SilvaJuanito ValladolidReal Madrid 17
1984–85 Barcelona Atlético Madrid (7) Athletic Bilbao Hugo Sánchez Atlético Madrid 19
1985–86 Real Madrid ‡ Barcelona (16) Athletic Bilbao Hugo Sánchez Real Madrid 22
1986–87 Real Madrid Barcelona (17) Espanyol Hugo Sánchez Real Madrid 34
1987–88 Real Madrid Real Sociedad (2) Atlético Madrid Hugo Sánchez Real Madrid 29
1988–89 Real Madrid Barcelona (18) Valencia Baltazar Atlético Madrid 35
1989–90 Real Madrid Valencia (5) Barcelona Hugo Sánchez Real Madrid 38
1990–91 Barcelona Atlético Madrid (8) Real Madrid Emilio Butragueño Real Madrid 19
1991–92 Barcelona † Real Madrid (13) Atlético Madrid Manolo Atlético Madrid 27
1992–93 Barcelona Real Madrid (14) Deportivo La Coruña Bebeto Deportivo La Coruña 29
1993–94 Barcelona Deportivo La Coruña (2) Zaragoza Romário Barcelona 30
1994–95 Real Madrid Deportivo La Coruña (3) Real Betis Iván Zamorano Real Madrid 28
1995–96 Atlético Madrid Valencia (6) Barcelona Juan Antonio Pizzi Tenerife 31
1996–97 Real Madrid Barcelona (19) Deportivo La Coruña Ronaldo Barcelona 34
1997–98 Barcelona Athletic Bilbao (7) Real Sociedad Christian Vieri Atlético Madrid 24
1998–99 Barcelona Real Madrid (15) Mallorca Raúl Real Madrid 25
1999–2000 Deportivo La Coruña Barcelona (20) Valencia Salva Ballesta Racing Santander 27
2000–01 Real Madrid Deportivo La Coruña (4) Mallorca Raúl Real Madrid 24
2001–02 Valencia Deportivo La Coruña (5) Real Madrid Diego Tristán Deportivo La Coruña 21
2002–03 Real Madrid Real Sociedad (3) Deportivo La Coruña Roy Makaay Deportivo La Coruña 29
2003–04 Valencia ‡ Barcelona (21) Deportivo La Coruña Ronaldo Real Madrid 25
2004–05 Barcelona Real Madrid (16) Villarreal Diego Forlán Villarreal 25
2005–06 Barcelona † Real Madrid (17) Valencia Samuel Eto’o Barcelona 26
2006–07 Real Madrid Barcelona (22) Sevilla Ruud van Nistelrooy Real Madrid 25
2007–08 Real Madrid Villarreal (1) Barcelona Daniel Güiza Mallorca 27
2008–09 Barcelona Real Madrid (18) Sevilla Diego Forlán Atlético Madrid 32
2009–10 Barcelona Real Madrid (19) Valencia Lionel Messi Barcelona 34
2010–11 Barcelona † Real Madrid (20) Valencia Cristiano Ronaldo Real Madrid 40
2011–12 Real Madrid Barcelona (23) Valencia Lionel Messi Barcelona 50
2012–13 Barcelona Real Madrid (21) Atlético Madrid Lionel Messi Barcelona 46
2013–14 Atlético Madrid Barcelona (24) Real Madrid Cristiano Ronaldo Real Madrid 31
2014–15 Barcelona Real Madrid (22) Atlético Madrid Cristiano Ronaldo Real Madrid 48
2015–16 Barcelona Real Madrid (23) Atlético Madrid Luis Suárez Barcelona 40
2016–17 Real Madrid † Barcelona (25) Atlético Madrid Lionel Messi Barcelona 37
2017–18 Barcelona Atlético Madrid (9) Real Madrid Lionel Messi Barcelona 34
2018–19 Barcelona Atlético Madrid (10) Real Madrid Lionel Messi Barcelona 36
2019–20 Real Madrid Barcelona (26) Atlético Madrid Lionel Messi Barcelona 25
2020–21 Atlético Madrid Real Madrid (24) Barcelona Lionel Messi Barcelona 30