Scraping Data and Making it Look Pretty

I decided to take to HTML data tables to see if I could make use of them. This rvest package is proving to be very useful.

If you could care less about the code, scroll down to the bottom to see the graphs.

Scraping ESPN Stats

I feel like the Quarterback position is always fun to look at; in the NFL your success is heavily dependent upon your QB play. So why not see how the QBs stack up.

First I had to grab the data. With a few lines of code, I got a vary clean data frame

webaddress <- "https://www.espn.com/nfl/stats/player/_/view/offense/table/passing/sort/passingYards/dir/desc"
dat <- read_html(webaddress)

divs <- dat %>% 
  html_elements("table") %>%
  html_table()
head(divs)
## [[1]]
## # A tibble: 50 x 2
##       RK Name             
##    <int> <chr>            
##  1     1 Deshaun WatsonHOU
##  2     2 Patrick MahomesKC
##  3     3 Tom BradyTB      
##  4     4 Matt RyanATL     
##  5     5 Josh AllenBUF    
##  6     6 Justin HerbertLAC
##  7     7 Aaron RodgersGB  
##  8     8 Kirk CousinsMIN  
##  9     9 Russell WilsonSEA
## 10    10 Philip RiversIND 
## # … with 40 more rows
## 
## [[2]]
## # A tibble: 50 x 15
##    POS      GP   CMP   ATT `CMP%` YDS     AVG `YDS/G`   LNG    TD   INT  SACK
##    <chr> <int> <int> <int>  <dbl> <chr> <dbl>   <dbl> <int> <int> <int> <int>
##  1 QB       16   382   544   70.2 4,823   8.9    301.    77    33     7    49
##  2 QB       15   390   588   66.3 4,740   8.1    316     75    38     6    22
##  3 QB       16   401   610   65.7 4,633   7.6    290.    50    40    12    21
##  4 QB       16   407   626   65   4,581   7.3    286.    63    26    11    41
##  5 QB       16   396   572   69.2 4,544   7.9    284     55    37    10    26
##  6 QB       15   396   595   66.6 4,336   7.3    289.    72    31    10    32
##  7 QB       16   372   526   70.7 4,299   8.2    269.    78    48     5    20
##  8 QB       16   349   516   67.6 4,265   8.3    267.    71    35    13    39
##  9 QB       16   384   558   68.8 4,212   7.5    263.    62    40    13    47
## 10 QB       16   369   543   68   4,169   7.7    261.    55    24    11    19
## # … with 40 more rows, and 3 more variables: SYL <int>, QBR <dbl>, RTG <dbl>

I refined the data a little bit and wound up with this data frame.

I also grabbed the head shots of the QBs using nflfastR. This is a package id love to dive into more at a later time.

Finally I merged the photo URLs and the rest of the data.

##         last_name first_name team_name2 GP CMP ATT CMP%  YDS AVG YDS/G LNG TD
## Watson     Watson    Deshaun        HOU 16 382 544 70.2 4823 8.9 301.4  77 33
## Mahomes   Mahomes    Patrick         KC 15 390 588 66.3 4740 8.1 316.0  75 38
## Brady       Brady        Tom         TB 16 401 610 65.7 4633 7.6 289.6  50 40
## Ryan         Ryan       Matt        ATL 16 407 626 65.0 4581 7.3 286.3  63 26
## Allen       Allen       Josh        BUF 16 396 572 69.2 4544 7.9 284.0  55 37
## Herbert   Herbert     Justin        LAC 15 396 595 66.6 4336 7.3 289.1  72 31
##         INT SACK SYL  QBR   RTG
## Watson    7   49 293 70.5 112.4
## Mahomes   6   22 147 82.9 108.2
## Brady    12   21 143 72.5 102.2
## Ryan     11   41 257 67.0  93.3
## Allen    10   26 159 81.7 107.2
## Herbert  10   32 218 69.5  98.3
##                                                                           headshot_url
## Watson  https://a.espncdn.com/combiner/i?img=/i/headshots/nfl/players/full/3122840.png
## Mahomes https://a.espncdn.com/combiner/i?img=/i/headshots/nfl/players/full/3139477.png
## Brady      https://a.espncdn.com/combiner/i?img=/i/headshots/nfl/players/full/2330.png
## Ryan      https://a.espncdn.com/combiner/i?img=/i/headshots/nfl/players/full/11237.png
## Allen   https://a.espncdn.com/combiner/i?img=/i/headshots/nfl/players/full/3918298.png
## Herbert https://a.espncdn.com/combiner/i?img=/i/headshots/nfl/players/full/4038941.png

The Results

This project started out as scraping ESPN stats. That was way easier than I expected so I spent some more time playing with my new data set. Below are two plots. Raw numbers are one thing, to see how those numbers line up against competitors is a whole different ball game. Let me know what stats you would like to see and I’ll pull something together.

Analysis

Pretty cool to see this laid out. Here are my observations:

Patrick Mahomes is really good, there is no way around it.

Aaron Rodgers might be even better, this man throws TDs, gains yardage, doesnt throw in-completions and doesnt turn the ball over.

Drew Lock sucks, I dont care about his “rocket arm”, he his literally the statistical anti-Rodgers. Brock Osweiler had a “rocket arm” too, how did his career pan out?

What are your conclusions? Let me know what stats you would like to see and I’ll pull something together.