This exercise will require web scraping to retrieve the necessary data that I will be working with. First, I made sure to include some packages that will allow us to both wrangle and clean data as necessary. The main packages we had to be sure to include were the XML and tidyverse packages. We also included rvest, RCurl, magrittr, and httr. Tidyverse specifically will be critical for our cleaning of data and performing visualizations.
We are using scoring data of last years fantasy season with the PPR (points per reception) format. One thing of note is that players are represented by their current team. The stats will show up for the team they are on for the upcoming year whether they moved via trade or free agency. We are going to work carefully with this data to try and make some meaningful analysis through visualizations. Also, players that are not currently on a team are represented by FA (Free Agent).
I was able to go out to the familiar website fantasy pros and scrape a data table off their website for 2019 fantasy data. Below I go through a step by step process of getting the data into my console. The first thing I did was make an entity to reference that would contain the web address. I went through a step of looking at the header to ensure that this was the data that I wanted to be working with. Then creating another entity to help with formatting I read the table into R. I then wanted to see how many tables had loaded and what their exact content would be. I created a blank data frame and then placed the table I desired into the data frame.
Players <- "https://www.fantasypros.com/nfl/reports/leaders/ppr.php?year=2019&start=1&end=17"
Players %>%
read_html() %>%
html_nodes("h1") %>%
html_text()
## [1] "Fantasy Football Leaders (2019)"
url_1 <-
GET(Players)$content %>%
rawToChar() %>%
htmlParse()
test_tables <- readHTMLTable(url_1, stringsAsFactors = FALSE)
length(test_tables)
## [1] 1
head(test_tables[[1]])
## Rank Player Team Position Points Games Avg
## 1 1 Christian McCaffrey CAR RB 471.2 16 29.5
## 2 2 Lamar Jackson BAL QB 421.7 15 28.1
## 3 3 Michael Thomas NO WR 374.6 16 23.4
## 4 4 Dak Prescott DAL QB 348.9 16 21.8
## 5 5 Jameis Winston NO QB 335.2 16 21.0
## 6 6 Russell Wilson SEA QB 333.5 16 20.8
Fantasy <- data.frame(test_tables[[1]])
This data set is pretty clean overall. There are some observations missing some critical data however. Any observation without a player name or position is going to not be included because we cannot verify this data or properly use it in analysis. There is one outlier that contains the position S for safety. There are not enough instances of this to make any statistical claims so we are going to remove it. One other thing that I feel will help is changing variables like games, points, and average to numeric variables instead of characters. I feel that the creation of some dummy variables can be helpful for my analysis later. I am going to create a dummy variable for each conference (AFC, NFC) and also for each division within each conference (East, South, West, North). Hopefully this can allow us to do some more meaningful analysis with the fantasy data of players by grouping teams into conferences and divisions.
A simple question that we want to ask first is which position scores the most points on average? We are going to use averages for each player to try and prevent there being a great disparity between specific positions.
We can see from this graphic that Wide Receivers seem to have the most points on average. This makes sense because there has been a general trend towards the NFL becoming a more passing league. Also, we are looking at a PPR format. We need to remember that this is fantasy football data and every reception is worth one point to a receiver. Further analysis could involve using a regression to compare PPR vs Standard scoring data for wide receivers. We could see what kind of relationship PPR really has on the amount of points that receivers collect.
Which Conference has the best fantasy Quarterbacks? We can look specifically at the quarterbacks and compare them across the conferences. We can seperate the conferences and view the points of Qbs in each.
It appears that the NFC has an edge in fantasy quarterback play compared to the AFC. There are so many factors that can go into this. I think that when one thinks about the the fact that there are a good deal of quality quarterbacks in the NFC this makes sense. One trend that could be interesting for further analysis is to compare this stat over recent years to see if Qbs switching conferences has an effect. An example is Tom Brady moving from the AFC to the NFC. I think another factor that you may want to use to decide whether this stat is relevant or proven is to consider the strength of schedule for the opposing conferences. A regression with the division dummy variables could shed some light onto whether Qbs in a specific conference had a an advantage in playing weaker opposition or not.
My favorite team the Buffalo Bills play in the AFC East. The division was somewhat weak last year besides the Bills and Dolphins. I want to see if the fantasy points for each position seem to have any effect on how the division finished. The division saw the final Standings (Bills, Dolphins, Patriots, Jets). Do the fantasy stats show a disparity that would make sense with the final standings?
I think that this visual does a good job of explaining just how important the quarterback position is and how their fantasy output could help explain division standings. An easy thing to acknowledge is that the Jets almost didn’t win a single game and had abysmal fantasy stats across the board. The quarterback trend is something that seemed to separate the other 3 teams. The records were somewhat close to each other. Yet, the Bills had substantially more fantasy output from the Quarterback position. I think that further analysis could involve running a regression with a new dummy variable for division champions. A linear regression for Fantasy play of Qbs and whether they were the division champion or not could show us just how related the fantasy output of a Qb and winning the division are.
What teams score the most? Can one really effect how many fantasy points a team accumulates that much?
## # A tibble: 10 x 3
## Team Avg_Points Max_Points
## <chr> <dbl> <dbl>
## 1 BUF 109. 298.
## 2 DAL 105. 349.
## 3 SEA 103. 334.
## 4 NO 103. 375.
## 5 LV 99.8 252.
## 6 JAC 98.1 235.
## 7 LAR 93.1 270.
## 8 DEN 92.6 222.
## 9 BAL 92.5 422.
## 10 CAR 84.0 471.
We can see here that all the highest point totals seemed to force their teams into the top 10 of fantasy points. Players like Christian McCaffery and Lamar Jackson have lower team averages than the rest, but the greatest amount of total points. This is an interesting analysis because the they seemed to be carrying the rest of their team. I found it interesting that the Bills who didn’t have the highest fantasy scorer were on top somewhat substantially for an average. I think that it brings up the dilemma of whether you should spread the ball around for scores or have one person who always seems to score. I think further analysis could include running a regression between total points scored and the highest fantasy scorer for each team. This could show us how much the team really depends on them scoring. This should show us how important the fantasy points are to team’s scoring. Some teams may score at such a high rate that so many different people are scoring and contributing which is bringing down the average.
The final thing we want to look at is whether playing too many games leads to a player losing production. Does Production decrease after a certain amount of games? Using average score we hope to see whether a player might lose some of their production as they play more game.
We can see from a linear regression standpoint that there seems to be a constant upward slope with the average amount of games and the average points per game. We can see from the other fitted line that there seems to be more of a leveling off and possibly a small downward slope at the end. This shows that we would need to do some further analysis on this issue. I think that by performing a further regression that possibilities squares or cubes the games variable could be helpful. These adjusted models could give us a better idea of which model is more accurate and where there may be curves in the graph.
Fantasy Football is a game that is enjoyed by so many people each fall. I think that this data is very interesting to work with and see trends that may help me in my personal fantasy football endeavors. I think that some other data that could be fun to work with in tandem with this could include team records and stats or Standard scoring data. I think that being able to see the effect of fantasy play on overall standings would be a very intersting exercise. Also, being able to compare different scoring formats could seek to try and find the true value of each player.