First - load the necessary libraries…
library(dplyr)
library(tidyr)
Now, taking the result from the scraped/cleaned data produced in tutorial #3, I will begin this manipulation process. First I want to convert the existing data into long form and do so with the gather function… ***
Old…
receiving.df <- read.csv("receiving.df.csv", stringsAsFactors = F)
head(receiving.df)
## PLAYER POS TEAM REC TAR YDS AVG TD LONG Twenty.Plus YDS.G
## 1 Steve Smith Sr. WR CAR 103 0 1563 15.2 12 80 22 97.7
## 2 Santana Moss WR WSH 84 0 1483 17.7 9 78 24 92.7
## 3 Chad Johnson WR CIN 97 0 1432 14.8 9 70 16 89.5
## 4 Larry Fitzgerald WR ARI 103 0 1409 13.7 10 47 27 88.1
## 5 Anquan Boldin WR ARI 102 0 1402 13.7 7 54 21 100.1
## 6 Torry Holt WR LA 102 0 1331 13.0 9 44 15 95.1
## FUM YAC First.Dns Year
## 1 1 0 70 2005
## 2 2 0 60 2005
## 3 1 0 74 2005
## 4 0 0 67 2005
## 5 2 0 68 2005
## 6 2 0 63 2005
New…
receiving.df1 <- receiving.df %>%
gather("Pro.Record", "Value", 4:14)
head(receiving.df1)
## PLAYER POS TEAM Year Pro.Record Value
## 1 Steve Smith Sr. WR CAR 2005 REC 103
## 2 Santana Moss WR WSH 2005 REC 84
## 3 Chad Johnson WR CIN 2005 REC 97
## 4 Larry Fitzgerald WR ARI 2005 REC 103
## 5 Anquan Boldin WR ARI 2005 REC 102
## 6 Torry Holt WR LA 2005 REC 102
Now create a column that holds negative results - in this case fumbles - and title it “Con.Record”.
receiving.df1 <- receiving.df1 %>%
mutate(Con.Record = "FUM")
head(receiving.df1)
## PLAYER POS TEAM Year Pro.Record Value Con.Record
## 1 Steve Smith Sr. WR CAR 2005 REC 103 FUM
## 2 Santana Moss WR WSH 2005 REC 84 FUM
## 3 Chad Johnson WR CIN 2005 REC 97 FUM
## 4 Larry Fitzgerald WR ARI 2005 REC 103 FUM
## 5 Anquan Boldin WR ARI 2005 REC 102 FUM
## 6 Torry Holt WR LA 2005 REC 102 FUM
Now that this Con.Record column has been created - change every variable that isn’t “FUM” into “NA”…
receiving.df1$Con.Record[receiving.df1$Pro.Record != "FUM"] <- NA
head(receiving.df1)
## PLAYER POS TEAM Year Pro.Record Value Con.Record
## 1 Steve Smith Sr. WR CAR 2005 REC 103 <NA>
## 2 Santana Moss WR WSH 2005 REC 84 <NA>
## 3 Chad Johnson WR CIN 2005 REC 97 <NA>
## 4 Larry Fitzgerald WR ARI 2005 REC 103 <NA>
## 5 Anquan Boldin WR ARI 2005 REC 102 <NA>
## 6 Torry Holt WR LA 2005 REC 102 <NA>
and…
receiving.df1<- receiving.df1 %>% arrange(desc(Con.Record))
head(receiving.df1)
## PLAYER POS TEAM Year Pro.Record Value Con.Record
## 1 Steve Smith Sr. WR CAR 2005 FUM 1 FUM
## 2 Santana Moss WR WSH 2005 FUM 2 FUM
## 3 Chad Johnson WR CIN 2005 FUM 1 FUM
## 4 Larry Fitzgerald WR ARI 2005 FUM 0 FUM
## 5 Anquan Boldin WR ARI 2005 FUM 2 FUM
## 6 Torry Holt WR LA 2005 FUM 2 FUM
Make a second value column, titled “Value2”, that will hold negative values for the “FUM” variable…
receiving.df1<- mutate(receiving.df1
, Value2 = ifelse(Con.Record != "NA", -Value, Value)) %>%
arrange(desc(Con.Record))
head(receiving.df1)
## PLAYER POS TEAM Year Pro.Record Value Con.Record Value2
## 1 Steve Smith Sr. WR CAR 2005 FUM 1 FUM -1
## 2 Santana Moss WR WSH 2005 FUM 2 FUM -2
## 3 Chad Johnson WR CIN 2005 FUM 1 FUM -1
## 4 Larry Fitzgerald WR ARI 2005 FUM 0 FUM 0
## 5 Anquan Boldin WR ARI 2005 FUM 2 FUM -2
## 6 Torry Holt WR LA 2005 FUM 2 FUM -2
After doing all of this we are essentially where we want to be; however, I only want to include the variables for the appropriate positions under the column “POS” (e.g wide receivers, running backs, etc.). So, to do this, follow the following code.
# First, find out what POS are in this df with this code below...
Pro.Record.POS<- unique(receiving.df1[["POS"]])
print(Pro.Record.POS)
## [1] " WR" " TE" " RB" " FB" " CB" " QB" " LS" " S" " OL" " FS" " DE"
## [12] " DB" " OT" " DT" " LB"
# Then, filter for the desired positions under column "POS" (Note: I have used "filter" on string variables before, but for some reason it isn't working today: so I used grepl w/ filter for this...
receiving.df11 <- receiving.df1 %>%
filter(grepl("WR|RB|TE|FB", POS))
Pro.Record.POS<- unique(receiving.df11[["POS"]])
print(Pro.Record.POS)
## [1] " WR" " TE" " RB" " FB"
Done! Now this data can be used for a variety of purposes including a weighting algorithm / function that I am planning to publish in a later tutorial.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.