Tutorial 1


For this example I will read a csv file into R to make a data frame, alter said df by removing certain columns, reorganizing the columns, and removing duplicate rows. I will be using one of my data frames; however, you can use my demonstration on any df for various reasons. The reason I am altering this particular df, titled “NFL.2015_DF.NoCoords,” is simply because it has data that is unnecessary for a certain project. Moreover, this extra data is causing several duplicated rows.

First, I’ll be using dplyr and tidyr, so I read in all the libraries as shown below>>>

library(dplyr)
library(tidyr)

Following this, I set my working directory to “/Users/mitchell/R/R.DFs” and then create the df titled “NFL.2015_DF” by using the read.csv command - again, depicted below>>>

setwd("/Users/mitchell/R/R.DFs")
NFL.2015_DF<- read.csv("The.Table.csv", stringsAsFactors = F)

This is an extensive df. It is derived from a df that has every NFL players’ function records for 2015-2016 along with the coordinates that place each player in their respective position on the field. Below is an image of the df. There are three particular columns that are causing many rows to be duplicated: namely the columns titled “Formation,” “X_Coords” & “Y_Coords.” That is, if I were to remove these three columns thousands of rows would be exact replicas of others–and therefore unnecessary. Now, because I want to manipulate this data further, and not using the coordinates to display this data at this time, I just want to remove these columns, create a csv file for them, and join them back at a later time. Before I do this, take a look at the df I will be working on–while keeping in mind the effects that the columns “Formation,” “X_Coords” & “Y_Coords” have on the rest of the data.


As you can see above, there are more columns and rows than necessary That is, because the formation and coordinate columns aren’t necessary for this particular project, they should be removed. Then, if they are removed, I will be left with a lot of identical rows, which of course are not needed. As such, I want to remove these three columns and the resulting identical rows, while also arranging the columns in a more appropriate sequence. This is done quite easily by selecting the desired columns and with the distinct command. Also, I arrange the columns with the select command simply by entering the original column number in the order I want it presented in the new df.

NFL.2015_DF.NoCoords<- NFL.2015_DF%>%
  select(1,4,6,9,3,10,2,8,5,7,17:15,18, 20:23)%>%
  distinct()

This is what the resulting df will look like.



Then, I want to set the working directory to “/Users/mitchell/R/NFL.csv_folder” and store my new csv file there: and do so using the “write.csv” command.

setwd("/Users/mitchell/R/NFL.csv_folder")
write.csv(NFL.2015_DF.NoCoords, "NFL.2015_DF.NoCoords.csv", row.names = F)

I changed the working directory because NFL.csv_folder is the folder I keep all of my NFL csv files in. Which reminds me, I want to create a csv of just the formations and coordinates. (Note: I also want to include the player position column titled “Pos” because I will use it in order to join this df to other NFL data. However, that is for a tutorial titled “Manipulating Data with dp” so I can join this df later). This, as you now know can be easily done; however, because there are a lot of column names in my sub-root df I want to list them so I can choose them by their number. I do this with the following command>>>

names(NFL.2015_DF)
##  [1] "Player"        "Pos"           "Team.Type"     "Team"         
##  [5] "Quality"       "J.Number"      "Value"         "String"       
##  [9] "City"          "General"       "Status"        "Formation"    
## [13] "X_Coords"      "Y_Coords"      "Avg"           "Max"          
## [17] "Median"        "SD"            "SDCat"         "Grade"        
## [21] "Average.Grade" "Max.Avg"       "True.Grade"

…which produces the above list. Now, very easily, I can choose the columns I want to include and do so with the following code>>>

NFL_Forms.and.Coords<- NFL.2015_DF%>%
  select(2,12,13,14)%>%
  distinct()%>%
  write.csv("NFL_Forms.and.Coords.csv", row.names = F)

…and there you have it! Now, all the data I need for my new project will be pulled from my new csv titled “NFL.2015_DF.NoCoords.csv” and I have all the positions, formations, and coordinates in a csv that can be joined with it when I need it. Done!

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.