IS 607 - Project 2 The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work. Your task is to:

  1. Choose any three of the “wide” datasets identified in the Week 6 Discussion items. (You may use your own dataset; please don’t use my Sample Post dataset, since that was used in your Week 6 assignment!) For each of the three chosen datasets:

Create a .CSV file (or optionally, a MySQL database!) that includes all of the information included in the dataset. You’re encouraged to use a “wide” structure similar to how the information appears in the discussion item, so that you can practice tidying and transformations as described below. Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. [Most of your grade will be based on this step!] Perform the analysis requested in the discussion item. Your code should be in an R Markdown file, posted to rpubs.com, and should include narrative descriptions of your data cleanup work, analysis, and conclusions.

  1. Please include in your homework submission, for each of the three chosen datasets: The URL to the .Rmd file in your GitHub repository, and The URL for your rpubs.com web page.

Data Acquired from Wikipedia:https://en.wikipedia.org/wiki/Aaron_Rodgers

For part A I’m looking at the career numbers for Aaron Rodgers. I’d like to compare his 2015, and 2016 number in a tabular model to see how they trajected in some of his best years.

Loading the packages

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Loading the data into R

Rawwide_data <- read.csv(paste0("https://raw.githubusercontent.com/Fyoun123/Data607/master/Project%202/ARP2.csv"),stringsAsFactors = F); Rawwide_data
##    Year Team Games...GP Games...GS Passing...Comp Passing...Att
## 1  2005   GB          3          0              9            16
## 2  2006   GB          2          0              6            15
## 3  2007   GB          2          0             20            28
## 4  2008   GB         16         16            341           536
## 5  2009   GB         16         16            350           541
## 6  2010   GB         15         15            312           475
## 7  2011   GB         15         15            343           502
## 8  2012   GB         16         16            371           552
## 9  2013   GB          9          9            193           290
## 10 2014   GB         16         16            341           520
## 11 2015   GB         16         16            347           572
## 12 2016   GB         16         16            401           610
## 13 2017   GB          7          7            154           238
## 14 2018   GB          4          4             99           156
##    Passing...Pct Passing...Yds Passing...Avg Passing...TD Passing...Int
## 1           56.3            65           4.1            0             1
## 2           40.0            46           3.1            0             0
## 3           71.4           218           7.8            1             0
## 4           63.6          4038           7.5           28            13
## 5           64.7          4434           8.2           30             7
## 6           65.7          3922           8.3           28            11
## 7           68.3          4643           9.2           45             6
## 8           67.2          4295           7.8           39             8
## 9           66.6          2536           8.7           17             6
## 10          65.6          4381           8.4           38             5
## 11          60.7          3821           6.7           31             8
## 12          65.7          4428           7.3           40             7
## 13          64.7          1675           7.0           16             6
## 14          63.5          1130           7.2            7             1
##    Passing...Rate Passing...TD. Passing...Int. Rushing...Att Rushing...Yds
## 1            39.8           0.0            6.3             2             7
## 2            48.2           0.0            0.0             2            11
## 3           106.0           3.6            0.0             7            29
## 4            93.8           5.2            2.4            56           207
## 5           103.2           5.5            1.3            58           316
## 6           101.2           5.9            2.3            64           356
## 7           122.5           9.0            1.2            60           257
## 8           108.0           7.1            1.4            54           259
## 9           104.9           5.9            2.1            30           120
## 10          112.2           7.3            1.0            43           269
## 11           92.7           5.4            1.4            58           344
## 12          104.2           6.6            1.1            67           369
## 13           97.2           6.7            2.5            24           126
## 14           97.4           4.5            0.6            11            67
##    Rushing...Avg Rushing...TD
## 1            3.5            0
## 2            5.5            0
## 3            4.1            0
## 4            3.7            4
## 5            5.4            5
## 6            5.6            4
## 7            4.3            3
## 8            4.8            2
## 9            3.3            0
## 10           6.3            2
## 11           5.9            1
## 12           5.5            4
## 13           5.2            0
## 14           6.1            0

Selecting portions I want

DF2 <- Rawwide_data %>%
    select(Year,Passing...Comp,Passing...Att,Passing...Pct,Passing...Avg,Passing...TD,Passing...Int,Passing...Rate)
  DF2
##    Year Passing...Comp Passing...Att Passing...Pct Passing...Avg
## 1  2005              9            16          56.3           4.1
## 2  2006              6            15          40.0           3.1
## 3  2007             20            28          71.4           7.8
## 4  2008            341           536          63.6           7.5
## 5  2009            350           541          64.7           8.2
## 6  2010            312           475          65.7           8.3
## 7  2011            343           502          68.3           9.2
## 8  2012            371           552          67.2           7.8
## 9  2013            193           290          66.6           8.7
## 10 2014            341           520          65.6           8.4
## 11 2015            347           572          60.7           6.7
## 12 2016            401           610          65.7           7.3
## 13 2017            154           238          64.7           7.0
## 14 2018             99           156          63.5           7.2
##    Passing...TD Passing...Int Passing...Rate
## 1             0             1           39.8
## 2             0             0           48.2
## 3             1             0          106.0
## 4            28            13           93.8
## 5            30             7          103.2
## 6            28            11          101.2
## 7            45             6          122.5
## 8            39             8          108.0
## 9            17             6          104.9
## 10           38             5          112.2
## 11           31             8           92.7
## 12           40             7          104.2
## 13           16             6           97.2
## 14            7             1           97.4

Using gather to make the data long.

DF3<- DF2 %>% gather(Type,Number,Passing...Comp:Passing...Rate)
DF3
##    Year           Type Number
## 1  2005 Passing...Comp    9.0
## 2  2006 Passing...Comp    6.0
## 3  2007 Passing...Comp   20.0
## 4  2008 Passing...Comp  341.0
## 5  2009 Passing...Comp  350.0
## 6  2010 Passing...Comp  312.0
## 7  2011 Passing...Comp  343.0
## 8  2012 Passing...Comp  371.0
## 9  2013 Passing...Comp  193.0
## 10 2014 Passing...Comp  341.0
## 11 2015 Passing...Comp  347.0
## 12 2016 Passing...Comp  401.0
## 13 2017 Passing...Comp  154.0
## 14 2018 Passing...Comp   99.0
## 15 2005  Passing...Att   16.0
## 16 2006  Passing...Att   15.0
## 17 2007  Passing...Att   28.0
## 18 2008  Passing...Att  536.0
## 19 2009  Passing...Att  541.0
## 20 2010  Passing...Att  475.0
## 21 2011  Passing...Att  502.0
## 22 2012  Passing...Att  552.0
## 23 2013  Passing...Att  290.0
## 24 2014  Passing...Att  520.0
## 25 2015  Passing...Att  572.0
## 26 2016  Passing...Att  610.0
## 27 2017  Passing...Att  238.0
## 28 2018  Passing...Att  156.0
## 29 2005  Passing...Pct   56.3
## 30 2006  Passing...Pct   40.0
## 31 2007  Passing...Pct   71.4
## 32 2008  Passing...Pct   63.6
## 33 2009  Passing...Pct   64.7
## 34 2010  Passing...Pct   65.7
## 35 2011  Passing...Pct   68.3
## 36 2012  Passing...Pct   67.2
## 37 2013  Passing...Pct   66.6
## 38 2014  Passing...Pct   65.6
## 39 2015  Passing...Pct   60.7
## 40 2016  Passing...Pct   65.7
## 41 2017  Passing...Pct   64.7
## 42 2018  Passing...Pct   63.5
## 43 2005  Passing...Avg    4.1
## 44 2006  Passing...Avg    3.1
## 45 2007  Passing...Avg    7.8
## 46 2008  Passing...Avg    7.5
## 47 2009  Passing...Avg    8.2
## 48 2010  Passing...Avg    8.3
## 49 2011  Passing...Avg    9.2
## 50 2012  Passing...Avg    7.8
## 51 2013  Passing...Avg    8.7
## 52 2014  Passing...Avg    8.4
## 53 2015  Passing...Avg    6.7
## 54 2016  Passing...Avg    7.3
## 55 2017  Passing...Avg    7.0
## 56 2018  Passing...Avg    7.2
## 57 2005   Passing...TD    0.0
## 58 2006   Passing...TD    0.0
## 59 2007   Passing...TD    1.0
## 60 2008   Passing...TD   28.0
## 61 2009   Passing...TD   30.0
## 62 2010   Passing...TD   28.0
## 63 2011   Passing...TD   45.0
## 64 2012   Passing...TD   39.0
## 65 2013   Passing...TD   17.0
## 66 2014   Passing...TD   38.0
## 67 2015   Passing...TD   31.0
## 68 2016   Passing...TD   40.0
## 69 2017   Passing...TD   16.0
## 70 2018   Passing...TD    7.0
## 71 2005  Passing...Int    1.0
## 72 2006  Passing...Int    0.0
## 73 2007  Passing...Int    0.0
## 74 2008  Passing...Int   13.0
## 75 2009  Passing...Int    7.0
## 76 2010  Passing...Int   11.0
## 77 2011  Passing...Int    6.0
## 78 2012  Passing...Int    8.0
## 79 2013  Passing...Int    6.0
## 80 2014  Passing...Int    5.0
## 81 2015  Passing...Int    8.0
## 82 2016  Passing...Int    7.0
## 83 2017  Passing...Int    6.0
## 84 2018  Passing...Int    1.0
## 85 2005 Passing...Rate   39.8
## 86 2006 Passing...Rate   48.2
## 87 2007 Passing...Rate  106.0
## 88 2008 Passing...Rate   93.8
## 89 2009 Passing...Rate  103.2
## 90 2010 Passing...Rate  101.2
## 91 2011 Passing...Rate  122.5
## 92 2012 Passing...Rate  108.0
## 93 2013 Passing...Rate  104.9
## 94 2014 Passing...Rate  112.2
## 95 2015 Passing...Rate   92.7
## 96 2016 Passing...Rate  104.2
## 97 2017 Passing...Rate   97.2
## 98 2018 Passing...Rate   97.4

Filtering the data to only show 2015 and 2016

DF4 <- DF3 %>% filter(Year==2015|Year==2016)
DF4
##    Year           Type Number
## 1  2015 Passing...Comp  347.0
## 2  2016 Passing...Comp  401.0
## 3  2015  Passing...Att  572.0
## 4  2016  Passing...Att  610.0
## 5  2015  Passing...Pct   60.7
## 6  2016  Passing...Pct   65.7
## 7  2015  Passing...Avg    6.7
## 8  2016  Passing...Avg    7.3
## 9  2015   Passing...TD   31.0
## 10 2016   Passing...TD   40.0
## 11 2015  Passing...Int    8.0
## 12 2016  Passing...Int    7.0
## 13 2015 Passing...Rate   92.7
## 14 2016 Passing...Rate  104.2

Renaming the columns

colnames(DF4)[1] <- "2015/2016"
colnames(DF4)[2] <- "Type"

Using spread to make the model tabular for comparison.

DF4$Type <- factor(DF4$Type)
DF5 <- spread(DF4,"2015/2016","Number")
DF5
##             Type  2015  2016
## 1  Passing...Att 572.0 610.0
## 2  Passing...Avg   6.7   7.3
## 3 Passing...Comp 347.0 401.0
## 4  Passing...Int   8.0   7.0
## 5  Passing...Pct  60.7  65.7
## 6 Passing...Rate  92.7 104.2
## 7   Passing...TD  31.0  40.0