assignment

Kevin Potter

February 2, 2020

Introduction

Link to the dataset: https://github.com/fivethirtyeight/data/blob/master/nba-draymond/draymond.csv
Link to the data dictionary: https://github.com/fivethirtyeight/data/blob/master/nba-draymond/README.md

This dataset looks at an alternative way to evaluate the production from a defensive player. It aims to look at the space created between a player and the person he is guarding. In the new era of high scoring basketball it quickly being adopted as a worthy metric for evaluation. The information provided in this dataset is number of possessions played and the DRAYNOND rating for each player in the NBA since 2014. The DRAYMOND rating can be +/- based on the league average.

Import / Load Data

The below cell are for the purpose of loading the appropriate libraries, data and converting the data into a dataframe.

 #load libraries
library(RCurl)

# load in the dataset for github repo
csv_dl <- getURL('https://raw.githubusercontent.com/fivethirtyeight/data/master/nba-draymond/draymond.csv')
# convert to dataframe
df <- read.csv( text = csv_dl)
# print first 5 rows
head(df)

##   season       player possessions   DRAYMOND
## 1   2017   AJ Hammons    331.0258 -0.1766801
## 2   2014     AJ Price    211.7156  5.9121720
## 3   2015     AJ Price    633.5186 -1.7909210
## 4   2014 Aaron Brooks   3257.9340 -0.9529003
## 5   2015 Aaron Brooks   3984.0440 -0.1861272
## 6   2016 Aaron Brooks   2276.0170  2.2965770

Exploratory Data Analysis

The below are an introductory look into the data. The last cell creates a subset of the data containing only scores for the year 2014 and looks at summary statistics of DRAYMOND for that year.

# summary statistics of the DRAYMOND column
summary(df$DRAYMOND)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -45.07268  -1.09302  -0.06452  -0.08387   0.97151  61.77634

# plots a histogram of DRAYMOND
hist(df$DRAYMOND,
     breaks = 10,
     xlab = 'DRAYMOND', 
     main =  'Distribution of DRAYMOND')

# creates a new df of 
df_2014 <- subset(df, season == 2014)
# print summary stats of subset
summary(df_2014$DRAYMOND)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -25.02556  -1.05128  -0.07111   0.07766   1.16220  32.00608

Conclusion

In conclusion I might look to add additional stats from the same season and see how the DRAYMOND score related to other statistics like steals or block. I might also look into grouping the stats as a team and see if it could be used to predict points allowed per game.