- For our project, we chose to analyze NBA player data. The data set contains over two decades of data on each player who has been part of an NBA teams’ roster.
The data set includes various stats for NBA players from 1996 to 2020.
The stats that we will mostly use are:
Best individual seasons
For this plot, our goal is to show only the players who were better than average in each statistical category. The data was achieved with the following code:
#only show players who are above average in PTS, REB, and AST
# with Net_Rating > 0
plot1data <-stats %>% filter(
stats$pts > mean(stats$pts, na.rm = TRUE),
stats$reb > mean(stats$reb, na.rm = TRUE),
stats$ast > mean(stats$ast, na.rm = TRUE),
stats$gp > 41, #played over half season
stats$net_rating > 2)#positive impact
This scatter plot consists of all the players who were above average in PTS, REB and AST who also had a positive net rating when on the court (min 41 games played).
Code for the previous plot:
plot1 <- plot_ly(plot1data,mode = 'markers', x = ~pts, y = ~reb, z=~ast,
marker=list(
size=5,
color=plot1data$year,
colorbar=list(
title='Year'),
colorscale='Viridis',
reversescale =T),
text = ~paste(plot1data$player_name, plot1data$season))
plot1 <- plot1 %>% add_markers()
plot1 <- plot1%>% layout(title = "Best NBA seasons")
plot1 <- plot1 %>% layout(scene = list(xaxis = list(title = 'PPG'),
yaxis = list(title = 'RPG'),
zaxis = list(title = 'APG')))
NBA player heights
This plot will show the distribution of heights throughout each NBA season. The data was achieved with the following code:
plot2 <- plot_ly(stats, x=~year, y = ~player_height,
type = "box",
marker = list(color = 'rgb(255,1,1)'),
line = list(color = 'rgb(0,0,0)'),
text = ~paste(stats$player_name, stats$season))
plot2 <- plot2%>%layout(title = "NBA Player Height Each Year",
yaxis = list(title ="Player Height (cm)"),
xaxis= list(title = "Year"))
What colleges produced the most NBA players?
This plot will show what colleges current NBA players attended in their years before entering the NBA.
To do this, we used the code below.
#get players from 2019 plot3data <- stats %>% filter(stats$year == 2019) plot3data = plot3data%>%group_by(college)%>%mutate(count=n()) uniqueColleges = plot3data %>% distinct(college, .keep_all = TRUE) #only count colleges with at least 5 players uniqueColleges = uniqueColleges[uniqueColleges$count >= 5,]
Below is the code that we used to achieve the previous plot:
plot3 <-ggplot(uniqueColleges, aes(x = "", y = uniqueColleges$count,
fill = uniqueColleges$college))+
geom_bar(width = 1, stat = "identity", color = "white") +
labs(x = "", y = "", title = "2019 NBA Player's Colleges \n",
fill = "Colleges") +
geom_text(aes(label = uniqueColleges$count),
position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y")
Highest PPG for each year
#group by each year and get top 1 in PTS plot4data<-stats %>% group_by(year) %>% top_n(1, pts)
plot4data<-stats %>% group_by(year) %>% top_n(1, pts)
plot4<-ggplot(data=plot4data, aes(x=plot4data$year,
y=plot4data$pts, width = 0.65))+
geom_bar(stat="identity", fill="orange")+
geom_text(aes(label=plot4data$pts), vjust=-0.3, size=3.5)+
ggtitle("Highest PPG for Each Season")+
ylab("Points Per Game")+
xlab("Year")+
theme_minimal()
height<- stats$player_height pts <- stats$pts traindata <- data.frame(height, pts)
model <- lm(pts ~ height, data = traindata)
traindata %>%
plot_ly(x = ~height) %>%
add_markers(y = ~pts) %>%
add_lines(x = ~height, y = fitted(model))%>%
layout(title = 'Height Vs Pts ', xaxis = list(title = 'Player Height(cm)'),
yaxis = list(title = 'PTS'), showlegend = F)
MultipleR
## [1] 0.003663154
AdjustedR
## [1] 0.003577983
We can see that there is no connection between height and NBA scoring ability in the NBA based on our NBA player data.