player_data <-read.csv('C:/Users/rohan/OneDrive/Desktop/INTRO TO STATISTICS IN R/DATA SETS/Datasets/Data/Nba_all_seasons_1996_2021.csv')
summary(player_data)
## X player_name team_abbreviation age
## Min. : 0 Length:12305 Length:12305 Min. :18.00
## 1st Qu.: 3076 Class :character Class :character 1st Qu.:24.00
## Median : 6152 Mode :character Mode :character Median :26.00
## Mean : 6152 Mean :27.08
## 3rd Qu.: 9228 3rd Qu.:30.00
## Max. :12304 Max. :44.00
## player_height player_weight college country
## Min. :160.0 Min. : 60.33 Length:12305 Length:12305
## 1st Qu.:193.0 1st Qu.: 90.72 Class :character Class :character
## Median :200.7 Median : 99.79 Mode :character Mode :character
## Mean :200.6 Mean :100.37
## 3rd Qu.:208.3 3rd Qu.:108.86
## Max. :231.1 Max. :163.29
## draft_year draft_round draft_number gp
## Length:12305 Length:12305 Length:12305 Min. : 1.00
## Class :character Class :character Class :character 1st Qu.:31.00
## Mode :character Mode :character Mode :character Median :57.00
## Mean :51.29
## 3rd Qu.:73.00
## Max. :85.00
## pts reb ast net_rating
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :-250.000
## 1st Qu.: 3.600 1st Qu.: 1.800 1st Qu.: 0.600 1st Qu.: -6.400
## Median : 6.700 Median : 3.000 Median : 1.200 Median : -1.300
## Mean : 8.173 Mean : 3.559 Mean : 1.814 Mean : -2.256
## 3rd Qu.:11.500 3rd Qu.: 4.700 3rd Qu.: 2.400 3rd Qu.: 3.200
## Max. :36.100 Max. :16.300 Max. :11.700 Max. : 300.000
## oreb_pct dreb_pct usg_pct ts_pct
## Min. :0.00000 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.02100 1st Qu.:0.096 1st Qu.:0.1490 1st Qu.:0.4800
## Median :0.04100 Median :0.131 Median :0.1810 Median :0.5240
## Mean :0.05447 Mean :0.141 Mean :0.1849 Mean :0.5111
## 3rd Qu.:0.08400 3rd Qu.:0.180 3rd Qu.:0.2170 3rd Qu.:0.5610
## Max. :1.00000 Max. :1.000 Max. :1.0000 Max. :1.5000
## ast_pct season
## Min. :0.0000 Length:12305
## 1st Qu.:0.0660 Class :character
## Median :0.1030 Mode :character
## Mean :0.1314
## 3rd Qu.:0.1780
## Max. :1.0000
Convertng the identified time column into a date format suitable for time series analysis.The season column in the data set is taken for time series.
# Loading necessary libraries
library(readr)
## Warning: package 'readr' was built under R version 4.3.2
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.3.2
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(tsibble)
## Warning: package 'tsibble' was built under R version 4.3.2
##
## Attaching package: 'tsibble'
## The following object is masked from 'package:lubridate':
##
## interval
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
library(ggplot2)
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.3.2
library(forecast)
## Warning: package 'forecast' was built under R version 4.3.2
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
player_data$date <- as.Date(player_data$season, format = "%Y")
head(player_data)
## X player_name team_abbreviation age player_height player_weight
## 1 0 Dennis Rodman CHI 36 198.12 99.79024
## 2 1 Dwayne Schintzius LAC 28 215.90 117.93392
## 3 2 Earl Cureton TOR 39 205.74 95.25432
## 4 3 Ed O'Bannon DAL 24 203.20 100.69742
## 5 4 Ed Pinckney MIA 34 205.74 108.86208
## 6 5 Eddie Johnson HOU 38 200.66 97.52228
## college country draft_year draft_round draft_number gp
## 1 Southeastern Oklahoma State USA 1986 2 27 55
## 2 Florida USA 1990 1 24 15
## 3 Detroit Mercy USA 1979 3 58 9
## 4 UCLA USA 1995 1 9 64
## 5 Villanova USA 1985 1 10 27
## 6 Illinois USA 1981 2 29 52
## pts reb ast net_rating oreb_pct dreb_pct usg_pct ts_pct ast_pct season
## 1 5.7 16.1 3.1 16.1 0.186 0.323 0.100 0.479 0.113 1996-97
## 2 2.3 1.5 0.3 12.3 0.078 0.151 0.175 0.430 0.048 1996-97
## 3 0.8 1.0 0.4 -2.1 0.105 0.102 0.103 0.376 0.148 1996-97
## 4 3.7 2.3 0.6 -8.7 0.060 0.149 0.167 0.399 0.077 1996-97
## 5 2.4 2.4 0.2 -11.2 0.109 0.179 0.127 0.611 0.040 1996-97
## 6 8.2 2.7 1.0 4.1 0.034 0.126 0.220 0.541 0.102 1996-97
## date
## 1 1996-11-14
## 2 1996-11-14
## 3 1996-11-14
## 4 1996-11-14
## 5 1996-11-14
## 6 1996-11-14
data_agg <- player_data %>%
group_by(date) %>%
summarize(pts = mean(pts, na.rm = TRUE))
this code snippet is used to aggregate the dataset to find the average points scored per game on each date present in the dataset, ignoring any missing values. The result is stored in a new data frame called data_agg.
time_series_data <- data_agg %>%
as_tsibble(index = date)
Time series analysis can be performed on the final product, time_series_data, which is a tsibble object with each row indicating the average points scored per game on each distinct date. When used with time series functions and models that are compatible with the tsibble infrastructure, like those included in the forecast and fable packages, the tsibble object is especially useful. It makes handling time series data simple, making temporal trend visualisation, summarization, and filtering possible.
ggplot(time_series_data, aes(x = date, y = pts)) +
geom_line() +
labs(title = "Points Per Game Over Time", x = "Year", y = "Points Per Game")
It looks that the plot you gave illustrates the “Points Per Game” trend
from roughly 1996 to 2021. This is a thorough analysis: Time Period:
Showing a period of more than 20 years, the x-axis shows years from just
before 2000 to just after 2020. Data Fluctuations: The points per game
are shown on the y-axis, and they change with time. These swings reflect
differences in the season-to-season average performance metrics of NBA
games. Initial Drop and Recovery: The points per game show a significant
initial decline, indicating a period of lower league scoring. There is a
recovery and a slow increase after this decline, suggesting a move
towards higher-scoring games. Mid-Period Decline: A discernible decline
begins in the middle of the 2000s. This may indicate a time when rules
may have affected scoring or when defensive tactics were more prevalent.
Noteworthy Rise After 2010: The number of points scored in a game
increases significantly and peaks around 2020. This could be the result
of a number of things, including modifications to the rules that
encourage offensive play, advancements in player abilities and tactics,
or adjustments to the game’s tempo. Volatility in Recent Years: There
has been a notable spike in volatility in the most recent years,
followed by a drop in points earned. Recent occurrences, roster
modifications for players, or other outside variables impacting the
league could have an impact on this. Potential Cyclicality:
Periodically, there seems to be a cyclic pattern with increases and
decreases in the number of points scored per game. It’s unclear, though,
if these trends are purely coincidental or statistically significant
without more investigation. Rising Trend: In spite of the fluctuations,
there appears to be a general upward trend indicating that the games are
getting higher scores over time.