Loading data

player_data <-read.csv('C:/Users/rohan/OneDrive/Desktop/INTRO TO STATISTICS IN R/DATA SETS/Datasets/Data/Nba_all_seasons_1996_2021.csv')

summary(player_data)
##        X         player_name        team_abbreviation       age       
##  Min.   :    0   Length:12305       Length:12305       Min.   :18.00  
##  1st Qu.: 3076   Class :character   Class :character   1st Qu.:24.00  
##  Median : 6152   Mode  :character   Mode  :character   Median :26.00  
##  Mean   : 6152                                         Mean   :27.08  
##  3rd Qu.: 9228                                         3rd Qu.:30.00  
##  Max.   :12304                                         Max.   :44.00  
##  player_height   player_weight      college            country         
##  Min.   :160.0   Min.   : 60.33   Length:12305       Length:12305      
##  1st Qu.:193.0   1st Qu.: 90.72   Class :character   Class :character  
##  Median :200.7   Median : 99.79   Mode  :character   Mode  :character  
##  Mean   :200.6   Mean   :100.37                                        
##  3rd Qu.:208.3   3rd Qu.:108.86                                        
##  Max.   :231.1   Max.   :163.29                                        
##   draft_year        draft_round        draft_number             gp       
##  Length:12305       Length:12305       Length:12305       Min.   : 1.00  
##  Class :character   Class :character   Class :character   1st Qu.:31.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :57.00  
##                                                           Mean   :51.29  
##                                                           3rd Qu.:73.00  
##                                                           Max.   :85.00  
##       pts              reb              ast           net_rating      
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   :-250.000  
##  1st Qu.: 3.600   1st Qu.: 1.800   1st Qu.: 0.600   1st Qu.:  -6.400  
##  Median : 6.700   Median : 3.000   Median : 1.200   Median :  -1.300  
##  Mean   : 8.173   Mean   : 3.559   Mean   : 1.814   Mean   :  -2.256  
##  3rd Qu.:11.500   3rd Qu.: 4.700   3rd Qu.: 2.400   3rd Qu.:   3.200  
##  Max.   :36.100   Max.   :16.300   Max.   :11.700   Max.   : 300.000  
##     oreb_pct          dreb_pct        usg_pct           ts_pct      
##  Min.   :0.00000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.02100   1st Qu.:0.096   1st Qu.:0.1490   1st Qu.:0.4800  
##  Median :0.04100   Median :0.131   Median :0.1810   Median :0.5240  
##  Mean   :0.05447   Mean   :0.141   Mean   :0.1849   Mean   :0.5111  
##  3rd Qu.:0.08400   3rd Qu.:0.180   3rd Qu.:0.2170   3rd Qu.:0.5610  
##  Max.   :1.00000   Max.   :1.000   Max.   :1.0000   Max.   :1.5000  
##     ast_pct          season         
##  Min.   :0.0000   Length:12305      
##  1st Qu.:0.0660   Class :character  
##  Median :0.1030   Mode  :character  
##  Mean   :0.1314                     
##  3rd Qu.:0.1780                     
##  Max.   :1.0000

Creating a time series column

Convertng the identified time column into a date format suitable for time series analysis.The season column in the data set is taken for time series.

# Loading necessary libraries
library(readr)
## Warning: package 'readr' was built under R version 4.3.2
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.3.2
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(tsibble)
## Warning: package 'tsibble' was built under R version 4.3.2
## 
## Attaching package: 'tsibble'
## The following object is masked from 'package:lubridate':
## 
##     interval
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union
library(ggplot2)
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.3.2
library(forecast)
## Warning: package 'forecast' was built under R version 4.3.2
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
player_data$date <- as.Date(player_data$season, format = "%Y")
head(player_data)
##   X       player_name team_abbreviation age player_height player_weight
## 1 0     Dennis Rodman               CHI  36        198.12      99.79024
## 2 1 Dwayne Schintzius               LAC  28        215.90     117.93392
## 3 2      Earl Cureton               TOR  39        205.74      95.25432
## 4 3       Ed O'Bannon               DAL  24        203.20     100.69742
## 5 4       Ed Pinckney               MIA  34        205.74     108.86208
## 6 5     Eddie Johnson               HOU  38        200.66      97.52228
##                       college country draft_year draft_round draft_number gp
## 1 Southeastern Oklahoma State     USA       1986           2           27 55
## 2                     Florida     USA       1990           1           24 15
## 3               Detroit Mercy     USA       1979           3           58  9
## 4                        UCLA     USA       1995           1            9 64
## 5                   Villanova     USA       1985           1           10 27
## 6                    Illinois     USA       1981           2           29 52
##   pts  reb ast net_rating oreb_pct dreb_pct usg_pct ts_pct ast_pct  season
## 1 5.7 16.1 3.1       16.1    0.186    0.323   0.100  0.479   0.113 1996-97
## 2 2.3  1.5 0.3       12.3    0.078    0.151   0.175  0.430   0.048 1996-97
## 3 0.8  1.0 0.4       -2.1    0.105    0.102   0.103  0.376   0.148 1996-97
## 4 3.7  2.3 0.6       -8.7    0.060    0.149   0.167  0.399   0.077 1996-97
## 5 2.4  2.4 0.2      -11.2    0.109    0.179   0.127  0.611   0.040 1996-97
## 6 8.2  2.7 1.0        4.1    0.034    0.126   0.220  0.541   0.102 1996-97
##         date
## 1 1996-11-14
## 2 1996-11-14
## 3 1996-11-14
## 4 1996-11-14
## 5 1996-11-14
## 6 1996-11-14

Creating response variable

data_agg <- player_data %>%
  group_by(date) %>%
  summarize(pts = mean(pts, na.rm = TRUE))

this code snippet is used to aggregate the dataset to find the average points scored per game on each date present in the dataset, ignoring any missing values. The result is stored in a new data frame called data_agg.

Creating tsibble Object

time_series_data <- data_agg %>%
  as_tsibble(index = date)

Time series analysis can be performed on the final product, time_series_data, which is a tsibble object with each row indicating the average points scored per game on each distinct date. When used with time series functions and models that are compatible with the tsibble infrastructure, like those included in the forecast and fable packages, the tsibble object is especially useful. It makes handling time series data simple, making temporal trend visualisation, summarization, and filtering possible.

ggplot(time_series_data, aes(x = date, y = pts)) +
  geom_line() +
  labs(title = "Points Per Game Over Time", x = "Year", y = "Points Per Game")

It looks that the plot you gave illustrates the “Points Per Game” trend from roughly 1996 to 2021. This is a thorough analysis: Time Period: Showing a period of more than 20 years, the x-axis shows years from just before 2000 to just after 2020. Data Fluctuations: The points per game are shown on the y-axis, and they change with time. These swings reflect differences in the season-to-season average performance metrics of NBA games. Initial Drop and Recovery: The points per game show a significant initial decline, indicating a period of lower league scoring. There is a recovery and a slow increase after this decline, suggesting a move towards higher-scoring games. Mid-Period Decline: A discernible decline begins in the middle of the 2000s. This may indicate a time when rules may have affected scoring or when defensive tactics were more prevalent. Noteworthy Rise After 2010: The number of points scored in a game increases significantly and peaks around 2020. This could be the result of a number of things, including modifications to the rules that encourage offensive play, advancements in player abilities and tactics, or adjustments to the game’s tempo. Volatility in Recent Years: There has been a notable spike in volatility in the most recent years, followed by a drop in points earned. Recent occurrences, roster modifications for players, or other outside variables impacting the league could have an impact on this. Potential Cyclicality: Periodically, there seems to be a cyclic pattern with increases and decreases in the number of points scored per game. It’s unclear, though, if these trends are purely coincidental or statistically significant without more investigation. Rising Trend: In spite of the fluctuations, there appears to be a general upward trend indicating that the games are getting higher scores over time.