Data_Dive_Week

R Markdown

Loading data

player_data <-read.csv('C:/Users/rohan/OneDrive/Desktop/INTRO TO STATISTICS IN R/DATA SETS/Datasets/Data/Nba_all_seasons_1996_2021.csv')

summary(player_data)

##        X         player_name        team_abbreviation       age       
##  Min.   :    0   Length:12305       Length:12305       Min.   :18.00  
##  1st Qu.: 3076   Class :character   Class :character   1st Qu.:24.00  
##  Median : 6152   Mode  :character   Mode  :character   Median :26.00  
##  Mean   : 6152                                         Mean   :27.08  
##  3rd Qu.: 9228                                         3rd Qu.:30.00  
##  Max.   :12304                                         Max.   :44.00  
##  player_height   player_weight      college            country         
##  Min.   :160.0   Min.   : 60.33   Length:12305       Length:12305      
##  1st Qu.:193.0   1st Qu.: 90.72   Class :character   Class :character  
##  Median :200.7   Median : 99.79   Mode  :character   Mode  :character  
##  Mean   :200.6   Mean   :100.37                                        
##  3rd Qu.:208.3   3rd Qu.:108.86                                        
##  Max.   :231.1   Max.   :163.29                                        
##   draft_year        draft_round        draft_number             gp       
##  Length:12305       Length:12305       Length:12305       Min.   : 1.00  
##  Class :character   Class :character   Class :character   1st Qu.:31.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :57.00  
##                                                           Mean   :51.29  
##                                                           3rd Qu.:73.00  
##                                                           Max.   :85.00  
##       pts              reb              ast           net_rating      
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   :-250.000  
##  1st Qu.: 3.600   1st Qu.: 1.800   1st Qu.: 0.600   1st Qu.:  -6.400  
##  Median : 6.700   Median : 3.000   Median : 1.200   Median :  -1.300  
##  Mean   : 8.173   Mean   : 3.559   Mean   : 1.814   Mean   :  -2.256  
##  3rd Qu.:11.500   3rd Qu.: 4.700   3rd Qu.: 2.400   3rd Qu.:   3.200  
##  Max.   :36.100   Max.   :16.300   Max.   :11.700   Max.   : 300.000  
##     oreb_pct          dreb_pct        usg_pct           ts_pct      
##  Min.   :0.00000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.02100   1st Qu.:0.096   1st Qu.:0.1490   1st Qu.:0.4800  
##  Median :0.04100   Median :0.131   Median :0.1810   Median :0.5240  
##  Mean   :0.05447   Mean   :0.141   Mean   :0.1849   Mean   :0.5111  
##  3rd Qu.:0.08400   3rd Qu.:0.180   3rd Qu.:0.2170   3rd Qu.:0.5610  
##  Max.   :1.00000   Max.   :1.000   Max.   :1.0000   Max.   :1.5000  
##     ast_pct          season         
##  Min.   :0.0000   Length:12305      
##  1st Qu.:0.0660   Class :character  
##  Median :0.1030   Mode  :character  
##  Mean   :0.1314                     
##  3rd Qu.:0.1780                     
##  Max.   :1.0000

Subsetting data

Creating to subsets of data of same team in different season to perform tests.

#data frame of chicago bulls teams
set_CB <- subset(player_data, team_abbreviation == "CHI")

#data frame of season 1996-97
set_s1 <- subset(set_CB, season == "1996-97")

#data frame of season 2006-2007
set_s2 <- subset(set_CB, season == "2006-07")

HYPOTHESIS

Is there a significant difference in the average points scored per game between players of one team in different seasons

H0: The average points scored per game is the similar for both the seasons

HA: The average points scored per game has a significant difference between the seasons.

Alpha Level (Significance Level): 0.05. The standard alpha level is often set at 0.05, indicating a 5% chance of rejecting the null hypothesis when it is true. This is a common and widely accepted level in statistical testing.

Power Level: 0.80. A power level of 0.80 is commonly used, indicating an 80% chance of detecting a true effect if it exists. This balance between alpha and power is often considered a reasonable compromise in statistical testing.

Minimum Effect Size: 0.3. The effect size represents the practical significance of the result. Choosing a minimum effect size of 0.3 means you are interested in detecting a moderate effect. This value may be determined based on domain knowledge or previous research.

t_test_result <- t.test(set_s1$pts, set_s2$pts)
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  set_s1$pts and set_s2$pts
## t = 0.10152, df = 26.93, p-value = 0.9199
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.196564  5.737517
## sample estimates:
## mean of x mean of y 
##  8.313333  8.042857

HYPOTHESIS

Is there a significant difference in the average assists per game between players of one team in different seasons

H0: The assists per game is the similar for both the seasons

HA: The assists per game has a significant difference between the seasons.

Alpha Level (Significance Level): 0.05. The standard alpha level is often set at 0.05, indicating a 5% chance of rejecting the null hypothesis when it is true. This is a common and widely accepted level in statistical testing.

Power Level: 0.80. A power level of 0.80 is commonly used, indicating an 80% chance of detecting a true effect if it exists. This balance between alpha and power is often considered a reasonable compromise in statistical testing.

Minimum Effect Size: 0.3. The effect size represents the practical significance of the result. Choosing a minimum effect size of 0.3 means you are interested in detecting a moderate effect. This value may be determined based on domain knowledge or previous research.

t_test_results <- t.test(set_s1$ast, set_s2$ast)
print(t_test_results)

## 
##  Welch Two Sample t-test
## 
## data:  set_s1$ast and set_s2$ast
## t = 0.52052, df = 26.388, p-value = 0.607
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.9540067  1.6016257
## sample estimates:
## mean of x mean of y 
##  2.166667  1.842857

HYPOTHESIS 1

Is there correlation between the points scored by the player and the assists made by the player.

H0: The assists made by the player helps player score more points.

HA: The assists made by the player doesn’t necessarily means more points scored.

Alpha Level (Significance Level): 0.05. The standard alpha level is often set at 0.05, indicating a 5% chance of rejecting the null hypothesis when it is true. This is a common and widely accepted level in statistical testing.

Power Level: 0.80. A power level of 0.80 is commonly used, indicating an 80% chance of detecting a true effect if it exists. This balance between alpha and power is often considered a reasonable compromise in statistical testing.

Minimum Effect Size: 0.3. The effect size represents the practical significance of the result. Choosing a minimum effect size of 0.3 means you are interested in detecting a moderate effect. This value may be determined based on domain knowledge or previous research.

correlation_test <- cor.test(set_s1$pts,set_s1$ast,method = 'pearson')
correlation_test_s2 <- cor.test(set_s2$pts,set_s2$ast,method = 'pearson')
print(correlation_test)

## 
##  Pearson's product-moment correlation
## 
## data:  set_s1$pts and set_s1$ast
## t = 5.0968, df = 13, p-value = 0.0002049
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5227137 0.9368499
## sample estimates:
##       cor 
## 0.8163777

print(correlation_test_s2)

## 
##  Pearson's product-moment correlation
## 
## data:  set_s2$pts and set_s2$ast
## t = 3.0117, df = 12, p-value = 0.01083
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1925575 0.8802540
## sample estimates:
##       cor 
## 0.6561047

HYPOTHESIS 2

Is there correlation between the points scored by the player and the assists made by the player.

H0: The Rebounds made by the player helps player score more points.

HA: The Rebounds made by the player doesn’t necessarily means more points scored.

Alpha Level (Significance Level): 0.05. The standard alpha level is often set at 0.05, indicating a 5% chance of rejecting the null hypothesis when it is true. This is a common and widely accepted level in statistical testing.

Power Level: 0.80. A power level of 0.80 is commonly used, indicating an 80% chance of detecting a true effect if it exists. This balance between alpha and power is often considered a reasonable compromise in statistical testing.

Minimum Effect Size: 0.3. The effect size represents the practical significance of the result. Choosing a minimum effect size of 0.3 means you are interested in detecting a moderate effect. This value may be determined based on domain knowledge or previous research.

correlation_test <- cor.test(set_s1$pts,set_s1$reb,method = 'pearson')
correlation_test_s2 <- cor.test(set_s2$pts,set_s2$reb,method = 'pearson')
print(correlation_test)

## 
##  Pearson's product-moment correlation
## 
## data:  set_s1$pts and set_s1$reb
## t = 1.1051, df = 13, p-value = 0.2891
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2579343  0.7001993
## sample estimates:
##       cor 
## 0.2930491

print(correlation_test_s2)

## 
##  Pearson's product-moment correlation
## 
## data:  set_s2$pts and set_s2$reb
## t = 1.3867, df = 12, p-value = 0.1907
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1979722  0.7536202
## sample estimates:
##       cor 
## 0.3716449

FISHER’S TEST FOR SIGNIFICANCE

TEST 1

CT <- table(set_s1$pts, set_s1$ast)

fisher_test_results <- fisher.test(CT, simulate.p.value = FALSE)

p_value <- fisher_test_results$p.value

cat("p: ", p_value, "\n")

## p:  1

if (p_value < 0.05) {
 
  print( "There is a significant association between Average points and Average assists")
} else {
 
  print("There is not enough evidence to conclude that there is a significant association between Average points and Average assists")
}

## [1] "There is not enough evidence to conclude that there is a significant association between Average points and Average assists"

TEST 2

CT<- table(set_s1$pts, set_s1$reb)

fisher_test_results <- fisher.test(CT, simulate.p.value = FALSE)

p_value <- fisher_test_results$p.value

cat("p: ", p_value, "\n")

## p:  1

if (p_value < 0.05) {

  print( "There is a significant association between Average points and Average rebounds")
} else {
 
  print("There is not enough evidence to conclude that there is a significant association between Average points and Average rebounds")
}

## [1] "There is not enough evidence to conclude that there is a significant association between Average points and Average rebounds"

VISUALIZATION 1

library(ggplot2)

# Create a scatter plot with a regression line
ggplot(set_s1, aes(x = ast, y = pts)) +
  geom_point() +
  geom_smooth(method = "lm", color = "blue", se = FALSE) +
  labs(
    title = "Relationship Between Assists and Points Per Game",
    x = "Assists",
    y = "Points Per Game"
  ) +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

VISUALIZATION 2

library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

data_summary <- set_s1 %>%
  group_by(set_s1$player_name) %>%
  summarise(TotalRebounds = sum(set_s1$reb), TotalPointsPerGame = sum(set_s1$pts))

# Create a bar chart
ggplot(data_summary, aes(x = set_s1$player_name, y = set_s1$pts, fill = set_s1$reb)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Total Points Per Game vs. Total Rebounds by Player",
    x = "player_name",
    y = "Total Points Per Game",
    fill = "Total Rebounds"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for better visibility

Data_Dive_Week_7

Rohan Royal

2023-10-09

R Markdown

Subsetting data

HYPOTHESIS

HYPOTHESIS

HYPOTHESIS 1

HYPOTHESIS 2

FISHER’S TEST FOR SIGNIFICANCE

VISUALIZATION 1

VISUALIZATION 2