Loading data
player_data <-read.csv('C:/Users/rohan/OneDrive/Desktop/INTRO TO STATISTICS IN R/DATA SETS/Datasets/Data/Nba_all_seasons_1996_2021.csv')
summary(player_data)
## X player_name team_abbreviation age
## Min. : 0 Length:12305 Length:12305 Min. :18.00
## 1st Qu.: 3076 Class :character Class :character 1st Qu.:24.00
## Median : 6152 Mode :character Mode :character Median :26.00
## Mean : 6152 Mean :27.08
## 3rd Qu.: 9228 3rd Qu.:30.00
## Max. :12304 Max. :44.00
## player_height player_weight college country
## Min. :160.0 Min. : 60.33 Length:12305 Length:12305
## 1st Qu.:193.0 1st Qu.: 90.72 Class :character Class :character
## Median :200.7 Median : 99.79 Mode :character Mode :character
## Mean :200.6 Mean :100.37
## 3rd Qu.:208.3 3rd Qu.:108.86
## Max. :231.1 Max. :163.29
## draft_year draft_round draft_number gp
## Length:12305 Length:12305 Length:12305 Min. : 1.00
## Class :character Class :character Class :character 1st Qu.:31.00
## Mode :character Mode :character Mode :character Median :57.00
## Mean :51.29
## 3rd Qu.:73.00
## Max. :85.00
## pts reb ast net_rating
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :-250.000
## 1st Qu.: 3.600 1st Qu.: 1.800 1st Qu.: 0.600 1st Qu.: -6.400
## Median : 6.700 Median : 3.000 Median : 1.200 Median : -1.300
## Mean : 8.173 Mean : 3.559 Mean : 1.814 Mean : -2.256
## 3rd Qu.:11.500 3rd Qu.: 4.700 3rd Qu.: 2.400 3rd Qu.: 3.200
## Max. :36.100 Max. :16.300 Max. :11.700 Max. : 300.000
## oreb_pct dreb_pct usg_pct ts_pct
## Min. :0.00000 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.02100 1st Qu.:0.096 1st Qu.:0.1490 1st Qu.:0.4800
## Median :0.04100 Median :0.131 Median :0.1810 Median :0.5240
## Mean :0.05447 Mean :0.141 Mean :0.1849 Mean :0.5111
## 3rd Qu.:0.08400 3rd Qu.:0.180 3rd Qu.:0.2170 3rd Qu.:0.5610
## Max. :1.00000 Max. :1.000 Max. :1.0000 Max. :1.5000
## ast_pct season
## Min. :0.0000 Length:12305
## 1st Qu.:0.0660 Class :character
## Median :0.1030 Mode :character
## Mean :0.1314
## 3rd Qu.:0.1780
## Max. :1.0000
Creating to subsets of data of same team in different season to perform tests.
#data frame of chicago bulls teams
set_CB <- subset(player_data, team_abbreviation == "CHI")
#data frame of season 1996-97
set_s1 <- subset(set_CB, season == "1996-97")
#data frame of season 2006-2007
set_s2 <- subset(set_CB, season == "2006-07")
Is there a significant difference in the average points scored per game between players of one team in different seasons
H0: The average points scored per game is the similar for both the seasons
HA: The average points scored per game has a significant difference between the seasons.
Alpha Level (Significance Level): 0.05. The standard alpha level is often set at 0.05, indicating a 5% chance of rejecting the null hypothesis when it is true. This is a common and widely accepted level in statistical testing.
Power Level: 0.80. A power level of 0.80 is commonly used, indicating an 80% chance of detecting a true effect if it exists. This balance between alpha and power is often considered a reasonable compromise in statistical testing.
Minimum Effect Size: 0.3. The effect size represents the practical significance of the result. Choosing a minimum effect size of 0.3 means you are interested in detecting a moderate effect. This value may be determined based on domain knowledge or previous research.
t_test_result <- t.test(set_s1$pts, set_s2$pts)
print(t_test_result)
##
## Welch Two Sample t-test
##
## data: set_s1$pts and set_s2$pts
## t = 0.10152, df = 26.93, p-value = 0.9199
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.196564 5.737517
## sample estimates:
## mean of x mean of y
## 8.313333 8.042857
Is there a significant difference in the average assists per game between players of one team in different seasons
H0: The assists per game is the similar for both the seasons
HA: The assists per game has a significant difference between the seasons.
Alpha Level (Significance Level): 0.05. The standard alpha level is often set at 0.05, indicating a 5% chance of rejecting the null hypothesis when it is true. This is a common and widely accepted level in statistical testing.
Power Level: 0.80. A power level of 0.80 is commonly used, indicating an 80% chance of detecting a true effect if it exists. This balance between alpha and power is often considered a reasonable compromise in statistical testing.
Minimum Effect Size: 0.3. The effect size represents the practical significance of the result. Choosing a minimum effect size of 0.3 means you are interested in detecting a moderate effect. This value may be determined based on domain knowledge or previous research.
t_test_results <- t.test(set_s1$ast, set_s2$ast)
print(t_test_results)
##
## Welch Two Sample t-test
##
## data: set_s1$ast and set_s2$ast
## t = 0.52052, df = 26.388, p-value = 0.607
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.9540067 1.6016257
## sample estimates:
## mean of x mean of y
## 2.166667 1.842857
Is there correlation between the points scored by the player and the assists made by the player.
H0: The assists made by the player helps player score more points.
HA: The assists made by the player doesn’t necessarily means more points scored.
Alpha Level (Significance Level): 0.05. The standard alpha level is often set at 0.05, indicating a 5% chance of rejecting the null hypothesis when it is true. This is a common and widely accepted level in statistical testing.
Power Level: 0.80. A power level of 0.80 is commonly used, indicating an 80% chance of detecting a true effect if it exists. This balance between alpha and power is often considered a reasonable compromise in statistical testing.
Minimum Effect Size: 0.3. The effect size represents the practical significance of the result. Choosing a minimum effect size of 0.3 means you are interested in detecting a moderate effect. This value may be determined based on domain knowledge or previous research.
correlation_test <- cor.test(set_s1$pts,set_s1$ast,method = 'pearson')
correlation_test_s2 <- cor.test(set_s2$pts,set_s2$ast,method = 'pearson')
print(correlation_test)
##
## Pearson's product-moment correlation
##
## data: set_s1$pts and set_s1$ast
## t = 5.0968, df = 13, p-value = 0.0002049
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5227137 0.9368499
## sample estimates:
## cor
## 0.8163777
print(correlation_test_s2)
##
## Pearson's product-moment correlation
##
## data: set_s2$pts and set_s2$ast
## t = 3.0117, df = 12, p-value = 0.01083
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1925575 0.8802540
## sample estimates:
## cor
## 0.6561047
Is there correlation between the points scored by the player and the assists made by the player.
H0: The Rebounds made by the player helps player score more points.
HA: The Rebounds made by the player doesn’t necessarily means more points scored.
Alpha Level (Significance Level): 0.05. The standard alpha level is often set at 0.05, indicating a 5% chance of rejecting the null hypothesis when it is true. This is a common and widely accepted level in statistical testing.
Power Level: 0.80. A power level of 0.80 is commonly used, indicating an 80% chance of detecting a true effect if it exists. This balance between alpha and power is often considered a reasonable compromise in statistical testing.
Minimum Effect Size: 0.3. The effect size represents the practical significance of the result. Choosing a minimum effect size of 0.3 means you are interested in detecting a moderate effect. This value may be determined based on domain knowledge or previous research.
correlation_test <- cor.test(set_s1$pts,set_s1$reb,method = 'pearson')
correlation_test_s2 <- cor.test(set_s2$pts,set_s2$reb,method = 'pearson')
print(correlation_test)
##
## Pearson's product-moment correlation
##
## data: set_s1$pts and set_s1$reb
## t = 1.1051, df = 13, p-value = 0.2891
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2579343 0.7001993
## sample estimates:
## cor
## 0.2930491
print(correlation_test_s2)
##
## Pearson's product-moment correlation
##
## data: set_s2$pts and set_s2$reb
## t = 1.3867, df = 12, p-value = 0.1907
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1979722 0.7536202
## sample estimates:
## cor
## 0.3716449
TEST 1
CT <- table(set_s1$pts, set_s1$ast)
fisher_test_results <- fisher.test(CT, simulate.p.value = FALSE)
p_value <- fisher_test_results$p.value
cat("p: ", p_value, "\n")
## p: 1
if (p_value < 0.05) {
print( "There is a significant association between Average points and Average assists")
} else {
print("There is not enough evidence to conclude that there is a significant association between Average points and Average assists")
}
## [1] "There is not enough evidence to conclude that there is a significant association between Average points and Average assists"
TEST 2
CT<- table(set_s1$pts, set_s1$reb)
fisher_test_results <- fisher.test(CT, simulate.p.value = FALSE)
p_value <- fisher_test_results$p.value
cat("p: ", p_value, "\n")
## p: 1
if (p_value < 0.05) {
print( "There is a significant association between Average points and Average rebounds")
} else {
print("There is not enough evidence to conclude that there is a significant association between Average points and Average rebounds")
}
## [1] "There is not enough evidence to conclude that there is a significant association between Average points and Average rebounds"
library(ggplot2)
# Create a scatter plot with a regression line
ggplot(set_s1, aes(x = ast, y = pts)) +
geom_point() +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "Relationship Between Assists and Points Per Game",
x = "Assists",
y = "Points Per Game"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data_summary <- set_s1 %>%
group_by(set_s1$player_name) %>%
summarise(TotalRebounds = sum(set_s1$reb), TotalPointsPerGame = sum(set_s1$pts))
# Create a bar chart
ggplot(data_summary, aes(x = set_s1$player_name, y = set_s1$pts, fill = set_s1$reb)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Total Points Per Game vs. Total Rebounds by Player",
x = "player_name",
y = "Total Points Per Game",
fill = "Total Rebounds"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better visibility