Week 6 Data Dive - Confidence Intervals

For this week’s data dive I will exploring confidence intervals. This will entail completing the following tasks:

Forming Two Pairs of Columns

My first pair contains the following:

Explanatory variable: FGA_per_100

Response Variable: PTS_per_100

Created Column: Assist_to_Turnover_Ratio (AST_per_100/TOV_per_100

I chose this because shot attempts are connected to scoring so there should be a strong connection between more shot attempts leading to more scoring. Additionally, I chose to use the Assist and Turnover columns since the Assist to Turnover Ratio is a common advanced metric used to assess the a player or team’s offensive ability.

My second pair contains the following:

Original Variable: X3pa_per_100 (3pt. attempts)

Created Column: Three_Point_Rate (X3pa_per_100 / FGA_per_100)

I chose this because the 3pt. rate for a given team can tell you a lot about the types of shots they take and give you a look into their overall shot diet. Additionally, we can see how a higher 3pt. rate may affect a team’s 3pt Percentage.

# Assist to Turnover Ratio
nba$Assist_to_Turnover_Ratio <- nba$AST_per_100 / nba$TOV_per_100

# 3-Point Attempt Rate
nba$Three_Point_Rate <- nba$X3pa_per_100 / nba$FGA_per_100

head(nba)
##    Season League                Team Abbreviation Playoffs Games Minutes_Played
## 1 2023-24    NBA       Atlanta Hawks          ATL    FALSE    82          19855
## 2 2023-24    NBA      Boston Celtics          BOS     TRUE    82          19830
## 3 2023-24    NBA       Brooklyn Nets          BRK    FALSE    82          19805
## 4 2023-24    NBA       Chicago Bulls          CHI    FALSE    82          19980
## 5 2023-24    NBA   Charlotte Hornets          CHO    FALSE    82          19730
## 6 2023-24    NBA Cleveland Cavaliers          CLE    FALSE    82          19805
##   FG_per_100 FGA_per_100 FG_Percent X3p_per_100 X3pa_per_100 X3p_Percent
## 1       42.6        91.6      0.465        13.6         37.3       0.364
## 2       44.9        92.1      0.487        16.8         43.4       0.388
## 3       41.7        91.4      0.456        13.6         37.6       0.362
## 4       43.0        91.6      0.470        11.7         32.8       0.358
## 5       41.0        89.2      0.460        12.4         34.9       0.355
## 6       42.7        89.2      0.479        13.8         37.6       0.367
##   X2p_per_100 X2pa_per_100 X2p_Percent FT_per_100 FTA_per_100 FT_Percent
## 1        29.0         54.3       0.535       18.4        23.0      0.797
## 2        28.0         48.8       0.575       16.6        20.6      0.807
## 3        28.1         53.7       0.522       16.2        21.4      0.756
## 4        31.3         58.7       0.532       17.1        21.6      0.791
## 5        28.7         54.3       0.528       14.9        18.9      0.786
## 6        28.9         51.5       0.561       15.9        20.8      0.765
##   ORB_per_100 DRB_per_100 TRB_per_100 AST_per_100 STL_per_100 BLK_per_100
## 1        12.4        31.9        44.2        26.3         7.4         4.5
## 2        10.9        36.4        47.3        27.5         6.9         6.7
## 3        11.7        33.4        45.2        26.3         7.0         5.3
## 4        11.4        33.4        44.8        25.6         8.0         4.9
## 5         9.6        31.7        41.3        25.4         7.0         4.6
## 6        10.1        34.2        44.3        28.7         7.5         4.7
##   TOV_per_100 PF_per_100 PTS_per_100 Assist_to_Turnover_Ratio Three_Point_Rate
## 1        13.4       18.4       117.2                 1.962687        0.4072052
## 2        12.2       16.5       123.2                 2.254098        0.4712269
## 3        13.5       19.0       113.2                 1.948148        0.4113786
## 4        12.5       19.2       114.9                 2.048000        0.3580786
## 5        14.1       18.4       109.3                 1.801418        0.3912556
## 6        13.9       17.9       115.2                 2.064748        0.4215247

Insights, Significance, and Questions

There aren’t really any insights to discuss until we get to the visualizations. All I will say is that I anticipate seeing positive associations for each of the pairs of columns. These pairs should provide some significance in how important an assist to turnover ratio is and how a team’s 3 point rate affects their 3pt percentage. I would be most curious to see if a team that has a higher 3pt. rate would also have a higher 3pt percentage or if they will just shoot more 3s.

Visualizations

First Visualization: Illustrates how more shot attempts often leads to a higher scoring output. There isn’t a very strong positive correlation, but there certainly is still a positive association between shooting more and scoring more points. I would have thought that there would be a stronger positive association between putting up more shots and scoring more points. Perhaps this means that putting up shots isn’t nearly as important as having a higher field goal percentage. I will also add that I don’t notice any obvious outliers in this visualization.

plot(nba$FGA_per_100, nba$PTS_per_100,
     main = "Shot Attempts vs Scoring Output",
     xlab = "Field Goal Attempts per 100",
     ylab = "Points per 100",
     pch = 19)

abline(lm(nba$PTS_per_100 ~ nba$FGA_per_100), col = "blue", lwd = 2)

Second Visualization: Illustrates connects with the first visualization by taking the result of the response variable of Points per 100 possessions and plotting it with the created column of the Assist to Turnover Ratio. In the visual we can see a fairly strong and positive association between a higher Assist to Turnover Ratio leading to scoring more points. This would make sense as it would essentially mean that teams that are more successful with their passing score more points. I will once again add that I don’t notice any obvious outliers.

plot(nba$Assist_to_Turnover_Ratio, nba$PTS_per_100,
     main = "Assist-to-Turnover Ratio vs Scoring",
     xlab = "Assist-to-Turnover Ratio",
     ylab = "Points per 100",
     pch = 19)

abline(lm(nba$PTS_per_100 ~ nba$Assist_to_Turnover_Ratio),
       col = "red", lwd = 2)

Third Visualization: This visualization moves on from the first pair and covers the second pair of 3pt. rate and 3pt. percentage. The visual illustrates a slightly positive association between taking more 3s leading to having a higher 3pt. percentage. However, I will point out that I believe the history of teams that didn’t shoot 3s back when the 3 pointer was first being introduced is skewing this analysis some. If these older teams were removed from the visual it would likely be less positive and would mean that typically there isn’t much of a reason to believe that shooting more 3s leads to a higher 3pt. percentage.

plot(nba$Three_Point_Rate, nba$X3p_Percent,
     main = "3-Point Rate vs 3-Point Percentage",
     xlab = "3-Point Attempt Rate",
     ylab = "3-Point Percentage",
     pch = 19)

abline(lm(nba$X3p_Percent ~ nba$Three_Point_Rate),
       col = "darkgreen", lwd = 2)

Insights, Significance, and Questions

Insights that I gather from these visualizations are that a higher Assist to Turnover Ratio does lead to scoring more points and that outliers can have a big impact on the analysis that we see. I would say the impact of outliers is very significant for not only the analysis but subsequently the conclusions that we draw. My question would be: How much would the third visualization be impacted if we removed any team from before 1990? How much if we remove every team before 2000?

Correlation Coefficients

The correlation for the first visualization of field goal attempts leading to more scoring is 0.1705605. This makes sense as it relates to what I said in the section above. There is some positive correlation, but it is not very significant.

The correlation for the second visualization of assist to turnover ratio and scoring is 0.7457319. This also makes sense when looking at the visual. There is certainly a strong correlation between having a higher assist to turnover ratio and scoring more points.

The correlation for the third visualization of 3pt. attempt rate and 3pt. percentage is 0.6569712. This does make sense based on the visual, but like I said before I would take this result with a grain of salt. I believe that the older teams that didn’t shoot many 3s are skewing the results of this visual.

# Correlation 1
cor_FGA_PTS <- cor(nba$FGA_per_100, nba$PTS_per_100, use = "complete.obs")
cor_FGA_PTS
## [1] 0.1705605
# Correlation 2
cor_ASTTOV_PTS <- cor(nba$Assist_to_Turnover_Ratio,
                      nba$PTS_per_100,
                      use = "complete.obs")
cor_ASTTOV_PTS
## [1] 0.7457319
# Correlation 3
cor_3Rate_3Pct <- cor(nba$Three_Point_Rate,
                      nba$X3p_Percent,
                      use = "complete.obs")
cor_3Rate_3Pct
## [1] 0.6569712

Insights, Significance, and Questions

The insights I gather from this are that there is a strong correlation between having a high Assist to Turnover Ratio and scoring more. Additionally, I noticed that there is a correlation between having a higher 3pt. rate and making more 3s. However, like I have said I believe this is skewed by the older NBA teams in the data. This is significant in showing that there is value in the analysis, but there can also be outliers that skew data and give you results that are less meaningful. This would lead me to ask: When does the correlation between shooting more 3s and having a higher 3pt. percentage stop being as strong?

Confidence Interval

The first confidence interval is for the response variable of PTS_per_100 (scoring). We are 95% confident that the true mean points per 100 possessions falls between 106.3453 and 106.8398. The sample estimates that the mean is 106.5925.

The second confidence interval is for the response variable of X3p_Percent (3pt. Percentage). We are 95% confident that the true mean 3pt. Percentage falls between 0.3317468 and 0.3368643. The sample estimates that the mean is 0.3343055. (or about 33.43%)

t.test(nba$PTS_per_100, conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  nba$PTS_per_100
## t = 845.67, df = 1401, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  106.3453 106.8398
## sample estimates:
## mean of x 
##  106.5925
t.test(nba$X3p_Percent, conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  nba$X3p_Percent
## t = 256.32, df = 1282, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.3317468 0.3368643
## sample estimates:
## mean of x 
## 0.3343055

Insights, Significance, and Questions

The insights I gather from this are that the window for the true mean of both scoring and 3pt. percentage are both fairly narrow. This would mean that there isn’t a great deal of variance in the data for either variable. The confidence intervals are significant in this case specifically as they give us a look into the population and not just the dataset. I would be inclined to tie this all back to being a successful team. So I would be lead to ask: What is the difference in Assist to Turnover Ratio or 3pt. Percentage for Playoff teams versus Non-Playoff teams?