Question 1

I first loaded in the data for Kobe Bryant and took a look at the data to get an idea of what the first ten rows look like.

load("~/Stat 363/Midterm_Project/kobe.RData")
kobe [1:10,]

Since we are focused on Kobe Bryant’s streak, I first displayed the number of shots that he made.

kobe$basket [1:10]
##  [1] "H" "M" "M" "H" "H" "M" "M" "M" "M" "H"

Looking at the above data, you are able to see that he has the following streak: 1, 0, 2, 0, 0, 0, 1. This is due a streak ends when he misses a shot, which is denoted by a “M”. I am going to displayed his shot record for all 133 rows.

kobe$basket
##   [1] "H" "M" "M" "H" "H" "M" "M" "M" "M" "H" "H" "H" "M" "H" "H" "M" "M" "H"
##  [19] "H" "H" "M" "M" "H" "M" "H" "H" "H" "M" "M" "M" "M" "M" "M" "H" "M" "H"
##  [37] "M" "M" "H" "H" "H" "H" "M" "H" "M" "M" "H" "M" "M" "H" "M" "M" "H" "M"
##  [55] "H" "H" "M" "M" "H" "M" "H" "H" "M" "H" "M" "M" "M" "H" "M" "M" "M" "M"
##  [73] "H" "M" "H" "M" "M" "H" "M" "M" "H" "H" "M" "M" "M" "M" "H" "H" "H" "M"
##  [91] "M" "H" "M" "M" "H" "M" "H" "H" "M" "H" "M" "M" "H" "M" "M" "M" "H" "M"
## [109] "H" "H" "H" "M" "H" "H" "H" "M" "H" "M" "H" "M" "M" "M" "M" "M" "M" "H"
## [127] "M" "H" "M" "M" "M" "M" "H"

The first thing that we have to do is count the number of streaks that Kobe has in the data. I did this by first creating a function that analyzes the data to get a number of streaks that he has. This is stored in a vector which is then passed through another function that creates a matrix and gets the number of each streak.

kobe_raw_shot = kobe$basket
get_streak = function(shot_record) {
  streak = c()
  i = 1
  num = 0
  while (i <= 133) {
    if (isTRUE(shot_record[i] == "M"))
    {
      streak = append(streak, num)
      num = 0
    }
    if (isTRUE(shot_record[i] == "H")) {
      num = num + 1
      if (i == 133){
        streak = append(streak, num)
      }
    }
    i = i + 1
  }
  return(streak)
}

## Gets the lengths of each streak
streak_length = function(streak) {
  i = 0
  j = 1
  num = 0
  max_streak = max(streak)
  length_streak = length(streak)
  streak_matrix = matrix(, nrow = max_streak + 1, ncol = 2)
  colnames(streak_matrix) = c("Streak Length", "Number of Streaks")
  while (i <= max_streak) {
    while (j <= length_streak) {
      if (i == streak[j]) {
        num = num + 1
      }
      j = j + 1
    }
    streak_matrix[i + 1,] = c(i, num)
    num = 0
    j = 1
    i = i + 1
  }
  return(streak_matrix)
}

kobe_streak = get_streak(kobe_raw_shot)
kobe_matrix = streak_length(kobe_streak)

print(kobe_streak)
##  [1] 1 0 2 0 0 0 3 2 0 3 0 1 3 0 0 0 0 0 1 1 0 4 1 0 1 0 1 0 1 2 0 1 2 1 0 0 1 0
## [39] 0 0 1 1 0 1 0 2 0 0 0 3 0 1 0 1 2 1 0 1 0 0 1 3 3 1 1 0 0 0 0 0 1 1 0 0 0 1
print(kobe_matrix)
##      Streak Length Number of Streaks
## [1,]             0                39
## [2,]             1                24
## [3,]             2                 6
## [4,]             3                 6
## [5,]             4                 1

If you look at the matrix, you can see that Kobe has a max streak of 4 and a minimum of 0, where 0 represents no streak. I then used the function summary and quantile to get an idea of where is shooting streak lies.

summary(kobe_streak)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.7632  1.0000  4.0000
quantile(kobe_streak)
##   0%  25%  50%  75% 100% 
##    0    0    0    1    4

Looking at the data above, you can see that his minimum streak is 0, meaning no streak and a max streak of 4. He has an average streak of 0.7632 meaning that on average, his shooting streak lies below 1. This can indicate that he has a higher probability of getting a streak of 1 or no streak than getting a streak of greater than 1. To better visualize this data, I created a bar plot to demonstrate this trend.

kobe_plot = barplot(height = kobe_matrix[,2], beside = TRUE, 
                    names.arg = kobe_matrix[,1],
                    ylim = c(0, 40), main = "Kobe Shooting Streak", 
                    xlab = "Shooting Streak", 
                    ylab = "Number per Shooting Streak", col = "orange")

As stated earlier, the mean is 0.7632 which makes sense most of the data is centered between 0 and 1. You are also able to see that the data is not evenly distributed and can be classified as right skewed.


Question 2

 The second question is focusing on if Kobe Bryant actually have hot hands or not. First to look at Kobe’s probability of making a shot, and then then we are going to see if the probability increases for his second shot.

## number of shots that Kobe made , regardless of streak
num_of_H = function(shot_record){
  i = 1
  num = 0
  while (i <= length(kobe$basket)){
    if (shot_record[i] == "H"){
      num = num + 1
    }
    i = i + 1
  }
  return(num)
}
## Calculates the total percent of shots that counts as a basket 
total_percent_shot_made = function(total_shots){
  total_h = num_of_H(total_shots)
  percent_of_shots = (total_h / length(total_shots)) * 100
  return(percent_of_shots)
}
## this calculates the percent of shots that Kobe made that resulted in a basket
kobe_percent_of_shots = total_percent_shot_made(kobe_raw_shot)
kobe_percent_of_shots
## [1] 43.60902

 So far, we figured out that the total percentage of shots that Kobe made and resulted in a basket is 43.61%. I was then curious to see what was the probability of Kobe getting a certain streak. I calculated each probability and then stored that information in the matrix that I used earlier for question 1.

## get the probability of each streak length
get_probability = function(shooting_matrix) {
  N = sum(shooting_matrix[, 2])
  i = 1
  num = 0
  p = c()
  while (i <= nrow(shooting_matrix)) {
    num = shooting_matrix[i, 2] / N
    p = append(p, num)
    num = 0
    i = i + 1
  }
  shooting_matrix = cbind(shooting_matrix, Probability = p)
  row.names(shooting_matrix) = c(1:nrow(shooting_matrix))
  return(shooting_matrix)
}

kobe_prob = get_probability(kobe_matrix)
kobe_prob
##   Streak Length Number of Streaks Probability
## 1             0                39  0.51315789
## 2             1                24  0.31578947
## 3             2                 6  0.07894737
## 4             3                 6  0.07894737
## 5             4                 1  0.01315789

 Looking at the above matrix, you can see that Kobe has a higher probability of getting either a streak of 0 or a streak of 1, which is consistent with the bar plot from question 1. To further see if getting a higher streak increases by getting a streak of 1, we can use conditional probability to see what is the probability of getting a second shot that goes in given that the first shot is a hit. Using P(shot 2 = H | shot 1 = H) = (#shot2 = H) / total num of shot 2. I calculated shot 2 = H by adding up the number of streaks with a streak length of 2, 3, and 4. Then for the total number of shot 2 by adding up the number of streaks with a streak length of 1, 2, 3, and 4 since the streaks are interpreted as HM, HHM, HHHM, and HHHHM, respectively, where the second shot is either a H or a M.

## Probability that the second shot is a hit
## P(shot 2 = H | shot 1 = H) = #shot 2 = H / #shot 2
## for the top, the number of shot 2 = H can be calculated by getting the sum
## of the streak 2, 3, and 4. For the bottom, you have to get the sum of streak
## 1, 2, 3, 4 since it is looking at HM, HHM, HHHM, HHHHM, where streak 0 is 
## just M

shot2_H = sum(kobe_matrix[3:5,2])
totat_shot2 = sum(kobe_matrix[2:5, 2])
prob_shot2_H = ((shot2_H / totat_shot2) * 100)
prob_shot2_H
## [1] 35.13514

The probability of a second shot being a H given that the first shot is a H is 35.14 %. Kobe total percentage of shots that resulted in a basket is 43.61 %, meaning that the probability of Kobe getting a longer streak than 1 is not very likely. Using the definition of hot hands, where the probability of getting a second shot in is influence by the fact that you got a first show in goes up, Kobe does not have hot hands. So far, looking at his total percent shots that goes in, and the fact that his probability of getting a second shot in goes down, the data goes against the trend that he has hot hands. If he were to have hot hands, then the probability of getting a second shot should have gone up, but again the data does not support this trend. I think it would be better to compare Kobe shots to other basketball players or using a simulation in order to get a better idea of what a trend for hot hands would look like since we are defining what the term “hot hands” is. Using a simulation will allow us to look at an independent shooter and if Kobe data follows the trend then it will allow us to conclude that each shot is not dependent on the previous shot.

Simulation

## test for 1 player similar to Kobe
test_1_player = function(kobe_total_shot){
  test_shot = c()
  i = 1
  if (kobe_total_shot > 1){
    kobe_shot = kobe_total_shot / 100
  } else{
    kobe_shot = kobe_total_shot
  }
  outcomes = c("H", "M")
  while (i <= 133) {
    x = sample(outcomes, size = 1, replace = TRUE,
               prob = c(kobe_shot, 1 - kobe_shot))
    test_shot = append(test_shot, x)
    i = i + 1
  }
  return(test_shot)
}

test_shot = test_1_player(kobe_percent_of_shots)
test_shot
##   [1] "M" "H" "M" "M" "M" "M" "H" "M" "M" "M" "H" "M" "H" "M" "H" "M" "M" "M"
##  [19] "H" "M" "M" "H" "M" "M" "H" "M" "M" "M" "H" "M" "H" "M" "M" "M" "M" "H"
##  [37] "H" "M" "M" "M" "H" "M" "M" "M" "M" "H" "M" "M" "M" "M" "M" "M" "H" "M"
##  [55] "M" "M" "M" "M" "M" "H" "M" "H" "M" "M" "M" "M" "M" "M" "M" "M" "H" "H"
##  [73] "H" "M" "H" "M" "H" "H" "M" "H" "M" "H" "H" "H" "M" "M" "M" "M" "H" "M"
##  [91] "H" "M" "M" "M" "H" "H" "M" "H" "M" "M" "H" "M" "H" "H" "M" "M" "H" "M"
## [109] "M" "H" "H" "M" "M" "M" "M" "H" "M" "H" "H" "H" "M" "M" "H" "H" "H" "H"
## [127] "M" "M" "M" "M" "M" "M" "M"
test_streak = get_streak(test_shot)
test_streak
##  [1] 0 1 0 0 0 1 0 0 1 1 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 2 0 0 1 0 0 0 1 0 0 0 0 0
## [39] 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 3 1 2 1 3 0 0 0 1 1 0 0 2 1 0 1 2 0 1 0 2 0 0
## [77] 0 1 3 0 4 0 0 0 0 0 0
test_matrix = streak_length(test_streak)
test_matrix
##      Streak Length Number of Streaks
## [1,]             0                55
## [2,]             1                23
## [3,]             2                 5
## [4,]             3                 3
## [5,]             4                 1
test_prob_matrix = get_probability(test_matrix)
test_prob_matrix
##   Streak Length Number of Streaks Probability
## 1             0                55  0.63218391
## 2             1                23  0.26436782
## 3             2                 5  0.05747126
## 4             3                 3  0.03448276
## 5             4                 1  0.01149425
test_total_shots = total_percent_shot_made(test_shot)
test_total_shots
## [1] 34.58647
quantile(test_streak)
##   0%  25%  50%  75% 100% 
##    0    0    0    1    4
summary(test_streak)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.5287  1.0000  4.0000

The above function does a simulation for one player with a total percent of shots made the same as Kobe. It stores the data in a matrix and then I call on the function to get the total percent of shots that are made. For example, in the first simulation, the total percent of shots that were made was 48.12 % and the mean is 0.9275. Looking at the quantile about 50% of the shots that were made, either resulted in a miss or a 1 streak, which is similar to Kobe’s streak. To get a better idea if Kobe’s shots are independent of one another, we should simulate with a large sample size.

sim_generation = function(N, m, perc){
  temp_matrix = matrix(, nrow = N, ncol = m)
  i = 1
  while (i <= N) {
    player_shot = test_1_player(perc)
    temp_matrix[i,] = player_shot
    i = i + 1
  }
  temp_data_frame = as.data.frame(temp_matrix)
  return(temp_data_frame)
}
test_sim = sim_generation(500, 133, .45)
test_sim[1:5,]

This first function takes in the total number of players, the total number of shots made, and the percent of total shots made. It then creates a matrix that will temporary stores the data from the simulation. The main function’s job is to call on the test 1 player function where that function returns a vector with either H or M and does then stores that data into the temp. matrix. It does this for 500 times and then converts the matrix into a data frame and returns the data frame back to the user.

all_streak = function(sim_matrix){
  i = 1
  j = 1
  sim_streak = c()
  while (i <= nrow(sim_matrix)){
    sim_streak = get_streak(sim_matrix[i,])
    temp_matrix = streak_length(sim_streak)
    if (i == 1){
      temp_df = data.frame(num_streak = temp_matrix[,1],
                           total_streak = temp_matrix[,2])
    } else {
     while (j <= nrow(temp_matrix)){
       temp_df = rbind(temp_df, temp_matrix[j,])
       j = j + 1
     }
      j = 0
      sim_streak = c()
    }
    i = i + 1
  }
  return(temp_df)
}

total_streak_df = all_streak(test_sim)
total_streak_df[1:10,]

This function then takes in the matrix the data frame that was created from the sim generation function and then analyzes the number of streaks for each player and stores that in another data frame. It was easier to first store each player’s streaks into a data frame and then I created another function to then combine each streak total so it will create a new data frame that has the grand total for each streak and then I will be able to create a graph displaying the probability of each streak, just like what I did for Kobe so I can do a side by side comparison.

grand_streak = function(all_streak_df){
  min_streak = min(all_streak_df[,1])
  max_streak = max(all_streak_df[,1])
  comp_streak = c(min_streak:max_streak)
  max_row = nrow(all_streak_df)
  i = 1
  j = 0
  num = 0
  while (j <= max_streak) {
    while (i <= max_row){
      if (all_streak_df[i, 1] == j){
        num = num + all_streak_df[i, 2]
      }
      i = i + 1
    }
    i = 1
    if (j == 0){
      temp_df = data.frame(num_streak = 0, total_streak = num)
    } else{
      temp_df = rbind(temp_df, c(j, num))
    }
    j = j + 1
    num = 0
  }
  return(temp_df)
}

grand_total_df = grand_streak(total_streak_df)
grand_total_df = get_probability(grand_total_df)
grand_total_df

 Again, here is the final matrix for all 500 players so we can get a more precise idea of the shooting percentage and to see if Kobe matches this trend or not. To better visualize the data, I created the same style of bar plot for the simulated 500 players as I did for Kobe.

sim_plot = barplot(height = grand_total_df[,2], beside = TRUE, 
                    names.arg = grand_total_df[,1],
                    main = "Sim Shooting Streak", 
                    xlab = "Shooting Streak", 
                    ylab = "Number per Shooting Streak", col = "purple")


Conclusion

par(mfrow = c(1,2), mai = c(1, .5, .5, .1))
kobe_plot = barplot(height = kobe_matrix[,2], beside = TRUE, 
                    names.arg = kobe_matrix[,1],
                    ylim = c(0, 40), main = "Kobe Shooting Streak", 
                    xlab = "Shooting Streak", 
                    ylab = "Number per Shooting Streak", col = "orange")
sim_plot = barplot(height = grand_total_df[,2], beside = TRUE, 
                   names.arg = grand_total_df[,1],
                   ylim = c(0, max(grand_total_df[,2])), 
                   main = "Sim Shooting Streak", 
                   xlab = "Shooting Streak", 
                   ylab = "Number per Shooting Streak", col = "purple")


Looking at the graph above, you are able to see that Kobe and the simulation has the same general trend where they have a high probability of having a 0 streak followed by having a high 1 streak. Then the probability of having a longer streak significantly decreases which is the exact opposite of the probability for hot hands where getting a first shot in will increase the probability of getting a second shot in. The simulation also supports the case that making a basket or not is independent of one another and that making a second shot in is not dependent on whether you made your first shot in or not. This supports the claim that hot hands is not what happening here either with Kobe or in the simulation that has the same shooting average as Kobe. This can be translate into Kobe not having hot hands.