Market Size and NBA Success Analysis

---
title: "Market Size and NBA Success Analysis"
author: "David Sung"
output: 
  flexdashboard::flex_dashboard:
    theme: journal
    storyboard: true
    social: menu
    source: embed
---

```{r setup, include=FALSE}
library(sqldf)
library(plotly)
library(DT)

setwd("C:\\Users\\David\\Desktop\\Market")

champlist <- read.csv('champlist.csv')

champlist$X <- NULL

champlist$drafttot <- champlist$d1 + champlist$d2 + champlist$d3 + 
                      champlist$d4 + champlist$d5 + champlist$d6 +
                      champlist$d7

champlist$drafttot_r <- champlist$d1_r + champlist$d2_r + champlist$d3_r + 
                      champlist$d4_r + champlist$d5_r + champlist$d6_r +
                      champlist$d7_r

champlist$drafttot3 <- champlist$d1 + champlist$d2 + champlist$d3

champlist$drafttot3_r <- champlist$d1_r + champlist$d2_r + champlist$d3_r

champlist$champion <- as.character(champlist$champion)
champlist$runnerup <- as.character(champlist$runnerup)

media_df <- read.csv('media.csv')

q = '
SELECT 
  year, champion, codec, runnerup, coder, p1, p2, p3, p4, p5, p6, p7, p1_r, p2_r, p3_r, p4_r, p5_r, p6_r, p7_r,
  d1, d2, d3, d4, d5, d6, d7, d1_r, d2_r, d3_r, d4_r, d5_r, d6_r, d7_r, drafttot3, drafttot3_r,
  drafttot, drafttot_r, m1.media as media, m2.media as media_r, m1.rank as rank, m2.rank as rank_r,
  m1.market as market, m2.market as market_r, m1.class as class, m2.class as class_r, m1.perbin as bin, m2.perbin as bin_r,
  m1.rankbin as rbin, m2.rankbin as rbin_r
FROM champlist
LEFT JOIN media_df m1 on m1.code = champlist.codec
LEFT JOIN media_df m2 on m2.code = champlist.coder  
'

champlist <- sqldf(q)

q = '
SELECT 
  year, champion, codec, p1, p2, p3, p4, p5, p6, p7, d1, d2, d3, d4, d5, d6, d7, drafttot, drafttot3,
  market, media, rank, class, bin, rbin
FROM champlist
UNION ALL
SELECT
  year, runnerup, coder, p1_r, p2_r, p3_r, p4_r, p5_r, p6_r, p7_r, d1_r, d2_r, d3_r, d4_r, d5_r, d6_r, d7_r, drafttot_r, 
  drafttot3_r, market_r, media_r, rank_r, class_r, bin_r, rbin_r
FROM champlist
'

plot <- sqldf(q)

plot$am <- plot$year - 1980


mean(plot$rank, na.rm = TRUE)

q = '
SELECT
  rbin,
  COUNT(class) as count,
  AVG(drafttot) aS avg
FROM plot
GROUP BY rbin
'
count <- sqldf(q)
count$percent <- count$count / 74

table_view <- data.frame(Year = plot$year, Team = plot$champion, 'Player 1'= plot$p1, 'Player 2'= plot$p2,
                         'Player 3' = plot$p3, 'Player 4' = plot$p4, 'Player 5'= plot$p5, 'Player 6'= plot$p6,
                         'Player 7' = plot$p7, 'Number Drafted'= plot$drafttot)


threedim <- plot_ly(data = plot, x = ~am, y = ~media, z = ~drafttot,
                  text = ~paste0("Team: ", champion, '
Player 1: ', p1, '
Player 2: ', p2, '
Player 3: ', p3),
                  marker = list(size = 15, color = ~drafttot, showscale = TRUE)) %>%
                  layout(scene = list(yaxis = list(title = 'Media Population'),
                         zaxis = list(title = 'Players Drafted'),
                         xaxis = list(title = 'Year Reference')))
twodim <- plot_ly(data = plot, x = ~media, y = ~drafttot, 
                  text = ~paste0("Team: ", champion, '
Player 1: ', p1, '
Player 2: ', p2, '
Player 3: ', p3),
                  marker = list(size = 15, color = ~drafttot, showscale = TRUE)) %>%
                  layout(xaxis = list(title = 'Media Population'),
                         yaxis = list(title = 'Players Drafted'))

winbin_plt <- plot_ly(data = count, x = ~rbin, y = ~percent, 
                      type = 'scatter', mode = 'lines', 
                      name = 'Proportion of Championships',
                      line = list(shape = "spline")) %>%
  add_lines(y = ~avg, name = "Average Drafted", line = list(shape = "spline"),
            yaxis = 'y2') %>%
  layout(xaxis = list(title = 'Media Market Rank Bin', showgrid = FALSE),
         yaxis = list(title = 'Proportion of Championships',
                      range = c(0,.35),
                      showgrid = FALSE),
         yaxis2 = list(title = 'Average Drafted',
           overlaying = "y",
           side = "right",
           range = c(0,5),
           showgrid = FALSE)
         )

twodim3 <- plot_ly(data = plot, x = ~media, y = ~drafttot3, 
                  text = ~paste0("Team: ", champion, '
Player 1: ', p1, '
Player 2: ', p2, '
Player 3: ', p3),
                  marker = list(size = 15, color = ~drafttot, showscale = TRUE)) %>%
  layout(xaxis = list(title = 'Media Population'),
         yaxis = list(title = 'Players Drafted'))








```

### Two dimensional graph of the number of players drafted vs. the media population.

```{r}
twodim
```

***

For this analysis, I used the media population in thousands as the metric to measure 'market size'. Larger media populations denotes a larger market size. The largest market team is the New York Knicks, followed by the Brooklyn Nets, Los Angeles Lakers and Clippers, and then Chicago Bulls to round out the top 5. 

The only teams I included in this study were those that won a conference championship starting in 1980. There was not complete data for the years before mainly due to the NBA/ABA merger in 1976.

For the teams included in the study, I then gathered the players with the top 7 winshares of that team. I only included 7 because that seemed to be the minimium amount of players in a coach's rotation. From there, I counted the players drafted by the team they won with. This means that a player can change teams from their original drafted team and come back at a later time and count as 'drafted'. An example would be LeBron James winning in Cleveland after playing with the Miami Heat. 

The first plot shown is the number of players drafted vs. the media population in thousands. When running a linear regression through the data, the slope is close to 0 with an R^2 value of 0.0001. This shows that there is no correlation between the number of players drafted and the market size of the team.


### Three dimensional graph of the number of players drafted in the top 7 vs. the media population vs. the year

```{r}
threedim
```

***

This plot is similar to the the two-dimensional except I added another axis (year) to see the duplicates since one team may have won multiple conference championships.


### Two dimensional graph of the number of players drafted in the top 3 vs. the media population 

```{r}
twodim3
```

***

The earlier results of no correlation surprised me. I then thought of the common saying that you need only 3 superstars to win. Therefore, extending the reach of players to 7 may be too far. I shortened it to just checking the draft status of the top 3 win share contributers and plotted it again.

This did show a negative correlation this time as what was expected. However, the R^2 value is just 0.1161. This however can be due to the discrete range and limited x-values. This can still be viewed as significant and provide grounds for the statement that small market teams best chance of winning is tanking. 


### Dual plot of the number of championships and average amount of players drafted vs. the media market bin

```{r}
winbin_plt
```

***

This next plot shows the proportion of championships and average number of players drafted against the media market. I binned the media markets into 7 equal sized bin, with 1 being the smallest market size and 7 being the highest. The graph for the proportion is clearly increasing while the average drafted shows a faint normal distribution shape with skewness to the right. Again, although market size and drafted correlation is debatable, it is a fact that large market teams have dominated the conference championship arena. 

This can be for a number of reasons. One being the impact of one player can skew the results with the short sample size of 26 years. For example, Michael Jordan can account for 6 championships and LeBron, 7 (conference championships). 


### Table of data

```{r}
datatable(table_view)
```

***

This table shows most of the data from which the plots were made. The data was obtained from www.basketballreference.com. In conclusion, there is weak evidence to say that small market teams must tank and rely on draft picks to win a championship as they don't have the lure big market teams do to attract big signings. However, I believe that the history of the modern NBA is too short to make any definitive conclusion.

If I were to do this more in depth, I would classify the non-drafted players as signed or traded. This will give a better indication of the team composition.