Data Description:

The lbj data set contains information about the 1703 games Lebron James has played in the NBA through the 2023/2024 season, including regular season and playoff games.

While there are 29 columns in the data set, we’ll be primarily interested in only a few of them:

  1. game_type: The type of game being played:
  1. age: His age at the time of the game

  2. team: The team James played for during the game (First team was Cleveland (CLE), second was Miami (MIA), and last was the Lakers (LAL))

  3. opp: The abbreviation of the opponent played against for the game

  4. result: If the team James played on won or lost the game

Question 1: One Categorical Variable

Part 1a: Teams played for

Start by changing the team column to a factor and have the order of the levels (groups) to be CLE, MIA, LAL. Once finished, use the levels() function to display the 3 groups in order.

lbj$team <- 
  factor(
    x = lbj$team,
    levels = c("CLE", "MIA", "LAL")
  )


levels(lbj$team)
## [1] "CLE" "MIA" "LAL"

Part 1b: Frequency Table for Teams played for

Create a frequency table for the teams Lebron has played for that has 2 columns: the team played for and the number of games played. Save it as teams_tab. Use the gt() function to display the frequency table (but don’t save the results of the gt() function.

teams_tab <- 
  lbj |> 
  count(
    team,
    name = "games"
  )


gt::gt(teams_tab)
team games
CLE 1001
MIA 381
LAL 321

Part 1c: Bar chart of counts for team

Create a bar chart displaying the three teams Lebron James has played for, with counts on the y-axis using the lbj data set. Change the theme to theme_classic() and have the x-axis label read as “Games Lebron James’ Played”

Bonus: Color the bars using the teams’ primary colors using hex codes: CLE = “#041E42” LAL = “#FDB927” MIA = “#98002E”

ggplot(
  data = lbj,
  mapping = aes(
    x = team
  )
) + 
  
  geom_bar(
    fill = c("#041E42", "#FDB927", "#98002E")
  ) + 
  
  labs(x = "Games Lebron James Played") +
  
  theme_classic() + 
  
  # Add this to the graph to have the bars sit on the x-axis
  scale_y_continuous(expand = c(0, 0, 0.05, 0))

What is the difference between the graphs in 1b vs 1c of displaying the density vs the counts?

Part 1d: Win and Loss Percentage

Create a relative frequency table for the result of each game and save it as wins_prop that has 3 columns: 1) result 2) count 3) proportion (rounded to 3 decimal places. Display the results with the gt() function, but don’t save the result of gt()

wins_prop <- 
  lbj |> 
  count(
    result,
    name = "count"  
  ) |> 
  mutate(
    proportion = round(count/sum(count), digits = 3)
  )

gt::gt(wins_prop)
result count proportion
Lost 597 0.351
Won 1106 0.649

What percentage of games has Lebron James won over the course of his career?

Part 1e: Bar chart with proportions

Create a bar chart of win and loss proportion with game result on the x-axis and proportion on the y-axis. Add relevant title and x-axis label, remove the label for the y-axis, and use theme_bw(). Color the bars with the hex code “#1D428A”

ggplot(
  data = wins_prop,
  mapping = aes(
    x = result,
    y = proportion
  )
) + 
  
  geom_col(
    fill = "#1D428A"
  ) + 
  
  labs(
    title = "Result of Lebron James Games",
    x = "Game Result",
    y = NULL
  ) + 
  
  theme_bw() + 
  
  # Keep this at the bottom and remove the # to have bars sit on the x-axis and
  # display the y-axis as a percentage
  scale_y_continuous(
    expand = c(0, 0, 0.05, 0),
    labels = scales::label_percent()
  )

Question 2: Win percentage by team

Part 2a: Table

Create a table to show James’ win and loss proportions (rounded to 3 decimal places) for the three different teams he has played for and save it as team_win_tab. It should have four columns: 1) team 2) result 3) count 4) proportion. Display the results using the gt() function, but don’t save the results of gt().

team_win_tab <- 
  lbj |> 
  count(
    team,
    result,
    name = "count"
  ) |> 
  mutate(
    .by = team,
    proportion = round(count/sum(count), digits = 3)
  )

# Displaying the results
gt::gt(team_win_tab)
team result count proportion
CLE Lost 358 0.358
CLE Won 643 0.642
MIA Lost 107 0.281
MIA Won 274 0.719
LAL Lost 132 0.411
LAL Won 189 0.589

Which team does Lebron have the highest win percentage?

Which team has Lebron have the lowest win percentage?

Part 2b: Side-by-Side bar chart for counts

Using the lbj data set, create a side-by-side bar chart that displays the game result for each team with the counts on the y-axis. Provide a decent theme, title, and axis labels.

ggplot(
  data = lbj,
  mapping = aes(
    x = team,
    fill = result
  )
) + 
  
  geom_bar(
    position = "dodge2"
  ) +  
  
  labs(
    x = "Teams played for",
    title = "Lebron James Wins and Losses by Team"
  ) + 
  
  theme_bw() + 

  scale_y_continuous(expand = c(0, 0, 0.05, 0))

Part 2c: Win and Loss Percentage by Team

Create a graph like the graph in 2b, but have the percentage appear on the y-axis and remove the y-axis label.

ggplot(
  data = team_win_tab,
  mapping = aes(
    x = team,
    y = proportion,
    fill = result
  )
) + 
  
  geom_col() + 
  
  labs(
    title = "Lebron James Win and Loss Percentage by Team",
    x = "Team",
    y = NULL
  ) + 
  
  theme_bw() + 
  
  scale_y_continuous(
    expand = c(0, 0, 0.05, 0),
    labels = scales::label_percent()
  )

Does there appear to be a large difference in how often James wins for the three teams he has played for?