The lbj data set contains information about the 1703 games Lebron James has played in the NBA through the 2023/2024 season, including regular season and playoff games.
While there are 29 columns in the data set, we’ll be primarily interested in only a few of them:
age: His age at the time of the game
team: The team James played for during the game (First team was Cleveland (CLE), second was Miami (MIA), and last was the Lakers (LAL))
opp: The abbreviation of the opponent played against for the game
result: If the team James played on won or lost the game
Start by changing the team column to a factor and
have the order of the levels (groups) to be CLE, MIA, LAL. Once
finished, use the levels()
function to display the 3 groups
in order.
lbj$team <-
factor(
x = lbj$team,
levels = c("CLE", "MIA", "LAL")
)
levels(lbj$team)
## [1] "CLE" "MIA" "LAL"
Create a frequency table for the teams Lebron has
played for that has 2 columns: the team played for and the number of
games played. Save it as teams_tab. Use the gt()
function
to display the frequency table (but don’t save the results of the
gt()
function.
teams_tab <-
lbj |>
count(
team,
name = "games"
)
gt::gt(teams_tab)
team | games |
---|---|
CLE | 1001 |
MIA | 381 |
LAL | 321 |
Create a bar chart displaying the three teams Lebron James
has played for, with counts on the y-axis using the lbj data
set. Change the theme to theme_classic()
and have the
x-axis label read as “Games Lebron James’ Played”
Bonus: Color the bars using the teams’ primary colors using hex codes: CLE = “#041E42” LAL = “#FDB927” MIA = “#98002E”
ggplot(
data = lbj,
mapping = aes(
x = team
)
) +
geom_bar(
fill = c("#041E42", "#FDB927", "#98002E")
) +
labs(x = "Games Lebron James Played") +
theme_classic() +
# Add this to the graph to have the bars sit on the x-axis
scale_y_continuous(expand = c(0, 0, 0.05, 0))
What is the difference between the graphs in 1b vs 1c of displaying the density vs the counts?
Create a relative frequency table for the result of each game
and save it as wins_prop that has 3 columns: 1) result 2) count
3) proportion (rounded to 3 decimal places. Display the results with the
gt()
function, but don’t save the result of
gt()
wins_prop <-
lbj |>
count(
result,
name = "count"
) |>
mutate(
proportion = round(count/sum(count), digits = 3)
)
gt::gt(wins_prop)
result | count | proportion |
---|---|---|
Lost | 597 | 0.351 |
Won | 1106 | 0.649 |
What percentage of games has Lebron James won over the course
of his career?
Create a bar chart of win and loss proportion with game
result on the x-axis and proportion on the y-axis. Add relevant
title and x-axis label, remove the label for the y-axis, and use
theme_bw()
. Color the bars with the hex code
“#1D428A”
ggplot(
data = wins_prop,
mapping = aes(
x = result,
y = proportion
)
) +
geom_col(
fill = "#1D428A"
) +
labs(
title = "Result of Lebron James Games",
x = "Game Result",
y = NULL
) +
theme_bw() +
# Keep this at the bottom and remove the # to have bars sit on the x-axis and
# display the y-axis as a percentage
scale_y_continuous(
expand = c(0, 0, 0.05, 0),
labels = scales::label_percent()
)
Create a table to show James’ win and loss proportions
(rounded to 3 decimal places) for the three different teams he has
played for and save it as team_win_tab. It should have four
columns: 1) team 2) result 3) count 4) proportion. Display the results
using the gt()
function, but don’t save the results of
gt()
.
team_win_tab <-
lbj |>
count(
team,
result,
name = "count"
) |>
mutate(
.by = team,
proportion = round(count/sum(count), digits = 3)
)
# Displaying the results
gt::gt(team_win_tab)
team | result | count | proportion |
---|---|---|---|
CLE | Lost | 358 | 0.358 |
CLE | Won | 643 | 0.642 |
MIA | Lost | 107 | 0.281 |
MIA | Won | 274 | 0.719 |
LAL | Lost | 132 | 0.411 |
LAL | Won | 189 | 0.589 |
Which team does Lebron have the highest win percentage?
Which team has Lebron have the lowest win percentage?
Using the lbj data set, create a side-by-side bar chart that displays the game result for each team with the counts on the y-axis. Provide a decent theme, title, and axis labels.
ggplot(
data = lbj,
mapping = aes(
x = team,
fill = result
)
) +
geom_bar(
position = "dodge2"
) +
labs(
x = "Teams played for",
title = "Lebron James Wins and Losses by Team"
) +
theme_bw() +
scale_y_continuous(expand = c(0, 0, 0.05, 0))
Create a graph like the graph in 2b, but have the percentage appear on the y-axis and remove the y-axis label.
ggplot(
data = team_win_tab,
mapping = aes(
x = team,
y = proportion,
fill = result
)
) +
geom_col() +
labs(
title = "Lebron James Win and Loss Percentage by Team",
x = "Team",
y = NULL
) +
theme_bw() +
scale_y_continuous(
expand = c(0, 0, 0.05, 0),
labels = scales::label_percent()
)
Does there appear to be a large difference in how often James wins for the three teams he has played for?