player_data <-read.csv('C:/Users/rohan/OneDrive/Desktop/INTRO TO STATISTICS IN R/DATA SETS/Datasets/Data/Nba_all_seasons_1996_2021.csv')
head(player_data,10)
## X player_name team_abbreviation age player_height player_weight
## 1 0 Dennis Rodman CHI 36 198.12 99.79024
## 2 1 Dwayne Schintzius LAC 28 215.90 117.93392
## 3 2 Earl Cureton TOR 39 205.74 95.25432
## 4 3 Ed O'Bannon DAL 24 203.20 100.69742
## 5 4 Ed Pinckney MIA 34 205.74 108.86208
## 6 5 Eddie Johnson HOU 38 200.66 97.52228
## 7 6 Eddie Jones LAL 25 198.12 86.18248
## 8 7 Elden Campbell LAL 28 213.36 113.39800
## 9 8 Eldridge Recasner ATL 29 193.04 86.18248
## 10 9 Elliot Perry MIL 28 182.88 72.57472
## college country draft_year draft_round draft_number gp
## 1 Southeastern Oklahoma State USA 1986 2 27 55
## 2 Florida USA 1990 1 24 15
## 3 Detroit Mercy USA 1979 3 58 9
## 4 UCLA USA 1995 1 9 64
## 5 Villanova USA 1985 1 10 27
## 6 Illinois USA 1981 2 29 52
## 7 Temple USA 1994 1 10 80
## 8 Clemson USA 1990 1 27 77
## 9 Washington USA 1992 Undrafted Undrafted 71
## 10 Memphis USA 1991 2 37 82
## pts reb ast net_rating oreb_pct dreb_pct usg_pct ts_pct ast_pct season
## 1 5.7 16.1 3.1 16.1 0.186 0.323 0.100 0.479 0.113 1996-97
## 2 2.3 1.5 0.3 12.3 0.078 0.151 0.175 0.430 0.048 1996-97
## 3 0.8 1.0 0.4 -2.1 0.105 0.102 0.103 0.376 0.148 1996-97
## 4 3.7 2.3 0.6 -8.7 0.060 0.149 0.167 0.399 0.077 1996-97
## 5 2.4 2.4 0.2 -11.2 0.109 0.179 0.127 0.611 0.040 1996-97
## 6 8.2 2.7 1.0 4.1 0.034 0.126 0.220 0.541 0.102 1996-97
## 7 17.2 4.1 3.4 4.1 0.035 0.091 0.209 0.559 0.149 1996-97
## 8 14.9 8.0 1.6 3.3 0.095 0.183 0.222 0.520 0.087 1996-97
## 9 5.7 1.6 1.3 -0.3 0.036 0.076 0.172 0.539 0.141 1996-97
## 10 6.9 1.5 3.0 -1.2 0.018 0.081 0.177 0.557 0.262 1996-97
A list of at least 3 columns (or values) in your data which are unclear until you read the documentation.
columns_df <- player_data[c("net_rating","usg_pct","ts_pct")]
head(columns_df,10)
## net_rating usg_pct ts_pct
## 1 16.1 0.100 0.479
## 2 12.3 0.175 0.430
## 3 -2.1 0.103 0.376
## 4 -8.7 0.167 0.399
## 5 -11.2 0.127 0.611
## 6 4.1 0.220 0.541
## 7 4.1 0.209 0.559
## 8 3.3 0.222 0.520
## 9 -0.3 0.172 0.539
## 10 -1.2 0.177 0.557
When a person looks at the above three columns it is difficult to understand what those names stand for, unless we take a look at the documentation.
net_rating: Team’s point differential per 100 possessions while the player is on the court
usg_pct: Percentage of team plays used by the player while he was on the floor
ts_pct: Measure of the player’s shooting efficiency that takes into account free throws, 2 and 3 point shots
At least one element or your data that is unclear even after reading the documentation.
The one element which I think is unclear to me is draft_number rather then saying unclear I think the column is unnecessary since it does not help in determine things I want to find.
Building visualization based on the second column mentioned in the second question.
# Load the ggplot2 package
library(ggplot2)
# Create a bar graph
ggplot(player_data, aes(x = draft_number, y = pts, fill = season)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Points Scored by Player Against Draft Number and Season",
x = "Draft Number", y = "Points Scored") +
theme_minimal() +
theme(legend.title = element_blank()) +
facet_wrap(~ season, scales = "free") # Group by season
## Observation We can clearly see that in every season that draft number
does not determine the player’s season performance . Therefore I think
that its unnecessary to have the draft number since it would make no
difference when calculating most points scored or when determining the
player’s overall performance.