Data_Dive_WeeK

Loading the data

player_data <-read.csv('C:/Users/rohan/OneDrive/Desktop/INTRO TO STATISTICS IN R/DATA SETS/Datasets/Data/Nba_all_seasons_1996_2021.csv')

head(player_data,10)

##    X       player_name team_abbreviation age player_height player_weight
## 1  0     Dennis Rodman               CHI  36        198.12      99.79024
## 2  1 Dwayne Schintzius               LAC  28        215.90     117.93392
## 3  2      Earl Cureton               TOR  39        205.74      95.25432
## 4  3       Ed O'Bannon               DAL  24        203.20     100.69742
## 5  4       Ed Pinckney               MIA  34        205.74     108.86208
## 6  5     Eddie Johnson               HOU  38        200.66      97.52228
## 7  6       Eddie Jones               LAL  25        198.12      86.18248
## 8  7    Elden Campbell               LAL  28        213.36     113.39800
## 9  8 Eldridge Recasner               ATL  29        193.04      86.18248
## 10 9      Elliot Perry               MIL  28        182.88      72.57472
##                        college country draft_year draft_round draft_number gp
## 1  Southeastern Oklahoma State     USA       1986           2           27 55
## 2                      Florida     USA       1990           1           24 15
## 3                Detroit Mercy     USA       1979           3           58  9
## 4                         UCLA     USA       1995           1            9 64
## 5                    Villanova     USA       1985           1           10 27
## 6                     Illinois     USA       1981           2           29 52
## 7                       Temple     USA       1994           1           10 80
## 8                      Clemson     USA       1990           1           27 77
## 9                   Washington     USA       1992   Undrafted    Undrafted 71
## 10                     Memphis     USA       1991           2           37 82
##     pts  reb ast net_rating oreb_pct dreb_pct usg_pct ts_pct ast_pct  season
## 1   5.7 16.1 3.1       16.1    0.186    0.323   0.100  0.479   0.113 1996-97
## 2   2.3  1.5 0.3       12.3    0.078    0.151   0.175  0.430   0.048 1996-97
## 3   0.8  1.0 0.4       -2.1    0.105    0.102   0.103  0.376   0.148 1996-97
## 4   3.7  2.3 0.6       -8.7    0.060    0.149   0.167  0.399   0.077 1996-97
## 5   2.4  2.4 0.2      -11.2    0.109    0.179   0.127  0.611   0.040 1996-97
## 6   8.2  2.7 1.0        4.1    0.034    0.126   0.220  0.541   0.102 1996-97
## 7  17.2  4.1 3.4        4.1    0.035    0.091   0.209  0.559   0.149 1996-97
## 8  14.9  8.0 1.6        3.3    0.095    0.183   0.222  0.520   0.087 1996-97
## 9   5.7  1.6 1.3       -0.3    0.036    0.076   0.172  0.539   0.141 1996-97
## 10  6.9  1.5 3.0       -1.2    0.018    0.081   0.177  0.557   0.262 1996-97

Question

A list of at least 3 columns (or values) in your data which are unclear until you read the documentation.

columns_df <- player_data[c("net_rating","usg_pct","ts_pct")]

head(columns_df,10)

##    net_rating usg_pct ts_pct
## 1        16.1   0.100  0.479
## 2        12.3   0.175  0.430
## 3        -2.1   0.103  0.376
## 4        -8.7   0.167  0.399
## 5       -11.2   0.127  0.611
## 6         4.1   0.220  0.541
## 7         4.1   0.209  0.559
## 8         3.3   0.222  0.520
## 9        -0.3   0.172  0.539
## 10       -1.2   0.177  0.557

Explanation

When a person looks at the above three columns it is difficult to understand what those names stand for, unless we take a look at the documentation.

net_rating: Team’s point differential per 100 possessions while the player is on the court

usg_pct: Percentage of team plays used by the player while he was on the floor

ts_pct: Measure of the player’s shooting efficiency that takes into account free throws, 2 and 3 point shots

Question 2

At least one element or your data that is unclear even after reading the documentation.

The one element which I think is unclear to me is draft_number rather then saying unclear I think the column is unnecessary since it does not help in determine things I want to find.

Question 3

Building visualization based on the second column mentioned in the second question.

# Load the ggplot2 package
library(ggplot2)

# Create a bar graph
ggplot(player_data, aes(x = draft_number, y = pts, fill = season)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Points Scored by Player Against Draft Number and Season",
       x = "Draft Number", y = "Points Scored") +
  theme_minimal() +
  theme(legend.title = element_blank()) +
  facet_wrap(~ season, scales = "free")  # Group by season

## Observation We can clearly see that in every season that draft number does not determine the player’s season performance . Therefore I think that its unnecessary to have the draft number since it would make no difference when calculating most points scored or when determining the player’s overall performance.

Data_Dive_WeeK_5

Rohan Royal

2023-09-25

Loading the data

Question

Explanation

Question 2

Question 3