Brief This dataset was taken from Kaggle. This is a NBA players information dataset. The sample size of this dataset is 1340 with 21 feature. The only categorical variable is the response Variable - Target
1-Signifies whether a player has a career of 5 years or more. 0-Signifies the career of the player is shorter than 5 years.
Other features: Name : Name of the player
GamesPlayed: Total no of games played
MinutesPlayed: Total minutes a player played
PointsPerGame: Points scored per game.
FieldGoalsMade: Field goals made by a player
FieldGoalsAttempt: Field goals attempt of a player
FieldGoalPercent: Percentage of field goals of a player
3PointMade: no of 3 point made by a player
3PointAttempt: no of 3 point attempt made by a player
3PointPercent: percentage of 3 point of a player
FreeThrowMade
FreeThrowAttempt
FreeThrowPercent
OffensiveRebounds
DefensiveRebounds
Rebounds
Assists
Steals
Blocks
Turnovers
When doing the data cleaning, ( get-and-clean-data.R)a YeoJohn transformation method was used. Since there is no big different between the model test with original dataset and transformed dataset. I decided to keep the original dataset which made the model data more interpretative.
When doing traing the model, logistic regression, random tress and decision tress are used to test the model accuracy. The roc accuracy and sensitivity results shows that logistic regression is the best model for this dataset. Final model named: nba_model
| GamesPlayed | n | Ave_3_PT | Ave_Free_Threw | Prob_career_over_5Yrs |
|---|---|---|---|---|
| 15 | 1 | 0.30 | 2.00 | 0.27 |
| 18 | 1 | 0.00 | 0.20 | 0.20 |
| 19 | 2 | 0.30 | 0.70 | 0.20 |
| 20 | 1 | 0.00 | 2.70 | 0.42 |
| 21 | 1 | 0.00 | 0.60 | 0.23 |
| 22 | 2 | 0.05 | 0.40 | 0.16 |
| 23 | 3 | 0.20 | 0.97 | 0.27 |
| 24 | 3 | 0.43 | 1.23 | 0.22 |
| 25 | 2 | 0.20 | 0.75 | 0.30 |
| 26 | 3 | 0.00 | 0.97 | 0.29 |
| 27 | 2 | 0.40 | 1.30 | 0.23 |
| 31 | 3 | 0.30 | 1.00 | 0.25 |
| 32 | 3 | 0.40 | 1.10 | 0.41 |
| 33 | 3 | 0.07 | 0.77 | 0.32 |
| 34 | 6 | 0.37 | 0.87 | 0.30 |
| 35 | 8 | 0.10 | 0.91 | 0.31 |
| 36 | 7 | 0.17 | 0.87 | 0.27 |
| 37 | 7 | 0.20 | 1.14 | 0.35 |
| 38 | 8 | 0.29 | 0.99 | 0.37 |
| 39 | 5 | 0.08 | 0.94 | 0.42 |
| 40 | 5 | 0.24 | 0.86 | 0.37 |
| 41 | 5 | 0.14 | 0.82 | 0.32 |
| 42 | 5 | 0.48 | 1.22 | 0.37 |
| 43 | 5 | 0.02 | 1.38 | 0.43 |
| 44 | 3 | 0.07 | 0.63 | 0.29 |
| 45 | 4 | 0.00 | 0.90 | 0.46 |
| 46 | 5 | 0.10 | 0.72 | 0.46 |
| 47 | 7 | 0.16 | 1.16 | 0.45 |
| 48 | 6 | 0.33 | 2.05 | 0.58 |
| 49 | 5 | 0.10 | 1.10 | 0.43 |
| 50 | 6 | 0.30 | 1.53 | 0.47 |
| 51 | 7 | 0.27 | 1.51 | 0.54 |
| 52 | 7 | 0.04 | 1.36 | 0.58 |
| 53 | 9 | 0.17 | 1.06 | 0.53 |
| 54 | 4 | 0.17 | 1.55 | 0.54 |
| 55 | 6 | 0.13 | 1.43 | 0.56 |
| 56 | 5 | 0.32 | 1.66 | 0.58 |
| 57 | 6 | 0.15 | 0.92 | 0.51 |
| 58 | 4 | 0.10 | 1.60 | 0.68 |
| 59 | 8 | 0.44 | 1.44 | 0.57 |
| 60 | 2 | 0.05 | 1.45 | 0.58 |
| 61 | 8 | 0.40 | 1.65 | 0.62 |
| 62 | 8 | 0.35 | 1.78 | 0.66 |
| 63 | 7 | 0.16 | 1.86 | 0.76 |
| 64 | 8 | 0.41 | 1.89 | 0.62 |
| 65 | 6 | 0.15 | 1.73 | 0.71 |
| 66 | 8 | 0.46 | 1.55 | 0.64 |
| 67 | 4 | 0.12 | 1.80 | 0.78 |
| 68 | 9 | 0.14 | 1.48 | 0.67 |
| 69 | 2 | 0.20 | 1.55 | 0.67 |
| 70 | 8 | 0.40 | 2.56 | 0.78 |
| 71 | 7 | 0.01 | 2.47 | 0.74 |
| 72 | 7 | 0.30 | 3.19 | 0.83 |
| 73 | 7 | 0.51 | 1.91 | 0.77 |
| 74 | 4 | 0.08 | 3.17 | 0.80 |
| 75 | 4 | 0.35 | 1.95 | 0.78 |
| 76 | 11 | 0.47 | 2.70 | 0.81 |
| 77 | 14 | 0.35 | 1.61 | 0.77 |
| 78 | 14 | 0.26 | 2.94 | 0.81 |
| 79 | 12 | 0.32 | 2.59 | 0.80 |
| 80 | 22 | 0.35 | 3.33 | 0.87 |
| 81 | 21 | 0.23 | 3.36 | 0.86 |
| 82 | 21 | 0.30 | 2.64 | 0.85 |
```