TPOT Regressor vs. Neural Network Regressor
TPOT in Python is an automated machine learning library that picks the best possible model for a given classification or regression problem.
TPOT documentation can be found here.
TPOT tutorials are offered through DataCamp and Kaggle.
More detailed explanations of TPOT can be found on Towards Data Science.
First, we will use TPOT Regression to predict the number of wins an NBA team will have and evaluate model performance. Then we will predict the same problem using a Keras Neural Network and evaluate model performance; allowing for comparison between the TPOT automated regression and deep learning for a regression problem.
Let’s begin with TPOT Regression.
Read in the data found here
import pandas as pd
dt = pd.read_csv('C:/Users/aengland/Documents/dt_NBA_reg.csv')
Get the names of the columns in the dataset
print(dt.columns)
## Index(['W', 'PTS', 'oppPTS', 'FG', 'FGA', '2P', '2PA', '3P', '3PA', 'FT',
## 'FTA', 'ORB', 'DRB', 'AST', 'STL', 'BLK', 'TOV'],
## dtype='object')
We will be using PTS (points scored), oppPTS (opposing team points), FG (field goals made), FGA (field goal attempts), 2P (2 point field goals made), 2PA (2 point field goal attmpts), 3P (3 point field goals made), 3PA (3 point field goal attempts), FT (free throws made), FTA (free throw attempts), ORB (offensive rebounds), DRB (defensive rebounds), AST (assists), STL (steals), BLK (blocks), and TOV (turnovers) to predict W (number of wins)
Preview the data
print(dt.head(5))
## W PTS oppPTS FG FGA 2P ... ORB DRB AST STL BLK TOV
## 0 44 8032 7999 3084 6644 2378 ... 758 2593 2007 664 369 1219
## 1 49 7944 7798 2942 6544 2314 ... 1047 2460 1668 599 391 1206
## 2 21 7661 8418 2823 6649 2354 ... 917 2389 1587 591 479 1153
## 3 45 7641 7615 2926 6698 2480 ... 1026 2514 1886 588 417 1171
## 4 24 7913 8297 2993 6901 2446 ... 1004 2359 1694 647 334 1149
##
## [5 rows x 17 columns]
Get the dt info
print(dt.info())
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 863 entries, 0 to 862
## Data columns (total 17 columns):
## W 863 non-null int64
## PTS 863 non-null int64
## oppPTS 863 non-null int64
## FG 863 non-null int64
## FGA 863 non-null int64
## 2P 863 non-null int64
## 2PA 863 non-null int64
## 3P 863 non-null int64
## 3PA 863 non-null int64
## FT 863 non-null int64
## FTA 863 non-null int64
## ORB 863 non-null int64
## DRB 863 non-null int64
## AST 863 non-null int64
## STL 863 non-null int64
## BLK 863 non-null int64
## TOV 863 non-null int64
## dtypes: int64(17)
## memory usage: 114.7 KB
## None
Get the desriptives of each variable
print(dt.describe())
## W PTS ... BLK TOV
## count 863.000000 863.000000 ... 863.000000 863.000000
## mean 40.989571 8360.232908 ... 419.793743 1299.221321
## std 12.744268 577.260038 ... 81.956890 153.200143
## min 11.000000 6901.000000 ... 204.000000 931.000000
## 25% 31.000000 7930.500000 ... 359.000000 1192.000000
## 50% 42.000000 8296.000000 ... 410.000000 1280.000000
## 75% 50.500000 8769.000000 ... 468.500000 1391.500000
## max 72.000000 10371.000000 ... 716.000000 1873.000000
##
## [8 rows x 17 columns]
Check each column for the proportion of missing values
print(dt.isnull().sum()/dt.shape[0])
## W 0.0
## PTS 0.0
## oppPTS 0.0
## FG 0.0
## FGA 0.0
## 2P 0.0
## 2PA 0.0
## 3P 0.0
## 3PA 0.0
## FT 0.0
## FTA 0.0
## ORB 0.0
## DRB 0.0
## AST 0.0
## STL 0.0
## BLK 0.0
## TOV 0.0
## dtype: float64
Get X’s and y
X = dt.drop('W', axis=1)
y = dt['W']
Create train and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
Instantiate TPOTRegressor model with a patience of 3 and a maximum of 20 generations
from tpot import TPOTRegressor
model = TPOTRegressor(generations=20, verbosity=2, scoring='r2', early_stop=3)
fit the model to the training dataset
model.fit(X_train, y_train)
Show the best pipeline
print(model.fitted_pipeline_)
## Pipeline(memory=None,
## steps=[('stackingestimator-1', StackingEstimator(estimator=LassoLarsCV(copy_X=True, cv=None, eps=2.220446049250313e-16,
## fit_intercept=True, max_iter=500, max_n_alphas=1000, n_jobs=1,
## normalize=True, positive=False, precompute='auto', verbose=False))), ('normalizer', Normalizer(copy=True,...x_n_alphas=1000, n_jobs=1,
## normalize=True, positive=False, precompute='auto', verbose=False))])
Get predictions on the testing data
predictions = model.predict(X_test)
Plot a scatterplot of the actual vs. predicted values with a trendline and a Pearson correlation (r)

Print the interpretation of the Pearson r correlation coefficient
## There is a very strong, positive linear relationship between the predicted and actual values.
Print the regression metrics
## MAE: 2.476
## MSE: 9.263
## RMSE: 3.043
## R-Squared: 0.946
Plot histogram of the residuals
## C:\Users\aengland\AppData\Local\CONTIN~1\ANACON~1\lib\importlib\_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__
## return f(*args, **kwds)
## C:\Users\aengland\AppData\Local\CONTIN~1\ANACON~1\lib\site-packages\matplotlib\axes\_axes.py:6499: MatplotlibDeprecationWarning:
## The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
## alternative="'density'", removal="3.1")

Check the residuals for normality using the Shapiro-Wilk test
## Shapiro-Wilk test statistic: 1.0
## p-value: 0.67
## Fail to reject the null hypothesis. Data is normally distributed.
How accurate was this model?
## This model was able to predict within +/- 2.476 wins
How long did it take to build this model?
## Time to complete the TPOT model: 11.67 min.
Now, let’s use a Keras neural network to predict how many wins an NBA team will have.
We will use 19 nodes in the hidden layer because I ran a nested loop over 12 different models [with 12-24 nodes in the hidden layer] through 10 iterations each and found greatest mean R-Squared and lowest mean RMSE in the model with 19 nodes in the hidden layer (for more information on this see this article).
## C:\Users\aengland\AppData\Local\CONTIN~1\ANACON~1\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
## from ._conv import register_converters as _register_converters
## Using TensorFlow backend.
Read in the data
import pandas as pd
dt = pd.read_csv('C:/Users/aengland/dt_NBA_reg.csv')
Standardize the predictor variables.
# Standardize predictor variables
scaler = StandardScaler()
# Fit scaler to the features
scaler.fit(dt.drop('W', axis = 1))
# Transform features to scaled version
scaled_features = scaler.transform(dt.drop('W', axis = 1))
# Save into data frame
dt_feat = pd.DataFrame(scaled_features, columns=dt.loc[:,dt.columns != 'W'].columns)
Get X’s and y
X = dt_feat
y = dt['W']
Create train and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
Save the weights of the model with the lowest loss val
filepath = 'C:/Users/aengland/model_weights.hdf5'
callbacks_list = [ModelCheckpoint(filepath, monitor='val_loss', verbose=True, save_best_only=True, mode='min')]
# This model has 1 hidden layer with 19 nodes
model = Sequential()
# Add first layer
model.add(Dense(units=len(X.columns), activation='relu', input_shape=(len(X.columns),)))
# Add next layer layer
model.add(Dense(units=19, activation='relu'))
# Add output layer
model.add(Dense(units=1))
model.compile(loss='mean_squared_error', optimizer='adam')
Fit model
model.fit(X, y, validation_split=0.33, epochs=300, callbacks=callbacks_list, verbose=True)
Load in the weights of the best model
model.load_weights('C:/Users/aengland/model_weights.hdf5')
Re-compile the model
model.compile(loss='mean_squared_error', optimizer='adam')
Re-fit the model
model.fit(X_train, y_train, validation_split=0.33, epochs=10, callbacks=[EarlyStopping(patience=3)], verbose=True)
## Train on 387 samples, validate on 191 samples
## Epoch 1/10
##
32/387 [=>............................] - ETA: 1s - loss: 5.3190
387/387 [==============================] - 0s 394us/step - loss: 9.6275 - val_loss: 9.0791
## Epoch 2/10
##
32/387 [=>............................] - ETA: 0s - loss: 17.5623
387/387 [==============================] - 0s 23us/step - loss: 9.5175 - val_loss: 9.1212
## Epoch 3/10
##
32/387 [=>............................] - ETA: 0s - loss: 9.1533
387/387 [==============================] - 0s 23us/step - loss: 9.4638 - val_loss: 9.0533
## Epoch 4/10
##
32/387 [=>............................] - ETA: 0s - loss: 6.7724
387/387 [==============================] - 0s 21us/step - loss: 9.2282 - val_loss: 9.0321
## Epoch 5/10
##
32/387 [=>............................] - ETA: 0s - loss: 8.5148
387/387 [==============================] - 0s 23us/step - loss: 9.0893 - val_loss: 9.0648
## Epoch 6/10
##
32/387 [=>............................] - ETA: 0s - loss: 7.3680
387/387 [==============================] - 0s 23us/step - loss: 9.1010 - val_loss: 9.1082
## Epoch 7/10
##
32/387 [=>............................] - ETA: 0s - loss: 8.3683
387/387 [==============================] - 0s 23us/step - loss: 9.1061 - val_loss: 9.2011
Get predictions
predictions = model.predict(X_test)[:,0]
Plot a scatterplot of the actual vs. predicted values with a trendline and a Pearson correlation (r)

Print the interpretation of the Pearson r correlation coefficient
## There is a very strong, positive linear relationship between the predicted and actual values.
Print the regression metrics
## MAE: 2.314
## MSE: 8.779
## RMSE: 2.963
## R-Squared: 0.948
Plot histogram of the residuals
## C:\Users\aengland\AppData\Local\CONTIN~1\ANACON~1\lib\site-packages\matplotlib\axes\_axes.py:6499: MatplotlibDeprecationWarning:
## The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
## alternative="'density'", removal="3.1")

Check the residuals for normality using the Shapiro-Wilk test
## Shapiro-Wilk test statistic: 1.0
## p-value: 0.65
## Fail to reject the null hypothesis. Data is normally distributed.
How accurate was this model?
## This model was able to predict within +/- 2.314 wins
How long did it take to build this model?
## Time to complete the neural network model: 0.15 min.
As we can see, the TPOTRegressor model and the Keras neural network for regression produce similarly-performing models, however, for the sake of time, it may be easier to implement the Keras neural network regressor.