User Score Classification with Neural Network and Keras

1 Intro

1.1 What We’ll Do

We will try to classify whether a user will give a game an above average score based on the content of the reviews. We will use the naive bayes and some three-based methods. Reviews will be extracted using text mining approach. I’ve done similar classification task but with different feature and models, using the sentiment value of the reviews. You can check it here . I also used the Naive Bayes and Random Forest method to classify the score, you can check it here .

1.2 The Dataset

The dataset is user reviews of 100 best PC games from metacritic website. I already scraped the data, which you can download here .

2 Data Preparation

First, we load the required package.

2.1 Import Data

We will import the dataset using fread for faster importing.

2.2 Data PreProcessing

We want to clean the text by removing url and any word elongation. We will replace “?” with “questionmark” and “!” with “exclamationmark” to see if these characters can be useful in our analysis, etc.

Since we want to classify the score into above average or below average, we need to add the label into the data.

Finally, we will make a document term matrix, with the row indicate each review and the columns consists of top 1024 words in the entire reviews. We will use the matrix to classify if the user will give an above average score based on the appearance of one or more terms.

3 Exploratory Data Analysis

We will see if there is a class imbalanec by looking at the proportion of the target variable.


    Above     Below 
0.6858394 0.3141606 

Turns out there is a class imbalance, with the class of above average score is twice bigger than the below average.

4 Modeling

4.1 Cross-Validation

We will split the data into training set, validation set, and testing set. First, we split the data into training set and testing set.

We will balance the class in the training set and normalize all numeric features. Then we split the testing set into validation set and the testing test itself.


Above Below 
  0.5   0.5 

We adjust the data to get a proper structure before we fed them into keras.

4.2 Neural Network

We will build the neural network architecture.

Our model would have several layers. There are layer dense which will scale our data using the relu activation function on the first and second layer dense. There are also layer dropout to prevent the model from overfitting. Finally, we scale back our data into range of [0,1] with the sigmoid function as the probability of our data belong to a particular class. The epoch represent the number of our model doing the feed-forward and back-propagation.

Our model has accuracy of 86.4 % on training dataset and 74.18 % on validation set at the end of the training phase. Since the difference between the training and validation set is not too big, we can conclude that our model is not overfit.

5 Evaluation

5.1 Performance

We will check the confusion matrix from the training set and the testing set.

          Truth
Prediction Above Below
     Above  6894   689
     Below   627  6832
          Truth
Prediction Above Below
     Above  1555   260
     Below   497   679

We will check the performance of our model on the training set.

Next, we check the performance on the testing set.

5.2 ROC Curve

5.3 Sensitivity-Specificity Curve

5.4 Precision-Recall Curve

6 Model Improvement

6.1 Only Change the Threshold

By changing the threshold, we have increased the precision (even only by small margin).

6.2 Tuning the Model Parameters

We will try to use the tf_idf instead of the term frequency. Tf_idf (term frequency - inverse document frequency) represent how unique a term or word is accross reviews.

The numeric features will be scaled using min-max scaling instead of the normalization.


Above Below 
  0.5   0.5 

We build a new neural network architecture.

Our model has accuracy of 88.21 % on training dataset and 75.7 % on validation set at the end of the training phase. Since the difference between the training and validation set is not too big, we can conclude that our model is not overfit.
We then check the model performance.

6.2.1 ROC Curve

6.2.2 Sensitivity-Specificity Curve

6.2.3 Precision-Recall Curve

We will try to change the threshold into 0.65

7 Conclusion

This is the summary of our model performance. Model 1 refers to the neural network model who use the term frequency, while Model 2 refers to those that use tf-idf.

There is no apparent difference in model performance between using the term frequency and the tf-idf.

2019-10-06