Decision Tree

This decision tree model tries to predict smartphone type (iPhone vs. Android) based on several characteristics and personality traits like gender, age, honesty/humility, extraversion, conscientiousness, avoidance similarity, emotionality, agreeableness, socio-economic status, time owned current device, and whether the phone is perceived as a status object. The accuracy of this model is 78.3%, meaning that the model can correctly predict the type of smartphone (Android / iPhone) about 78% of the time.

Overall accuracy = 0.783 

Confusion matrix 
      Predicted (cv)
Actual  [,1]  [,2]
  [1,] 0.658 0.342
  [2,] 0.129 0.871

Model Tuning

Decision tree model arguments can be tuned so that a better accuracy is reached. For instance, you may decide that you split a group as long as it has at least 1 item, or specify the smallest number of observations that can appear on the bottom node of the tree. You may also want to set a complexity argument for how much better each model must be before splitting a leaf node. We see that by fine-tuning the previous model, accuracy raises from 78.3% to 83.7%. Now, the smartphone is correctly predicted about 84% of the time.

Overall accuracy = 0.837 

Confusion matrix 
      Predicted (cv)
Actual  [,1]  [,2]
  [1,] 0.808 0.192
  [2,] 0.142 0.858

Random Forest

Combining the results of many small decision trees is often described as ‘boosting’, or an `ensemble’ method, and when you do this with decision trees, you get what is referred to as a ‘random forest’. The accuracy of this model is roughly 65%. The forest also lets you understand which variables in the model are more important based on how often they are closer to the root of the tree. You can see here that the two most important variables to choose one type of smartphone over the other appear to be status symbol and age.

Overall accuracy = 0.65 

Confusion matrix 
      Predicted (cv)
Actual  [,1]  [,2]
  [1,] 0.479 0.521
  [2,] 0.229 0.771

Interaction Plot

An interaction plot shows the prediction of the forest for different values of the selected variables. You may want to select variables that are important according to the importance plot previously obtained, and see how they interact with other features. If the interaction is meaningful, you will see blocks of red/purple indicating the prediction in different regions. In the case of using the phone as a status symbol, you see that the lower the person scored in honesty/humility, the most likely that person is to choose a certain type of smartphone (an iPhone, in this case).