Load the libraries + functions

Load all the libraries or functions that you will use to for the rest of the assignment. It is helpful to define your libraries and functions at the top of a report, so that others can know what they need for the report to compile correctly.

The data for this project has already been loaded. You will be distinguishing between the categories of nerd and geek to determine the influence of respective variables on their category definition.

library(Rling)
data(nerd)
head(nerd)
##   Noun Num Century Register    Eval
## 1 nerd  pl      XX     ACAD Neutral
## 2 nerd  pl     XXI     ACAD Neutral
## 3 nerd  pl      XX     ACAD Neutral
## 4 nerd  pl      XX     ACAD Neutral
## 5 nerd  pl      XX     ACAD Neutral
## 6 nerd  pl     XXI     ACAD Neutral
#install.packages("party")
library(party)
## Warning: package 'party' was built under R version 3.6.1
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Warning: package 'strucchange' was built under R version 3.6.1
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Warning: package 'sandwich' was built under R version 3.6.1
table(nerd$Noun)
## 
## geek nerd 
##  670  646

Description of the data

Dependent variable:

Independent variables:

Conditional inference model

set.seed(12345)
tree.output = ctree(Noun ~ Num + Century + Register + Eval, data = nerd)

Make a plot

plot(tree.output)

Interpret the categories

Conditional inference model predictiveness

Final_Result = table(predict(tree.output), nerd$Noun)
Final_Result
##       
##        geek nerd
##   geek  227   61
##   nerd  443  585
sum(diag(Final_Result)) / sum(Final_Result) * 100
## [1] 61.70213
sum(Final_Result[1]) / sum(Final_Result[,1]) * 100
## [1] 33.8806
sum(Final_Result[4]) / sum(Final_Result[,2]) * 100
## [1] 90.55728
sum(Final_Result[,1]) / (sum(Final_Result[,1]) + sum(Final_Result[,2]))
## [1] 0.5091185
sum(Final_Result[1,]) / (sum(Final_Result[1,]) + sum(Final_Result[2,]))
## [1] 0.218845

Random forests

forest.output = cforest(Noun ~ Num + Century + Register + Eval, 
                        data = nerd,
                        controls = cforest_unbiased(ntree = 1000,
                                                    mtry = 2))

Variable importance

The most important variables as seen in the output are Eval and Century. Eval: A measure of the semanticity of the word Century: Time measurement - 20th or 21st century

forest.importance = varimp(forest.output,
                           conditional = T)
round(forest.importance, 3)
##      Num  Century Register     Eval 
##   -0.002    0.023   -0.003    0.056
dotchart(sort(forest.importance),
         main = "Variable Importance")

Forest model predictiveness

We can see that the forest model with an accuracy of 63.37% is more accurate than the tree model but it is much less biased in predicting “nerd”. For the tree model, the geek:nerd prediction split was 22:78, the random forest has a better split 37:63 which is closer to the actual split of 51:49. The accuracy in predicting “geek” improves to 56.4% which comes at a cost of reducing the prediction accuracy for “nerd” to 70.8%

forest.outcomes = table(predict(forest.output), nerd$Noun)
forest.outcomes
##       
##        geek nerd
##   geek  337  149
##   nerd  333  497
sum(diag(forest.outcomes)) / sum(forest.outcomes) * 100
## [1] 63.37386
sum(forest.outcomes[1]) / sum(forest.outcomes[,1]) * 100
## [1] 50.29851
sum(forest.outcomes[4]) / sum(forest.outcomes[,2]) * 100
## [1] 76.93498
sum(forest.outcomes[1,]) / (sum(forest.outcomes[1,]) + sum(forest.outcomes[2,]))
## [1] 0.3693009

Thought question

Urban dictionary hysterically defines hot and cool nerds as rare and special creatures whose knowledge spans across a wide variety of subjects. Hence it would be important to see how different authors view the definition of geek and nerd.