Load all the libraries or functions that you will use to for the rest of the assignment. It is helpful to define your libraries and functions at the top of a report, so that others can know what they need for the report to compile correctly.
The data for this project has already been loaded. You will be distinguishing between the categories of nerd and geek to determine the influence of respective variables on their category definition.
library(Rling)
library(party)
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
data(nerd)
head(nerd)
## Noun Num Century Register Eval
## 1 nerd pl XX ACAD Neutral
## 2 nerd pl XXI ACAD Neutral
## 3 nerd pl XX ACAD Neutral
## 4 nerd pl XX ACAD Neutral
## 5 nerd pl XX ACAD Neutral
## 6 nerd pl XXI ACAD Neutral
table(nerd$Noun)
##
## geek nerd
## 670 646
Dependent variable:
Independent variables:
ctree()
to create a conditional inference model.set.seed(12345)
tree.output = ctree(Noun ~ Num + Century + Register + Eval, data = nerd)
plot(tree.output)
outcomes = table(predict(tree.output), nerd$Noun)
outcomes
##
## geek nerd
## geek 227 61
## nerd 443 585
sum(diag(outcomes)) / sum(outcomes) * 100
## [1] 61.70213
sum(outcomes[1]) / sum(outcomes[,1]) * 100
## [1] 33.8806
sum(outcomes[4]) / sum(outcomes[,2]) * 100
## [1] 90.55728
sum(outcomes[,1]) / (sum(outcomes[,1]) + sum(outcomes[,2]))
## [1] 0.5091185
sum(outcomes[1,]) / (sum(outcomes[1,]) + sum(outcomes[2,]))
## [1] 0.218845
forest.output = cforest(Noun ~ Num + Century + Register + Eval,
data = nerd,
controls = cforest_unbiased(ntree = 1000,
mtry = 3))
forest.importance = varimp(forest.output,
conditional = T)
round(forest.importance, 2)
## Num Century Register Eval
## 0.00 0.02 0.00 0.05
dotchart(sort(forest.importance),
main = "Conditional Importance of Variables")
forest.outcomes = table(predict(forest.output), nerd$Noun)
forest.outcomes
##
## geek nerd
## geek 365 175
## nerd 305 471
sum(diag(forest.outcomes)) / sum(forest.outcomes) * 100
## [1] 63.52584
sum(forest.outcomes[1]) / sum(forest.outcomes[,1]) * 100
## [1] 54.47761
sum(forest.outcomes[4]) / sum(forest.outcomes[,2]) * 100
## [1] 72.91022
sum(forest.outcomes[1,]) / (sum(forest.outcomes[1,]) + sum(forest.outcomes[2,]))
## [1] 0.4103343
I think it would be helpful if we add context to the discussion.