This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition).
A data frame with 435 observations on 17 variables:
1 Class Name: 2 (democrat, republican) 2 handicapped-infants: 2 (y,n) 3 water-project-cost-sharing: 2 (y,n) 4 adoption-of-the-budget-resolution: 2 (y,n) 5 physician-fee-freeze: 2 (y,n) 6 el-salvador-aid: 2 (y,n) 7 religious-groups-in-schools: 2 (y,n) 8 anti-satellite-test-ban: 2 (y,n) 9 aid-to-nicaraguan-contras: 2 (y,n) 10 mx-missile: 2 (y,n) 11 immigration: 2 (y,n) 12 synfuels-corporation-cutback: 2 (y,n) 13 education-spending: 2 (y,n) 14 superfund-right-to-sue: 2 (y,n) 15 crime: 2 (y,n) 16 duty-free-exports: 2 (y,n) 17 export-administration-act-south-africa: 2 (y,n)
Naïve Bayes algorithms typically handle NA values either by ignoring records that contain any NA values or by ignoring just the NA values. These choices are indicated by the value of the variable na.action in the naiveBayes algorithm, which is set to na.omit (to ignore the record) or na.pass (to ignore the value).
library(e1071)
data(HouseVotes84, package = "mlbench")
#Set training data set and test data set
# I set the first 75% of 435 observatiosn as training, the rest is test
hv_train<-HouseVotes84[1:326,-1]
hv_test<-HouseVotes84[327:435,-1]
# Save labels
hv_train_labels <- HouseVotes84[1:326, ]$Class
hv_test_labels<- HouseVotes84[327:435, ]$Class
hv_classifier <- naiveBayes(hv_train, hv_train_labels)
hv_test_pred <- predict(hv_classifier, hv_test)
head(hv_test_pred)
[1] democrat republican democrat democrat
[5] republican democrat
Levels: democrat republican
library(gmodels)
CrossTable(hv_test_pred, hv_test_labels,
prop.chisq = FALSE, prop.t = FALSE, prop.r = FALSE,
dnn = c('predicted', 'actual'))
Cell Contents
|-------------------------|
| N |
| N / Col Total |
|-------------------------|
Total Observations in Table: 109
| actual
predicted | democrat | republican | Row Total |
-------------|------------|------------|------------|
democrat | 55 | 3 | 58 |
| 0.833 | 0.070 | |
-------------|------------|------------|------------|
republican | 11 | 40 | 51 |
| 0.167 | 0.930 | |
-------------|------------|------------|------------|
Column Total | 66 | 43 | 109 |
| 0.606 | 0.394 | |
-------------|------------|------------|------------|
hv_test_pred
[1] democrat republican democrat democrat
[5] republican democrat democrat democrat
[9] democrat republican democrat democrat
[13] democrat republican republican democrat
[17] democrat republican democrat republican
[21] republican republican democrat republican
[25] democrat republican democrat republican
[29] democrat democrat republican republican
[33] democrat republican democrat democrat
[37] democrat republican republican republican
[41] democrat democrat democrat republican
[45] democrat democrat republican republican
[49] republican republican democrat republican
[53] republican republican democrat democrat
[57] republican democrat republican republican
[61] democrat democrat republican democrat
[65] republican democrat republican democrat
[69] democrat democrat democrat republican
[73] democrat republican republican republican
[77] democrat republican republican republican
[81] democrat republican democrat republican
[85] republican democrat republican republican
[89] democrat democrat republican democrat
[93] democrat democrat republican democrat
[97] democrat democrat democrat democrat
[101] democrat republican democrat democrat
[105] republican democrat republican republican
[109] republican
Levels: democrat republican
Accuracy= (55+40)/109=87%
hv_classifier2 <- naiveBayes(hv_train, hv_train_labels, laplace = 3)
hv_test_pred2 <- predict(hv_classifier2, hv_test)
CrossTable(hv_test_pred2, hv_test_labels,
prop.chisq = FALSE, prop.t = FALSE, prop.r = FALSE,
dnn = c('predicted', 'actual'))
Cell Contents
|-------------------------|
| N |
| N / Col Total |
|-------------------------|
Total Observations in Table: 109
| actual
predicted | democrat | republican | Row Total |
-------------|------------|------------|------------|
democrat | 55 | 3 | 58 |
| 0.833 | 0.070 | |
-------------|------------|------------|------------|
republican | 11 | 40 | 51 |
| 0.167 | 0.930 | |
-------------|------------|------------|------------|
Column Total | 66 | 43 | 109 |
| 0.606 | 0.394 | |
-------------|------------|------------|------------|
hv_test_pred2
[1] democrat republican democrat democrat
[5] republican democrat democrat democrat
[9] democrat republican democrat democrat
[13] democrat republican republican democrat
[17] democrat republican democrat republican
[21] republican republican democrat republican
[25] democrat republican democrat republican
[29] democrat democrat republican republican
[33] democrat republican democrat democrat
[37] democrat republican republican republican
[41] democrat democrat democrat republican
[45] democrat democrat republican republican
[49] republican republican democrat republican
[53] republican republican democrat democrat
[57] republican democrat republican republican
[61] democrat democrat republican democrat
[65] republican democrat republican democrat
[69] democrat democrat democrat republican
[73] democrat republican republican republican
[77] democrat republican republican republican
[81] democrat republican democrat republican
[85] republican democrat republican republican
[89] democrat democrat republican democrat
[93] democrat democrat republican democrat
[97] democrat democrat democrat democrat
[101] democrat republican democrat democrat
[105] republican democrat republican republican
[109] republican
Levels: democrat republican
Accuracy=(55+40)/109=87%