This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(randomForest)
## Warning: package 'randomForest' was built under R version 4.4.3
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
library(caret)
## Warning: package 'caret' was built under R version 4.4.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.4.3
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:randomForest':
##
## margin
## Loading required package: lattice
income <- read.csv("C:/Users/singleton1097/Downloads/income(in).csv", stringsAsFactors = TRUE)
names(income) <- trimws(names(income))
income$income <- as.factor(income$income)
set.seed(123)
split <- createDataPartition(income$income, p = 0.7, list = FALSE)
train_data <- income[split, ]
test_data <- income[-split, ]
rf_model <- randomForest(income ~ ., data = train_data, ntree = 100, importance = TRUE)
rf_predictions <- predict(rf_model, test_data)
confusionMatrix(rf_predictions, test_data$income)
## Confusion Matrix and Statistics
##
## Reference
## Prediction <=50K >50K
## <=50K 6898 845
## >50K 518 1507
##
## Accuracy : 0.8605
## 95% CI : (0.8534, 0.8673)
## No Information Rate : 0.7592
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.5993
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.9302
## Specificity : 0.6407
## Pos Pred Value : 0.8909
## Neg Pred Value : 0.7442
## Prevalence : 0.7592
## Detection Rate : 0.7062
## Detection Prevalence : 0.7927
## Balanced Accuracy : 0.7854
##
## 'Positive' Class : <=50K
##
importance(rf_model)
## <=50K >50K MeanDecreaseAccuracy MeanDecreaseGini
## age 0.1957535 40.327014 32.263139 824.38636
## workclass 15.3980770 7.616782 17.932043 278.80339
## fnlwgt 1.9204707 2.670225 3.270909 767.73980
## education 28.4594978 2.480759 23.583517 449.62231
## education.num 16.3335259 8.971060 18.056995 509.39495
## marital.status 19.0719600 10.694842 16.528699 721.52352
## occupation 21.4450613 34.583053 42.756515 693.93104
## relationship 11.7943353 14.287248 14.858137 827.48942
## race 4.8658486 3.297146 6.519363 95.15221
## sex 11.4867963 3.241266 13.778542 102.55231
## capital.gain 56.9218284 76.675488 72.883574 865.40047
## capital.loss 26.1416187 34.952360 38.377818 219.41110
## hours.per.week 2.0200049 30.017817 20.553881 489.63551
## native.country 6.4682648 -1.568595 4.033609 198.94483
varImpPlot(rf_model)
You can also embed plots, for example:
Note that the echo = FALSE
parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.