R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(randomForest)
## Warning: package 'randomForest' was built under R version 4.4.3
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
library(caret)
## Warning: package 'caret' was built under R version 4.4.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.4.3
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:randomForest':
## 
##     margin
## Loading required package: lattice
income <- read.csv("C:/Users/singleton1097/Downloads/income(in).csv", stringsAsFactors = TRUE)

names(income) <- trimws(names(income))

income$income <- as.factor(income$income)

set.seed(123)
split <- createDataPartition(income$income, p = 0.7, list = FALSE)
train_data <- income[split, ]
test_data <- income[-split, ]

rf_model <- randomForest(income ~ ., data = train_data, ntree = 100, importance = TRUE)

rf_predictions <- predict(rf_model, test_data)

confusionMatrix(rf_predictions, test_data$income)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  <=50K  >50K
##      <=50K   6898   845
##      >50K     518  1507
##                                           
##                Accuracy : 0.8605          
##                  95% CI : (0.8534, 0.8673)
##     No Information Rate : 0.7592          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.5993          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.9302          
##             Specificity : 0.6407          
##          Pos Pred Value : 0.8909          
##          Neg Pred Value : 0.7442          
##              Prevalence : 0.7592          
##          Detection Rate : 0.7062          
##    Detection Prevalence : 0.7927          
##       Balanced Accuracy : 0.7854          
##                                           
##        'Positive' Class :  <=50K          
## 
importance(rf_model)
##                     <=50K      >50K MeanDecreaseAccuracy MeanDecreaseGini
## age             0.1957535 40.327014            32.263139        824.38636
## workclass      15.3980770  7.616782            17.932043        278.80339
## fnlwgt          1.9204707  2.670225             3.270909        767.73980
## education      28.4594978  2.480759            23.583517        449.62231
## education.num  16.3335259  8.971060            18.056995        509.39495
## marital.status 19.0719600 10.694842            16.528699        721.52352
## occupation     21.4450613 34.583053            42.756515        693.93104
## relationship   11.7943353 14.287248            14.858137        827.48942
## race            4.8658486  3.297146             6.519363         95.15221
## sex            11.4867963  3.241266            13.778542        102.55231
## capital.gain   56.9218284 76.675488            72.883574        865.40047
## capital.loss   26.1416187 34.952360            38.377818        219.41110
## hours.per.week  2.0200049 30.017817            20.553881        489.63551
## native.country  6.4682648 -1.568595             4.033609        198.94483
varImpPlot(rf_model)

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.