This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
A Kenyan entrepreneur has created an online cryptography course and would want to advertise it on her blog. She currently targets audiences originating from various countries. In the past, she ran ads to advertise a related course on the same blog and collected data in the process. She would now like to employ our services as a Data Science Consultant to help her identify which individuals are most likely to click on her ads.
This project will be considered a success after we have thouroughly cleaned our data and performed both univariate and bivariate analysis and offering summaries of our dataset.
The dataset that we will be using is an advertisement dataset.
The following steps will be followed in conducting this study:
Define the question, the metric for success, the context, experimental design taken.
Read and explore the given dataset. Define the appropriateness of the available data to answer the given question.
Find and deal with outliers, anomalies, and missing data within the dataset.
Perform univariate and bivariate analysis and recording our observations.
From our insights we will provide a conclusion and recommendation.
# we start by reading our dataset
df=read.csv("http://bit.ly/IPAdvertisingData")
head(df)
## Daily.Time.Spent.on.Site Age Area.Income Daily.Internet.Usage
## 1 68.95 35 61833.90 256.09
## 2 80.23 31 68441.85 193.77
## 3 69.47 26 59785.94 236.50
## 4 74.15 29 54806.18 245.89
## 5 68.37 35 73889.99 225.58
## 6 59.99 23 59761.56 226.74
## Ad.Topic.Line City Male Country
## 1 Cloned 5thgeneration orchestration Wrightburgh 0 Tunisia
## 2 Monitored national standardization West Jodi 1 Nauru
## 3 Organic bottom-line service-desk Davidton 0 San Marino
## 4 Triple-buffered reciprocal time-frame West Terrifurt 1 Italy
## 5 Robust logistical utilization South Manuel 0 Iceland
## 6 Sharable client-driven software Jamieberg 1 Norway
## Timestamp Clicked.on.Ad
## 1 2016-03-27 00:53:11 0
## 2 2016-04-04 01:39:02 0
## 3 2016-03-13 20:35:42 0
## 4 2016-01-10 02:31:19 0
## 5 2016-06-03 03:36:18 0
## 6 2016-05-19 14:30:17 0
# checking data composition
str(df)
## 'data.frame': 1000 obs. of 10 variables:
## $ Daily.Time.Spent.on.Site: num 69 80.2 69.5 74.2 68.4 ...
## $ Age : int 35 31 26 29 35 23 33 48 30 20 ...
## $ Area.Income : num 61834 68442 59786 54806 73890 ...
## $ Daily.Internet.Usage : num 256 194 236 246 226 ...
## $ Ad.Topic.Line : chr "Cloned 5thgeneration orchestration" "Monitored national standardization" "Organic bottom-line service-desk" "Triple-buffered reciprocal time-frame" ...
## $ City : chr "Wrightburgh" "West Jodi" "Davidton" "West Terrifurt" ...
## $ Male : int 0 1 0 1 0 1 0 1 1 1 ...
## $ Country : chr "Tunisia" "Nauru" "San Marino" "Italy" ...
## $ Timestamp : chr "2016-03-27 00:53:11" "2016-04-04 01:39:02" "2016-03-13 20:35:42" "2016-01-10 02:31:19" ...
## $ Clicked.on.Ad : int 0 0 0 0 0 0 0 1 0 0 ...
#checking dimension of our dataset
dim(df)
## [1] 1000 10
#confirming our dataset is a dataframe
class(df)
## [1] "data.frame"
sum(is.na(df))
## [1] 0
#there is no missing values
sum(duplicated(df))
## [1] 0
#there is no duplicates
boxplot(df$`Area.Income`,main="Boxplot for Area.Income",col = "grey")
boxplot(df$`Age`,main="Boxplot for Age",col = "orange")
boxplot(df$`Daily.Time.Spent.on.Site`,main="Boxplot for Daily.Time.Spent.on.Site",col = "green")
boxplot(df$`Male`,main="Boxplot for Male",col = "blue")
boxplot(df$`Daily.Internet.Usage`,main="Boxplot for Daily.Internet.Usage",col = "yellow")
boxplot(df$`Clicked.on.Ad`,main="Boxplot for Clicked.on.Ad",col = "red")
#We dont have many outliers in our columns so we will just leave it
summary(df)
## Daily.Time.Spent.on.Site Age Area.Income Daily.Internet.Usage
## Min. :32.60 Min. :19.00 Min. :13996 Min. :104.8
## 1st Qu.:51.36 1st Qu.:29.00 1st Qu.:47032 1st Qu.:138.8
## Median :68.22 Median :35.00 Median :57012 Median :183.1
## Mean :65.00 Mean :36.01 Mean :55000 Mean :180.0
## 3rd Qu.:78.55 3rd Qu.:42.00 3rd Qu.:65471 3rd Qu.:218.8
## Max. :91.43 Max. :61.00 Max. :79485 Max. :270.0
## Ad.Topic.Line City Male Country
## Length:1000 Length:1000 Min. :0.000 Length:1000
## Class :character Class :character 1st Qu.:0.000 Class :character
## Mode :character Mode :character Median :0.000 Mode :character
## Mean :0.481
## 3rd Qu.:1.000
## Max. :1.000
## Timestamp Clicked.on.Ad
## Length:1000 Min. :0.0
## Class :character 1st Qu.:0.0
## Mode :character Median :0.5
## Mean :0.5
## 3rd Qu.:1.0
## Max. :1.0
#getting summary in our dataset i.e mean , quartiles, median, maximum and minimum
cat("the range of age is",range(df$'Age'))
## the range of age is 19 61
cat("\n")
cat("the range of Area.Income is",range(df$'Area.Income'))
## the range of Area.Income is 13996.5 79484.8
cat("\n")
cat("the range of Daily.Time.Spent.on.Site is",range(df$'Daily.Time.Spent.on.Site'))
## the range of Daily.Time.Spent.on.Site is 32.6 91.43
cat("\n")
cat("the range of male is",range(df$'Male'))
## the range of male is 0 1
cat("\n")
cat("the range of Daily.Internet.Usage is",range(df$'Daily.Internet.Usage'))
## the range of Daily.Internet.Usage is 104.78 269.96
cat("\n")
cat("the standard deviation of age is",sd(df$'Age'))
## the standard deviation of age is 8.785562
cat("\n")
cat("the standard deviation of Area.Income is",sd(df$'Area.Income'))
## the standard deviation of Area.Income is 13414.63
cat("\n")
cat("the standard deviatione of Daily.Time.Spent.on.Site is",sd(df$'Daily.Time.Spent.on.Site'))
## the standard deviatione of Daily.Time.Spent.on.Site is 15.85361
cat("\n")
cat("the standard deviation of male is",sd(df$'Male'))
## the standard deviation of male is 0.4998889
cat("\n")
cat("the standard deviation of Daily.Internet.Usage is",sd(df$'Daily.Internet.Usage'))
## the standard deviation of Daily.Internet.Usage is 43.90234
hist(df$`Area.Income`,main="histogram for Area.Income",col = "grey")
hist(df$`Age`,main="histogram for Age",col = "orange")
hist(df$`Daily.Time.Spent.on.Site`,main="histogram for Daily.Time.Spent.on.Site",col = "green")
hist(df$`Male`,main="histogram for Male",col = "blue")
hist(df$`Daily.Internet.Usage`,main="histogram for Daily.Internet.Usage",col = "yellow")
hist(df$`Clicked.on.Ad`,main="histogram for Clicked.on.Ad",col = "red")
#assigning columns to respective variables
ts<-df$Daily.Time.Spent.on.Site
age<-df$Age
ai<-df$Area.Income
dis<-df$Daily.Internet.Usage
mal<-df$Male
ca<-df$Clicked.on.Ad
cat("the variance between age and daily time spent on site is",var(ts,age))
## the variance between age and daily time spent on site is -46.17415
cat("\n")
cat("the variance between age and Area.Income is",var(age,ai))
## the variance between age and Area.Income is -21520.93
cat("\n")
cat("the variance between age and daily internet usage is",var(age,dis))
## the variance between age and daily internet usage is -141.6348
cat("\n")
cat("the variance between age and Clicked.on.Ad is",var(ca,age))
## the variance between age and Clicked.on.Ad is 2.164665
cat("\n")
cat("the variance between area income and daily time spent on site is",var(ts,ai))
## the variance between area income and daily time spent on site is 66130.81
cat("\n")
cat("the variance between daily internet usage and daily time spent on site is",var(ts,dis))
## the variance between daily internet usage and daily time spent on site is 360.9919
cat("\n")
cat("the variance between clicked on ad and daily time spent on site is",var(ts,ca))
## the variance between clicked on ad and daily time spent on site is -5.933143
cat("\n")
cat("the variance between daily internet usage and area income",var(ts,dis))
## the variance between daily internet usage and area income 360.9919
cat("\n")
cat("the variance between daily internet usage and area income is",var(ai,dis))
## the variance between daily internet usage and area income is 198762.5
cat("\n")
cat("the variance between daily internet usage and clicked on ad is",var(ca,dis))
## the variance between daily internet usage and clicked on ad is -17.27409
cat("\n")
cat("the correlation between age and daily time spent on site is",cor(ts,age))
## the correlation between age and daily time spent on site is -0.3315133
cat("\n")
cat("the correlation between age and Area.Income is",cor(age,ai))
## the correlation between age and Area.Income is -0.182605
cat("\n")
cat("the correlation between age and daily internet usage is",cor(age,dis))
## the correlation between age and daily internet usage is -0.3672086
cat("\n")
cat("the correlation between age and Clicked.on.Ad is",cor(ca,age))
## the correlation between age and Clicked.on.Ad is 0.4925313
cat("\n")
cat("the correlation between area income and daily time spent on site is",cor(ts,ai))
## the correlation between area income and daily time spent on site is 0.3109544
cat("\n")
cat("the correlation between daily internet usage and daily time spent on site is",cor(ts,dis))
## the correlation between daily internet usage and daily time spent on site is 0.5186585
cat("\n")
cat("the correlation between clicked on ad and daily time spent on site is",cor(ts,ca))
## the correlation between clicked on ad and daily time spent on site is -0.7481166
cat("\n")
cat("the correlation between daily internet usage and area income",cor(ts,dis))
## the correlation between daily internet usage and area income 0.5186585
cat("\n")
cat("the correlation between daily internet usage and area income is",cor(ai,dis))
## the correlation between daily internet usage and area income is 0.3374955
cat("\n")
cat("the correlation between daily internet usage and clicked on ad is",cor(ca,dis))
## the correlation between daily internet usage and clicked on ad is -0.7865392
cat("\n")
cat("the covariance between age and daily time spent on site is",cov(ts,age))
## the covariance between age and daily time spent on site is -46.17415
cat("\n")
cat("the covariance between age and Area.Income is",cov(age,ai))
## the covariance between age and Area.Income is -21520.93
cat("\n")
cat("the covariance between age and daily internet usage is",cov(age,dis))
## the covariance between age and daily internet usage is -141.6348
cat("\n")
cat("the covariance between age and Clicked.on.Ad is",cov(ca,age))
## the covariance between age and Clicked.on.Ad is 2.164665
cat("\n")
cat("the covariance between area income and daily time spent on site is",cov(ts,ai))
## the covariance between area income and daily time spent on site is 66130.81
cat("\n")
cat("the covariance between daily internet usage and daily time spent on site is",cov(ts,dis))
## the covariance between daily internet usage and daily time spent on site is 360.9919
cat("\n")
cat("the covariance between clicked on ad and daily time spent on site is",cov(ts,ca))
## the covariance between clicked on ad and daily time spent on site is -5.933143
cat("\n")
cat("the covariance between daily internet usage and area income",cov(ts,dis))
## the covariance between daily internet usage and area income 360.9919
cat("\n")
cat("the covariance between daily internet usage and area income is",cov(ai,dis))
## the covariance between daily internet usage and area income is 198762.5
cat("\n")
cat("the covariance between daily internet usage and clicked on ad is",cov(ca,dis))
## the covariance between daily internet usage and clicked on ad is -17.27409
cat("\n")
plot(age, dis, xlab="age", ylab="daily internet usage",col = "orange")
plot(age,ai, xlab="age", ylab="area income",col="blue")
plot(age, ts, xlab="age", ylab="Time spent on site",col="red")
plot(age,ca, xlab="age", ylab="clicked on ad",col="yellow")
plot(ts,ai, xlab="Time spent on site", ylab="area income",col="pink")
plot(ts,dis, xlab="Time spent on site", ylab="daily internet usage",col="grey")
plot(ts,ca, xlab="Time spent on site", ylab="clicked on ad",col="green")
plot(ai,dis, xlab="area income", ylab="daily internet usage",col="purple")
plot(ca,dis, xlab="clicked on ad", ylab="daily internet usage",col="black")
Looking at our data analysis, we can see that there is a correlation between our main columns .
#selecting numerical columns for our dataset
df1 <- df[,c(1,2,3,4,7,10)]
head(df1)
## Daily.Time.Spent.on.Site Age Area.Income Daily.Internet.Usage Male
## 1 68.95 35 61833.90 256.09 0
## 2 80.23 31 68441.85 193.77 1
## 3 69.47 26 59785.94 236.50 0
## 4 74.15 29 54806.18 245.89 1
## 5 68.37 35 73889.99 225.58 0
## 6 59.99 23 59761.56 226.74 1
## Clicked.on.Ad
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
set.seed(1234)
#creating a distribution of 240
random <- runif(240)
df1_random <- df1[order(random),]
# viewing our random sample
head(df1_random)
## Daily.Time.Spent.on.Site Age Area.Income Daily.Internet.Usage Male
## 7 88.91 33 53852.85 208.36 0
## 64 86.06 32 61601.05 178.92 1
## 73 55.35 39 75509.61 153.17 1
## 186 46.88 54 43444.86 136.64 0
## 98 39.94 41 64927.19 156.30 0
## 222 75.83 27 67516.07 200.59 0
## Clicked.on.Ad
## 7 0
## 64 0
## 73 1
## 186 1
## 98 1
## 222 0
normal <- function(x) (
return( ((x - min(x)) /(max(x)-min(x))) )
)
normal(1:5)
## [1] 0.00 0.25 0.50 0.75 1.00
df_new <- as.data.frame(lapply(df1_random[,-5], normal))
summary(df_new)
## Daily.Time.Spent.on.Site Age Area.Income
## Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.2970 1st Qu.:0.2500 1st Qu.:0.4749
## Median :0.6016 Median :0.3750 Median :0.6722
## Mean :0.5457 Mean :0.4101 Mean :0.6281
## 3rd Qu.:0.7724 3rd Qu.:0.5500 3rd Qu.:0.7948
## Max. :1.0000 Max. :1.0000 Max. :1.0000
## Daily.Internet.Usage Clicked.on.Ad
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.2010 1st Qu.:0.0000
## Median :0.4194 Median :1.0000
## Mean :0.4421 Mean :0.5292
## 3rd Qu.:0.6608 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000
#Loading libraries
library(rpart,quietly = TRUE)
library(caret,quietly = TRUE)
library(rpart.plot,quietly = TRUE)
library(rattle)
## Loading required package: tibble
## Loading required package: bitops
## Rattle: A free graphical interface for data science with R.
## Version 5.5.1 Copyright (c) 2006-2021 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.
intrain <- createDataPartition(y = df1_random$Clicked.on.Ad, p= 0.7, list = FALSE)
training <- df1_random[intrain,]
testing <- df1_random[-intrain,]
#checking dimensions of our sets
dim(training);
## [1] 168 6
dim(testing);
## [1] 72 6
training[["Clicked.on.Ad"]] = factor(training[["Clicked.on.Ad"]])
#we train our model with 10 resampling iterations repeating it 3 times
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
svm_Linear <- train(Clicked.on.Ad ~., data = training, method = "svmLinear",
trControl=trctrl,
preProcess = c("center", "scale"),
tuneLength = 10)
#checking results of our model
svm_Linear
## Support Vector Machines with Linear Kernel
##
## 168 samples
## 5 predictor
## 2 classes: '0', '1'
##
## Pre-processing: centered (5), scaled (5)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 151, 151, 151, 151, 151, 151, ...
## Resampling results:
##
## Accuracy Kappa
## 0.9783088 0.9566439
##
## Tuning parameter 'C' was held constant at a value of 1
test_pred <- predict(svm_Linear, newdata = testing)
test_pred
## [1] 0 1 1 0 0 0 0 1 1 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1
## [39] 0 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0
## Levels: 0 1
confusionMatrix(table(test_pred, testing$Clicked.on.Ad))
## Confusion Matrix and Statistics
##
##
## test_pred 0 1
## 0 33 0
## 1 1 38
##
## Accuracy : 0.9861
## 95% CI : (0.925, 0.9996)
## No Information Rate : 0.5278
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.9721
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9706
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 0.9744
## Prevalence : 0.4722
## Detection Rate : 0.4583
## Detection Prevalence : 0.4583
## Balanced Accuracy : 0.9853
##
## 'Positive' Class : 0
##
# we can see that our model has achieved a decent accuracy of 97.2 %
grid <- expand.grid(C = c(0,0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2,5))
svm_Linear_Grid <- train(Clicked.on.Ad ~., data = training, method = "svmLinear",
trControl=trctrl,
preProcess = c("center", "scale"),
tuneGrid = grid,
tuneLength = 10)
## Warning: model fit failed for Fold01.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold02.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold03.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold04.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold05.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold06.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold07.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold08.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold09.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold10.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold01.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold02.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold03.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold04.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold05.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold06.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold07.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold08.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold09.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold10.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold01.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold02.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold03.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold04.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold05.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold06.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold07.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold08.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold09.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold10.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
## Warning in train.default(x, y, weights = w, ...): missing values found in
## aggregated results
svm_Linear_Grid
## Support Vector Machines with Linear Kernel
##
## 168 samples
## 5 predictor
## 2 classes: '0', '1'
##
## Pre-processing: centered (5), scaled (5)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 151, 151, 152, 151, 151, 152, ...
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.00 NaN NaN
## 0.01 0.9501225 0.9008833
## 0.05 0.9621324 0.9246358
## 0.10 0.9621324 0.9246358
## 0.25 0.9678922 0.9361013
## 0.50 0.9719363 0.9441760
## 0.75 0.9738971 0.9480294
## 1.00 0.9758578 0.9519921
## 1.25 0.9738971 0.9480294
## 1.50 0.9738971 0.9480294
## 1.75 0.9759804 0.9521960
## 2.00 0.9740196 0.9482880
## 5.00 0.9698529 0.9398203
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was C = 1.75.
plot(svm_Linear_Grid)
# we can see that our model is giving best accuracy when c=0.25
test_pred_grid <- predict(svm_Linear_Grid, newdata = testing)
test_pred_grid
## [1] 0 1 1 0 0 0 0 1 1 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1
## [39] 0 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0
## Levels: 0 1
confusionMatrix(table(test_pred_grid, testing$Clicked.on.Ad))
## Confusion Matrix and Statistics
##
##
## test_pred_grid 0 1
## 0 33 0
## 1 1 38
##
## Accuracy : 0.9861
## 95% CI : (0.925, 0.9996)
## No Information Rate : 0.5278
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.9721
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9706
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 0.9744
## Prevalence : 0.4722
## Detection Rate : 0.4583
## Detection Prevalence : 0.4583
## Balanced Accuracy : 0.9853
##
## 'Positive' Class : 0
##
# here our accuracy reduces abit to 97.2% which is same as previous one
We are able to see that by using the svm modelling method we get good accuracy scores.