# Load packages for analysis and this section will have all the required libraries mentioned for better clarity
library('ggplot2') # visualization
library('ggthemes') # visualization
library('scales') # visualization
library('dplyr') # data manipulation
library('mice') # imputation
library('randomForest') # classification algorithm
library('rpart') # for decision tree
library('ROCR')
library('rpart.plot')
library('ROCR')
library('randomForest')
library('corrr')
library('corrplot')
library('glue')
library('caTools')
library('data.table')
require("knitr")
require("geosphere")
require("gmapsdistance")
require("tidyr")
#source("distance.R")
library('car')
library('caret')
library(gclus)
library('visdat')
library('psych')
library('leaflet')
library('leaflet.extras')
library("PerformanceAnalytics")
library(GPArotation)
library(MVN)
library(psych)
library(MASS)
library(psy)
library(corpcor)
library('fastmatch')
library(plyr)
library(car)
library("PerformanceAnalytics")
library(ggcorrplot)
library(cluster)
Here we have loaded the data and using two sample t-test to find out if there is any difference of sales between old and new insurance…Our hypothesis are H(0): Mean for old and new promium sales are same and there is no differemce H(1): There is difference in sales between old and new insurance sales and difference is not zero
mydatatitan <- read.csv('smsdm_group_assign_data.csv')
summary(mydatatitan)
## Salesperson Old.Scheme New.Scheme
## Min. : 1.00 Min. : 28.00 Min. : 32.00
## 1st Qu.: 8.25 1st Qu.: 54.00 1st Qu.: 55.00
## Median :15.50 Median : 67.00 Median : 74.00
## Mean :15.50 Mean : 68.03 Mean : 72.37
## 3rd Qu.:22.75 3rd Qu.: 81.50 3rd Qu.: 85.75
## Max. :30.00 Max. :110.00 Max. :122.00
Check the sample variance difference by performing a f test to compare the variance of two sample.. This is must before proceeding for Student’s t-test. This helps us to idetntify homoskedasticity(homogeneity of variances) and we will use Fisher’s F-test.. Here p-value is 0.3337 and this is > 0.05 and so we conclude that two variances are homogeneous
myvar <- var.test(mydatatitan$Old.Scheme,mydatatitan$New.Scheme)
myvar
##
## F test to compare two variances
##
## data: mydatatitan$Old.Scheme and mydatatitan$New.Scheme
## F = 0.69553, num df = 29, denom df = 29, p-value = 0.3337
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.3310499 1.4613150
## sample estimates:
## ratio of variances
## 0.6955345
Two-sample t-test with equal variances…Main reason here for not using t-test is that we dont have idea of population variance and sample size is small. So here we are using sample to talk about the population. We have used same variance test. We obtained p-value greater than 0.05, then we can conclude that the mean of two groups are significantly similar. This confirms that we can accept the null hypothesis H0 of equality of the means.
myinsurance.ttest.equalvariance <- t.test(mydatatitan$Old.Scheme,mydatatitan$New.Scheme, var.equal=TRUE, paired=FALSE)
myinsurance.ttest.equalvariance
##
## Two Sample t-test
##
## data: mydatatitan$Old.Scheme and mydatatitan$New.Scheme
## t = -0.74314, df = 58, p-value = 0.4604
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -16.005633 7.338966
## sample estimates:
## mean of x mean of y
## 68.03333 72.36667
We will do the t-sample test assuming un-euqal variance test and compare the data. Both the test shows approx same value of p
myinsurance.ttest.unequalvariance <- t.test(mydatatitan$Old.Scheme,mydatatitan$New.Scheme, var.equal=FALSE, paired=FALSE)
myinsurance.ttest.unequalvariance
##
## Welch Two Sample t-test
##
## data: mydatatitan$Old.Scheme and mydatatitan$New.Scheme
## t = -0.74314, df = 56.188, p-value = 0.4605
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -16.013652 7.346985
## sample estimates:
## mean of x mean of y
## 68.03333 72.36667