Load R packages
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.5 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.5
## Warning: package 'tidyr' was built under R version 4.0.4
## Warning: package 'readr' was built under R version 4.0.4
## Warning: package 'purrr' was built under R version 4.0.4
## Warning: package 'dplyr' was built under R version 4.0.4
## Warning: package 'forcats' was built under R version 4.0.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.0.5
library(rstatix)
## Warning: package 'rstatix' was built under R version 4.0.5
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
library(car)
## Warning: package 'car' was built under R version 4.0.5
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
Import Data
ICI.Count.Size <- read.csv("C:/Users/candy/OneDrive/Desktop/DATA501/Count Data/ICI.Count.Size.csv", row.names=1)
NORMALITY TEST
Check Data
head(ICI.Count.Size)
summary(ICI.Count.Size)
## Size.of.enterprise X1
## Length:302 Min. : 1.000
## Class :character 1st Qu.: 5.000
## Mode :character Median : 6.000
## Mean : 6.523
## 3rd Qu.: 7.750
## Max. :14.000
str(ICI.Count.Size)
## 'data.frame': 302 obs. of 2 variables:
## $ Size.of.enterprise: chr "ALL" "ALL" "ALL" "ALL" ...
## $ X1 : int 1 1 1 1 1 1 1 1 1 2 ...
Visual NOrmality Test
Create a histogram for the data and include a density curve
ggplot(data=ICI.Count.Size, aes(X1)) +
geom_histogram(breaks=seq(0, 10, by=1),
col="red",
aes(fill=..count..)) +
scale_fill_gradient("Count", low="green", high="red")
ggdensity(ICI.Count.Size$X1, fill = "lightblue")
ggqqplot(ICI.Count.Size$X1)
Non Visual NOrmality Test
H0: The sample comes from a normally distributed population Ha: The sample does not come from a normally distributed population
ICI.Count.Size %>% shapiro_test(X1)
Since p is less than the alpha value 0.05, We reject the null Hypothesis Therfore there is enough evidence for us to conclude that the data tested are not normally distributed
HOMOGENERY TEST
H0: The variance of the Impacts of Cyber Security Incidents is the same for varying groups. Ha: The variance of the Impacts of Cyber Security Incidents is significantly different for varying groups.
leveneTest(X1 ~ Size.of.enterprise, data = ICI.Count.Size)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
The p-value is greater than the alpha value (p=0.7128 and p>0.05). Therefore, we fail to reject the null hypothesis. This means there is not enough evidence to conclude that the variances of the Impacts of Cyber Security Incidents is statistically significantly different fro varying groups.
Therefore we can conclude that the Assumptions of equal variances is valid.
ANOVA TEST
HYPOTHESIS
H0= The size of an enterprise does not affect the Impacts of Cyber Security Incidents faced(The variables are independent) Ha= The size of an enterprise affects the Impacts of Cyber security incidents faced (The variables are dependent)
library(car)
kruskal.test(ICI.Count.Size$X1~ICI.Count.Size$Size.of.enterprise)
##
## Kruskal-Wallis rank sum test
##
## data: ICI.Count.Size$X1 by ICI.Count.Size$Size.of.enterprise
## Kruskal-Wallis chi-squared = 0.94747, df = 3, p-value = 0.814
pairwise.wilcox.test(ICI.Count.Size$X1,ICI.Count.Size$Size.of.enterprise,
p.adjust.method = "BH")
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: ICI.Count.Size$X1 and ICI.Count.Size$Size.of.enterprise
##
## ALL L M
## L 0.81 - -
## M 0.94 0.81 -
## S 0.81 0.81 0.81
##
## P value adjustment method: BH
Kruskal Test
As the p-value is greater than the significance level 0.05, we Fail to Reject the null Hypothesis.
This means that there isn’t enough evidence to conclude that there are significant differences between the Impacts of Cyber Security Incidents for different enterprise sizes.
IN conclusion, the size of an enterprise does not affect the Impacts of Cyber Security Incidents faced(The variables are independent).
PAIRWISE COMPARISONS
The pairwise comparison shows that, None of the Impacts of Cyber Security Incidents on enterprise sizes are significantly different.