Load R packages

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.5     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.5
## Warning: package 'tidyr' was built under R version 4.0.4
## Warning: package 'readr' was built under R version 4.0.4
## Warning: package 'purrr' was built under R version 4.0.4
## Warning: package 'dplyr' was built under R version 4.0.4
## Warning: package 'forcats' was built under R version 4.0.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.0.5
library(rstatix)
## Warning: package 'rstatix' was built under R version 4.0.5
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter
library(car)
## Warning: package 'car' was built under R version 4.0.5
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some

Import Data

ICI.Count.Size <- read.csv("C:/Users/candy/OneDrive/Desktop/DATA501/Count Data/ICI.Count.Size.csv", row.names=1)

NORMALITY TEST

Check Data

head(ICI.Count.Size)
summary(ICI.Count.Size)
##  Size.of.enterprise       X1        
##  Length:302         Min.   : 1.000  
##  Class :character   1st Qu.: 5.000  
##  Mode  :character   Median : 6.000  
##                     Mean   : 6.523  
##                     3rd Qu.: 7.750  
##                     Max.   :14.000
str(ICI.Count.Size)
## 'data.frame':    302 obs. of  2 variables:
##  $ Size.of.enterprise: chr  "ALL" "ALL" "ALL" "ALL" ...
##  $ X1                : int  1 1 1 1 1 1 1 1 1 2 ...

Visual NOrmality Test

Create a histogram for the data and include a density curve

ggplot(data=ICI.Count.Size, aes(X1)) + 
  geom_histogram(breaks=seq(0, 10, by=1), 
                 col="red", 
                 aes(fill=..count..)) +
  scale_fill_gradient("Count", low="green", high="red")

ggdensity(ICI.Count.Size$X1, fill = "lightblue")

ggqqplot(ICI.Count.Size$X1)

Non Visual NOrmality Test

H0: The sample comes from a normally distributed population Ha: The sample does not come from a normally distributed population

ICI.Count.Size %>% shapiro_test(X1)

Since p is less than the alpha value 0.05, We reject the null Hypothesis Therfore there is enough evidence for us to conclude that the data tested are not normally distributed

HOMOGENERY TEST

H0: The variance of the Impacts of Cyber Security Incidents is the same for varying groups. Ha: The variance of the Impacts of Cyber Security Incidents is significantly different for varying groups.

leveneTest(X1 ~ Size.of.enterprise, data = ICI.Count.Size)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

The p-value is greater than the alpha value (p=0.7128 and p>0.05). Therefore, we fail to reject the null hypothesis. This means there is not enough evidence to conclude that the variances of the Impacts of Cyber Security Incidents is statistically significantly different fro varying groups.

Therefore we can conclude that the Assumptions of equal variances is valid.

ANOVA TEST

HYPOTHESIS

H0= The size of an enterprise does not affect the Impacts of Cyber Security Incidents faced(The variables are independent) Ha= The size of an enterprise affects the Impacts of Cyber security incidents faced (The variables are dependent)

library(car)

kruskal.test(ICI.Count.Size$X1~ICI.Count.Size$Size.of.enterprise)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ICI.Count.Size$X1 by ICI.Count.Size$Size.of.enterprise
## Kruskal-Wallis chi-squared = 0.94747, df = 3, p-value = 0.814
pairwise.wilcox.test(ICI.Count.Size$X1,ICI.Count.Size$Size.of.enterprise,
                 p.adjust.method = "BH")
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  ICI.Count.Size$X1 and ICI.Count.Size$Size.of.enterprise 
## 
##   ALL  L    M   
## L 0.81 -    -   
## M 0.94 0.81 -   
## S 0.81 0.81 0.81
## 
## P value adjustment method: BH

Kruskal Test

As the p-value is greater than the significance level 0.05, we Fail to Reject the null Hypothesis.

This means that there isn’t enough evidence to conclude that there are significant differences between the Impacts of Cyber Security Incidents for different enterprise sizes.

IN conclusion, the size of an enterprise does not affect the Impacts of Cyber Security Incidents faced(The variables are independent).

PAIRWISE COMPARISONS

The pairwise comparison shows that, None of the Impacts of Cyber Security Incidents on enterprise sizes are significantly different.