The breaches data set resides in the Ecdat package of the R programming language. It contains information about cyber security breaches involving health care records that were reported to the U.S Department of Health and Human Services as of June 27, 2014. The breaches data set consists of 1055 observations with 13 variables.
#install.packages("Ecdat")
library("Ecdat")
## Loading required package: Ecfun
##
## Attaching package: 'Ecfun'
## The following object is masked from 'package:base':
##
## sign
##
## Attaching package: 'Ecdat'
## The following object is masked from 'package:datasets':
##
## Orange
#Place breaches data set from the Ecdat into B
B <-Ecdat::breaches
The following is the structure of the breaches data set.
## 'data.frame': 1055 obs. of 13 variables:
## $ Number : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Name_of_Covered_Entity : Factor w/ 967 levels " DeKalb Medical Center, Inc. d/b/a DeKalb Medical Hillandale",..: 108 521 31 334 435 217 519 413 488 170 ...
## $ State : Factor w/ 52 levels "AK","AL","AR",..: 45 25 1 8 5 5 5 5 5 5 ...
## $ Business_Associate_Involved : Factor w/ 232 levels ""," Xand Corporation",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Individuals_Affected : int 1000 1000 501 3800 5257 857 6145 952 5166 5900 ...
## $ Date_of_Breach : chr "10/16/2009" "9/22/2009" "10/12/2009" "10/9/2009" ...
## $ Type_of_Breach : Factor w/ 29 levels "Hacking/IT Incident",..: 12 12 12 5 12 12 12 12 12 12 ...
## $ Location_of_Breached_Information: Factor w/ 41 levels "Desktop Computer",..: 41 31 37 17 1 1 1 1 1 17 ...
## $ Date_Posted_or_Updated : Date, format: "2014-06-30" "2014-05-30" ...
## $ Summary : chr "A binder containing the protected health information (PHI) of up to 1,272 individuals was stolen from a staff member's vehicle."| __truncated__ "Five desktop computers containing unencrypted electronic protected health information (e-PHI) were stolen from the covered enti"| __truncated__ "" "A laptop was lost by an employee while in transit on public transportation. The computer contained the protected health inform"| __truncated__ ...
## $ breach_start : Date, format: "2009-10-16" "2009-09-22" ...
## $ breach_end : Date, format: NA NA ...
## $ year : num 2009 2009 2009 2009 2009 ...
The 5 factors in this data set are the entities experiencing the security breach, the state where the breach occurred (this also includes the District of Columbia (DC) and Puerto Rico (PR)), the name of the subcontractor associated with the breach, the type of breach and the location (or source) of the breach. The following is information about the number of levels for each factor:
Name_of_Covered_Entity: 967 levels
State: 52 levels
Business_Associate_Involved: 232 levels
Type_of_Breach: 29 levels
Location_of_Breached_Information: 41 levels
The continous variables in this data set is the Invdividuals_Affected which is the number of people whose records were compromised in the breach. This number is 500 or greater.
We’ll taking the response variable in this model to be Individuals_Affected. This is because this variable is the only numerical value in this data set other than the id number.
As previously mentioned the data set consists of 13 variables and 13 variables.
The following is a general view of the data set:
head(B)
## Number Name_of_Covered_Entity State
## 1 0 Brooke Army Medical Center TX
## 2 1 Mid America Kidney Stone Association, LLC MO
## 3 2 Alaska Department of Health and Social Services AK
## 4 3 Health Services for Children with Special Needs, Inc. DC
## 5 4 L. Douglas Carlson, M.D. CA
## 6 5 David I. Cohen, MD CA
## Business_Associate_Involved Individuals_Affected Date_of_Breach
## 1 1000 10/16/2009
## 2 1000 9/22/2009
## 3 501 10/12/2009
## 4 3800 10/9/2009
## 5 5257 9/27/2009
## 6 857 9/27/2009
## Type_of_Breach Location_of_Breached_Information
## 1 Theft Paper
## 2 Theft Network Server
## 3 Theft Other Portable Electronic Device, Other
## 4 Loss Laptop
## 5 Theft Desktop Computer
## 6 Theft Desktop Computer
## Date_Posted_or_Updated
## 1 2014-06-30
## 2 2014-05-30
## 3 2014-01-23
## 4 2014-01-23
## 5 2014-01-23
## 6 2014-01-23
## Summary
## 1 A binder containing the protected health information (PHI) of up to 1,272 individuals was stolen from a staff member's vehicle. The PHI included names, telephone numbers, detailed treatment notes, and possibly social security numbers. In response to the breach, the covered entity (CE) sanctioned the workforce member and developed a new policy requiring on-call staff members to submit any information created during their shifts to the main office instead of adding it to the binder. Following OCR's investigation, the CE notified the local media about the breach.
## 2 Five desktop computers containing unencrypted electronic protected health information (e-PHI) were stolen from the covered entity (CE). Originally, the CE reported that over 500 persons were involved, but subsequent investigation showed that about 260 persons were involved. The ePHI included demographic and financial information. The CE provided breach notification to affected individuals and HHS. Following the breach, the CE improved physical security by installing motion detectors and alarm systems security monitoring. It improved technical safeguards by installing enhanced antivirus and encryption software. As a result of OCR's investigation the CE updated its computer password policy.
## 3
## 4 A laptop was lost by an employee while in transit on public transportation. The computer contained the protected health information of 3800 individuals. The protected health information involved in the breach included names, Medicaid ID numbers, dates of birth, and primary physicians. In response to this incident, the covered entity took steps to enforce the requirements of the Privacy & Security Rules. The covered entity has installed encryption software on all employee computers, strengthened access controls including passwords, reviewed and updated security policies and procedures, and updated it risk assessment. In addition, all employees received additional security training. \n\n
## 5 A shared Computer that was used for backup was stolen on 9/27/09 from the reception desk area of the covered entity. The Computer contained certain electronic protected health information (ePHI) of 5,257 individuals who were patients of the CE. The ePHI involved in the breach included names, dates of birth, and clinical information, but there were no social security numbers, financial information, addresses, phone numbers, or other ePHI in any of the reports on the disks or the hard drive on the stolen Computer. Following the breach, the covered entity notified all 5,257 affected individuals and the appropriate media; added technical safeguards of encryption for all ePHI stored on the USB flash drive or the CD used on the replacement computer; added physical safeguards by keeping new portable devices locked when not in use in a secure combination safe in doctor's private office or in a secure filing cabinet; and added administrative safeguards by requiring annual refresher retraining of CE staff for Privacy and Security Rules as well as requiring immediate retraining of cleaning staff in both Rules.\n\n
## 6 A shared Computer that was used for backup was stolen from the reception desk area, behind a locked desk area, probably while a cleaning crew had left the main door to the building open and the door to the suite was unlocked and perhaps ajar. The Computer contained certain electronic protected health information (ePHI) of 857 patients. The ePHI involved in the breach included names, dates of birth, and clinical information. Following the breach, the covered entity notified all affected individuals and the media, added technical safeguards of encryption for all ePHI stored on the USB flash drive or the CD used on the replacement computer, added physical safeguards by keeping new portable devices locked when not in use in a secure combination safe in doctor's private office or in a secure filing cabinet, and added administrative safeguards by requiring annual refresher retraining staff for Privacy and Security Rules as well as requiring immediate retraining of cleaning staff in both Rules, which has already taken place.\n\n
## breach_start breach_end year
## 1 2009-10-16 <NA> 2009
## 2 2009-09-22 <NA> 2009
## 3 2009-10-12 <NA> 2009
## 4 2009-10-09 <NA> 2009
## 5 2009-09-27 <NA> 2009
## 6 2009-09-27 <NA> 2009
tail(B)
## Number Name_of_Covered_Entity State
## 1050 1049 St. Francis Hospital GA
## 1051 1050 Puerto Rico Health Insurance PR
## 1052 1051 Hospitalists of Brandon, LLC FL
## 1053 1052 Santa Rosa Memorial Hospital CA
## 1054 1053 Group Health Plan of Hurley Medical Center MI
## 1055 1054 Abrham Tekola, M.D.,INC CA
## Business_Associate_Involved Individuals_Affected
## 1050 1175
## 1051 American Health Inc 28413
## 1052 Doctors First Choice Billings, Inc. 1831
## 1053 33702
## 1054 2289
## 1055 5471
## Date_of_Breach Type_of_Breach
## 1050 5/30/2014 Other
## 1051 9/20/2013 Theft
## 1052 2/11/2014 Hacking/IT Incident
## 1053 6/2/2014 Theft, Loss
## 1054 5/13/2014 Unauthorized Access/Disclosure
## 1055 5/27/2014 Theft
## Location_of_Breached_Information Date_Posted_or_Updated Summary
## 1050 E-mail 2014-06-18
## 1051 Other 2014-06-27
## 1052 Other 2014-06-27
## 1053 Other Portable Electronic Device 2014-06-27
## 1054 E-mail 2014-06-27
## 1055 Desktop Computer 2014-06-27
## breach_start breach_end year
## 1050 2014-05-30 <NA> 2014
## 1051 2013-09-20 <NA> 2013
## 1052 2014-02-11 <NA> 2014
## 1053 2014-06-02 <NA> 2014
## 1054 2014-05-13 <NA> 2014
## 1055 2014-05-27 <NA> 2014
summary(B)
## Number
## Min. : 0.0
## 1st Qu.: 263.5
## Median : 527.0
## Mean : 527.0
## 3rd Qu.: 790.5
## Max. :1054.0
##
## Name_of_Covered_Entity
## UnitedHealth Group health plan single affiliated covered entity: 7
## Cook County Health & Hospitals System : 4
## University of California, San Francisco : 4
## Walgreen Co. : 4
## Baptist Health System : 3
## County of Los Angeles : 3
## (Other) :1030
## State Business_Associate_Involved
## CA :113 :784
## TX : 83 MedAssets : 6
## FL : 66 StayWell Health Management, LLC : 5
## NY : 58 Clearpoint Design, Inc. : 4
## IL : 49 Futurity First Insurance Group : 4
## IN : 40 HealthPartners Administrators, Inc.: 3
## (Other):646 (Other) :249
## Individuals_Affected Date_of_Breach
## Min. : 500 Length:1055
## 1st Qu.: 1000 Class :character
## Median : 2300 Mode :character
## Mean : 30262
## 3rd Qu.: 6941
## Max. :4900000
##
## Type_of_Breach
## Theft :516
## Unauthorized Access/Disclosure:148
## Other : 91
## Loss : 85
## Hacking/IT Incident : 75
## Improper Disposal : 38
## (Other) :102
## Location_of_Breached_Information Date_Posted_or_Updated
## Paper :227 Min. :2014-01-23
## Laptop :217 1st Qu.:2014-01-23
## Other :116 Median :2014-01-23
## Desktop Computer :113 Mean :2014-02-23
## Network Server :107 3rd Qu.:2014-03-24
## Other Portable Electronic Device: 60 Max. :2014-06-30
## (Other) :215
## Summary breach_start breach_end
## Length:1055 Min. :1997-01-01 Min. :2007-06-14
## Class :character 1st Qu.:2010-11-08 1st Qu.:2012-04-22
## Mode :character Median :2012-01-11 Median :2012-10-29
## Mean :2011-12-09 Mean :2012-10-28
## 3rd Qu.:2013-03-07 3rd Qu.:2013-05-29
## Max. :2014-06-02 Max. :2013-11-30
## NA's :910
## year
## Min. :1997
## 1st Qu.:2010
## Median :2012
## Mean :2011
## 3rd Qu.:2013
## Max. :2014
##
For this experiment we are utilizing 4 factors with multiple levels. We wil analyze their individual main and interaction effects on the number of individuals affected by a breach. For this experiment our null hypothesis can be set as the number of individuals affected by a breach dosen’t depend on any of the four factors or any two way interaction of the factors.
The factors that we are looking at in this experiment are as follows:
State
Business_Associate_Involved
Type_of_Breach
Location_of_Breached_Information
We aren’t looking at the factor Name_of_Covered_Entity since there are 967 levels with 1055 observation. It’s not likely that we would obtain good results.
The rationale behind the data set is for the U.S. Department of Health and Human Services to have information behind the cyber security breaches involving health care records.
This data set has no randomization scheme. #Replicate: Are there any replicates and/or repeated measures? There are no replicates or repeated measures in this data set. #Block: Did you use blocking in the design? There is no blocking utilized in this design. However, if there were fewer levels for the factor Name_of_Covered_Entity or more observations, we would most likely utilize blocking with regard to the factor Name_of_Covered_entity. This is because that would give us specific information with regard to the entity undergoing the breach.
The following is a histogram of our response variable Individuals_Affected for visaully determining if the response
It’s apparent that the response variable doesn’t follow a normal distribution.
The following is the summary statistics of the Individuals_Affected
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 500 1000 2300 30260 6941 4900000
From the difference in the Median and Mean of the number of individuals affected, we can assume that there are extreme outliers in this data set. To confirm let’s look at the boxplot of the variable Individuals_Affected. Note: 30 of the largest outliers were removed to make the boxplots easy to view.
According to the boxplots it’s possible that the business associate involved and the location of the information breach doesn’t have an effect on the response variable. For analyzing these effects ANOVA will be utilized.
In order to determine the factors have a significant main effect or two-way interaction effect on the response variable, ANOVA will be conducted. The following are ANOVA results for the main effects.
main1 <-aov(B$Individuals_Affected~B$State)
anova(main1)
## Analysis of Variance Table
##
## Response: B$Individuals_Affected
## Df Sum Sq Mean Sq F value Pr(>F)
## B$State 51 1.7544e+12 3.4400e+10 0.6514 0.9725
## Residuals 1003 5.2969e+13 5.2811e+10
main2 <-aov(B$Individuals_Affected~B$Business_Associate_Involved)
anova(main2)
## Analysis of Variance Table
##
## Response: B$Individuals_Affected
## Df Sum Sq Mean Sq F value Pr(>F)
## B$Business_Associate_Involved 231 3.1831e+13 1.3780e+11 4.9538 < 2.2e-16
## Residuals 823 2.2893e+13 2.7816e+10
##
## B$Business_Associate_Involved ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
main3 <- aov(B$Individuals_Affected~B$Type_of_Breach)
anova(main3)
## Analysis of Variance Table
##
## Response: B$Individuals_Affected
## Df Sum Sq Mean Sq F value Pr(>F)
## B$Type_of_Breach 28 7.2064e+11 2.5737e+10 0.489 0.9887
## Residuals 1026 5.4003e+13 5.2635e+10
main4 <- aov(B$Individuals_Affected~B$Location_of_Breached_Information)
anova(main4)
## Analysis of Variance Table
##
## Response: B$Individuals_Affected
## Df Sum Sq Mean Sq F value
## B$Location_of_Breached_Information 40 2.0127e+12 5.0318e+10 0.968
## Residuals 1014 5.2711e+13 5.1983e+10
## Pr(>F)
## B$Location_of_Breached_Information 0.5285
## Residuals
Based on the results of the ANOVA tests for main effects, it appears that we only reject the null hypothesis in the second ANOVA test and the variation in the number of individuals affected could be attributed to the business associate involved.
To determine the interaction effects, it’s necessary to convert the factor levels into numeric.
as.numeric(B$State)
## [1] 45 25 1 8 5 5 5 5 5 5 39 44 35 5 28 23 23 5 20 20 8 8 5
## [24] 15 23 45 45 25 15 45 5 15 46 34 4 41 40 40 10 44 5 33 5 6 10 52
## [47] 50 28 35 45 10 49 10 7 2 45 4 39 23 15 5 30 45 35 50 7 20 41 42
## [70] 5 18 44 24 45 47 15 6 10 45 8 5 5 8 33 42 35 30 28 45 15 36 17
## [93] 10 24 35 23 18 44 44 36 35 45 11 34 21 4 5 44 5 35 35 10 33 6 47
## [116] 28 15 20 35 8 35 36 45 7 45 18 16 28 23 45 45 36 39 36 45 15 20 18
## [139] 25 39 28 14 5 49 44 7 18 15 38 33 20 32 32 35 35 9 47 24 34 41 13
## [162] 16 5 17 24 35 36 1 16 20 5 5 35 15 37 5 20 35 45 18 21 45 46 49
## [185] 3 44 5 40 7 26 40 7 30 35 16 25 23 10 32 5 37 39 35 44 19 45 40
## [208] 39 21 15 50 36 5 50 16 24 39 18 21 2 39 10 36 31 10 47 16 45 49 42
## [231] 23 42 44 49 45 45 44 6 18 27 49 25 35 5 15 51 4 40 23 35 6 10 20
## [254] 18 37 25 45 35 2 51 5 5 1 4 4 7 37 25 37 42 5 4 24 50 52 28
## [277] 15 15 4 15 15 36 31 45 20 16 25 39 40 40 4 10 5 45 35 35 21 40 23
## [300] 42 18 49 38 28 4 42 5 11 51 5 49 10 44 44 25 4 36 19 5 6 5 10
## [323] 35 2 23 40 11 20 11 47 5 44 15 18 45 38 5 20 16 32 32 32 32 49 9
## [346] 21 11 32 32 40 45 15 15 18 39 15 5 5 3 16 45 50 40 45 11 26 45 49
## [369] 39 15 45 39 35 39 44 40 44 30 30 23 16 20 10 5 24 24 20 24 21 33 10
## [392] 39 39 20 3 25 45 16 10 8 35 23 28 36 36 10 45 39 21 47 5 51 17 2
## [415] 29 45 18 44 10 16 39 15 15 30 11 49 5 44 49 45 20 10 28 39 16 23 31
## [438] 17 5 37 47 35 16 49 6 10 24 28 24 5 23 32 50 16 8 5 19 38 39 33
## [461] 11 40 21 45 1 18 16 40 40 40 40 5 35 40 5 40 11 45 15 20 28 5 24
## [484] 35 45 23 20 4 5 45 5 46 16 35 10 49 45 11 38 5 3 10 42 38 5 4
## [507] 45 23 49 2 45 51 13 25 24 5 24 19 24 5 28 10 30 20 36 5 11 7 45
## [530] 45 49 44 11 16 10 38 35 10 18 18 27 28 45 45 15 5 5 13 38 28 20 35
## [553] 16 15 16 4 15 15 7 7 16 38 5 16 45 8 5 10 39 45 15 10 16 11 34
## [576] 49 10 33 47 3 7 35 5 18 25 5 18 10 44 5 11 5 32 16 39 41 36 35
## [599] 15 10 19 41 6 45 5 12 7 33 20 16 11 7 10 36 10 18 15 28 5 10 44
## [622] 5 41 11 47 10 28 10 45 20 35 3 3 3 23 36 35 4 5 23 16 32 18 20
## [645] 47 45 35 5 20 34 21 4 32 20 19 20 35 25 21 39 45 35 45 50 45 15 2
## [668] 44 28 10 11 21 5 50 10 21 20 25 11 10 33 11 10 43 4 5 5 11 35 35
## [691] 15 10 39 22 5 46 36 7 10 10 21 16 16 16 10 38 20 26 46 45 41 49 5
## [714] 10 10 15 35 28 45 50 42 25 38 16 35 45 46 46 39 28 28 4 36 35 45 5
## [737] 49 44 28 44 28 16 35 36 39 16 16 3 2 35 5 10 36 20 10 5 28 38 25
## [760] 10 10 13 30 45 11 16 47 5 10 36 28 5 15 39 24 45 5 52 25 45 5 10
## [783] 35 19 38 6 5 3 45 44 36 16 16 25 44 5 45 29 10 15 6 45 5 11 25
## [806] 6 25 38 5 15 40 44 42 15 15 35 10 5 25 10 47 36 5 6 36 44 32 13
## [829] 11 5 39 49 39 28 35 50 25 42 23 16 6 15 48 1 10 47 35 45 45 35 35
## [852] 25 5 50 5 45 10 10 39 42 45 39 6 32 6 28 28 40 40 28 29 21 4 35
## [875] 10 23 11 50 5 5 16 49 39 49 15 15 15 11 11 15 45 10 11 18 5 30 32
## [898] 47 52 27 42 44 39 45 5 20 28 10 33 6 32 11 49 39 28 5 26 47 15 36
## [921] 5 40 40 2 36 50 45 9 45 5 5 11 36 44 7 23 17 35 35 15 49 35 10
## [944] 39 36 5 16 25 5 24 32 44 47 36 36 28 45 3 5 45 4 2 2 13 5 5
## [967] 39 5 23 45 6 4 45 35 18 24 24 24 24 5 18 27 10 47 39 18 49 5 46
## [990] 45 10 15 45 5 40 40 23 5 15 17 7 47 25 44 5 40 26 21 5 36 13 14
## [1013] 21 39 6 10 20 21 35 46 45 45 45 45 16 13 2 39 36 20 18 32 5 45 32
## [1036] 40 38 23 31 18 35 36 40 7 17 39 5 39 15 11 40 10 5 23 5
as.numeric(B$Business_Associate_Involved)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 70 1 177 1 1
## [18] 1 1 1 184 135 1 220 1 1 1 1 1 1 1 1 95 1
## [35] 1 41 140 139 1 1 1 1 1 1 1 1 1 1 1 1 1
## [52] 1 1 1 1 58 1 1 1 1 1 128 1 1 1 1 1 1
## [69] 1 1 1 212 1 1 1 137 1 1 1 1 1 1 104 71 1
## [86] 1 68 1 1 1 1 1 1 1 188 1 1 1 71 1 1 1
## [103] 33 1 26 1 1 1 46 1 1 1 1 90 1 1 221 113 1
## [120] 1 1 1 1 1 1 127 1 1 133 1 1 1 1 1 1 1
## [137] 129 1 1 1 1 134 1 1 1 1 1 1 1 1 160 122 122
## [154] 1 1 25 1 1 1 1 1 1 1 1 47 1 1 17 1 1
## [171] 151 80 1 1 1 1 118 1 1 1 1 1 225 1 1 1 1
## [188] 216 1 1 132 1 1 1 1 1 1 1 167 1 1 1 1 1
## [205] 92 1 1 88 1 1 1 1 1 105 207 1 1 1 232 1 1
## [222] 1 1 1 1 1 1 1 1 1 1 213 115 1 1 1 1 1
## [239] 112 1 1 1 91 1 44 229 1 1 1 1 1 1 1 1 1
## [256] 62 1 1 1 1 1 1 218 1 1 93 1 1 1 1 108 1
## [273] 1 100 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [290] 1 1 1 1 1 193 1 45 1 15 1 1 1 125 1 1 1
## [307] 85 1 1 1 1 1 1 1 1 1 27 1 1 74 1 1 1
## [324] 1 30 1 1 1 31 1 181 180 1 1 1 1 1 1 32 131
## [341] 130 131 131 1 1 29 1 131 131 5 206 1 131 1 190 1 141
## [358] 1 1 1 1 124 1 1 1 1 1 1 34 1 1 20 1 86
## [375] 1 1 87 87 87 1 1 1 1 208 1 4 1 87 1 1 1
## [392] 1 1 1 1 106 196 1 1 1 1 209 1 161 161 1 1 1
## [409] 1 183 1 1 136 1 1 1 1 1 1 1 1 42 142 1 1
## [426] 1 1 1 1 1 1 1 16 1 178 227 1 1 1 1 6 1
## [443] 1 1 1 1 1 1 3 1 14 1 1 1 1 1 1 77 1
## [460] 35 49 203 1 116 1 1 1 201 201 172 204 1 1 172 1 202
## [477] 1 1 1 48 1 1 1 1 1 1 1 1 1 101 67 224 1
## [494] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [511] 1 1 1 1 1 1 3 1 1 1 1 1 1 1 65 1 1
## [528] 1 1 1 1 1 1 166 1 1 1 1 1 1 1 1 1 1
## [545] 1 156 155 1 1 1 1 1 51 1 1 1 1 1 81 81 51
## [562] 1 1 50 1 187 1 1 1 1 1 1 1 1 1 1 1 1
## [579] 1 79 1 1 1 1 169 1 1 1 1 1 1 1 1 1 1
## [596] 1 1 171 222 1 1 1 1 1 1 1 1 1 1 13 1 1
## [613] 1 1 12 9 40 1 1 12 11 7 1 10 1 9 1 1 1
## [630] 8 1 94 94 1 149 1 103 1 1 1 1 1 107 53 149 1
## [647] 1 1 53 1 228 1 148 53 1 53 1 1 1 1 1 43 1
## [664] 1 179 121 1 123 230 1 1 1 54 1 1 1 1 1 1 72
## [681] 1 1 73 1 1 1 1 1 194 97 162 1 1 1 146 89 1
## [698] 1 1 219 154 60 59 59 1 1 1 1 1 1 1 197 173 1
## [715] 1 1 164 1 1 211 1 1 1 1 1 1 1 1 231 1 1
## [732] 1 1 1 28 75 1 1 78 145 78 1 1 1 1 119 1 1
## [749] 200 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 18
## [766] 174 64 1 1 1 1 143 83 231 1 186 1 1 157 1 117 1
## [783] 1 1 1 1 1 98 1 126 1 1 24 111 1 1 1 1 1
## [800] 1 1 1 55 1 1 1 1 1 1 109 158 1 1 38 38 1
## [817] 1 1 111 1 1 1 1 147 1 1 1 1 153 1 1 1 1
## [834] 1 1 99 1 37 1 1 1 1 1 1 1 1 1 1 1 168
## [851] 168 182 1 1 1 1 191 1 1 1 1 1 1 1 1 1 1
## [868] 217 1 1 1 1 19 1 96 96 1 1 1 1 1 1 1 1
## [885] 1 170 1 1 1 1 1 1 1 1 1 1 114 138 1 1 69
## [902] 56 231 1 1 1 1 1 1 57 1 1 1 176 1 159 1 226
## [919] 1 1 1 215 215 36 1 223 1 1 1 1 1 1 1 1 1
## [936] 1 1 1 1 1 1 1 1 1 1 1 1 195 195 195 1 195
## [953] 1 152 1 1 1 66 1 192 1 165 165 1 199 1 1 1 1
## [970] 1 1 1 210 195 1 1 102 102 102 198 205 1 1 1 1 1
## [987] 1 1 1 175 163 1 82 1 1 1 1 199 1 1 52 23 1
## [1004] 1 1 22 1 185 61 1 214 1 1 1 1 1 1 110 1 189
## [1021] 1 1 1 1 84 1 1 1 150 1 120 1 63 1 2 1 1
## [1038] 39 1 1 1 1 1 144 1 1 1 1 1 1 21 76 1 1
## [1055] 1
as.numeric(B$Type_of_Breach)
## [1] 12 12 12 5 12 12 12 12 12 12 12 12 11 12 1 12 12 11 12 12 12 12 12
## [24] 12 12 5 12 12 12 12 12 12 12 12 12 11 12 12 12 12 11 11 12 12 12 24
## [47] 12 12 12 12 11 12 12 1 12 24 12 15 11 5 12 11 12 12 11 12 12 12 3
## [70] 12 12 5 12 6 12 12 12 5 12 19 12 12 12 12 12 11 12 5 3 12 12 12
## [93] 12 11 12 12 1 5 12 11 12 12 5 12 11 12 11 11 7 12 12 11 12 12 12
## [116] 12 11 5 12 12 12 3 1 3 12 11 1 12 5 12 12 11 11 12 12 15 11 12
## [139] 3 12 1 5 12 12 5 12 12 12 12 12 3 12 12 5 5 11 12 23 12 12 5
## [162] 12 12 12 11 12 3 12 12 3 12 12 12 3 1 12 3 12 12 12 11 12 11 1
## [185] 12 12 5 25 23 12 23 12 12 12 23 12 12 12 12 23 16 12 12 3 23 12 21
## [208] 23 1 12 12 3 12 23 12 12 23 15 12 12 5 12 12 1 12 12 1 15 12 1
## [231] 12 5 12 12 12 12 12 12 1 15 12 1 12 23 12 23 5 28 5 12 23 28 23
## [254] 12 5 23 28 12 12 23 12 12 12 12 12 5 12 23 12 3 28 12 5 23 1 1
## [277] 1 1 1 1 1 3 12 12 12 12 23 12 12 12 20 12 12 3 12 12 23 12 25
## [300] 12 15 28 12 28 6 12 5 12 23 12 5 23 12 12 28 23 12 12 23 5 12 23
## [323] 12 23 1 12 12 1 23 3 23 23 12 12 12 12 23 12 1 12 12 12 12 23 5
## [346] 1 12 12 12 12 12 23 12 12 20 12 23 12 1 12 12 1 12 12 12 12 25 12
## [369] 5 5 12 20 12 20 12 12 12 5 12 12 12 12 5 20 12 12 12 12 12 12 23
## [392] 12 12 12 5 5 15 23 12 5 12 12 28 12 12 12 12 3 12 5 12 15 27 12
## [415] 12 23 5 23 5 12 11 23 3 12 12 12 11 12 12 12 12 12 27 12 23 11 12
## [438] 12 11 5 27 12 20 12 2 12 11 12 12 11 11 12 12 1 23 11 5 20 12 12
## [461] 23 12 12 12 23 11 12 12 12 12 12 12 12 12 12 12 12 12 11 11 23 12 23
## [484] 23 12 23 5 5 23 23 12 1 3 12 11 12 3 29 12 12 23 12 23 12 12 25
## [507] 12 1 12 3 25 12 3 23 12 5 12 15 23 12 23 15 12 12 23 12 12 12 10
## [530] 12 12 5 5 12 12 12 12 12 12 12 12 23 12 23 12 9 18 12 26 5 12 12
## [553] 1 11 12 12 12 12 12 12 1 12 12 1 12 12 12 12 12 5 27 12 12 12 25
## [576] 12 27 1 12 5 12 12 12 23 3 12 5 21 12 12 12 12 12 1 25 12 3 12
## [599] 12 12 20 5 5 5 11 1 12 12 12 11 12 12 12 12 12 12 23 3 23 12 12
## [622] 20 12 12 5 12 1 12 23 12 12 11 11 11 12 12 23 12 23 11 12 13 1 1
## [645] 12 1 12 12 1 3 23 23 12 1 12 1 12 12 12 5 23 12 12 12 12 3 12
## [668] 12 12 12 5 12 19 23 11 12 12 12 12 23 12 23 23 12 12 12 12 12 12 12
## [691] 23 23 12 11 12 5 12 23 12 12 1 20 20 20 12 12 1 5 17 23 29 12 23
## [714] 20 22 12 12 20 12 23 5 12 12 23 12 11 23 12 5 14 12 12 12 5 5 3
## [737] 3 23 1 11 1 12 11 5 1 3 1 20 12 12 11 20 12 1 23 12 3 23 3
## [760] 20 23 10 5 12 12 11 12 11 12 11 11 23 11 5 5 3 23 12 23 23 21 23
## [783] 5 23 23 20 12 12 23 23 12 23 11 23 12 11 15 4 11 12 1 15 11 11 12
## [806] 12 12 23 12 1 12 5 3 1 1 12 28 23 23 20 1 12 12 12 12 12 12 23
## [829] 12 12 20 11 12 11 3 23 23 1 1 12 3 23 12 23 23 12 5 12 12 23 23
## [852] 23 12 12 12 12 3 23 23 12 11 5 20 12 19 23 23 23 23 11 19 23 12 5
## [875] 12 12 12 12 5 5 12 12 23 1 23 23 15 11 12 11 23 23 5 12 1 11 5
## [898] 23 23 23 12 12 5 11 12 23 11 23 12 23 12 5 23 11 23 12 1 27 11 28
## [921] 12 12 20 11 23 5 5 11 1 12 12 12 12 12 12 12 23 12 5 12 27 19 5
## [944] 23 11 1 12 23 23 23 12 23 12 27 11 15 12 11 3 1 11 1 1 11 12 12
## [967] 11 12 20 12 11 12 6 23 11 8 8 8 8 12 12 12 5 7 23 11 11 12 23
## [990] 23 23 1 12 1 12 12 12 12 11 5 5 12 23 23 5 12 12 25 20 5 11 12
## [1013] 12 12 1 20 12 23 23 1 1 1 1 1 1 23 12 5 11 23 11 20 11 12 11
## [1036] 23 23 25 12 12 23 23 23 12 23 8 12 11 12 11 12 1 15 23 12
as.numeric(B$Location_of_Breached_Information)
## [1] 41 31 37 17 1 1 1 1 1 17 17 17 41 35 6 36 18 12 38 35 41 41 37
## [24] 35 35 37 17 1 17 35 17 40 35 41 17 41 41 41 1 41 1 1 31 35 17 31
## [47] 17 41 17 37 41 17 17 31 17 12 36 41 35 41 17 41 17 35 35 17 17 41 41
## [70] 1 37 35 40 41 18 37 41 41 17 14 31 1 17 17 17 41 17 35 41 17 17 17
## [93] 17 41 35 17 31 17 17 12 19 31 35 31 12 17 41 11 37 37 36 41 17 1 36
## [116] 8 41 38 1 17 1 41 31 41 1 41 31 41 35 17 17 17 35 17 31 41 37 17
## [139] 41 17 1 35 1 1 37 17 17 17 23 37 41 37 36 37 36 31 17 15 41 41 36
## [162] 18 41 30 41 1 41 37 17 41 37 37 31 41 1 1 41 17 26 41 35 17 11 31
## [185] 37 35 41 31 31 1 37 17 1 17 41 17 17 41 31 41 41 1 1 41 41 1 18
## [208] 31 4 1 17 41 37 35 17 17 12 37 41 37 37 31 18 31 17 17 32 37 1 1
## [231] 18 37 35 37 17 37 1 41 31 17 41 1 39 35 37 31 37 1 37 41 41 35 1
## [254] 17 10 41 15 1 7 41 17 1 17 37 37 35 31 35 30 41 35 41 41 35 1 31
## [277] 19 19 19 19 19 35 17 17 17 17 35 41 1 1 41 40 17 35 41 41 41 41 31
## [300] 1 37 41 37 12 41 37 41 17 41 1 37 35 17 17 41 41 17 20 41 35 4 39
## [323] 17 41 31 1 41 31 31 41 41 41 41 17 17 1 41 36 31 37 37 37 37 41 37
## [346] 31 37 37 37 35 17 17 37 29 21 4 35 41 31 37 17 1 41 41 17 41 31 18
## [369] 37 41 17 30 41 41 41 35 37 37 37 1 17 37 35 41 25 17 17 35 41 41 15
## [392] 35 35 1 35 35 1 31 41 37 17 41 41 1 1 31 37 31 40 35 37 17 35 1
## [415] 17 16 36 41 35 35 41 31 41 41 17 35 41 28 17 17 17 4 32 17 35 41 41
## [438] 17 41 35 41 17 41 1 12 36 41 17 17 41 35 41 17 35 41 35 36 17 17 17
## [461] 35 17 31 35 1 41 17 17 28 17 17 41 17 17 41 28 17 36 31 41 35 41 35
## [484] 12 41 12 41 36 12 41 35 31 41 4 35 3 41 35 1 1 35 41 12 41 31 31
## [507] 41 31 31 41 31 41 41 12 17 36 17 17 35 35 35 17 17 17 35 41 16 1 17
## [530] 35 9 41 36 17 36 1 13 41 17 17 28 1 17 15 41 36 36 41 31 36 17 1
## [553] 35 31 35 31 28 41 17 17 35 35 1 35 16 17 1 15 17 36 41 17 36 17 31
## [576] 17 41 31 36 36 28 17 1 12 41 17 36 31 17 17 35 1 41 31 33 41 41 31
## [599] 41 15 15 35 41 36 35 1 36 17 1 35 41 1 1 17 1 1 31 41 41 1 1
## [622] 1 17 1 36 1 12 41 34 1 1 41 41 41 17 26 35 41 35 17 17 31 17 31
## [645] 17 1 41 1 31 41 41 41 17 31 17 31 17 17 17 35 15 41 17 41 1 41 1
## [668] 17 1 17 17 17 41 35 41 17 36 17 17 41 17 31 41 1 17 1 41 17 41 36
## [691] 35 15 41 12 17 36 1 31 17 17 31 31 31 31 11 3 31 17 41 41 41 17 41
## [714] 15 31 15 31 30 41 41 17 41 17 35 1 12 35 1 41 41 1 36 35 36 36 41
## [737] 41 12 31 1 31 17 12 17 31 35 1 35 17 17 35 15 12 1 1 17 41 41 12
## [760] 5 41 35 36 17 17 41 41 41 17 41 12 12 41 41 41 35 15 1 35 1 4 1
## [783] 36 1 35 31 17 17 36 31 17 15 35 41 17 41 36 8 41 1 1 17 31 12 28
## [806] 17 4 15 17 35 31 35 35 31 31 31 35 15 41 4 12 17 17 17 31 17 17 15
## [829] 12 30 41 41 17 12 41 41 12 31 31 17 41 12 17 12 1 16 36 1 17 31 31
## [852] 12 17 17 17 1 41 17 31 17 41 36 15 17 17 41 35 41 41 35 41 12 35 36
## [875] 17 17 17 36 36 36 1 17 15 1 31 12 17 1 17 41 35 1 17 36 31 12 36
## [898] 31 31 2 17 17 41 41 1 12 41 1 17 12 17 1 27 41 35 17 31 41 12 35
## [921] 17 31 31 35 41 36 36 17 31 36 17 17 31 17 41 31 36 11 36 39 41 17 41
## [944] 31 35 12 17 31 31 31 1 31 17 35 15 17 1 35 41 31 35 31 31 24 2 12
## [967] 35 1 41 35 18 17 36 31 15 22 22 22 22 1 17 41 36 36 12 12 12 36 12
## [990] 35 41 12 22 31 35 35 36 2 41 41 41 41 35 41 36 36 35 31 41 41 35 36
## [1013] 17 17 12 17 35 41 12 31 12 12 12 12 31 41 41 41 41 16 12 41 41 17 31
## [1036] 41 35 12 1 36 1 33 41 17 12 41 1 14 11 12 35 35 36 12 1
The following are ANOVA results for the interaction efects. Note for the code to quickly run it had to be done on the subset of the data set by removing 600 of the largest for Individuals_Affected. This is assuming that the results of ANOVA would be similar to if it was done on the full data set.
Bsub <- subset(B,Individuals_Affected<1846)
inter1 <- aov(Individuals_Affected ~ State * Business_Associate_Involved, data = Bsub)
anova(inter1)
## Analysis of Variance Table
##
## Response: Individuals_Affected
## Df Sum Sq Mean Sq F value Pr(>F)
## State 47 6028611 128268 0.9475 0.57420
## Business_Associate_Involved 89 13214151 148474 1.0968 0.28121
## State:Business_Associate_Involved 4 1242585 310646 2.2947 0.05923 .
## Residuals 314 42508025 135376
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
inter2 <-aov(Individuals_Affected ~ State * Type_of_Breach, data = Bsub)
anova(inter2)
## Analysis of Variance Table
##
## Response: Individuals_Affected
## Df Sum Sq Mean Sq F value Pr(>F)
## State 47 6028611 128268 0.9493 0.5709
## Type_of_Breach 16 1354983 84686 0.6267 0.8616
## State:Type_of_Breach 104 16829150 161819 1.1976 0.1245
## Residuals 287 38780628 135124
inter3 <- aov(Individuals_Affected ~ State * Location_of_Breached_Information, data = Bsub)
anova(inter3)
## Analysis of Variance Table
##
## Response: Individuals_Affected
## Df Sum Sq Mean Sq F value Pr(>F)
## State 47 6028611 128268 0.9068 0.6471
## Location_of_Breached_Information 24 2881454 120061 0.8487 0.6719
## State:Location_of_Breached_Information 142 19992140 140790 0.9953 0.5074
## Residuals 241 34091168 141457
inter4 <- aov(Individuals_Affected ~ Business_Associate_Involved * Type_of_Breach, data = Bsub)
anova(inter4)
## Analysis of Variance Table
##
## Response: Individuals_Affected
## Df Sum Sq Mean Sq F value
## Business_Associate_Involved 89 13188717 148188 1.0898
## Type_of_Breach 12 1422178 118515 0.8716
## Business_Associate_Involved:Type_of_Breach 1 519707 519707 3.8221
## Residuals 352 47862769 135974
## Pr(>F)
## Business_Associate_Involved 0.29127
## Type_of_Breach 0.57634
## Business_Associate_Involved:Type_of_Breach 0.05137 .
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
inter5 <- aov(Individuals_Affected ~ Business_Associate_Involved * Location_of_Breached_Information, data = Bsub)
anova(inter5)
## Analysis of Variance Table
##
## Response: Individuals_Affected
## Df Sum Sq Mean Sq F value Pr(>F)
## Business_Associate_Involved 89 13188717 148188 1.0956 0.2811
## Location_of_Breached_Information 23 3547693 154248 1.1404 0.2990
## Residuals 342 46256962 135254
inter6 <- aov(Individuals_Affected ~ Type_of_Breach * Location_of_Breached_Information, data = Bsub)
anova(inter6)
## Analysis of Variance Table
##
## Response: Individuals_Affected
## Df Sum Sq Mean Sq
## Type_of_Breach 16 1608222 100514
## Location_of_Breached_Information 23 3303253 143620
## Type_of_Breach:Location_of_Breached_Information 35 4588848 131110
## Residuals 380 53493049 140771
## F value Pr(>F)
## Type_of_Breach 0.7140 0.7801
## Location_of_Breached_Information 1.0202 0.4381
## Type_of_Breach:Location_of_Breached_Information 0.9314 0.5839
## Residuals
It’s apparent that none of the interaction effects are significant.
qqnorm(residuals(inter1))
qqline(residuals(inter1))
qqnorm(residuals(inter2))
qqline(residuals(inter2))
qqnorm(residuals(inter3))
qqline(residuals(inter3))
qqnorm(residuals(inter4))
qqline(residuals(inter4))
qqnorm(residuals(inter5))
qqline(residuals(inter5))
qqnorm(residuals(inter6))
qqline(residuals(inter6))
In this case on nonnoramlity, a nonparametric test could be utilized. Also, find a way to perform a complete anova on all of the data set.
https://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf #A summary of, or pointer to the raw dat summary(Ecdat::breaches) #Complete and Documented R code