1. Setting

System under test

The breaches data set resides in the Ecdat package of the R programming language. It contains information about cyber security breaches involving health care records that were reported to the U.S Department of Health and Human Services as of June 27, 2014. The breaches data set consists of 1055 observations with 13 variables.

#install.packages("Ecdat")
library("Ecdat")
## Loading required package: Ecfun
## 
## Attaching package: 'Ecfun'
## The following object is masked from 'package:base':
## 
##     sign
## 
## Attaching package: 'Ecdat'
## The following object is masked from 'package:datasets':
## 
##     Orange
#Place breaches data set from the Ecdat into B
B <-Ecdat::breaches

The following is the structure of the breaches data set.

## 'data.frame':    1055 obs. of  13 variables:
##  $ Number                          : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Name_of_Covered_Entity          : Factor w/ 967 levels " DeKalb Medical Center, Inc. d/b/a DeKalb Medical Hillandale",..: 108 521 31 334 435 217 519 413 488 170 ...
##  $ State                           : Factor w/ 52 levels "AK","AL","AR",..: 45 25 1 8 5 5 5 5 5 5 ...
##  $ Business_Associate_Involved     : Factor w/ 232 levels ""," Xand Corporation",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Individuals_Affected            : int  1000 1000 501 3800 5257 857 6145 952 5166 5900 ...
##  $ Date_of_Breach                  : chr  "10/16/2009" "9/22/2009" "10/12/2009" "10/9/2009" ...
##  $ Type_of_Breach                  : Factor w/ 29 levels "Hacking/IT Incident",..: 12 12 12 5 12 12 12 12 12 12 ...
##  $ Location_of_Breached_Information: Factor w/ 41 levels "Desktop Computer",..: 41 31 37 17 1 1 1 1 1 17 ...
##  $ Date_Posted_or_Updated          : Date, format: "2014-06-30" "2014-05-30" ...
##  $ Summary                         : chr  "A binder containing the protected health information (PHI) of up to 1,272 individuals was stolen from a staff member's vehicle."| __truncated__ "Five desktop computers containing unencrypted electronic protected health information (e-PHI) were stolen from the covered enti"| __truncated__ "" "A laptop was lost by an employee while in transit on public transportation.  The computer contained the protected health inform"| __truncated__ ...
##  $ breach_start                    : Date, format: "2009-10-16" "2009-09-22" ...
##  $ breach_end                      : Date, format: NA NA ...
##  $ year                            : num  2009 2009 2009 2009 2009 ...

Factors and Levels

The 5 factors in this data set are the entities experiencing the security breach, the state where the breach occurred (this also includes the District of Columbia (DC) and Puerto Rico (PR)), the name of the subcontractor associated with the breach, the type of breach and the location (or source) of the breach. The following is information about the number of levels for each factor:
Name_of_Covered_Entity: 967 levels
State: 52 levels
Business_Associate_Involved: 232 levels
Type_of_Breach: 29 levels
Location_of_Breached_Information: 41 levels

Continous variables

The continous variables in this data set is the Invdividuals_Affected which is the number of people whose records were compromised in the breach. This number is 500 or greater.

Response variables

We’ll taking the response variable in this model to be Individuals_Affected. This is because this variable is the only numerical value in this data set other than the id number.

The Data: How is it organized and what does it look like?

As previously mentioned the data set consists of 13 variables and 13 variables.
The following is a general view of the data set:

head(B)
##   Number                                Name_of_Covered_Entity State
## 1      0                            Brooke Army Medical Center    TX
## 2      1             Mid America Kidney Stone Association, LLC    MO
## 3      2       Alaska Department of Health and Social Services    AK
## 4      3 Health Services for Children with Special Needs, Inc.    DC
## 5      4                              L. Douglas Carlson, M.D.    CA
## 6      5                                    David I. Cohen, MD    CA
##   Business_Associate_Involved Individuals_Affected Date_of_Breach
## 1                                             1000     10/16/2009
## 2                                             1000      9/22/2009
## 3                                              501     10/12/2009
## 4                                             3800      10/9/2009
## 5                                             5257      9/27/2009
## 6                                              857      9/27/2009
##   Type_of_Breach        Location_of_Breached_Information
## 1          Theft                                   Paper
## 2          Theft                          Network Server
## 3          Theft Other Portable Electronic Device, Other
## 4           Loss                                  Laptop
## 5          Theft                        Desktop Computer
## 6          Theft                        Desktop Computer
##   Date_Posted_or_Updated
## 1             2014-06-30
## 2             2014-05-30
## 3             2014-01-23
## 4             2014-01-23
## 5             2014-01-23
## 6             2014-01-23
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Summary
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         A binder containing the protected health information (PHI) of up to 1,272 individuals was stolen from a staff member's vehicle.  The PHI included names, telephone numbers, detailed treatment notes, and possibly social security numbers.  In response to the breach, the covered entity (CE) sanctioned the workforce member and developed a new policy requiring on-call staff members to submit any information created during their shifts to the main office instead of adding it to the binder.  Following OCR's investigation, the CE notified the local media about the breach.
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                  Five desktop computers containing unencrypted electronic protected health information (e-PHI) were stolen from the covered entity (CE).  Originally, the CE reported that over 500 persons were involved, but subsequent investigation showed that about 260 persons were involved.  The ePHI included demographic and financial information. The CE provided breach notification to affected individuals and HHS.  Following the breach, the CE improved physical security by installing motion detectors and alarm systems security monitoring.  It improved technical safeguards by installing enhanced antivirus and encryption software.  As a result of OCR's investigation the CE updated its computer password policy.  
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                      A laptop was lost by an employee while in transit on public transportation.  The computer contained the protected health information of 3800 individuals.  The protected health information involved in the breach included names, Medicaid ID numbers, dates of birth, and primary physicians.  In response to this incident, the covered entity took steps to enforce the requirements of the Privacy & Security Rules.  The covered entity has installed encryption software on all employee computers, strengthened access controls including passwords, reviewed and updated security policies and procedures, and updated it risk assessment.  In addition, all employees received additional security training.  \n\n
## 5 A shared Computer that was used for backup was stolen on 9/27/09 from the reception desk area of the covered entity. The Computer contained certain electronic protected health information (ePHI) of 5,257 individuals who were patients of the CE.  The ePHI involved in the breach included names, dates of birth, and clinical information, but there were no social security numbers, financial information, addresses, phone numbers, or other ePHI in any of the reports on the disks or the hard drive on the stolen Computer. Following the breach, the covered entity notified all 5,257 affected individuals and the appropriate media; added technical safeguards of encryption for all ePHI stored on the USB flash drive or the CD used on the replacement computer; added physical safeguards by keeping new portable devices locked when not in use in a secure combination safe in doctor's private office or in a secure filing cabinet; and added administrative safeguards by requiring annual refresher retraining of CE staff for Privacy and Security Rules as well as requiring immediate retraining of cleaning staff in both Rules.\n\n
## 6                                                                                  A shared Computer that was used for backup was stolen from the reception desk area, behind a locked desk area, probably while a cleaning crew had left the main door to the building open and the door to the suite was unlocked and perhaps ajar.  The Computer contained certain electronic protected health information (ePHI) of 857 patients.  The ePHI involved in the breach included names, dates of birth, and clinical information.  Following the breach, the covered entity notified all affected individuals and the media, added technical safeguards of encryption for all ePHI stored on the USB flash drive or the CD used on the replacement computer, added physical safeguards by keeping new portable devices locked when not in use in a secure combination safe in doctor's private office or in a secure filing cabinet, and added administrative safeguards by requiring annual refresher retraining staff for Privacy and Security Rules as well as requiring immediate retraining of cleaning staff in both Rules, which has already taken place.\n\n
##   breach_start breach_end year
## 1   2009-10-16       <NA> 2009
## 2   2009-09-22       <NA> 2009
## 3   2009-10-12       <NA> 2009
## 4   2009-10-09       <NA> 2009
## 5   2009-09-27       <NA> 2009
## 6   2009-09-27       <NA> 2009
tail(B)
##      Number                     Name_of_Covered_Entity State
## 1050   1049                       St. Francis Hospital    GA
## 1051   1050              Puerto Rico Health Insurance     PR
## 1052   1051               Hospitalists of Brandon, LLC    FL
## 1053   1052              Santa Rosa Memorial Hospital     CA
## 1054   1053 Group Health Plan of Hurley Medical Center    MI
## 1055   1054                    Abrham Tekola, M.D.,INC    CA
##              Business_Associate_Involved Individuals_Affected
## 1050                                                     1175
## 1051                 American Health Inc                28413
## 1052 Doctors First Choice Billings, Inc.                 1831
## 1053                                                    33702
## 1054                                                     2289
## 1055                                                     5471
##      Date_of_Breach                 Type_of_Breach
## 1050      5/30/2014                          Other
## 1051      9/20/2013                          Theft
## 1052      2/11/2014            Hacking/IT Incident
## 1053       6/2/2014                    Theft, Loss
## 1054      5/13/2014 Unauthorized Access/Disclosure
## 1055      5/27/2014                          Theft
##      Location_of_Breached_Information Date_Posted_or_Updated Summary
## 1050                           E-mail             2014-06-18        
## 1051                            Other             2014-06-27        
## 1052                            Other             2014-06-27        
## 1053 Other Portable Electronic Device             2014-06-27        
## 1054                           E-mail             2014-06-27        
## 1055                 Desktop Computer             2014-06-27        
##      breach_start breach_end year
## 1050   2014-05-30       <NA> 2014
## 1051   2013-09-20       <NA> 2013
## 1052   2014-02-11       <NA> 2014
## 1053   2014-06-02       <NA> 2014
## 1054   2014-05-13       <NA> 2014
## 1055   2014-05-27       <NA> 2014
summary(B)
##      Number      
##  Min.   :   0.0  
##  1st Qu.: 263.5  
##  Median : 527.0  
##  Mean   : 527.0  
##  3rd Qu.: 790.5  
##  Max.   :1054.0  
##                  
##                                                      Name_of_Covered_Entity
##  UnitedHealth Group health plan single affiliated covered entity:   7      
##  Cook County Health & Hospitals System                          :   4      
##  University of California, San Francisco                        :   4      
##  Walgreen Co.                                                   :   4      
##  Baptist Health System                                          :   3      
##  County of Los Angeles                                          :   3      
##  (Other)                                                        :1030      
##      State                          Business_Associate_Involved
##  CA     :113                                      :784         
##  TX     : 83   MedAssets                          :  6         
##  FL     : 66   StayWell Health Management, LLC    :  5         
##  NY     : 58   Clearpoint Design, Inc.            :  4         
##  IL     : 49   Futurity First Insurance Group     :  4         
##  IN     : 40   HealthPartners Administrators, Inc.:  3         
##  (Other):646   (Other)                            :249         
##  Individuals_Affected Date_of_Breach    
##  Min.   :    500      Length:1055       
##  1st Qu.:   1000      Class :character  
##  Median :   2300      Mode  :character  
##  Mean   :  30262                        
##  3rd Qu.:   6941                        
##  Max.   :4900000                        
##                                         
##                         Type_of_Breach
##  Theft                         :516   
##  Unauthorized Access/Disclosure:148   
##  Other                         : 91   
##  Loss                          : 85   
##  Hacking/IT Incident           : 75   
##  Improper Disposal             : 38   
##  (Other)                       :102   
##                  Location_of_Breached_Information Date_Posted_or_Updated
##  Paper                           :227             Min.   :2014-01-23    
##  Laptop                          :217             1st Qu.:2014-01-23    
##  Other                           :116             Median :2014-01-23    
##  Desktop Computer                :113             Mean   :2014-02-23    
##  Network Server                  :107             3rd Qu.:2014-03-24    
##  Other Portable Electronic Device: 60             Max.   :2014-06-30    
##  (Other)                         :215                                   
##    Summary           breach_start          breach_end        
##  Length:1055        Min.   :1997-01-01   Min.   :2007-06-14  
##  Class :character   1st Qu.:2010-11-08   1st Qu.:2012-04-22  
##  Mode  :character   Median :2012-01-11   Median :2012-10-29  
##                     Mean   :2011-12-09   Mean   :2012-10-28  
##                     3rd Qu.:2013-03-07   3rd Qu.:2013-05-29  
##                     Max.   :2014-06-02   Max.   :2013-11-30  
##                                          NA's   :910         
##       year     
##  Min.   :1997  
##  1st Qu.:2010  
##  Median :2012  
##  Mean   :2011  
##  3rd Qu.:2013  
##  Max.   :2014  
## 

2.(Experimental) Design

How will the experiment be organized and conducted to test the hypothesis?

For this experiment we are utilizing 4 factors with multiple levels. We wil analyze their individual main and interaction effects on the number of individuals affected by a breach. For this experiment our null hypothesis can be set as the number of individuals affected by a breach dosen’t depend on any of the four factors or any two way interaction of the factors.
The factors that we are looking at in this experiment are as follows:
State
Business_Associate_Involved
Type_of_Breach
Location_of_Breached_Information
We aren’t looking at the factor Name_of_Covered_Entity since there are 967 levels with 1055 observation. It’s not likely that we would obtain good results.

What is the rationale for this design?

The rationale behind the data set is for the U.S. Department of Health and Human Services to have information behind the cyber security breaches involving health care records.

Randomize: What is the Randomization Scheme?

This data set has no randomization scheme. #Replicate: Are there any replicates and/or repeated measures? There are no replicates or repeated measures in this data set. #Block: Did you use blocking in the design? There is no blocking utilized in this design. However, if there were fewer levels for the factor Name_of_Covered_Entity or more observations, we would most likely utilize blocking with regard to the factor Name_of_Covered_entity. This is because that would give us specific information with regard to the entity undergoing the breach.

3.(Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

The following is a histogram of our response variable Individuals_Affected for visaully determining if the response

It’s apparent that the response variable doesn’t follow a normal distribution.
The following is the summary statistics of the Individuals_Affected

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     500    1000    2300   30260    6941 4900000

From the difference in the Median and Mean of the number of individuals affected, we can assume that there are extreme outliers in this data set. To confirm let’s look at the boxplot of the variable Individuals_Affected. Note: 30 of the largest outliers were removed to make the boxplots easy to view.

According to the boxplots it’s possible that the business associate involved and the location of the information breach doesn’t have an effect on the response variable. For analyzing these effects ANOVA will be utilized.

Testing

In order to determine the factors have a significant main effect or two-way interaction effect on the response variable, ANOVA will be conducted. The following are ANOVA results for the main effects.

main1 <-aov(B$Individuals_Affected~B$State)
anova(main1)
## Analysis of Variance Table
## 
## Response: B$Individuals_Affected
##             Df     Sum Sq    Mean Sq F value Pr(>F)
## B$State     51 1.7544e+12 3.4400e+10  0.6514 0.9725
## Residuals 1003 5.2969e+13 5.2811e+10
main2 <-aov(B$Individuals_Affected~B$Business_Associate_Involved)
anova(main2)
## Analysis of Variance Table
## 
## Response: B$Individuals_Affected
##                                Df     Sum Sq    Mean Sq F value    Pr(>F)
## B$Business_Associate_Involved 231 3.1831e+13 1.3780e+11  4.9538 < 2.2e-16
## Residuals                     823 2.2893e+13 2.7816e+10                  
##                                  
## B$Business_Associate_Involved ***
## Residuals                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
main3 <- aov(B$Individuals_Affected~B$Type_of_Breach)
anova(main3)
## Analysis of Variance Table
## 
## Response: B$Individuals_Affected
##                    Df     Sum Sq    Mean Sq F value Pr(>F)
## B$Type_of_Breach   28 7.2064e+11 2.5737e+10   0.489 0.9887
## Residuals        1026 5.4003e+13 5.2635e+10
main4 <- aov(B$Individuals_Affected~B$Location_of_Breached_Information)
anova(main4)
## Analysis of Variance Table
## 
## Response: B$Individuals_Affected
##                                      Df     Sum Sq    Mean Sq F value
## B$Location_of_Breached_Information   40 2.0127e+12 5.0318e+10   0.968
## Residuals                          1014 5.2711e+13 5.1983e+10        
##                                    Pr(>F)
## B$Location_of_Breached_Information 0.5285
## Residuals

Based on the results of the ANOVA tests for main effects, it appears that we only reject the null hypothesis in the second ANOVA test and the variation in the number of individuals affected could be attributed to the business associate involved.

To determine the interaction effects, it’s necessary to convert the factor levels into numeric.

as.numeric(B$State)
##    [1] 45 25  1  8  5  5  5  5  5  5 39 44 35  5 28 23 23  5 20 20  8  8  5
##   [24] 15 23 45 45 25 15 45  5 15 46 34  4 41 40 40 10 44  5 33  5  6 10 52
##   [47] 50 28 35 45 10 49 10  7  2 45  4 39 23 15  5 30 45 35 50  7 20 41 42
##   [70]  5 18 44 24 45 47 15  6 10 45  8  5  5  8 33 42 35 30 28 45 15 36 17
##   [93] 10 24 35 23 18 44 44 36 35 45 11 34 21  4  5 44  5 35 35 10 33  6 47
##  [116] 28 15 20 35  8 35 36 45  7 45 18 16 28 23 45 45 36 39 36 45 15 20 18
##  [139] 25 39 28 14  5 49 44  7 18 15 38 33 20 32 32 35 35  9 47 24 34 41 13
##  [162] 16  5 17 24 35 36  1 16 20  5  5 35 15 37  5 20 35 45 18 21 45 46 49
##  [185]  3 44  5 40  7 26 40  7 30 35 16 25 23 10 32  5 37 39 35 44 19 45 40
##  [208] 39 21 15 50 36  5 50 16 24 39 18 21  2 39 10 36 31 10 47 16 45 49 42
##  [231] 23 42 44 49 45 45 44  6 18 27 49 25 35  5 15 51  4 40 23 35  6 10 20
##  [254] 18 37 25 45 35  2 51  5  5  1  4  4  7 37 25 37 42  5  4 24 50 52 28
##  [277] 15 15  4 15 15 36 31 45 20 16 25 39 40 40  4 10  5 45 35 35 21 40 23
##  [300] 42 18 49 38 28  4 42  5 11 51  5 49 10 44 44 25  4 36 19  5  6  5 10
##  [323] 35  2 23 40 11 20 11 47  5 44 15 18 45 38  5 20 16 32 32 32 32 49  9
##  [346] 21 11 32 32 40 45 15 15 18 39 15  5  5  3 16 45 50 40 45 11 26 45 49
##  [369] 39 15 45 39 35 39 44 40 44 30 30 23 16 20 10  5 24 24 20 24 21 33 10
##  [392] 39 39 20  3 25 45 16 10  8 35 23 28 36 36 10 45 39 21 47  5 51 17  2
##  [415] 29 45 18 44 10 16 39 15 15 30 11 49  5 44 49 45 20 10 28 39 16 23 31
##  [438] 17  5 37 47 35 16 49  6 10 24 28 24  5 23 32 50 16  8  5 19 38 39 33
##  [461] 11 40 21 45  1 18 16 40 40 40 40  5 35 40  5 40 11 45 15 20 28  5 24
##  [484] 35 45 23 20  4  5 45  5 46 16 35 10 49 45 11 38  5  3 10 42 38  5  4
##  [507] 45 23 49  2 45 51 13 25 24  5 24 19 24  5 28 10 30 20 36  5 11  7 45
##  [530] 45 49 44 11 16 10 38 35 10 18 18 27 28 45 45 15  5  5 13 38 28 20 35
##  [553] 16 15 16  4 15 15  7  7 16 38  5 16 45  8  5 10 39 45 15 10 16 11 34
##  [576] 49 10 33 47  3  7 35  5 18 25  5 18 10 44  5 11  5 32 16 39 41 36 35
##  [599] 15 10 19 41  6 45  5 12  7 33 20 16 11  7 10 36 10 18 15 28  5 10 44
##  [622]  5 41 11 47 10 28 10 45 20 35  3  3  3 23 36 35  4  5 23 16 32 18 20
##  [645] 47 45 35  5 20 34 21  4 32 20 19 20 35 25 21 39 45 35 45 50 45 15  2
##  [668] 44 28 10 11 21  5 50 10 21 20 25 11 10 33 11 10 43  4  5  5 11 35 35
##  [691] 15 10 39 22  5 46 36  7 10 10 21 16 16 16 10 38 20 26 46 45 41 49  5
##  [714] 10 10 15 35 28 45 50 42 25 38 16 35 45 46 46 39 28 28  4 36 35 45  5
##  [737] 49 44 28 44 28 16 35 36 39 16 16  3  2 35  5 10 36 20 10  5 28 38 25
##  [760] 10 10 13 30 45 11 16 47  5 10 36 28  5 15 39 24 45  5 52 25 45  5 10
##  [783] 35 19 38  6  5  3 45 44 36 16 16 25 44  5 45 29 10 15  6 45  5 11 25
##  [806]  6 25 38  5 15 40 44 42 15 15 35 10  5 25 10 47 36  5  6 36 44 32 13
##  [829] 11  5 39 49 39 28 35 50 25 42 23 16  6 15 48  1 10 47 35 45 45 35 35
##  [852] 25  5 50  5 45 10 10 39 42 45 39  6 32  6 28 28 40 40 28 29 21  4 35
##  [875] 10 23 11 50  5  5 16 49 39 49 15 15 15 11 11 15 45 10 11 18  5 30 32
##  [898] 47 52 27 42 44 39 45  5 20 28 10 33  6 32 11 49 39 28  5 26 47 15 36
##  [921]  5 40 40  2 36 50 45  9 45  5  5 11 36 44  7 23 17 35 35 15 49 35 10
##  [944] 39 36  5 16 25  5 24 32 44 47 36 36 28 45  3  5 45  4  2  2 13  5  5
##  [967] 39  5 23 45  6  4 45 35 18 24 24 24 24  5 18 27 10 47 39 18 49  5 46
##  [990] 45 10 15 45  5 40 40 23  5 15 17  7 47 25 44  5 40 26 21  5 36 13 14
## [1013] 21 39  6 10 20 21 35 46 45 45 45 45 16 13  2 39 36 20 18 32  5 45 32
## [1036] 40 38 23 31 18 35 36 40  7 17 39  5 39 15 11 40 10  5 23  5
as.numeric(B$Business_Associate_Involved)
##    [1]   1   1   1   1   1   1   1   1   1   1   1   1  70   1 177   1   1
##   [18]   1   1   1 184 135   1 220   1   1   1   1   1   1   1   1  95   1
##   [35]   1  41 140 139   1   1   1   1   1   1   1   1   1   1   1   1   1
##   [52]   1   1   1   1  58   1   1   1   1   1 128   1   1   1   1   1   1
##   [69]   1   1   1 212   1   1   1 137   1   1   1   1   1   1 104  71   1
##   [86]   1  68   1   1   1   1   1   1   1 188   1   1   1  71   1   1   1
##  [103]  33   1  26   1   1   1  46   1   1   1   1  90   1   1 221 113   1
##  [120]   1   1   1   1   1   1 127   1   1 133   1   1   1   1   1   1   1
##  [137] 129   1   1   1   1 134   1   1   1   1   1   1   1   1 160 122 122
##  [154]   1   1  25   1   1   1   1   1   1   1   1  47   1   1  17   1   1
##  [171] 151  80   1   1   1   1 118   1   1   1   1   1 225   1   1   1   1
##  [188] 216   1   1 132   1   1   1   1   1   1   1 167   1   1   1   1   1
##  [205]  92   1   1  88   1   1   1   1   1 105 207   1   1   1 232   1   1
##  [222]   1   1   1   1   1   1   1   1   1   1 213 115   1   1   1   1   1
##  [239] 112   1   1   1  91   1  44 229   1   1   1   1   1   1   1   1   1
##  [256]  62   1   1   1   1   1   1 218   1   1  93   1   1   1   1 108   1
##  [273]   1 100   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
##  [290]   1   1   1   1   1 193   1  45   1  15   1   1   1 125   1   1   1
##  [307]  85   1   1   1   1   1   1   1   1   1  27   1   1  74   1   1   1
##  [324]   1  30   1   1   1  31   1 181 180   1   1   1   1   1   1  32 131
##  [341] 130 131 131   1   1  29   1 131 131   5 206   1 131   1 190   1 141
##  [358]   1   1   1   1 124   1   1   1   1   1   1  34   1   1  20   1  86
##  [375]   1   1  87  87  87   1   1   1   1 208   1   4   1  87   1   1   1
##  [392]   1   1   1   1 106 196   1   1   1   1 209   1 161 161   1   1   1
##  [409]   1 183   1   1 136   1   1   1   1   1   1   1   1  42 142   1   1
##  [426]   1   1   1   1   1   1   1  16   1 178 227   1   1   1   1   6   1
##  [443]   1   1   1   1   1   1   3   1  14   1   1   1   1   1   1  77   1
##  [460]  35  49 203   1 116   1   1   1 201 201 172 204   1   1 172   1 202
##  [477]   1   1   1  48   1   1   1   1   1   1   1   1   1 101  67 224   1
##  [494]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
##  [511]   1   1   1   1   1   1   3   1   1   1   1   1   1   1  65   1   1
##  [528]   1   1   1   1   1   1 166   1   1   1   1   1   1   1   1   1   1
##  [545]   1 156 155   1   1   1   1   1  51   1   1   1   1   1  81  81  51
##  [562]   1   1  50   1 187   1   1   1   1   1   1   1   1   1   1   1   1
##  [579]   1  79   1   1   1   1 169   1   1   1   1   1   1   1   1   1   1
##  [596]   1   1 171 222   1   1   1   1   1   1   1   1   1   1  13   1   1
##  [613]   1   1  12   9  40   1   1  12  11   7   1  10   1   9   1   1   1
##  [630]   8   1  94  94   1 149   1 103   1   1   1   1   1 107  53 149   1
##  [647]   1   1  53   1 228   1 148  53   1  53   1   1   1   1   1  43   1
##  [664]   1 179 121   1 123 230   1   1   1  54   1   1   1   1   1   1  72
##  [681]   1   1  73   1   1   1   1   1 194  97 162   1   1   1 146  89   1
##  [698]   1   1 219 154  60  59  59   1   1   1   1   1   1   1 197 173   1
##  [715]   1   1 164   1   1 211   1   1   1   1   1   1   1   1 231   1   1
##  [732]   1   1   1  28  75   1   1  78 145  78   1   1   1   1 119   1   1
##  [749] 200   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1  18
##  [766] 174  64   1   1   1   1 143  83 231   1 186   1   1 157   1 117   1
##  [783]   1   1   1   1   1  98   1 126   1   1  24 111   1   1   1   1   1
##  [800]   1   1   1  55   1   1   1   1   1   1 109 158   1   1  38  38   1
##  [817]   1   1 111   1   1   1   1 147   1   1   1   1 153   1   1   1   1
##  [834]   1   1  99   1  37   1   1   1   1   1   1   1   1   1   1   1 168
##  [851] 168 182   1   1   1   1 191   1   1   1   1   1   1   1   1   1   1
##  [868] 217   1   1   1   1  19   1  96  96   1   1   1   1   1   1   1   1
##  [885]   1 170   1   1   1   1   1   1   1   1   1   1 114 138   1   1  69
##  [902]  56 231   1   1   1   1   1   1  57   1   1   1 176   1 159   1 226
##  [919]   1   1   1 215 215  36   1 223   1   1   1   1   1   1   1   1   1
##  [936]   1   1   1   1   1   1   1   1   1   1   1   1 195 195 195   1 195
##  [953]   1 152   1   1   1  66   1 192   1 165 165   1 199   1   1   1   1
##  [970]   1   1   1 210 195   1   1 102 102 102 198 205   1   1   1   1   1
##  [987]   1   1   1 175 163   1  82   1   1   1   1 199   1   1  52  23   1
## [1004]   1   1  22   1 185  61   1 214   1   1   1   1   1   1 110   1 189
## [1021]   1   1   1   1  84   1   1   1 150   1 120   1  63   1   2   1   1
## [1038]  39   1   1   1   1   1 144   1   1   1   1   1   1  21  76   1   1
## [1055]   1
as.numeric(B$Type_of_Breach)
##    [1] 12 12 12  5 12 12 12 12 12 12 12 12 11 12  1 12 12 11 12 12 12 12 12
##   [24] 12 12  5 12 12 12 12 12 12 12 12 12 11 12 12 12 12 11 11 12 12 12 24
##   [47] 12 12 12 12 11 12 12  1 12 24 12 15 11  5 12 11 12 12 11 12 12 12  3
##   [70] 12 12  5 12  6 12 12 12  5 12 19 12 12 12 12 12 11 12  5  3 12 12 12
##   [93] 12 11 12 12  1  5 12 11 12 12  5 12 11 12 11 11  7 12 12 11 12 12 12
##  [116] 12 11  5 12 12 12  3  1  3 12 11  1 12  5 12 12 11 11 12 12 15 11 12
##  [139]  3 12  1  5 12 12  5 12 12 12 12 12  3 12 12  5  5 11 12 23 12 12  5
##  [162] 12 12 12 11 12  3 12 12  3 12 12 12  3  1 12  3 12 12 12 11 12 11  1
##  [185] 12 12  5 25 23 12 23 12 12 12 23 12 12 12 12 23 16 12 12  3 23 12 21
##  [208] 23  1 12 12  3 12 23 12 12 23 15 12 12  5 12 12  1 12 12  1 15 12  1
##  [231] 12  5 12 12 12 12 12 12  1 15 12  1 12 23 12 23  5 28  5 12 23 28 23
##  [254] 12  5 23 28 12 12 23 12 12 12 12 12  5 12 23 12  3 28 12  5 23  1  1
##  [277]  1  1  1  1  1  3 12 12 12 12 23 12 12 12 20 12 12  3 12 12 23 12 25
##  [300] 12 15 28 12 28  6 12  5 12 23 12  5 23 12 12 28 23 12 12 23  5 12 23
##  [323] 12 23  1 12 12  1 23  3 23 23 12 12 12 12 23 12  1 12 12 12 12 23  5
##  [346]  1 12 12 12 12 12 23 12 12 20 12 23 12  1 12 12  1 12 12 12 12 25 12
##  [369]  5  5 12 20 12 20 12 12 12  5 12 12 12 12  5 20 12 12 12 12 12 12 23
##  [392] 12 12 12  5  5 15 23 12  5 12 12 28 12 12 12 12  3 12  5 12 15 27 12
##  [415] 12 23  5 23  5 12 11 23  3 12 12 12 11 12 12 12 12 12 27 12 23 11 12
##  [438] 12 11  5 27 12 20 12  2 12 11 12 12 11 11 12 12  1 23 11  5 20 12 12
##  [461] 23 12 12 12 23 11 12 12 12 12 12 12 12 12 12 12 12 12 11 11 23 12 23
##  [484] 23 12 23  5  5 23 23 12  1  3 12 11 12  3 29 12 12 23 12 23 12 12 25
##  [507] 12  1 12  3 25 12  3 23 12  5 12 15 23 12 23 15 12 12 23 12 12 12 10
##  [530] 12 12  5  5 12 12 12 12 12 12 12 12 23 12 23 12  9 18 12 26  5 12 12
##  [553]  1 11 12 12 12 12 12 12  1 12 12  1 12 12 12 12 12  5 27 12 12 12 25
##  [576] 12 27  1 12  5 12 12 12 23  3 12  5 21 12 12 12 12 12  1 25 12  3 12
##  [599] 12 12 20  5  5  5 11  1 12 12 12 11 12 12 12 12 12 12 23  3 23 12 12
##  [622] 20 12 12  5 12  1 12 23 12 12 11 11 11 12 12 23 12 23 11 12 13  1  1
##  [645] 12  1 12 12  1  3 23 23 12  1 12  1 12 12 12  5 23 12 12 12 12  3 12
##  [668] 12 12 12  5 12 19 23 11 12 12 12 12 23 12 23 23 12 12 12 12 12 12 12
##  [691] 23 23 12 11 12  5 12 23 12 12  1 20 20 20 12 12  1  5 17 23 29 12 23
##  [714] 20 22 12 12 20 12 23  5 12 12 23 12 11 23 12  5 14 12 12 12  5  5  3
##  [737]  3 23  1 11  1 12 11  5  1  3  1 20 12 12 11 20 12  1 23 12  3 23  3
##  [760] 20 23 10  5 12 12 11 12 11 12 11 11 23 11  5  5  3 23 12 23 23 21 23
##  [783]  5 23 23 20 12 12 23 23 12 23 11 23 12 11 15  4 11 12  1 15 11 11 12
##  [806] 12 12 23 12  1 12  5  3  1  1 12 28 23 23 20  1 12 12 12 12 12 12 23
##  [829] 12 12 20 11 12 11  3 23 23  1  1 12  3 23 12 23 23 12  5 12 12 23 23
##  [852] 23 12 12 12 12  3 23 23 12 11  5 20 12 19 23 23 23 23 11 19 23 12  5
##  [875] 12 12 12 12  5  5 12 12 23  1 23 23 15 11 12 11 23 23  5 12  1 11  5
##  [898] 23 23 23 12 12  5 11 12 23 11 23 12 23 12  5 23 11 23 12  1 27 11 28
##  [921] 12 12 20 11 23  5  5 11  1 12 12 12 12 12 12 12 23 12  5 12 27 19  5
##  [944] 23 11  1 12 23 23 23 12 23 12 27 11 15 12 11  3  1 11  1  1 11 12 12
##  [967] 11 12 20 12 11 12  6 23 11  8  8  8  8 12 12 12  5  7 23 11 11 12 23
##  [990] 23 23  1 12  1 12 12 12 12 11  5  5 12 23 23  5 12 12 25 20  5 11 12
## [1013] 12 12  1 20 12 23 23  1  1  1  1  1  1 23 12  5 11 23 11 20 11 12 11
## [1036] 23 23 25 12 12 23 23 23 12 23  8 12 11 12 11 12  1 15 23 12
as.numeric(B$Location_of_Breached_Information)
##    [1] 41 31 37 17  1  1  1  1  1 17 17 17 41 35  6 36 18 12 38 35 41 41 37
##   [24] 35 35 37 17  1 17 35 17 40 35 41 17 41 41 41  1 41  1  1 31 35 17 31
##   [47] 17 41 17 37 41 17 17 31 17 12 36 41 35 41 17 41 17 35 35 17 17 41 41
##   [70]  1 37 35 40 41 18 37 41 41 17 14 31  1 17 17 17 41 17 35 41 17 17 17
##   [93] 17 41 35 17 31 17 17 12 19 31 35 31 12 17 41 11 37 37 36 41 17  1 36
##  [116]  8 41 38  1 17  1 41 31 41  1 41 31 41 35 17 17 17 35 17 31 41 37 17
##  [139] 41 17  1 35  1  1 37 17 17 17 23 37 41 37 36 37 36 31 17 15 41 41 36
##  [162] 18 41 30 41  1 41 37 17 41 37 37 31 41  1  1 41 17 26 41 35 17 11 31
##  [185] 37 35 41 31 31  1 37 17  1 17 41 17 17 41 31 41 41  1  1 41 41  1 18
##  [208] 31  4  1 17 41 37 35 17 17 12 37 41 37 37 31 18 31 17 17 32 37  1  1
##  [231] 18 37 35 37 17 37  1 41 31 17 41  1 39 35 37 31 37  1 37 41 41 35  1
##  [254] 17 10 41 15  1  7 41 17  1 17 37 37 35 31 35 30 41 35 41 41 35  1 31
##  [277] 19 19 19 19 19 35 17 17 17 17 35 41  1  1 41 40 17 35 41 41 41 41 31
##  [300]  1 37 41 37 12 41 37 41 17 41  1 37 35 17 17 41 41 17 20 41 35  4 39
##  [323] 17 41 31  1 41 31 31 41 41 41 41 17 17  1 41 36 31 37 37 37 37 41 37
##  [346] 31 37 37 37 35 17 17 37 29 21  4 35 41 31 37 17  1 41 41 17 41 31 18
##  [369] 37 41 17 30 41 41 41 35 37 37 37  1 17 37 35 41 25 17 17 35 41 41 15
##  [392] 35 35  1 35 35  1 31 41 37 17 41 41  1  1 31 37 31 40 35 37 17 35  1
##  [415] 17 16 36 41 35 35 41 31 41 41 17 35 41 28 17 17 17  4 32 17 35 41 41
##  [438] 17 41 35 41 17 41  1 12 36 41 17 17 41 35 41 17 35 41 35 36 17 17 17
##  [461] 35 17 31 35  1 41 17 17 28 17 17 41 17 17 41 28 17 36 31 41 35 41 35
##  [484] 12 41 12 41 36 12 41 35 31 41  4 35  3 41 35  1  1 35 41 12 41 31 31
##  [507] 41 31 31 41 31 41 41 12 17 36 17 17 35 35 35 17 17 17 35 41 16  1 17
##  [530] 35  9 41 36 17 36  1 13 41 17 17 28  1 17 15 41 36 36 41 31 36 17  1
##  [553] 35 31 35 31 28 41 17 17 35 35  1 35 16 17  1 15 17 36 41 17 36 17 31
##  [576] 17 41 31 36 36 28 17  1 12 41 17 36 31 17 17 35  1 41 31 33 41 41 31
##  [599] 41 15 15 35 41 36 35  1 36 17  1 35 41  1  1 17  1  1 31 41 41  1  1
##  [622]  1 17  1 36  1 12 41 34  1  1 41 41 41 17 26 35 41 35 17 17 31 17 31
##  [645] 17  1 41  1 31 41 41 41 17 31 17 31 17 17 17 35 15 41 17 41  1 41  1
##  [668] 17  1 17 17 17 41 35 41 17 36 17 17 41 17 31 41  1 17  1 41 17 41 36
##  [691] 35 15 41 12 17 36  1 31 17 17 31 31 31 31 11  3 31 17 41 41 41 17 41
##  [714] 15 31 15 31 30 41 41 17 41 17 35  1 12 35  1 41 41  1 36 35 36 36 41
##  [737] 41 12 31  1 31 17 12 17 31 35  1 35 17 17 35 15 12  1  1 17 41 41 12
##  [760]  5 41 35 36 17 17 41 41 41 17 41 12 12 41 41 41 35 15  1 35  1  4  1
##  [783] 36  1 35 31 17 17 36 31 17 15 35 41 17 41 36  8 41  1  1 17 31 12 28
##  [806] 17  4 15 17 35 31 35 35 31 31 31 35 15 41  4 12 17 17 17 31 17 17 15
##  [829] 12 30 41 41 17 12 41 41 12 31 31 17 41 12 17 12  1 16 36  1 17 31 31
##  [852] 12 17 17 17  1 41 17 31 17 41 36 15 17 17 41 35 41 41 35 41 12 35 36
##  [875] 17 17 17 36 36 36  1 17 15  1 31 12 17  1 17 41 35  1 17 36 31 12 36
##  [898] 31 31  2 17 17 41 41  1 12 41  1 17 12 17  1 27 41 35 17 31 41 12 35
##  [921] 17 31 31 35 41 36 36 17 31 36 17 17 31 17 41 31 36 11 36 39 41 17 41
##  [944] 31 35 12 17 31 31 31  1 31 17 35 15 17  1 35 41 31 35 31 31 24  2 12
##  [967] 35  1 41 35 18 17 36 31 15 22 22 22 22  1 17 41 36 36 12 12 12 36 12
##  [990] 35 41 12 22 31 35 35 36  2 41 41 41 41 35 41 36 36 35 31 41 41 35 36
## [1013] 17 17 12 17 35 41 12 31 12 12 12 12 31 41 41 41 41 16 12 41 41 17 31
## [1036] 41 35 12  1 36  1 33 41 17 12 41  1 14 11 12 35 35 36 12  1

The following are ANOVA results for the interaction efects. Note for the code to quickly run it had to be done on the subset of the data set by removing 600 of the largest for Individuals_Affected. This is assuming that the results of ANOVA would be similar to if it was done on the full data set.

Bsub <- subset(B,Individuals_Affected<1846)
inter1 <- aov(Individuals_Affected ~ State * Business_Associate_Involved, data = Bsub)
anova(inter1)
## Analysis of Variance Table
## 
## Response: Individuals_Affected
##                                    Df   Sum Sq Mean Sq F value  Pr(>F)  
## State                              47  6028611  128268  0.9475 0.57420  
## Business_Associate_Involved        89 13214151  148474  1.0968 0.28121  
## State:Business_Associate_Involved   4  1242585  310646  2.2947 0.05923 .
## Residuals                         314 42508025  135376                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
inter2 <-aov(Individuals_Affected ~ State * Type_of_Breach, data = Bsub)
anova(inter2)
## Analysis of Variance Table
## 
## Response: Individuals_Affected
##                       Df   Sum Sq Mean Sq F value Pr(>F)
## State                 47  6028611  128268  0.9493 0.5709
## Type_of_Breach        16  1354983   84686  0.6267 0.8616
## State:Type_of_Breach 104 16829150  161819  1.1976 0.1245
## Residuals            287 38780628  135124
inter3 <- aov(Individuals_Affected ~ State * Location_of_Breached_Information, data = Bsub)
anova(inter3)
## Analysis of Variance Table
## 
## Response: Individuals_Affected
##                                         Df   Sum Sq Mean Sq F value Pr(>F)
## State                                   47  6028611  128268  0.9068 0.6471
## Location_of_Breached_Information        24  2881454  120061  0.8487 0.6719
## State:Location_of_Breached_Information 142 19992140  140790  0.9953 0.5074
## Residuals                              241 34091168  141457
inter4 <- aov(Individuals_Affected ~ Business_Associate_Involved * Type_of_Breach, data = Bsub)
anova(inter4)
## Analysis of Variance Table
## 
## Response: Individuals_Affected
##                                             Df   Sum Sq Mean Sq F value
## Business_Associate_Involved                 89 13188717  148188  1.0898
## Type_of_Breach                              12  1422178  118515  0.8716
## Business_Associate_Involved:Type_of_Breach   1   519707  519707  3.8221
## Residuals                                  352 47862769  135974        
##                                             Pr(>F)  
## Business_Associate_Involved                0.29127  
## Type_of_Breach                             0.57634  
## Business_Associate_Involved:Type_of_Breach 0.05137 .
## Residuals                                           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
inter5 <- aov(Individuals_Affected ~ Business_Associate_Involved * Location_of_Breached_Information, data = Bsub)
anova(inter5)
## Analysis of Variance Table
## 
## Response: Individuals_Affected
##                                   Df   Sum Sq Mean Sq F value Pr(>F)
## Business_Associate_Involved       89 13188717  148188  1.0956 0.2811
## Location_of_Breached_Information  23  3547693  154248  1.1404 0.2990
## Residuals                        342 46256962  135254
inter6 <- aov(Individuals_Affected ~ Type_of_Breach * Location_of_Breached_Information, data = Bsub)
anova(inter6)
## Analysis of Variance Table
## 
## Response: Individuals_Affected
##                                                  Df   Sum Sq Mean Sq
## Type_of_Breach                                   16  1608222  100514
## Location_of_Breached_Information                 23  3303253  143620
## Type_of_Breach:Location_of_Breached_Information  35  4588848  131110
## Residuals                                       380 53493049  140771
##                                                 F value Pr(>F)
## Type_of_Breach                                   0.7140 0.7801
## Location_of_Breached_Information                 1.0202 0.4381
## Type_of_Breach:Location_of_Breached_Information  0.9314 0.5839
## Residuals

It’s apparent that none of the interaction effects are significant.

Diagnostics/Model Adequacy Checking

qqnorm(residuals(inter1))
qqline(residuals(inter1))

qqnorm(residuals(inter2))
qqline(residuals(inter2))

qqnorm(residuals(inter3))
qqline(residuals(inter3))

qqnorm(residuals(inter4))
qqline(residuals(inter4))

qqnorm(residuals(inter5))
qqline(residuals(inter5))

qqnorm(residuals(inter6))
qqline(residuals(inter6))

4.Contingencies

In this case on nonnoramlity, a nonparametric test could be utilized. Also, find a way to perform a complete anova on all of the data set.

5. References to the literature

[1]https://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf

6. Appendices

https://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf #A summary of, or pointer to the raw dat summary(Ecdat::breaches) #Complete and Documented R code