Introduction: The US Department of Health and Human Services collects and reports disclosures of protected health information as mandated by law. Cases of breaches over 500 people are required to be reported. This includes the State, number of individuals affected, covered entity (Organization responsible), Breach submission Date, type of breach, and Location of breached information.

Total Affected By State

breach_archive %>% 
  group_by(`State`) %>% 
  summarize('Total-Affected' = sum(`Individuals Affected`)) %>% 
  knitr::kable()
State Total-Affected
AK 9053
AL 1081417
AR 114959
AZ 1137038
CA 3052133
CO 216750
CT 218414
DC 38519
DE 3562
FL 6001825
GA 611399
HI 14336
IA 42253
ID 14962
IL 4692107
IN 79576765
KS 88971
KY 197724
LA 114140
MA 181740
MD 350226
ME 2774
MI 213166
MN 184145
MO 151367
MS 99706
MT 1156519
NC 370661
ND 15102
NE 51317
NH 239339
NJ 3051796
NM 68082
NV 97492
NY 2782138
OH 707014
OK 332619
OR 300949
PA 1376521
PR 1704916
RI 45877
SC 765107
SD 23779
TN 1724277
TX 4040208
UT 876270
VA 5158001
VT 3215
WA 690054
WI 122555
WV 81035
WY 55384

This table gives insight into the states with the most breaches. This can give the HHS insight into which states need the most attention or reform. As well as give clues as to why the State is so susceptible to Breaches

Instances of Breaches under 1000

breach_archive %>% 
  filter(`Individuals Affected` < 1000) %>% 
  group_by(`Type of Breach`) %>% 
  count(`Type of Breach`) %>%
  knitr::kable()
Type of Breach n
Hacking/IT Incident 31
Hacking/IT Incident, Other 1
Hacking/IT Incident, Theft, Unauthorized Access/Disclosure 1
Hacking/IT Incident, Unauthorized Access/Disclosure 1
Improper Disposal 13
Improper Disposal, Theft 1
Improper Disposal, Unauthorized Access/Disclosure 1
Loss 42
Loss, Theft 3
Loss, Unauthorized Access/Disclosure 2
Other 16
Other, Unauthorized Access/Disclosure 1
Theft 186
Theft, Unauthorized Access/Disclosure 6
Unauthorized Access/Disclosure 144
Unknown 4

Most Breaches with under 1000 individuals affected are via theft or unauthorized access

Affected Individuals by Breach Type

breach_archive %>% 
  group_by(`Type of Breach`) %>% 
  summarize('Total-Affected' = sum(`Individuals Affected`)) %>% 
  knitr::kable()
Type of Breach Total-Affected
Hacking/IT Incident 87385368
Hacking/IT Incident, Other 3720
Hacking/IT Incident, Other, Unauthorized Access/Disclosure 4354
Hacking/IT Incident, Theft 27800
Hacking/IT Incident, Theft, Unauthorized Access/Disclosure 13800
Hacking/IT Incident, Unauthorized Access/Disclosure 181253
Improper Disposal 889249
Improper Disposal, Loss 5690
Improper Disposal, Loss, Theft 53338
Improper Disposal, Theft 501
Improper Disposal, Theft, Unauthorized Access/Disclosure 17300
Improper Disposal, Unauthorized Access/Disclosure 727
Loss 7821407
Loss, Other 34534
Loss, Other, Theft 2600
Loss, Theft 98965
Loss, Unauthorized Access/Disclosure 3210
Loss, Unauthorized Access/Disclosure, Unknown 2533
Loss, Unknown 7335
Other 923010
Other, Theft 10259
Other, Theft, Unauthorized Access/Disclosure 28396
Other, Unauthorized Access/Disclosure 140544
Other, Unknown 317082
Theft 18552308
Theft, Unauthorized Access/Disclosure 242368
Unauthorized Access/Disclosure 5566337
Unknown 1915690

The largest amount of people affected by breaches have come from hacking/IT incidents, theft or loss

Individuals Affected by Year

breach_archive %>%
  mutate(year = year(`Breach Submission Date`)) %>% 
  group_by(`year`) %>%
  summarize('Total-Affected' = sum(`Individuals Affected`)) %>% 
  knitr::kable()
year Total-Affected
2009 134773
2010 5931508
2011 13149977
2012 2837677
2013 7007220
2014 8139638
2015 80142606
2016 6096750
2017 808480
2018 1049

2011 and 2015 had the most individuals affected

Number of Healthcare Data Breaches by Year

breach_archive %>%
  mutate(year = year(`Breach Submission Date`)) %>% 
  ggplot(aes(x = `year`, y = `Individuals Affected`)) +
  geom_col()+
  labs(y = 'Total Affected')

2015 and 2011 had the most Affected Individuals

Top 25 breaches

breach_archive %>% 
  arrange(desc(`Individuals Affected`)) %>% 
  select(`Name of Covered Entity`, `Covered Entity Type`, `Individuals Affected`, `Breach Submission Date`, `Type of Breach`, `Location of Breached Information`) %>% 
  head(25) %>% 
  knitr::kable()
Name of Covered Entity Covered Entity Type Individuals Affected Breach Submission Date Type of Breach Location of Breached Information
Anthem (Working file) Health Plan 78800000 2015-02-13 Hacking/IT Incident Network Server
Science Applications International Corporation (SA Business Associate 4900000 2011-11-04 Loss Other
Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group Healthcare Provider 4029530 2013-08-23 Theft Desktop Computer
21st Century Oncology Healthcare Provider 2213597 2016-03-04 Hacking/IT Incident Network Server
Xerox State Healthcare, LLC Business Associate 2000000 2014-09-10 Unauthorized Access/Disclosure Desktop Computer, Email, Laptop, Network Server, Other, Other Portable Electronic Device
IBM Business Associate 1900000 2011-04-14 Unknown Other
GRM Information Management Services Business Associate 1700000 2011-02-11 Theft Electronic Medical Record, Other
AvMed, Inc. Health Plan 1220000 2010-06-03 Theft Laptop
Montana Department of Public Health & Human Services Health Plan 1062509 2014-07-07 Hacking/IT Incident Network Server
The Nemours Foundation Healthcare Provider 1055489 2011-10-07 Loss Other
BlueCross BlueShield of Tennessee, Inc. Health Plan 1023209 2010-11-01 Theft Other
Sutter Medical Foundation Healthcare Provider 943434 2011-11-17 Theft Desktop Computer
Valley Anesthesiology Consultants, Inc. d/b/a Valley Anesthesiology and Pain Consultants Healthcare Provider 882590 2016-08-12 Hacking/IT Incident Network Server
Horizon Healthcare Services, Inc., doing business as Horizon Blue Cross Blue Shield of New Jersey, and its affiliates Business Associate 839711 2014-01-03 Theft Laptop
Iron Mountain Data Products, Inc. (now known as Business Associate 800000 2010-07-19 Loss Electronic Medical Record, Other, Other Portable Electronic Device
Utah Department of Technology Services Business Associate 780000 2012-04-11 Hacking/IT Incident Network Server
AHMC Healthcare Inc. and affiliated Hospitals Healthcare Provider 729000 2013-10-25 Theft Laptop
EISENHOWER MEDICAL CENTER Healthcare Provider 514330 2011-03-30 Theft Desktop Computer
Radiology Regional Center, PA Healthcare Provider 483063 2016-02-12 Loss Paper/Films
Puerto Rico Department of Health - Triple S Management Corp. Health Plan 475000 2010-11-04 Unauthorized Access/Disclosure Network Server
St Joseph Health System Healthcare Provider 405000 2014-02-05 Hacking/IT Incident Network Server
Spartanburg Regional Healthcare System Healthcare Provider 400000 2011-05-27 Theft Desktop Computer
Triple-S Salud, Inc. - Breach Case#2 Health Plan 398000 2014-01-24 Theft Network Server
Triple-S Salud, Inc. Health Plan 398000 2010-11-18 Theft Network Server
Community Health Plan of Washington Health Plan 381504 2016-12-21 Hacking/IT Incident Network Server, Other

This shows the largest breaches since 2009

Top 10 Breached States

breach_archive %>% 
  group_by(State) %>% 
  summarize(`Total Affected` = sum(`Individuals Affected`)) %>% 
  arrange(desc(`Total Affected`)) %>% 
  slice_max(`Total Affected`, n = 10) %>% 
  ggplot(aes(x = `State`, y = `Total Affected`)) +
  geom_col()+
  labs(title = 'Individuals affected by state')

Hacking Incidents by Month

breach_archive %>%
  mutate(`month` = month(`Breach Submission Date`)) %>%
  group_by(`month`) %>% 
  summarize(`Hacking total` = sum(`Hacking`)) %>% 
  ggplot(aes(x = `month`, y = `Hacking total`))+
  geom_col()

Breaches by Covered Entity

Covered_Entity_Breach <- 
  breach_archive %>% 
  group_by(`Covered Entity Type`) %>% 
  summarize(`Total Affected` = sum(`Individuals Affected`))

Breaches by day of the week

breach_archive %>% 
  mutate(`wday` = wday(`Breach Submission Date`, label = TRUE)) %>% 
  group_by(`wday`) %>% 
  summarize(`Breaches` = n()) %>% 
  ggplot(aes(x = `wday`, y = `Breaches`))+
  geom_col()

#Years with 50 business associate breaches and 150 healthcare provider breaches

breach_archive %>%
  mutate(year = year(`Breach Submission Date`)) %>% 
  group_by(year) %>% 
  summarize(type = sum(`Covered Entity Type` == 'Business Associate'),
            type2 = sum(`Covered Entity Type` == 'Healthcare Provider')) %>% 
  filter(type > 50) %>% 
  filter(type2 > 150) %>% 
  knitr::kable()
year type type2
2013 64 187
2014 67 179

Breaches by State and Location

I am interested in the method of breach by State because this can help educate policy makers when it comes to data control. If a state is seeing the primary form of breach is coming from paper documents, the procedure can be revised so that documents must be checked out, adding accountability.

breach_archive %>% 
  group_by(`State`, `Location of Breached Information`) %>% 
  summarize(`Breach` = n(), `Total Affected` = sum(`Individuals Affected`)) %>% 
  filter(Breach >= 5) %>% 
  knitr::kable()
## `summarise()` has grouped output by 'State'. You can override using the
## `.groups` argument.
State Location of Breached Information Breach Total Affected
AL Laptop 5 16397
AL Network Server 6 63191
AZ Laptop 6 12154
AZ Network Server 5 944126
AZ Other Portable Electronic Device 5 17387
AZ Paper/Films 11 56053
CA Desktop Computer 29 848887
CA Electronic Medical Record 7 14350
CA Email 19 43760
CA Laptop 31 1231289
CA Network Server 23 257537
CA Other 18 159926
CA Other Portable Electronic Device 10 119271
CA Other, Other Portable Electronic Device 6 51459
CA Paper/Films 48 226957
CO Email 5 19771
CO Laptop 5 11285
CO Paper/Films 9 13903
CT Laptop 5 77348
CT Paper/Films 5 8849
DC Paper/Films 5 9656
FL Desktop Computer 9 42777
FL Electronic Medical Record 11 217870
FL Email 9 27816
FL Laptop 13 1270873
FL Network Server 14 2618172
FL Other 12 1116298
FL Other Portable Electronic Device 6 6983
FL Paper/Films 36 638792
GA Email 5 15644
GA Laptop 13 43715
GA Network Server 7 47356
GA Other 5 333760
GA Paper/Films 13 46481
IL Email 8 29466
IL Laptop 12 54105
IL Network Server 11 57554
IL Other 5 16435
IL Paper/Films 30 241601
IN Laptop 11 297188
IN Network Server 9 78848789
IN Other 8 30045
IN Paper/Films 11 323737
KY Email 6 11714
KY Laptop 7 22969
KY Other Portable Electronic Device 5 10828
MA Desktop Computer 5 21817
MA Laptop 7 24318
MA Network Server 7 34859
MA Paper/Films 6 56705
MD Laptop 5 225067
MD Paper/Films 5 17815
MI Email 7 9640
MI Network Server 6 32651
MI Paper/Films 10 24061
MN Laptop 5 32729
MN Network Server 10 50897
MN Paper/Films 11 38851
MO Other 8 26764
MO Paper/Films 8 33978
NC Email 6 102794
NC Network Server 5 57701
NC Other 8 85957
NC Other Portable Electronic Device 5 18185
NC Paper/Films 12 29722
NM Laptop 7 25603
NM Paper/Films 5 19186
NY Desktop Computer 10 94080
NY Email 8 65115
NY Laptop 14 57818
NY Network Server 7 72717
NY Other Portable Electronic Device 12 46197
NY Paper/Films 15 46618
OH Laptop 9 167728
OH Network Server 6 326885
OH Other 7 22467
OH Paper/Films 19 145447
OK Email 5 18614
OR Laptop 7 62721
PA Email 5 6236
PA Laptop 8 110472
PA Network Server 8 106181
PA Other 11 164622
PA Paper/Films 15 44167
PR Laptop 5 48051
PR Network Server 5 1284000
PR Paper/Films 7 92238
RI Paper/Films 7 18276
SC Laptop 5 53896
SC Paper/Films 8 15221
TN Laptop 9 62460
TN Network Server 6 423113
TN Other 6 1137594
TN Paper/Films 12 25411
TX Desktop Computer 13 156636
TX Electronic Medical Record 6 52129
TX Email 11 52858
TX Laptop 30 122388
TX Network Server 23 907861
TX Other 16 356766
TX Other Portable Electronic Device 11 47238
TX Other, Other Portable Electronic Device 6 59730
TX Paper/Films 24 248123
VA Other Portable Electronic Device 5 14922
VA Paper/Films 7 186829
WA Laptop 7 17561
WA Network Server 5 37112
WA Paper/Films 10 17838
WI Laptop 5 14626

Paper breaches are most common, however data breaches cause the mos affected individuals. This leads me to believe that tighter controls and security should be placed on digital records in larger states like California and Florida.