Introduction: The US Department of Health and Human Services collects and reports disclosures of protected health information as mandated by law. Cases of breaches over 500 people are required to be reported. This includes the State, number of individuals affected, covered entity (Organization responsible), Breach submission Date, type of breach, and Location of breached information.
breach_archive %>%
group_by(`State`) %>%
summarize('Total-Affected' = sum(`Individuals Affected`)) %>%
knitr::kable()
| State | Total-Affected |
|---|---|
| AK | 9053 |
| AL | 1081417 |
| AR | 114959 |
| AZ | 1137038 |
| CA | 3052133 |
| CO | 216750 |
| CT | 218414 |
| DC | 38519 |
| DE | 3562 |
| FL | 6001825 |
| GA | 611399 |
| HI | 14336 |
| IA | 42253 |
| ID | 14962 |
| IL | 4692107 |
| IN | 79576765 |
| KS | 88971 |
| KY | 197724 |
| LA | 114140 |
| MA | 181740 |
| MD | 350226 |
| ME | 2774 |
| MI | 213166 |
| MN | 184145 |
| MO | 151367 |
| MS | 99706 |
| MT | 1156519 |
| NC | 370661 |
| ND | 15102 |
| NE | 51317 |
| NH | 239339 |
| NJ | 3051796 |
| NM | 68082 |
| NV | 97492 |
| NY | 2782138 |
| OH | 707014 |
| OK | 332619 |
| OR | 300949 |
| PA | 1376521 |
| PR | 1704916 |
| RI | 45877 |
| SC | 765107 |
| SD | 23779 |
| TN | 1724277 |
| TX | 4040208 |
| UT | 876270 |
| VA | 5158001 |
| VT | 3215 |
| WA | 690054 |
| WI | 122555 |
| WV | 81035 |
| WY | 55384 |
This table gives insight into the states with the most breaches. This can give the HHS insight into which states need the most attention or reform. As well as give clues as to why the State is so susceptible to Breaches
breach_archive %>%
filter(`Individuals Affected` < 1000) %>%
group_by(`Type of Breach`) %>%
count(`Type of Breach`) %>%
knitr::kable()
| Type of Breach | n |
|---|---|
| Hacking/IT Incident | 31 |
| Hacking/IT Incident, Other | 1 |
| Hacking/IT Incident, Theft, Unauthorized Access/Disclosure | 1 |
| Hacking/IT Incident, Unauthorized Access/Disclosure | 1 |
| Improper Disposal | 13 |
| Improper Disposal, Theft | 1 |
| Improper Disposal, Unauthorized Access/Disclosure | 1 |
| Loss | 42 |
| Loss, Theft | 3 |
| Loss, Unauthorized Access/Disclosure | 2 |
| Other | 16 |
| Other, Unauthorized Access/Disclosure | 1 |
| Theft | 186 |
| Theft, Unauthorized Access/Disclosure | 6 |
| Unauthorized Access/Disclosure | 144 |
| Unknown | 4 |
Most Breaches with under 1000 individuals affected are via theft or unauthorized access
breach_archive %>%
group_by(`Type of Breach`) %>%
summarize('Total-Affected' = sum(`Individuals Affected`)) %>%
knitr::kable()
| Type of Breach | Total-Affected |
|---|---|
| Hacking/IT Incident | 87385368 |
| Hacking/IT Incident, Other | 3720 |
| Hacking/IT Incident, Other, Unauthorized Access/Disclosure | 4354 |
| Hacking/IT Incident, Theft | 27800 |
| Hacking/IT Incident, Theft, Unauthorized Access/Disclosure | 13800 |
| Hacking/IT Incident, Unauthorized Access/Disclosure | 181253 |
| Improper Disposal | 889249 |
| Improper Disposal, Loss | 5690 |
| Improper Disposal, Loss, Theft | 53338 |
| Improper Disposal, Theft | 501 |
| Improper Disposal, Theft, Unauthorized Access/Disclosure | 17300 |
| Improper Disposal, Unauthorized Access/Disclosure | 727 |
| Loss | 7821407 |
| Loss, Other | 34534 |
| Loss, Other, Theft | 2600 |
| Loss, Theft | 98965 |
| Loss, Unauthorized Access/Disclosure | 3210 |
| Loss, Unauthorized Access/Disclosure, Unknown | 2533 |
| Loss, Unknown | 7335 |
| Other | 923010 |
| Other, Theft | 10259 |
| Other, Theft, Unauthorized Access/Disclosure | 28396 |
| Other, Unauthorized Access/Disclosure | 140544 |
| Other, Unknown | 317082 |
| Theft | 18552308 |
| Theft, Unauthorized Access/Disclosure | 242368 |
| Unauthorized Access/Disclosure | 5566337 |
| Unknown | 1915690 |
The largest amount of people affected by breaches have come from hacking/IT incidents, theft or loss
breach_archive %>%
mutate(year = year(`Breach Submission Date`)) %>%
group_by(`year`) %>%
summarize('Total-Affected' = sum(`Individuals Affected`)) %>%
knitr::kable()
| year | Total-Affected |
|---|---|
| 2009 | 134773 |
| 2010 | 5931508 |
| 2011 | 13149977 |
| 2012 | 2837677 |
| 2013 | 7007220 |
| 2014 | 8139638 |
| 2015 | 80142606 |
| 2016 | 6096750 |
| 2017 | 808480 |
| 2018 | 1049 |
2011 and 2015 had the most individuals affected
breach_archive %>%
mutate(year = year(`Breach Submission Date`)) %>%
ggplot(aes(x = `year`, y = `Individuals Affected`)) +
geom_col()+
labs(y = 'Total Affected')
2015 and 2011 had the most Affected Individuals
breach_archive %>%
arrange(desc(`Individuals Affected`)) %>%
select(`Name of Covered Entity`, `Covered Entity Type`, `Individuals Affected`, `Breach Submission Date`, `Type of Breach`, `Location of Breached Information`) %>%
head(25) %>%
knitr::kable()
| Name of Covered Entity | Covered Entity Type | Individuals Affected | Breach Submission Date | Type of Breach | Location of Breached Information |
|---|---|---|---|---|---|
| Anthem (Working file) | Health Plan | 78800000 | 2015-02-13 | Hacking/IT Incident | Network Server |
| Science Applications International Corporation (SA | Business Associate | 4900000 | 2011-11-04 | Loss | Other |
| Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group | Healthcare Provider | 4029530 | 2013-08-23 | Theft | Desktop Computer |
| 21st Century Oncology | Healthcare Provider | 2213597 | 2016-03-04 | Hacking/IT Incident | Network Server |
| Xerox State Healthcare, LLC | Business Associate | 2000000 | 2014-09-10 | Unauthorized Access/Disclosure | Desktop Computer, Email, Laptop, Network Server, Other, Other Portable Electronic Device |
| IBM | Business Associate | 1900000 | 2011-04-14 | Unknown | Other |
| GRM Information Management Services | Business Associate | 1700000 | 2011-02-11 | Theft | Electronic Medical Record, Other |
| AvMed, Inc. | Health Plan | 1220000 | 2010-06-03 | Theft | Laptop |
| Montana Department of Public Health & Human Services | Health Plan | 1062509 | 2014-07-07 | Hacking/IT Incident | Network Server |
| The Nemours Foundation | Healthcare Provider | 1055489 | 2011-10-07 | Loss | Other |
| BlueCross BlueShield of Tennessee, Inc. | Health Plan | 1023209 | 2010-11-01 | Theft | Other |
| Sutter Medical Foundation | Healthcare Provider | 943434 | 2011-11-17 | Theft | Desktop Computer |
| Valley Anesthesiology Consultants, Inc. d/b/a Valley Anesthesiology and Pain Consultants | Healthcare Provider | 882590 | 2016-08-12 | Hacking/IT Incident | Network Server |
| Horizon Healthcare Services, Inc., doing business as Horizon Blue Cross Blue Shield of New Jersey, and its affiliates | Business Associate | 839711 | 2014-01-03 | Theft | Laptop |
| Iron Mountain Data Products, Inc. (now known as | Business Associate | 800000 | 2010-07-19 | Loss | Electronic Medical Record, Other, Other Portable Electronic Device |
| Utah Department of Technology Services | Business Associate | 780000 | 2012-04-11 | Hacking/IT Incident | Network Server |
| AHMC Healthcare Inc. and affiliated Hospitals | Healthcare Provider | 729000 | 2013-10-25 | Theft | Laptop |
| EISENHOWER MEDICAL CENTER | Healthcare Provider | 514330 | 2011-03-30 | Theft | Desktop Computer |
| Radiology Regional Center, PA | Healthcare Provider | 483063 | 2016-02-12 | Loss | Paper/Films |
| Puerto Rico Department of Health - Triple S Management Corp. | Health Plan | 475000 | 2010-11-04 | Unauthorized Access/Disclosure | Network Server |
| St Joseph Health System | Healthcare Provider | 405000 | 2014-02-05 | Hacking/IT Incident | Network Server |
| Spartanburg Regional Healthcare System | Healthcare Provider | 400000 | 2011-05-27 | Theft | Desktop Computer |
| Triple-S Salud, Inc. - Breach Case#2 | Health Plan | 398000 | 2014-01-24 | Theft | Network Server |
| Triple-S Salud, Inc. | Health Plan | 398000 | 2010-11-18 | Theft | Network Server |
| Community Health Plan of Washington | Health Plan | 381504 | 2016-12-21 | Hacking/IT Incident | Network Server, Other |
This shows the largest breaches since 2009
breach_archive %>%
group_by(State) %>%
summarize(`Total Affected` = sum(`Individuals Affected`)) %>%
arrange(desc(`Total Affected`)) %>%
slice_max(`Total Affected`, n = 10) %>%
ggplot(aes(x = `State`, y = `Total Affected`)) +
geom_col()+
labs(title = 'Individuals affected by state')
breach_archive %>%
mutate(`month` = month(`Breach Submission Date`)) %>%
group_by(`month`) %>%
summarize(`Hacking total` = sum(`Hacking`)) %>%
ggplot(aes(x = `month`, y = `Hacking total`))+
geom_col()
Covered_Entity_Breach <-
breach_archive %>%
group_by(`Covered Entity Type`) %>%
summarize(`Total Affected` = sum(`Individuals Affected`))
breach_archive %>%
mutate(`wday` = wday(`Breach Submission Date`, label = TRUE)) %>%
group_by(`wday`) %>%
summarize(`Breaches` = n()) %>%
ggplot(aes(x = `wday`, y = `Breaches`))+
geom_col()
#Years with 50 business associate breaches and 150 healthcare provider breaches
breach_archive %>%
mutate(year = year(`Breach Submission Date`)) %>%
group_by(year) %>%
summarize(type = sum(`Covered Entity Type` == 'Business Associate'),
type2 = sum(`Covered Entity Type` == 'Healthcare Provider')) %>%
filter(type > 50) %>%
filter(type2 > 150) %>%
knitr::kable()
| year | type | type2 |
|---|---|---|
| 2013 | 64 | 187 |
| 2014 | 67 | 179 |
breach_archive %>%
mutate(`year` = year(`Breach Submission Date`)) %>%
group_by(`year`,`Type of Breach`) %>%
summarize(`Breaches` = n()) %>%
arrange(desc(`year`, `Type of Breach`)) %>%
ggplot(aes(x = `year`, y = `Breaches`, fill = 'Type of Breach'))+
geom_col()
## `summarise()` has grouped output by 'year'. You can override using the `.groups`
## argument.
I am interested in the method of breach by State because this can help educate policy makers when it comes to data control. If a state is seeing the primary form of breach is coming from paper documents, the procedure can be revised so that documents must be checked out, adding accountability.
breach_archive %>%
group_by(`State`, `Location of Breached Information`) %>%
summarize(`Breach` = n(), `Total Affected` = sum(`Individuals Affected`)) %>%
filter(Breach >= 5) %>%
knitr::kable()
## `summarise()` has grouped output by 'State'. You can override using the
## `.groups` argument.
| State | Location of Breached Information | Breach | Total Affected |
|---|---|---|---|
| AL | Laptop | 5 | 16397 |
| AL | Network Server | 6 | 63191 |
| AZ | Laptop | 6 | 12154 |
| AZ | Network Server | 5 | 944126 |
| AZ | Other Portable Electronic Device | 5 | 17387 |
| AZ | Paper/Films | 11 | 56053 |
| CA | Desktop Computer | 29 | 848887 |
| CA | Electronic Medical Record | 7 | 14350 |
| CA | 19 | 43760 | |
| CA | Laptop | 31 | 1231289 |
| CA | Network Server | 23 | 257537 |
| CA | Other | 18 | 159926 |
| CA | Other Portable Electronic Device | 10 | 119271 |
| CA | Other, Other Portable Electronic Device | 6 | 51459 |
| CA | Paper/Films | 48 | 226957 |
| CO | 5 | 19771 | |
| CO | Laptop | 5 | 11285 |
| CO | Paper/Films | 9 | 13903 |
| CT | Laptop | 5 | 77348 |
| CT | Paper/Films | 5 | 8849 |
| DC | Paper/Films | 5 | 9656 |
| FL | Desktop Computer | 9 | 42777 |
| FL | Electronic Medical Record | 11 | 217870 |
| FL | 9 | 27816 | |
| FL | Laptop | 13 | 1270873 |
| FL | Network Server | 14 | 2618172 |
| FL | Other | 12 | 1116298 |
| FL | Other Portable Electronic Device | 6 | 6983 |
| FL | Paper/Films | 36 | 638792 |
| GA | 5 | 15644 | |
| GA | Laptop | 13 | 43715 |
| GA | Network Server | 7 | 47356 |
| GA | Other | 5 | 333760 |
| GA | Paper/Films | 13 | 46481 |
| IL | 8 | 29466 | |
| IL | Laptop | 12 | 54105 |
| IL | Network Server | 11 | 57554 |
| IL | Other | 5 | 16435 |
| IL | Paper/Films | 30 | 241601 |
| IN | Laptop | 11 | 297188 |
| IN | Network Server | 9 | 78848789 |
| IN | Other | 8 | 30045 |
| IN | Paper/Films | 11 | 323737 |
| KY | 6 | 11714 | |
| KY | Laptop | 7 | 22969 |
| KY | Other Portable Electronic Device | 5 | 10828 |
| MA | Desktop Computer | 5 | 21817 |
| MA | Laptop | 7 | 24318 |
| MA | Network Server | 7 | 34859 |
| MA | Paper/Films | 6 | 56705 |
| MD | Laptop | 5 | 225067 |
| MD | Paper/Films | 5 | 17815 |
| MI | 7 | 9640 | |
| MI | Network Server | 6 | 32651 |
| MI | Paper/Films | 10 | 24061 |
| MN | Laptop | 5 | 32729 |
| MN | Network Server | 10 | 50897 |
| MN | Paper/Films | 11 | 38851 |
| MO | Other | 8 | 26764 |
| MO | Paper/Films | 8 | 33978 |
| NC | 6 | 102794 | |
| NC | Network Server | 5 | 57701 |
| NC | Other | 8 | 85957 |
| NC | Other Portable Electronic Device | 5 | 18185 |
| NC | Paper/Films | 12 | 29722 |
| NM | Laptop | 7 | 25603 |
| NM | Paper/Films | 5 | 19186 |
| NY | Desktop Computer | 10 | 94080 |
| NY | 8 | 65115 | |
| NY | Laptop | 14 | 57818 |
| NY | Network Server | 7 | 72717 |
| NY | Other Portable Electronic Device | 12 | 46197 |
| NY | Paper/Films | 15 | 46618 |
| OH | Laptop | 9 | 167728 |
| OH | Network Server | 6 | 326885 |
| OH | Other | 7 | 22467 |
| OH | Paper/Films | 19 | 145447 |
| OK | 5 | 18614 | |
| OR | Laptop | 7 | 62721 |
| PA | 5 | 6236 | |
| PA | Laptop | 8 | 110472 |
| PA | Network Server | 8 | 106181 |
| PA | Other | 11 | 164622 |
| PA | Paper/Films | 15 | 44167 |
| PR | Laptop | 5 | 48051 |
| PR | Network Server | 5 | 1284000 |
| PR | Paper/Films | 7 | 92238 |
| RI | Paper/Films | 7 | 18276 |
| SC | Laptop | 5 | 53896 |
| SC | Paper/Films | 8 | 15221 |
| TN | Laptop | 9 | 62460 |
| TN | Network Server | 6 | 423113 |
| TN | Other | 6 | 1137594 |
| TN | Paper/Films | 12 | 25411 |
| TX | Desktop Computer | 13 | 156636 |
| TX | Electronic Medical Record | 6 | 52129 |
| TX | 11 | 52858 | |
| TX | Laptop | 30 | 122388 |
| TX | Network Server | 23 | 907861 |
| TX | Other | 16 | 356766 |
| TX | Other Portable Electronic Device | 11 | 47238 |
| TX | Other, Other Portable Electronic Device | 6 | 59730 |
| TX | Paper/Films | 24 | 248123 |
| VA | Other Portable Electronic Device | 5 | 14922 |
| VA | Paper/Films | 7 | 186829 |
| WA | Laptop | 7 | 17561 |
| WA | Network Server | 5 | 37112 |
| WA | Paper/Films | 10 | 17838 |
| WI | Laptop | 5 | 14626 |
Paper breaches are most common, however data breaches cause the mos affected individuals. This leads me to believe that tighter controls and security should be placed on digital records in larger states like California and Florida.