The data being used throughout this analysis is from the US Department of Health and Human Services (HHS) in the Office for Civil Rights (OCR). This office is responsible for collecting and reporting disclosures of protected health information (PHI) as mandated by law. Part of the law requires that the OCR report cases where covered entities (CE—organizations responsible for protecting health information) have a breach that affects more than 500 individuals.
The data can be found uning this link : http://asayanalytics.com/breach_archive-csv
This chart breaks down incidents on the state level. As expected, the larger states had more incidents along with a wider range of how many people were affected by each incident. Obviously the larger states had more incidents with more people so overall they take the cake with regards to total affected. What is interesting is that IN and NJ have extremely large affected values, but are not states that I would off the top of my head think would be a leading party in data breaches.
## # A tibble: 52 x 6
## State incident_by_state average_affected unauthorized_access_tot~ sd_affected
## <chr> <int> <dbl> <int> <dbl>
## 1 AK 6 1509. 2 864.
## 2 AL 25 43257. 6 187830.
## 3 AR 18 6387. 9 7989.
## 4 AZ 41 27733. 11 137332.
## 5 CA 207 14745. 50 63447.
## 6 CO 32 6773. 13 18580.
## 7 CT 25 8737. 8 20716.
## 8 DC 10 3852. 4 5111.
## 9 DE 2 1781 1 144.
## 10 FL 124 48402. 46 247108.
## # ... with 42 more rows, and 1 more variable: total_affected <dbl>
Healthcare data breaches can be extremely serious. Information that people have entrusted to companies become open to those who may not have the best intentions. This graphic allows us to identify that overall breaches appear to have peaked in 2014 and have taken a relatively downhill turn over recent years.
Not all data breaches are the same. There are many different ranges and each one is unique. Below you will find the top 25 data breaches within this data set. Notice that the largest, being Anthem, is significantly larger than every other breach recorded.
## # A tibble: 25 x 2
## `Individuals Affected` `Name of Covered Entity`
## <dbl> <chr>
## 1 78800000 Anthem, Inc. Affiliated Covered Entity
## 2 4900000 Science Applications International Corporation (SA
## 3 4029530 Advocate Health and Hospitals Corporation, d/b/a Advo~
## 4 2213597 21st Century Oncology
## 5 2000000 Xerox State Healthcare, LLC
## 6 1900000 IBM
## 7 1700000 GRM Information Management Services
## 8 1220000 AvMed, Inc.
## 9 1062509 Montana Department of Public Health & Human Services
## 10 1055489 The Nemours Foundation
## # ... with 15 more rows
The chart below provides us with some insight as to which states have been the most problematic with regards to their breaches. It is important to note that Indiana has a significantly larger amount of individuals affected. This is due to that Anthem breach which affected just under 79 million people.
Knowing when healthcare hacking incidents occur can be extremely useful when looking to deter these types of threats. The month that experiences the most healthcare hacking incidents is May with April being a close second. All in all it looks as though the incidents peak in every other quarter and experience dips between.
When examining breaches, it is useful to understand which type of entity is having the most problems with keeping their data secure. This table reveals that healthcare providers, by far, experience the most amount of breaches.
## # A tibble: 4 x 2
## `Covered Entity Type` number_of_breaches
## <chr> <int>
## 1 Business Associate 285
## 2 Health Plan 200
## 3 Healthcare Clearing House 4
## 4 Healthcare Provider 1220
Even though breaches may occur earlier in the week, it is clear that they are reported prior to the weekend. This could be due to employees wishing to report the incident prior to the weekend so that there is some time for the heat to Passover.
As shown by the table below, there were only two years that met the above criteria. The amount of breaches required by Associates and Providers to appear on this list are relatively severe and can explain why there are so little observations. Knowing that these years had some of the most offenses, we can identify practices and aim to develop counter actions.
## # A tibble: 6 x 3
## # Groups: Year [4]
## Year `Covered Entity Type` number_of_breaches
## <fct> <chr> <int>
## 1 2013 Business Associate 64
## 2 2013 Healthcare Provider 187
## 3 2014 Business Associate 67
## 4 2014 Healthcare Provider 179
## 5 2015 Healthcare Provider 155
## 6 2016 Healthcare Provider 182
This table allows us to identify trends between types of breaches over the years. It can also provide some insight into the amount of each type of breach by year. Overall, 2013 and 2014 had the most amount of breaches and in total, theft seems to be the most poular type of breach.
## # A tibble: 10 x 9
## Year hack_it_total imp_disp_total loss_total theft_total u_a_d_total
## <fct> <int> <int> <int> <int> <int>
## 1 2009 0 0 1 15 0
## 2 2010 0 10 20 135 10
## 3 2011 0 7 18 122 34
## 4 2012 0 8 20 124 40
## 5 2013 0 13 24 131 73
## 6 2014 0 11 30 111 98
## 7 2015 0 6 23 64 80
## 8 2016 0 7 12 46 96
## 9 2017 0 4 9 17 43
## 10 2018 0 0 0 0 1
## # ... with 3 more variables: unknown_total <int>, other_total <int>,
## # total_breaches <int>
Just as we looked at specifically Hacking/IT breaches by month, it is interesting to see how all breaches broken by month line up. There are extremely similar trends with April and May still standing out. However, the dips between April and October are far less pronounced. This reveals total incidents follow a relatively similar pattern as Hacking/IT incidents.