The Office for Civil Rights (OCR) within the Department of Health and Human Services is responsible for collecting disclosure reports of Protected Health Information (PHI), as is mandated by law. A breach is required to be reported from a Covered Entity (CE) for a PHI breach that affects 500 or more individuals. A Covered Entity can be any organization that collects and stores Protected Health Information. This data set include 1,709 breaches spanning from 2009-2018. Data reported for each of these breaches includes:
• Name of the covered entity (Organization responsible for the PHI)
• State (US State where the breach was reported)
• Covered Entity Type (Type of organization responsible for the PHI)
• Individuals Affected (Number of records affected by the breach)
• Breach submission date (Date the breach was reported by the CE)
• Type of breach (how unauthorized access to the PHI was obtained)
• Location of breached information (Where was the PHI when unauthorized access was obtained)
• Business associate present (Was a business associate such as a consultant or contractor involved in the breach)
• Web description (A optional statement explaining what happened and the resolution)
The analysis contained in this document is within the scope of an assignment for an R-Studio Analytics Programming class. I have conducted both directed and self-guided analysis into this data to uncover trends, gain a deeper understanding of this data, and to analyze some questions that I find interesting.
The first task I conducted was to create some brief summary statistics in order to get an understanding of the data, as well as to investigate some questions that were personally interesting to me. The initial question I wanted to investigate was the distribution of breaches and individuals affected for each of the 4 Covered Entity Types in this data set.
| Covered Entity Type | Number of Breaches | Median Individuals Affected | Average Individuals Affected |
|---|---|---|---|
| Business Associate | 285 | 3164.0 | 59113.34 |
| Health Plan | 200 | 2807.0 | 430357.68 |
| Healthcare Clearing House | 4 | 3252.0 | 4438.50 |
| Healthcare Provider | 1220 | 1963.5 | 17469.74 |
I chose to investigate the median and average individuals affected simultaneously to get a sense of the skews in Individuals Affected for each of these Entity Types. The biggest disparity between those 2 statistics is for Health Plans. This leads to the notion that there are a few Health Plan breaches that affected significantly more individuals than the rest, which is bringing the average up. We can also see that Healthcare Providers are the number 1 culprit of breaches in terms of frequency.
The second set of investigative statistics I looked at was purely motivated by my personal experience. I have lived in 3 separate states up to this point in my life: North Carolina, California, and Ohio. For my curiosity I wanted to see the number of breaches and the median individuals affected for these 3 states. The results are what follows:
| State | Number of Breaches | Median Individuals Affected |
|---|---|---|
| CA | 207 | 2250.0 |
| NC | 46 | 1777.5 |
| OH | 58 | 1200.0 |
Unsurprisingly, California has the most breaches since this is a much larger state in terms of population. The median of individuals affected is fairly close for the 3 of these states, with differences being around 500 and ascending from OH to CA.
The first visualization that I made is aimed at seeing the distribution of the breaches reported in this data set over the years. This is mainly focused on putting this data in context, and to get an understanding of the time period we are dealing with. The analysis of this visualization shows that there were heightened levels of data breaches over the 2013-2014 time period, with significantly lower levels at the tails of this data for the years 2009, 2017, and 2018. There was also a fairly consistent level of data breaches from 2010-2012 (around 200) before the spike in 2013.
As was mentioned in the introduction to the data, breaches only need to be reported if there is a PHI breach of 500 individuals or more. This table shows how much “or more” can be. Each individual affected represents someone’s private healthcare information, and seeing how many people can be affected by a singular breach is quite appalling.
| Name of Covered Entity | Breach Submission Date | State | Covered Entity Type | Individuals Affected | Type of Breach |
|---|---|---|---|---|---|
| Anthem, Inc. Affiliated Covered Entity | 2015-03-13 | IN | Health Plan | 78800000 | Hacking/IT Incident |
| Science Applications International Corporation (SA | 2011-11-04 | VA | Business Associate | 4900000 | Loss |
| Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group | 2013-08-23 | IL | Healthcare Provider | 4029530 | Theft |
| 21st Century Oncology | 2016-03-04 | FL | Healthcare Provider | 2213597 | Hacking/IT Incident |
| Xerox State Healthcare, LLC | 2014-09-10 | TX | Business Associate | 2000000 | Unauthorized Access/Disclosure |
| IBM | 2011-04-14 | NY | Business Associate | 1900000 | Unknown |
| GRM Information Management Services | 2011-02-11 | NJ | Business Associate | 1700000 | Theft |
| AvMed, Inc. | 2010-06-03 | FL | Health Plan | 1220000 | Theft |
| Montana Department of Public Health & Human Services | 2014-07-07 | MT | Health Plan | 1062509 | Hacking/IT Incident |
| The Nemours Foundation | 2011-10-07 | FL | Healthcare Provider | 1055489 | Loss |
| BlueCross BlueShield of Tennessee, Inc. | 2010-11-01 | TN | Health Plan | 1023209 | Theft |
| Sutter Medical Foundation | 2011-11-17 | AL | Healthcare Provider | 943434 | Theft |
| Valley Anesthesiology Consultants, Inc. d/b/a Valley Anesthesiology and Pain Consultants | 2016-08-12 | AZ | Healthcare Provider | 882590 | Hacking/IT Incident |
| Horizon Healthcare Services, Inc., doing business as Horizon Blue Cross Blue Shield of New Jersey, and its affiliates | 2014-01-03 | NJ | Business Associate | 839711 | Theft |
| Iron Mountain Data Products, Inc. (now known as | 2010-07-19 | PA | Business Associate | 800000 | Loss |
| Utah Department of Technology Services | 2012-04-11 | UT | Business Associate | 780000 | Hacking/IT Incident |
| AHMC Healthcare Inc. and affiliated Hospitals | 2013-10-25 | CA | Healthcare Provider | 729000 | Theft |
| EISENHOWER MEDICAL CENTER | 2011-03-30 | CA | Healthcare Provider | 514330 | Theft |
| Radiology Regional Center, PA | 2016-02-12 | FL | Healthcare Provider | 483063 | Loss |
| Puerto Rico Department of Health - Triple S Management Corp. | 2010-11-04 | PR | Health Plan | 475000 | Unauthorized Access/Disclosure |
| St Joseph Health System | 2014-02-05 | TX | Healthcare Provider | 405000 | Hacking/IT Incident |
| Spartanburg Regional Healthcare System | 2011-05-27 | SC | Healthcare Provider | 400000 | Theft |
| Triple-S Salud, Inc. - Breach Case#2 | 2014-01-24 | PR | Health Plan | 398000 | Theft |
| Triple-S Salud, Inc. | 2010-11-18 | PR | Health Plan | 398000 | Theft |
| Community Health Plan of Washington | 2016-12-21 | WA | Health Plan | 381504 | Hacking/IT Incident |
The following visual shows the 10 states in descending order that have affected the highest amount of individuals. Notably in this graph, the y-axis goes up by a log of 10, which makes the difference in each bar significantly higher than it appears, especially towards the top. Noticeably, Indiana is the state with a significantly higher amount of individuals affected as compared to the rest of the states. Indiana has a total Individuals Affected at 79,576,765, and Florida, the next highest, has only 6,001,825. As we saw in the previous table, the Anthem breach which affected 78,000,000 occurred in Indiana, which clearly plays a role in their standing at the top of this list.
The following graph shows the number of breaches per month from the 2009-2018 time period. Interestingly there seems to be a fairly consistent breach level surrounding the summer months from April to August, with heightened breaches directly before and after. There is also a high level of breaches in December around the holiday season. With this information, CEs could raise their security levels and emphasize to employees the need for safety with PHI during these months.
There are 4 entity types covered in this data, and the following table shows the dispersion of breaches in the data set over these 4 unique Covered Entity type. Healthcare Clearing Houses as defined by Smart Data Solutions are, “…the middleman between the healthcare providers and the insurance players.” These clearing houses get access to PHI in order to process medical claims. Unsurprisingly, Healthcare Providers have the highest number of breaches reported in this data set. This is not surprising as Healthcare Providers utilize PHI with the most consistency.
| Covered Entity Type | Number of Breaches |
|---|---|
| Business Associate | 285 |
| Health Plan | 200 |
| Healthcare Clearing House | 4 |
| Healthcare Provider | 1220 |
Friday is the day of the week with the highest number of reported breaches. There is a 60-day time frame for Covered Entities to report a data breach over 500 people. This day is not the day of the actual breach, but instead the day of the week in which the breach was reported. This shows that CEs like to report their breaches on a Friday, before the end of the week.
One of the questions I investigated was looking at what year(s) there were at least 50 breaches from a ‘Business Associate’ Covered Entity Type and at least 150 breaches from a ‘Healthcare Provider’ Covered Entity Type. There are 2 years that meet this criteria: 2013 and 2014.
The following graph shows the number of breaches for each Type of Breach in the years that this data set covers. Theft was the primary culprit up until 2015, and by a significant margin for most of these years. As we’ve moved towards storing and sharing Protected Health Information more digitally, stealing physical copies has become harder and harder, and we can see this change in 2016 as Hacking/IT Incidents (Digital Theft) surpassed Theft. Unauthorized Access and Disclosure might also be attributed to this digital revolution, as employees might be getting used to sharing PHIs digitally, where a mix up could result in an unauthorized person getting accidental access.
One of my interests in this field is this digital revolution of Protected Health Information, and how this has changed the way that breaches can occur. The following chart shows the frequency in which Electronic Medical Records have been involved in any breach. As expected, breaches in Electronic Medical Records have increased significantly over time. What I find interesting is there seems to be a trend where the number of breaches increases for a few years, and then drops off: rise in 2010+2011 then falls in 2012. Rise in 2015 and 2016, then falls again in 2017. This may suggest that after a few years healthcare organizations implement new technology or practices that make it more difficult for breaches to occur. This evolving state of Electronic Medical Records, the ongoing balance between privacy and interoperability is fascinating.
It is really interesting to me that there have been data breaches where the cause is entirely Unknown. This is painful from 2 aspects: the first is from the Covered Entity side, as this table suggests there can be thousands and thousands of individuals affected by a breach and not being unable to understand what happened makes it so there can be no action taken to prevent this in the future. The other side of this is that when the healthcare organization tells the people who have placed their private healthcare records in their hands, they can have no explanation as to what happened. I can only imagine how frustrating it would be for both sides in this circumstance. Luckily, there are only 13 cases out of the 7,000+ where this has occurred. The following table shows which states have had the most individuals affected by a breach of Unknown origin.
| State | Individuals Affected |
|---|---|
| NY | 1900000 |
| GA | 315000 |
| FL | 7366 |
| IA | 7335 |
| WA | 2700 |