Breaches in Protected Health Information

Pierce Jenkins

2023-02-28


Data Introduction

The Office for Civil Rights (OCR) within the Department of Health and Human Services is responsible for collecting disclosure reports of Protected Health Information (PHI), as is mandated by law. A breach is required to be reported from a Covered Entity (CE) for a PHI breach that affects 500 or more individuals. A Covered Entity can be any organization that collects and stores Protected Health Information. This data set include 1,709 breaches spanning from 2009-2018. Data reported for each of these breaches includes:

• Name of the covered entity (Organization responsible for the PHI)

• State (US State where the breach was reported)

• Covered Entity Type (Type of organization responsible for the PHI)

• Individuals Affected (Number of records affected by the breach)

• Breach submission date (Date the breach was reported by the CE)

• Type of breach (how unauthorized access to the PHI was obtained)

• Location of breached information (Where was the PHI when unauthorized access was obtained)

• Business associate present (Was a business associate such as a consultant or contractor involved in the breach)

• Web description (A optional statement explaining what happened and the resolution)

The analysis contained in this document is within the scope of an assignment for an R-Studio Analytics Programming class. I have conducted both directed and self-guided analysis into this data to uncover trends, gain a deeper understanding of this data, and to analyze some questions that I find interesting.

Summary Statistics

The first task I conducted was to create some brief summary statistics in order to get an understanding of the data, as well as to investigate some questions that were personally interesting to me. The initial question I wanted to investigate was the distribution of breaches and individuals affected for each of the 4 Covered Entity Types in this data set.

Covered Entity Type Number of Breaches Median Individuals Affected Average Individuals Affected
Business Associate 285 3164.0 59113.34
Health Plan 200 2807.0 430357.68
Healthcare Clearing House 4 3252.0 4438.50
Healthcare Provider 1220 1963.5 17469.74

I chose to investigate the median and average individuals affected simultaneously to get a sense of the skews in Individuals Affected for each of these Entity Types. The biggest disparity between those 2 statistics is for Health Plans. This leads to the notion that there are a few Health Plan breaches that affected significantly more individuals than the rest, which is bringing the average up. We can also see that Healthcare Providers are the number 1 culprit of breaches in terms of frequency.

The second set of investigative statistics I looked at was purely motivated by my personal experience. I have lived in 3 separate states up to this point in my life: North Carolina, California, and Ohio. For my curiosity I wanted to see the number of breaches and the median individuals affected for these 3 states. The results are what follows:

State Number of Breaches Median Individuals Affected
CA 207 2250.0
NC 46 1777.5
OH 58 1200.0

Unsurprisingly, California has the most breaches since this is a much larger state in terms of population. The median of individuals affected is fairly close for the 3 of these states, with differences being around 500 and ascending from OH to CA.

Data Visualizations

Data Breaches by Year

The first visualization that I made is aimed at seeing the distribution of the breaches reported in this data set over the years. This is mainly focused on putting this data in context, and to get an understanding of the time period we are dealing with. The analysis of this visualization shows that there were heightened levels of data breaches over the 2013-2014 time period, with significantly lower levels at the tails of this data for the years 2009, 2017, and 2018. There was also a fairly consistent level of data breaches from 2010-2012 (around 200) before the spike in 2013.

Top 25 Largest Healthcare Data Breaches

As was mentioned in the introduction to the data, breaches only need to be reported if there is a PHI breach of 500 individuals or more. This table shows how much “or more” can be. Each individual affected represents someone’s private healthcare information, and seeing how many people can be affected by a singular breach is quite appalling.

Name of Covered Entity Breach Submission Date State Covered Entity Type Individuals Affected Type of Breach
Anthem, Inc. Affiliated Covered Entity 2015-03-13 IN Health Plan 78800000 Hacking/IT Incident
Science Applications International Corporation (SA 2011-11-04 VA Business Associate 4900000 Loss
Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group 2013-08-23 IL Healthcare Provider 4029530 Theft
21st Century Oncology 2016-03-04 FL Healthcare Provider 2213597 Hacking/IT Incident
Xerox State Healthcare, LLC 2014-09-10 TX Business Associate 2000000 Unauthorized Access/Disclosure
IBM 2011-04-14 NY Business Associate 1900000 Unknown
GRM Information Management Services 2011-02-11 NJ Business Associate 1700000 Theft
AvMed, Inc. 2010-06-03 FL Health Plan 1220000 Theft
Montana Department of Public Health & Human Services 2014-07-07 MT Health Plan 1062509 Hacking/IT Incident
The Nemours Foundation 2011-10-07 FL Healthcare Provider 1055489 Loss
BlueCross BlueShield of Tennessee, Inc. 2010-11-01 TN Health Plan 1023209 Theft
Sutter Medical Foundation 2011-11-17 AL Healthcare Provider 943434 Theft
Valley Anesthesiology Consultants, Inc. d/b/a Valley Anesthesiology and Pain Consultants 2016-08-12 AZ Healthcare Provider 882590 Hacking/IT Incident
Horizon Healthcare Services, Inc., doing business as Horizon Blue Cross Blue Shield of New Jersey, and its affiliates 2014-01-03 NJ Business Associate 839711 Theft
Iron Mountain Data Products, Inc. (now known as 2010-07-19 PA Business Associate 800000 Loss
Utah Department of Technology Services 2012-04-11 UT Business Associate 780000 Hacking/IT Incident
AHMC Healthcare Inc. and affiliated Hospitals 2013-10-25 CA Healthcare Provider 729000 Theft
EISENHOWER MEDICAL CENTER 2011-03-30 CA Healthcare Provider 514330 Theft
Radiology Regional Center, PA 2016-02-12 FL Healthcare Provider 483063 Loss
Puerto Rico Department of Health - Triple S Management Corp. 2010-11-04 PR Health Plan 475000 Unauthorized Access/Disclosure
St Joseph Health System 2014-02-05 TX Healthcare Provider 405000 Hacking/IT Incident
Spartanburg Regional Healthcare System 2011-05-27 SC Healthcare Provider 400000 Theft
Triple-S Salud, Inc. - Breach Case#2 2014-01-24 PR Health Plan 398000 Theft
Triple-S Salud, Inc. 2010-11-18 PR Health Plan 398000 Theft
Community Health Plan of Washington 2016-12-21 WA Health Plan 381504 Hacking/IT Incident

Individuals Affected in the 10 States with Highest Count of Individuals Affected

The following visual shows the 10 states in descending order that have affected the highest amount of individuals. Notably in this graph, the y-axis goes up by a log of 10, which makes the difference in each bar significantly higher than it appears, especially towards the top. Noticeably, Indiana is the state with a significantly higher amount of individuals affected as compared to the rest of the states. Indiana has a total Individuals Affected at 79,576,765, and Florida, the next highest, has only 6,001,825. As we saw in the previous table, the Anthem breach which affected 78,000,000 occurred in Indiana, which clearly plays a role in their standing at the top of this list.

Dispersion of PHI Breaches by Month

The following graph shows the number of breaches per month from the 2009-2018 time period. Interestingly there seems to be a fairly consistent breach level surrounding the summer months from April to August, with heightened breaches directly before and after. There is also a high level of breaches in December around the holiday season. With this information, CEs could raise their security levels and emphasize to employees the need for safety with PHI during these months.

Breaches by Covered Entity Type

There are 4 entity types covered in this data, and the following table shows the dispersion of breaches in the data set over these 4 unique Covered Entity type. Healthcare Clearing Houses as defined by Smart Data Solutions are, “…the middleman between the healthcare providers and the insurance players.” These clearing houses get access to PHI in order to process medical claims. Unsurprisingly, Healthcare Providers have the highest number of breaches reported in this data set. This is not surprising as Healthcare Providers utilize PHI with the most consistency.

Covered Entity Type Number of Breaches
Business Associate 285
Health Plan 200
Healthcare Clearing House 4
Healthcare Provider 1220

Breaches by Day of Week

Friday is the day of the week with the highest number of reported breaches. There is a 60-day time frame for Covered Entities to report a data breach over 500 people. This day is not the day of the actual breach, but instead the day of the week in which the breach was reported. This shows that CEs like to report their breaches on a Friday, before the end of the week.

Business Associate AND Healthcare Data Breach in a Year

One of the questions I investigated was looking at what year(s) there were at least 50 breaches from a ‘Business Associate’ Covered Entity Type and at least 150 breaches from a ‘Healthcare Provider’ Covered Entity Type. There are 2 years that meet this criteria: 2013 and 2014.

How the Type of Breach has changed over time

The following graph shows the number of breaches for each Type of Breach in the years that this data set covers. Theft was the primary culprit up until 2015, and by a significant margin for most of these years. As we’ve moved towards storing and sharing Protected Health Information more digitally, stealing physical copies has become harder and harder, and we can see this change in 2016 as Hacking/IT Incidents (Digital Theft) surpassed Theft. Unauthorized Access and Disclosure might also be attributed to this digital revolution, as employees might be getting used to sharing PHIs digitally, where a mix up could result in an unauthorized person getting accidental access.

Data Breaches through Electronic Medical Records over time

One of my interests in this field is this digital revolution of Protected Health Information, and how this has changed the way that breaches can occur. The following chart shows the frequency in which Electronic Medical Records have been involved in any breach. As expected, breaches in Electronic Medical Records have increased significantly over time. What I find interesting is there seems to be a trend where the number of breaches increases for a few years, and then drops off: rise in 2010+2011 then falls in 2012. Rise in 2015 and 2016, then falls again in 2017. This may suggest that after a few years healthcare organizations implement new technology or practices that make it more difficult for breaches to occur. This evolving state of Electronic Medical Records, the ongoing balance between privacy and interoperability is fascinating.

Top 5 States with the Most Individuals Affected by Unknown Data Breach

It is really interesting to me that there have been data breaches where the cause is entirely Unknown. This is painful from 2 aspects: the first is from the Covered Entity side, as this table suggests there can be thousands and thousands of individuals affected by a breach and not being unable to understand what happened makes it so there can be no action taken to prevent this in the future. The other side of this is that when the healthcare organization tells the people who have placed their private healthcare records in their hands, they can have no explanation as to what happened. I can only imagine how frustrating it would be for both sides in this circumstance. Luckily, there are only 13 cases out of the 7,000+ where this has occurred. The following table shows which states have had the most individuals affected by a breach of Unknown origin.

State Individuals Affected
NY 1900000
GA 315000
FL 7366
IA 7335
WA 2700