Introduction to the data

I am an analyst working for the US Department of Health and Human Services (HHS) in the Office for Civil Rights (OCR). This office is responsible for collecting and reporting disclosures of protected health information (PHI) as mandated by law. Part of the law requires that the OCR report cases where covered entities (CE—organizations responsible for protecting health information) have a breach that affects more than 500 individuals. The data reported for each of these breaches include:

• Name of the covered entity (Organization responsible for the PHI)

• State (US State where the breach was reported)

• Covered Entity Type (Type of organization responsible for the PHI)

• Individuals Affected (Number of records affected by the breach)

• Breach submission date (Date the breach was reported by the CE)

• Type of breach (how unauthorized access to the PHI was obtained)

• Location of breached information (Where was the PHI when unauthorized access was obtained)

• Business associate present (Was a business associate such as a consultant or contractor involved in the breach)

• Web description (A optional statement explaining what happened and the resolution)

Summary Table

Below is a table summarizing the overall data. This table includes the amount of affected people as well as the average affected people and standard deviation.I included the total to emphasize just how many people are affected by this issue. I included the average to show how many people are typically affected per breach, however the standard deviation will help show just how dispersed the data is. I also included a number of total breaches and the largest and smallest breaches, to show the variation in size that is throughout the data.

Average Affected Standard Deviation Of Affected Total Affected Total Breaches Largest Group Smallest Group
72703.15 1915658 124249678 1709 78800000 500

What these summary statistics show us is that per breach, an average of 72,703 people are affected. However, with a standard deviation of 1,915,658 we know that this is very dispersed data. With total breaches of 1,709, we can see that over the last few years there have been many different breaches. The dispersement of the data that was mentioned earlier is seen again in the largest versus smallest group. Since the largest group affected 78,800,000 and the smallest affected 500.

Number of Breaches Per Year

What this bar plot shows is how many breaches there were per year, allowing us to understand if there were any spikes and if recent years are consistent. What we can see from this plot is that there was a big spike in breaches from 2009 to 2010. After this large jump, there was a steady amount of breaches that remained around 200 for the next three years before making another jump in 2013. After this jump, there was an even larger one in 2015 before the number of breaches continued to decline from this point, to almost 150 less by 2017.

Top 25 Largest Healthcare Data Breaches

What this table shows us is the Covered Entity (company) that had a breach and how big that breach was. From here, we can see that the Anthem breach was historically the largest breach by a large margin, beating out Science Applications International by 73,900,000 individuals affected.

Name of Covered Entity Individuals Affected
Anthem, Inc. Affiliated Covered Entity 78800000
Science Applications International Corporation (SA 4900000
Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group 4029530
21st Century Oncology 2213597
Xerox State Healthcare, LLC 2000000
IBM 1900000
GRM Information Management Services 1700000
AvMed, Inc. 1220000
Montana Department of Public Health & Human Services 1062509
The Nemours Foundation 1055489
BlueCross BlueShield of Tennessee, Inc. 1023209
Sutter Medical Foundation 943434
Valley Anesthesiology Consultants, Inc. d/b/a Valley Anesthesiology and Pain Consultants 882590
Horizon Healthcare Services, Inc., doing business as Horizon Blue Cross Blue Shield of New Jersey, and its affiliates 839711
Iron Mountain Data Products, Inc. (now known as 800000
Utah Department of Technology Services 780000
AHMC Healthcare Inc. and affiliated Hospitals 729000
EISENHOWER MEDICAL CENTER 514330
Radiology Regional Center, PA 483063
Puerto Rico Department of Health - Triple S Management Corp. 475000
St Joseph Health System 405000
Spartanburg Regional Healthcare System 400000
Triple-S Salud, Inc. - Breach Case#2 398000
Triple-S Salud, Inc. 398000
Community Health Plan of Washington 381504

Total Healthcare Records exposed by State for the Top-10 States

What this bar plot shows us is that, obviously, Anthem INC is headquartered in Indiana. What this bar plot also shows us is that these states are all similar in the amount of individuals affected per state. Florida being the second most, with Virginia coming in a close third. Something interesting about this bar plot is that the states that have notably larger cities are included, yet they are not even close to the amount that happened with Anthem. For example, California, Illinois, and New York have some of the most well-known cities in the world but they are not top of the list. This emphasizes that there’s no correlation between the state/city size and the number of breaches.

Number of Healthcare Hacking Incidents by Month

The bar plot below shows us the all of the months of the year and how many hacking incidents there were per month. What this tells us is that there is a significant drop in hacking incidents in November and February. The number of hacking incidents also spike in the months of March, April, September, and December.

Number of Breaches by Covered Entity Type

This table shows us that there is a larger amount of breaches at healthcare providers. This table begs the question of how many individuals are affected by the breaches from these types of entities. Although the healthcare providers have the largest number of breaches, they may not have the most affected individuals. What we see here is that healthcare providers have much more than any other covered entity. While business associates and health plans each have around 200.

Covered Entity Type Number of Breaches
Business Associate 285
Health Plan 200
Healthcare Clearing House 4
Healthcare Provider 1220

On what day of the week are breaches most often reported?

From this graph, we are able to see that the most breaches are reported typically on Fridays. Thinking about this logically, this is most likely because people would like to report breaches and then escape to the bliss of the weekend before having to deal with repercussions on Monday. Also Saturday and Sunday are the two smallest bars, which is understandable because they are on the weekend where most providers aren’t operating.

In which years were there at least 50 breaches from a ‘Business Associate’ covered entity type and at least 150 breaches from a ‘Healthcare Provider’ covered entity type?”

From this table we are able to see that there were only 2 years in which a Business Associate had at least 50 breaches and a Healthcare Provider had at least 150 breaches. Something interesting to note is that on each of the years in which this happened, the number of breaches was about the same for each Covered Entity type, and they occurred in 2 years consecutively.

Year Number of Breaches at BA > 50 Number of Breaches at HP > 150
2013 64 187
2014 67 179

How has the type of breach changed for each year?

This bar plot tells us that there is a disparity between how many theft breaches there were and every other type. The only other type of breach that came close to beating theft was Unauthorized Access/Disclosure. Another thing to note is that there was a spike in Hacking incidents in 2016 as well as a spike for everything but theft in 2014.

Is there a correlation between the Covered Entity type and breach size?

This question was posed in response to the one that I asked earlier in the Number of Breaches by Covered Entity Type chunk. I wanted to see if the Covered Entity type that had the most amount of breaches also had the most individuals affected. From the bar plot, we are able to see that the covered entity type that had the most breaches, did not end up having the most affected individuals. This is interesting, because Healthcare Providers had almost 1,000 more breaches than any other Covered Entity Type.

How many above average breaches were there when a Business Associate was present?

This bar plot will show us how many above average breaches occurred when a Business Associate was and was not present. What this will help us tell is whether or not having a Business Associate present correlates to less above average breaches. In the case of this graph, we are able to say that less above average breaches occur when there is a Business Associate present.

What was the total amount of Individuals Affected when a Business Associate present?

Despite the number of breaches being higher when a Business Associate is not present, this does not tell the total amount of Individuals Affected. Although a breach is more likely without a Business Associate present, how many Individuals Affected are not told in this visual. This bar plot shows us that not only do the most breaches occur when a Business Associate is not present, but also that the most Individuals Affected are affected when a Business Associate is not present

What we can determine from this bar plot is that there should be a Business Associate present at all times. This way, we can not only aid in making the amount of above average breaches lessened, but we can also cut down on the overall individuals affected. Adding a Business Associate presence will not completely rid each system of breaches, but based on these graphs it would be reasonable to assume that having one present will help.