We are Vulnerable in a Data Filled World
As technology advances and data becomes more prevalent, we are increasingly surrounded by data that contains valuable information. Especially when it comes to health information, tons of data containing our personal and medical information exists to help improve the future. Unfortunately, this data is vulnerable to breaches, and our information can be stolen through technological breaches.
As a part of the US Department of Health and Human services we took a look at some of the data regarding these breaches to try and understand some of the context. This data comes from the Office of Civil Rights and ontains information on cases where a covered entities have a data breach that affects more than 500 individuals.
Understanding the Data
Yearly Trends
We want to start with our analysis of the data by understanding the behavior of the breaches. A great way to do this is look at certain trends and patterns. Let’s start by looking at the number of healthcare data breaches by year.
2013 and 2014 saw a large number of breaches, while the years before and after were slightly more constant. The last couple years have had a lot fewer breaches. This could be because of increased security, or there may just be less data on the matter.
Largest Breaches
Next let’s look at a list of some of the largest breaches in terms of Individuals Affected
## # A tibble: 10 × 2
## `Name of Covered Entity` `Individuals A…`
## <chr> <dbl>
## 1 Anthem, Inc. Affiliated Covered Entity 78800000
## 2 Science Applications International Corporation (SA 4900000
## 3 Advocate Health and Hospitals Corporation, d/b/a Advocate M… 4029530
## 4 21st Century Oncology 2213597
## 5 Xerox State Healthcare, LLC 2000000
## 6 IBM 1900000
## 7 GRM Information Management Services 1700000
## 8 AvMed, Inc. 1220000
## 9 Montana Department of Public Health & Human Services 1062509
## 10 The Nemours Foundation 1055489
As you can see, the largest breach in the data affected over 78 million people. This is the largest by far, but the next largest breaches still affected millions of people. This is important because these breaches can be very detrimental to the patients involved.
Geography
Another interesting aspect of this data is the geographical aspect. Let’s look at the top 10 states that have exposed healthcare records, in terms of individuals affected. There may be a trend by region of the US.
Looking at this bar chart, Indiana has affected the most individuals through data breaches, and it’s not really close. Indiana has affected almost 80 million people, while the others are all around a few million. This is because the largest data breach (the one with 78 million individuals affected) happened in Indiana, so this graph is displaying only 1 instance from Indiana.
Hacking Incidents
The data also provides information regarding how each breach occurred. There are things like hacking incidents, theft, loss, and others. Let’s look at monthly trends of hacking incidents.
March, April, September, and December have the most hacking incidents of any of the months. There are a few months with very few hacking incidents, but there doesn’t seem to be any seasonal trends to this type of breach.
Covered Entity Types
The data also indicates which covered entity type each breach occurred under. These types include Health Plan, Business Associate, Healthcare Provider, and Healthcare Clearing House. Let’s look at the total number of breaches under each entity type.
## # A tibble: 4 × 2
## `Covered Entity Type` Number_of_Breaches
## <chr> <int>
## 1 Business Associate 285
## 2 Health Plan 200
## 3 Healthcare Clearing House 4
## 4 Healthcare Provider 1220
As you can see, Healthcare Provider is by far the most common Covered Entity Type among these data breaches. Patients should understand that it is important to choose a healthcare provider that protects their data.
Analysis
Days of the Week
We have looked at both yearly and monthly trends, and the distribution of breaches seemed to be somewhat random. Now let’s look at day of the week to see if a particular day sees more breaches than others.
Breaches are most common on Fridays, according to this bar chart. Protectors of healthcare information should come up with a strategy for protecting this data better toward the end of the week.
Breaches by Specific Entity Types
This next graphic shows the years in which a Business Associate entity type had at least 50 breaches, as well as years in which a Healthcare Provider entity type had at least 150 breaches.
Business Associate covered entity types saw two years with over 50 breaches, 2013 and 2014. Healthcare Provider had four years with at least 150 breaches, 2013, 2014, 2015, and 2016. This trend in Healthcare Provider breaches should lead to more investigation as to why these breaches are so frequent, as it seems to be a common pattern.
Type of Breach Trends
After looking at trends in Covered Entity Types, we also want to look at Breach Types over the years. This table looks at the numbers of each breach type that have occurred in each year.
Based on the data, Hacking/IT has slowly increased over the years, with a boom in 2016. Unauthorized Access or Disclosure also increased until 2014 and then leveled off for a few years. Improper disposal is not very common, but 2013 and 2014 had the most breaches of this type. Theft is the most common, but 2010-2014 were the years with the most theft. Loss has seen small increases and decreases over the years. Other has not been identified since 2014. 2009, 2017, and 2018 have much fewer instances in general than the other years.
Self-Directed Analysis
Locations of Breaches
We’ve looked at a handful of yearly trends with different variables, but what we haven’t looked at yet is the location of the breach. I want to know the most common location by year, which will give an idea whether technology is playing a larger role in breaches as of late.
There isn’t a super strong trend among the locations. Paper/Films is actually the most common breach, and laptop looks like the second most common, but there isn’t a trend by year. As we established earlier, around 2013 and 2014 had the most breaches, regardless of location.
Laptops vs Paper/Films
The next thing I want to look at is whether Laptops or Paper/Films affects more people, since they are the two most common breach locations.
According to this analysis, laptops affect more people on average than Paper/Films. So even though the latter is more frequent, the former is more detrimental.