We are Vulnerable in a Data Filled World

As technology advances and data becomes more prevalent, we are increasingly surrounded by data that contains valuable information. Especially when it comes to health information, tons of data containing our personal and medical information exists to help improve the future. Unfortunately, this data is vulnerable to breaches, and our information can be stolen through technological breaches.

As a part of the US Department of Health and Human services we took a look at some of the data regarding these breaches to try and understand some of the context. This data comes from the Office of Civil Rights and ontains information on cases where a covered entities have a data breach that affects more than 500 individuals.

Understanding the Data

Largest Breaches

Next let’s look at a list of some of the largest breaches in terms of Individuals Affected

## # A tibble: 10 × 2
##    `Name of Covered Entity`                                     `Individuals A…`
##    <chr>                                                                   <dbl>
##  1 Anthem, Inc. Affiliated Covered Entity                               78800000
##  2 Science Applications International Corporation (SA                    4900000
##  3 Advocate Health and Hospitals Corporation, d/b/a Advocate M…          4029530
##  4 21st Century Oncology                                                 2213597
##  5 Xerox State Healthcare, LLC                                           2000000
##  6 IBM                                                                   1900000
##  7 GRM Information Management Services                                   1700000
##  8 AvMed, Inc.                                                           1220000
##  9 Montana Department of Public Health & Human Services                  1062509
## 10 The Nemours Foundation                                                1055489

As you can see, the largest breach in the data affected over 78 million people. This is the largest by far, but the next largest breaches still affected millions of people. This is important because these breaches can be very detrimental to the patients involved.

Geography

Another interesting aspect of this data is the geographical aspect. Let’s look at the top 10 states that have exposed healthcare records, in terms of individuals affected. There may be a trend by region of the US.

Looking at this bar chart, Indiana has affected the most individuals through data breaches, and it’s not really close. Indiana has affected almost 80 million people, while the others are all around a few million. This is because the largest data breach (the one with 78 million individuals affected) happened in Indiana, so this graph is displaying only 1 instance from Indiana.

Hacking Incidents

The data also provides information regarding how each breach occurred. There are things like hacking incidents, theft, loss, and others. Let’s look at monthly trends of hacking incidents.

March, April, September, and December have the most hacking incidents of any of the months. There are a few months with very few hacking incidents, but there doesn’t seem to be any seasonal trends to this type of breach.

Covered Entity Types

The data also indicates which covered entity type each breach occurred under. These types include Health Plan, Business Associate, Healthcare Provider, and Healthcare Clearing House. Let’s look at the total number of breaches under each entity type.

## # A tibble: 4 × 2
##   `Covered Entity Type`     Number_of_Breaches
##   <chr>                                  <int>
## 1 Business Associate                       285
## 2 Health Plan                              200
## 3 Healthcare Clearing House                  4
## 4 Healthcare Provider                     1220

As you can see, Healthcare Provider is by far the most common Covered Entity Type among these data breaches. Patients should understand that it is important to choose a healthcare provider that protects their data.

Analysis

Days of the Week

We have looked at both yearly and monthly trends, and the distribution of breaches seemed to be somewhat random. Now let’s look at day of the week to see if a particular day sees more breaches than others.

Breaches are most common on Fridays, according to this bar chart. Protectors of healthcare information should come up with a strategy for protecting this data better toward the end of the week.

Breaches by Specific Entity Types

This next graphic shows the years in which a Business Associate entity type had at least 50 breaches, as well as years in which a Healthcare Provider entity type had at least 150 breaches.

Business Associate covered entity types saw two years with over 50 breaches, 2013 and 2014. Healthcare Provider had four years with at least 150 breaches, 2013, 2014, 2015, and 2016. This trend in Healthcare Provider breaches should lead to more investigation as to why these breaches are so frequent, as it seems to be a common pattern.

Self-Directed Analysis

Locations of Breaches

We’ve looked at a handful of yearly trends with different variables, but what we haven’t looked at yet is the location of the breach. I want to know the most common location by year, which will give an idea whether technology is playing a larger role in breaches as of late.

There isn’t a super strong trend among the locations. Paper/Films is actually the most common breach, and laptop looks like the second most common, but there isn’t a trend by year. As we established earlier, around 2013 and 2014 had the most breaches, regardless of location.

Laptops vs Paper/Films

The next thing I want to look at is whether Laptops or Paper/Films affects more people, since they are the two most common breach locations.

According to this analysis, laptops affect more people on average than Paper/Films. So even though the latter is more frequent, the former is more detrimental.