Introduction of Data

The US Departments of Health and Human Services (HHS) in the office for Civil Rights (OCR) is mandated by law to collect and report disclosures of citizens protected health information (PHI). The following tables & various visualizations depict different types analysis regarding the various data breaches and the individuals that were affected during those breaches. The data includes information regarding the Name of the entity, what state the breach occurred in, the various kinds of covered entity types, the number of individuals affected, the date and type of breach, the location of the breached information, if there was a business associate present, and a small web description of each breach. • Name of the covered entity (Organization responsible for the PHI)

• State (US State where the breach was reported)

• Covered Entity Type (Type of organization responsible for the PHI)

• Individuals Affected (Number of records affected by the breach)

• Breach submission date (Date the breach was reported by the CE)

• Type of breach (how unauthorized access to the PHI was obtained)

• Location of breached information (Where was the PHI when unauthorized access was obtained)

• Business associate present (Was a business associate such as a consultant or contractor involved in the breach)

• Web description (A optional statement explaining what happened and the resolution)

Summary Statistics

Below is a table that shows a few summary statistics I was interested in looking at.

Number of Breaches in States Bordering or in OH

This first table shows the number of breaches that each state has if they are in Ohio or are a bordering state. We see that PA has the most with 64 breaches and WV has the least with 9. Ohio has the second most breaches with 58.

Number of Breaches by Location

The second table is a break down of the location that each breach happened, whether it was on a laptop, paper/firms, network server, email, or some other type of location. From this we can conclude that laptop are the most common place for a breach with 274 instances. This is understandable with the different links that people can click or some other firmware that may be placed on their laptops.

List of the 25 Smallest Healthcare Data Breaches

When looking at the smallest data breaches, you can see from the table that there are only two breaches that did not affect 500 people but rather 501. The minimum number of people affected is 500 people for this data set so it makes sense that most of these are 500 people.

Was there a Business Associate Present?

Next, I looked into whether a business associate was present at the time of the breach. I was looking at whether there was another party involved like a consultant or a contractor. I found that more often times then not there was not a business associate and by a larger margin. In cases that there was a breach, 1,356 did not involve a business associate whereas 353 breaches did have one.

10 States with the Fewest Number of People Affected

Lastly, I decided to take a look at the states that had the fewest number of breaches. Most of these states are either smaller states or more remote states. For instance, AK has the fourth fewest number of people affect at 9,053 people. The one that I found interesting in this table was Washington D.C. With the government and everything that goes on there, I would have thought they would be toward the top of the list rather than one of the ones on the bottom with one of the fewest counts.

Creating Visualizations and Tables

The output for every question in this section will either be a specific visual or a data table.

Number of Healthcare Data Breaches by Year

This graph shows that the most breaches occurred in 2014, with 2013 being just below that. The least number of breaches occurred in 2018 or and that 2009 also had a low number of breaches. All of the other years (’10, ’11, ’12, ’15, ’16, ’17) are all relatively close to one another in terms of the number of breaches.

List of the Top 25 Largest Healthcare Data Breaches

## # A tibble: 25 x 16
##    `Name of Covered E~ State `Covered Entity ~ `Individuals Af~ `Breach Submiss~
##    <chr>               <chr> <chr>                        <dbl> <date>          
##  1 Anthem (Working fi~ IN    Health Plan               78800000 2015-02-13      
##  2 Science Applicatio~ VA    Business Associa~          4900000 2011-11-04      
##  3 Advocate Health an~ IL    Healthcare Provi~          4029530 2013-08-23      
##  4 21st Century Oncol~ FL    Healthcare Provi~          2213597 2016-03-04      
##  5 Xerox State Health~ TX    Business Associa~          2000000 2014-09-10      
##  6 IBM                 NY    Business Associa~          1900000 2011-04-14      
##  7 GRM Information Ma~ NJ    Business Associa~          1700000 2011-02-11      
##  8 AvMed, Inc.         FL    Health Plan                1220000 2010-06-03      
##  9 Montana Department~ MT    Health Plan                1062509 2014-07-07      
## 10 The Nemours Founda~ FL    Healthcare Provi~          1055489 2011-10-07      
## # ... with 15 more rows, and 11 more variables: Type of Breach <chr>,
## #   Location of Breached Information <chr>, Business Associate Present <chr>,
## #   Web Description <chr>, Hacking <lgl>, ImproperDisposal <lgl>, Loss <lgl>,
## #   Theft <lgl>, UnauthorizedAccess <lgl>, Unknown <lgl>, Other <lgl>

This table shows a list of the top 25 largest healthcare data breaches. The top breach was with Anthem and it affected 78.8 million people back in 2015. This is a very large breach considering the next highest breach affected 4.9 million in 2011. This is still a considerable amount of people but no so when looking at Anthem’s breach. These breaches vary in the state, year, type of breach, and location of the breach, meaning that there is not one easy thing that can be fixed to have prevented all of them.

Total Healthcare Records Exposed by State for the Top 10 States

IN was the top state with the most people affected by the breaches. They had just over 79.5 million people who were affected. The main reason for this is because the Anthem breach that affected 78.8 million people happened in IN. Without this breach they would only have about 800,000 people who were effected. IN is the top state by far with the next state (FL) only having just over 6 million people affected.

Number of Healthcare Hacking Incidents by Month

This graph shows the number of breaches by each months. October has the most breaches by months with just under 120 breaches. March and April were also months that had their fair share of breaches. The months with the lowest number of breaches were January and February with around 90 breaches for both.

Number of Breaches by Covered Entity Type

Next, I tried to figure out if there was a difference in the number of breaches that occurred when it came to the type of entity that was responsible. I found that the top covered entity type affected by these breaches was healthcare provider with 1220 instances. While the lowest entity was healthcare clearing house only having 4 breaches. This is a significant difference between the highest and lowest, making one wonder why this may be the cause.

On What Day of the Week are Breaches Often Reported?

Looking at the graph it seems that Saturday and Sundays have the least amount of breaches. Friday is the day with the most amount of breaches. This could be because companies do a weekly audit of their systems and find it at the end of the the week. Also, no one is really working on the weekend, resulting in fewer breaches reported.

Years Where Breaches From a ‘Business Associate’ Covered Entity Type was at Least 50 or at Least 150 Breaches from a Healthcare Provider Covered Entity Type

We saw in a graph before that 2013 and 2014 had the highest number of breaches. Given the table, it makes sense that these two years are the ones with at least 50 breaches from a business associate covered entity type and at least 150 breaches from a healthcare provider covered entity type.

Each Type of Breach Changing Over the Years

This graph shows the total number of breaches and what type of breach happened. Over the years, hacking and unauthorized access disclosure type of hacks have increased. This makes sense with technology always changing and how more and more people have become tech savvy over the years.

Self-Directed Analytical Questions

I first started by looking at the number of breaches for each state. CA had the most with 207, followed by TX and then FL. ME and DE had the least about of breaches with only 2 for each of them. After looking at those numbers, I looked at the average number of people affected for each state. IN had the highest average because of their breach that affected 78.8 million people. With this skewing the data I also looked at the median for each state to gain a better picture. With this we see that WV has the highest median at just over 9,000 people affected and VT having the lowest at 665 people affected.

Table of the Number of Breaches, Average People Affected, and Median Number Affected by State

Number of Breach in Each State

Mean Number of People Affected by a Breach in Each State

Median Number of People Affected by a Breach in Each State

Deeper Dive into a State

After looking at the data from before, I wanted to look into a state that seemed like they had a problem. One state that might have issues is NH. They have the sixth highest average of people affected, but the reason why I found this interesting is because they only had four breaches. I found that they had a breach that affected 231,400 people causing their average to rise. There four breaches were in 2011, 2012, and 2014. I found interesting because there was only one breach in the years that had the most (’13 and ’14).

Number of Healthcare Data Breaches in New Hampshire

Number of People Affected by a Breach in New Hamshire

Table of Breaches that Happened in New Hamshire (2011)