Breach of Unsecured Protected Health Information

GJA

2023-02-23

Introduction

This data set is a collection of reporting disclosures of protected health information (PHI) as mandated by law. Part of the law requires that the OCR report cases where covered entities (CE—organizations responsible for protecting health information) have a breach that affects more than 500 individuals.

Number of healthdata breaches by year

This first graph is just an overview of the total of breaches by year. We can see that the years of 2013 and 2014 standout as compared to the rest of the dataset.

List of the top 25 largest data breaches by year

We wanted to find the 25 largest data breaches in the dataset. We see that there is one breach that is much larger than the rest of the observations.

Name of Covered Entity Individuals Affected
Anthem, Inc. Affiliated Covered Entity 78800000
Science Applications International Corporation (SA 4900000
Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group 4029530
21st Century Oncology 2213597
Xerox State Healthcare, LLC 2000000
IBM 1900000
GRM Information Management Services 1700000
AvMed, Inc. 1220000
Montana Department of Public Health & Human Services 1062509
The Nemours Foundation 1055489
BlueCross BlueShield of Tennessee, Inc. 1023209
Sutter Medical Foundation 943434
Valley Anesthesiology Consultants, Inc. d/b/a Valley Anesthesiology and Pain Consultants 882590
Horizon Healthcare Services, Inc., doing business as Horizon Blue Cross Blue Shield of New Jersey, and its affiliates 839711
Iron Mountain Data Products, Inc. (now known as 800000
Utah Department of Technology Services 780000
AHMC Healthcare Inc. and affiliated Hospitals 729000
EISENHOWER MEDICAL CENTER 514330
Radiology Regional Center, PA 483063
Puerto Rico Department of Health - Triple S Management Corp. 475000
St Joseph Health System 405000
Spartanburg Regional Healthcare System 400000
Triple-S Salud, Inc. - Breach Case#2 398000
Triple-S Salud, Inc. 398000
Community Health Plan of Washington 381504

Total health records (individuals affected) exposed by state (Top 10)

We are looking at the 10 states that were impacted the most by the data breaches tracked in the dataset. Indiana is a enormous out liar, many due to the Anthem data breach in 2013.

Number of healthcare hacking incidents by month

This bar chart shows the total hacking/it incidents to healthcare providers by month. The data is evenly distributed throughout the year, showing that data can be hacked at any time.

Number of breaches by covered entity types

This visualization shows the number of breaches by covered entity. We see that healthcare provider is the main target of data breaches.

Which day of the week are breaches reported the most?

This code is looking at which day of the week breaches are most common. In R, 1 is Sunday, 2 is Monday, etc. We are given an output of 6, which means the most common day of the week that breaches are reported on is Friday.

In which year (or years) were there at least 50 breaches from a ‘Business Associate’ covered entity type and at least 150 breaches from a healthcare provider covered entity type?

We are trying to find the years that include both 50 or more breaches from a business associate and 150 or more breaches from a healthcare provider. We see that those years are 2013 and 2014.

Year
2013
2014

How has the type of breach (hacking, improper disposal, loss, etc.) changed for each year?

We want to see how the number of breaches by type has changed from 2009 to 2018. I summarized the sum of each of the dummy variables for each of the type of breaches. I also grouped by year to show each individual year and to show the change over time for each type of breach.

Year Hacks by year Disposals by year Unauthorized Access by year Theft by year Loss by year Unknown by year
2009 0 0 0 15 1 0
2010 8 10 10 135 20 0
2011 17 7 34 122 18 7
2012 17 8 40 124 20 2
2013 27 13 73 131 24 3
2014 37 11 98 111 30 1
2015 25 6 80 64 23 0
2016 71 7 96 46 12 0
2017 32 4 43 17 9 0
2018 0 0 1 0 0 0

How many individuals affected in Michigan?

Being born and raised in Michigan, I wanted to know how many people were impacted by data breaches by year in Michigan. I think the most interesting output from this function is how large of an out liar 2010 is, as it is about 5 times more than the next closest year. I would like to look into the dataset to see what company was responsible for that breach and how it occurred.

State Year Total Individuals Affected in MI
MI 2009 10646
MI 2010 110493
MI 2011 21440
MI 2012 11661
MI 2013 7894
MI 2014 22633
MI 2015 9157
MI 2016 8913
MI 2017 10329

Total healthcare records exposed by state for the bottom 5 states

One of the questions in the assignment was to list the top 10 states affected from data breaches. In contrast, I wanted to know the bottom five states affected from data breaches. We see that those states are Maine, Delaware, Alaska, Vermont and Hawaii. To my surprise, Hawaii and Alaska are much higher than the other states listed. It may be due to those states being secluded from mainland USA and operating their own entities.