Introduction
This data set is a collection of reporting disclosures of protected health information (PHI) as mandated by law. Part of the law requires that the OCR report cases where covered entities (CE—organizations responsible for protecting health information) have a breach that affects more than 500 individuals.
Number of healthdata breaches by year
This first graph is just an overview of the total of breaches by year. We can see that the years of 2013 and 2014 standout as compared to the rest of the dataset.
List of the top 25 largest data breaches by year
We wanted to find the 25 largest data breaches in the dataset. We see that there is one breach that is much larger than the rest of the observations.
| Name of Covered Entity | Individuals Affected |
|---|---|
| Anthem, Inc. Affiliated Covered Entity | 78800000 |
| Science Applications International Corporation (SA | 4900000 |
| Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group | 4029530 |
| 21st Century Oncology | 2213597 |
| Xerox State Healthcare, LLC | 2000000 |
| IBM | 1900000 |
| GRM Information Management Services | 1700000 |
| AvMed, Inc. | 1220000 |
| Montana Department of Public Health & Human Services | 1062509 |
| The Nemours Foundation | 1055489 |
| BlueCross BlueShield of Tennessee, Inc. | 1023209 |
| Sutter Medical Foundation | 943434 |
| Valley Anesthesiology Consultants, Inc. d/b/a Valley Anesthesiology and Pain Consultants | 882590 |
| Horizon Healthcare Services, Inc., doing business as Horizon Blue Cross Blue Shield of New Jersey, and its affiliates | 839711 |
| Iron Mountain Data Products, Inc. (now known as | 800000 |
| Utah Department of Technology Services | 780000 |
| AHMC Healthcare Inc. and affiliated Hospitals | 729000 |
| EISENHOWER MEDICAL CENTER | 514330 |
| Radiology Regional Center, PA | 483063 |
| Puerto Rico Department of Health - Triple S Management Corp. | 475000 |
| St Joseph Health System | 405000 |
| Spartanburg Regional Healthcare System | 400000 |
| Triple-S Salud, Inc. - Breach Case#2 | 398000 |
| Triple-S Salud, Inc. | 398000 |
| Community Health Plan of Washington | 381504 |
Total health records (individuals affected) exposed by state (Top 10)
We are looking at the 10 states that were impacted the most by the data breaches tracked in the dataset. Indiana is a enormous out liar, many due to the Anthem data breach in 2013.
Number of healthcare hacking incidents by month
This bar chart shows the total hacking/it incidents to healthcare providers by month. The data is evenly distributed throughout the year, showing that data can be hacked at any time.
Number of breaches by covered entity types
This visualization shows the number of breaches by covered entity. We see that healthcare provider is the main target of data breaches.
Which day of the week are breaches reported the most?
This code is looking at which day of the week breaches are most common. In R, 1 is Sunday, 2 is Monday, etc. We are given an output of 6, which means the most common day of the week that breaches are reported on is Friday.
In which year (or years) were there at least 50 breaches from a ‘Business Associate’ covered entity type and at least 150 breaches from a healthcare provider covered entity type?
We are trying to find the years that include both 50 or more breaches from a business associate and 150 or more breaches from a healthcare provider. We see that those years are 2013 and 2014.
| Year |
|---|
| 2013 |
| 2014 |
How has the type of breach (hacking, improper disposal, loss, etc.) changed for each year?
We want to see how the number of breaches by type has changed from 2009 to 2018. I summarized the sum of each of the dummy variables for each of the type of breaches. I also grouped by year to show each individual year and to show the change over time for each type of breach.
| Year | Hacks by year | Disposals by year | Unauthorized Access by year | Theft by year | Loss by year | Unknown by year |
|---|---|---|---|---|---|---|
| 2009 | 0 | 0 | 0 | 15 | 1 | 0 |
| 2010 | 8 | 10 | 10 | 135 | 20 | 0 |
| 2011 | 17 | 7 | 34 | 122 | 18 | 7 |
| 2012 | 17 | 8 | 40 | 124 | 20 | 2 |
| 2013 | 27 | 13 | 73 | 131 | 24 | 3 |
| 2014 | 37 | 11 | 98 | 111 | 30 | 1 |
| 2015 | 25 | 6 | 80 | 64 | 23 | 0 |
| 2016 | 71 | 7 | 96 | 46 | 12 | 0 |
| 2017 | 32 | 4 | 43 | 17 | 9 | 0 |
| 2018 | 0 | 0 | 1 | 0 | 0 | 0 |
How many individuals affected in Michigan?
Being born and raised in Michigan, I wanted to know how many people were impacted by data breaches by year in Michigan. I think the most interesting output from this function is how large of an out liar 2010 is, as it is about 5 times more than the next closest year. I would like to look into the dataset to see what company was responsible for that breach and how it occurred.
| State | Year | Total Individuals Affected in MI |
|---|---|---|
| MI | 2009 | 10646 |
| MI | 2010 | 110493 |
| MI | 2011 | 21440 |
| MI | 2012 | 11661 |
| MI | 2013 | 7894 |
| MI | 2014 | 22633 |
| MI | 2015 | 9157 |
| MI | 2016 | 8913 |
| MI | 2017 | 10329 |
Total healthcare records exposed by state for the bottom 5 states
One of the questions in the assignment was to list the top 10 states affected from data breaches. In contrast, I wanted to know the bottom five states affected from data breaches. We see that those states are Maine, Delaware, Alaska, Vermont and Hawaii. To my surprise, Hawaii and Alaska are much higher than the other states listed. It may be due to those states being secluded from mainland USA and operating their own entities.