My Work with Data Breaches
I am an analyst working for the US Department of Health and Human Services (HHS) in the Office for Civil Rights (OCR). My office is responsible for collecting and reporting disclosures of protected health information (PHI) as mandated by law. Part of the law requires that the OCR report cases where covered entities (CE—organizations responsible for protecting health information) have a breach that affects more than 500 individuals. The data reported for each of these breaches include:
• Name of the covered entity (Organization responsible for the PHI) • State (US State where the breach was reported) • Covered Entity Type (Type of organization responsible for the PHI) • Individuals Affected (Number of records affected by the breach) • Breach submission date (Date the breach was reported by the CE) • Type of breach (how unauthorized access to the PHI was obtained) • Location of breached information (Where was the PHI when unauthorized access was obtained) • Business associate present (Was a business associate such as a consultant or contractor involved in the breach) • Web description (A optional statement explaining what happened and the resolution)
This document will have me look at various different questions regarding data breaches.
Which state had the most and least breaches?
The first thing I wanted to look at was which state had the most breaches, and which had the least. Looking at the data below, I found that California had a whopping 207 data breaches over the course of a 10 year span. On the other hand, Maine and Delaware only had 2 data breaches each in the past 10 years. California should not come as a shock as this state is littered with tech companies and a high population.
| State | total_state |
|---|---|
| CA | 207 |
| State | total_state |
|---|---|
| DE | 2 |
| ME | 2 |
The number of Breached Information by Location
This table looks at the number of breaches that occurred based off of location. I also wanted to analyze the average amount of individuals affected per preach on each location. What I found was paper/films contained the most and laptops contained the second most number of breaches over the past 10 years. However, neither were near the most average individuals affected, which was on desktop computers and network servers. Electronic medical records had the least amount of breaches and the least amount of average individuals affected.
| Location of Breached Information | Number of Breaches | avg_individuals_affected |
|---|---|---|
| Desktop Computer | 133 | 52654.549 |
| Electronic Medical Record | 64 | 6093.297 |
| 142 | 6502.993 | |
| Laptop | 274 | 19111.580 |
| Network Server | 212 | 47610.877 |
| Other | 165 | 68217.794 |
| Other Portable Electronic Device | 101 | 7692.416 |
| Paper/Films | 405 | 6906.886 |
Number of Breaches by Covered Entity Type
This table is also pretty self-explanatory as it looks at how many breaches there are by entity type. Not surprisingly, of the four covered entity types, healthcare providers are the main organization responsible for this protected health information. However, it is fascinating to see that health plans and business associates are also sometimes responsible and have their data breached.
| Covered Entity Type | Number of Breaches |
|---|---|
| Business Associate | 285 |
| Health Plan | 200 |
| Healthcare Clearing House | 4 |
| Healthcare Provider | 1220 |
Does having a Business associate present affect individuals affected?
I wanted to see if a business associate being a consultant or contractor lead to possibly more individuals being affected with the breach. While this summary statistic does not show the whole story, the results were still intriguing. Around 70% of the breaches did not have a business associate present. However, of the ones that did, the average individuals affected was over 50,000 people.
| Business Associate Present | Number of Breaches | avg_individuals_affected |
|---|---|---|
| No | 1355 | 19676.01 |
| Yes | 353 | 53225.73 |
Data Breaches By Year
Looking at which years had the most data breaches, this graph shows that 2013 and 2014 had the most data breaches. It is also very interesting to see in this graph that 2017 saw a signigicant less amount of data breaches as well.
Top 25 largest healthcare Data Breaches 2009-2018
This table looks at the 25 largest data breaches over this 10 year span. The biggest data breach in this span affected over 78 million individuals, and this occurred at Anthem Inc. in Indiana. Ones that really intrigue me are ones in small states or areas such as New Jersey and Puerto Rico, with small Puerto Rico having two data breaches that affected a combined total of 873,000 individuals. Puerto Rico, get your data secured!
## # A tibble: 25 × 16
## `Name of Covered E… State `Covered Entity … `Individuals Af… `Breach Submiss…
## <chr> <chr> <chr> <dbl> <date>
## 1 Anthem, Inc. Affil… IN Health Plan 78800000 2015-03-13
## 2 Science Applicatio… VA Business Associa… 4900000 2011-11-04
## 3 Advocate Health an… IL Healthcare Provi… 4029530 2013-08-23
## 4 21st Century Oncol… FL Healthcare Provi… 2213597 2016-03-04
## 5 Xerox State Health… TX Business Associa… 2000000 2014-09-10
## 6 IBM NY Business Associa… 1900000 2011-04-14
## 7 GRM Information Ma… NJ Business Associa… 1700000 2011-02-11
## 8 AvMed, Inc. FL Health Plan 1220000 2010-06-03
## 9 Montana Department… MT Health Plan 1062509 2014-07-07
## 10 The Nemours Founda… FL Healthcare Provi… 1055489 2011-10-07
## # … with 15 more rows, and 11 more variables: Type of Breach <chr>,
## # Location of Breached Information <chr>, Business Associate Present <chr>,
## # Web Description <chr>, hacking_IT <lgl>, improper_disposal <lgl>,
## # loss <lgl>, other <lgl>, theft <lgl>, unauthorized_access_disclosure <lgl>,
## # unknown <lgl>
What states have been exposed the most on paper?
Similar to the previous header, this graph looks at what states have had the most individuals affected because of a breach. These top 10 states should not come as much of a shock as California, Florida, Illinois, New York, and Texas are some of the most populated states and can be prone for data breaches. The others have seen many large data breaches, especially Indiana as this graph shows just how huge that singular Anthem breach was compared to the others.
Healthcare Hacking Incidents by Month
This graph looks at how many hacking or IT incidents occurred in this 10 year span and what months saw the most incidents occur. The graph shows that the spring months of March and April along with the fall months of September and October see the most incidents occur. Another interesting takeaway is that the winter months see the fewest amount of hacking incidents except for December, which makes sense with the idea of the holiday season and organizations taking time off and being vulnearable to being hacked.
Number of Breaches by Covered Entity Type
Similar to the table I made earlier, this table goes a little further. Looking at what we previously talked about with business associates, this table is fascinating because it shows the median number of individuals affected. This shows that while health care providers are usually the primary responsible party for these breaches, ones with business associates usually entail more individuals being affected.
| Covered Entity Type | Number of Breaches | median_individuals_affected |
|---|---|---|
| Business Associate | 285 | 3164.0 |
| Health Plan | 200 | 2807.0 |
| Healthcare Clearing House | 4 | 3252.0 |
| Healthcare Provider | 1220 | 1963.5 |
On what day of the week are breaches often reported?
This graph looks at what days of the week are most of these breaches reported. It should not come as a shock that almost all of the breaches are reported on weekdays, with Saturday and Sunday having hardly any. I am surprised though that most of these breaches were reported on Friday, leaving one to consider if people just realize that it is happening on a Friday and report it, or if people target this day to breach data on purpose.
In which year (or years) were there at least 50 breaches from a ‘Business Associate’ covered entity type and at least 150 breaches from a healthcare provider covered entity type?
The answer to this question is one that we can actually use the first graph on this page to help figure out. The years 2013 and 2014 were the two years with the most data breaches, both well over 250 in total. For context, almost every other year had less than 200. This then makes sense that these two years were the two years were there were 50 Business Associate breaches and 150 Healthcare Provider breaches.
The Development of Breaching
How has data breaching developed over time? This graph looks at how the ways people have breached have evolved over this 10 year span. First off, theft has declined since 2014 significantly while the last two years show that hacking seems to be something that is on the rise. With more computer technology that allows this possibility, it makes sense that hacking is starting to take over theft in terms of the main way to breach data. The other data breach type that has become more prominent over the span is unauthorized access disclosure as this was one that has seen itself become a larger percentage of the type of data breach.
A Deeper Dive Into Data Breaches by State
Are the states with the most breaches also the states that have the most average individuals affected by those breaches? I am intrigued because I believe the biggest states will have the most breaches,but do those breaches impact the most people? After looking at the data, it is interesting to see that some of the bigger states with the most breaches do not impact many individuals on average. However,looking at only the mean can be misleading, so I added in median as well to see a better picture. My biggest takaeway is that Puerto Rico concerns me with data breaches as they had 31 data breaches,on average their breaches impacted 55,000 individuals, and the median was almost 8,000.
| State | Number of Breaches | avg_individuals_affected | median_individuals_affected |
|---|---|---|---|
| AK | 6 | 1508.833 | 1628.0 |
| AL | 25 | 43256.680 | 2000.0 |
| AR | 18 | 6386.611 | 2363.0 |
| AZ | 41 | 27732.634 | 2100.0 |
| CA | 207 | 14744.604 | 2250.0 |
| CO | 32 | 6773.438 | 1884.0 |
| CT | 25 | 8736.560 | 1506.0 |
| DC | 10 | 3851.900 | 2308.0 |
| DE | 2 | 1781.000 | 1781.0 |
| FL | 124 | 48401.815 | 2137.5 |
| GA | 51 | 11988.216 | 3000.0 |
| HI | 4 | 3584.000 | 1741.5 |
| IA | 10 | 4225.300 | 2171.0 |
| ID | 3 | 4987.333 | 5500.0 |
| IL | 86 | 54559.384 | 1324.5 |
| IN | 50 | 15535.300 | 1927.0 |
| KS | 11 | 8088.273 | 1700.0 |
| KY | 35 | 5649.257 | 2027.0 |
| LA | 18 | 6341.111 | 4536.5 |
| MA | 40 | 4543.500 | 2430.0 |
| MD | 30 | 11674.200 | 1247.5 |
| ME | 2 | 1387.000 | 1387.0 |
| MI | 41 | 5199.171 | 2777.0 |
| MN | 38 | 4845.921 | 1975.5 |
| MO | 32 | 4730.219 | 2226.5 |
| MS | 10 | 9970.600 | 1898.5 |
| MT | 10 | 115651.900 | 4600.0 |
| NC | 46 | 8057.848 | 1777.5 |
| ND | 4 | 3775.500 | 2226.0 |
| NE | 10 | 5131.700 | 1998.5 |
| NH | 4 | 59834.750 | 3584.0 |
| NJ | 20 | 152589.800 | 3000.0 |
| NM | 19 | 3583.263 | 2365.0 |
| NV | 12 | 8124.333 | 6619.5 |
| NY | 84 | 33120.690 | 1757.5 |
| OH | 58 | 12189.897 | 1200.0 |
| OK | 17 | 19565.824 | 4278.0 |
| OR | 29 | 10377.552 | 1980.0 |
| PA | 64 | 21508.141 | 2052.0 |
| PR | 31 | 54997.290 | 7911.0 |
| RI | 10 | 4587.700 | 1158.0 |
| SC | 22 | 34777.591 | 3141.0 |
| SD | 4 | 5944.750 | 4834.0 |
| TN | 42 | 41054.214 | 2481.0 |
| TX | 154 | 26235.117 | 2462.0 |
| UT | 11 | 79660.909 | 2600.0 |
| VA | 27 | 191037.074 | 2739.0 |
| VT | 3 | 1071.667 | 665.0 |
| WA | 45 | 15334.533 | 2367.0 |
| WI | 15 | 8170.333 | 2734.0 |
| WV | 9 | 9003.889 | 3655.0 |
| WY | 7 | 7912.000 | 9023.0 |
What is going on with Puerto Rico?
Now this is just wild. Based off the previous question and results, I wanted to look into Puerto Rico. Why is this small Caribbean Island accounting for 31 data breaches and an average of 55,000 individuals affected? My first graph shows some good signs that most of Puerto Rico’s data breach troubles are behind them as they have not had a breach since 2015 worth reporting.
Now, I went to look at how many individuals were affected by these breaches by year, and that is when I found 2010 Puerto Rico. In 2010, Puerto Rico had almost 1 million people affected by these data breaches. That is almost one-third of their ENTIRE POPULATION!
Needing to figure out how in the world what the heck happened in 2010, I made a table to see what kind of big breach could have caused happened. I had figured that this was the cause of one big breach that affected almost a third of the island. I found out I was wrong as it was not one, but three DIFFERENT major data breach incidents that affected almost a million people combined. To make it even crazier, these three breaches occurred in the span of FOURTEEN DAYS! November 4th through November 18th in 2010 for Puerto Rico was not kind to their health information data.
## # A tibble: 6 × 17
## `Name of Covered En… State `Covered Entity … `Individuals Af… `Breach Submiss…
## <chr> <chr> <chr> <dbl> <date>
## 1 Hospital Auxilio Mu… PR Healthcare Provi… 1000 2010-12-13
## 2 Triple-S Salud, Inc. PR Health Plan 398000 2010-11-18
## 3 Medical Card System… PR Business Associa… 115000 2010-11-09
## 4 Puerto Rico Departm… PR Health Plan 475000 2010-11-04
## 5 MSO of Puerto Rico,… PR Business Associa… 1907 2010-02-17
## 6 MSO of Puerto Rico PR Business Associa… 605 2010-02-17
## # … with 12 more variables: Type of Breach <chr>,
## # Location of Breached Information <chr>, Business Associate Present <chr>,
## # Web Description <chr>, hacking_IT <lgl>, improper_disposal <lgl>,
## # loss <lgl>, other <lgl>, theft <lgl>, unauthorized_access_disclosure <lgl>,
## # unknown <lgl>, Breach Submission Year <fct>