Introduction to the data and its source

The US Department of Health and Human Services (HHS) in the Office for Civil Rights (OCR) is responsible for collecting and reporting disclosures of protected health information (PHI) as mandated by law. Part of the law requires that the OCR report cases where covered entities (CE—organizations responsible for protecting health information) have a breach that affects more than 500 individuals.

Details of the dataset

The data for this analysis is derived from the Breach Portal: Notice to the Secretary of HHS Breach of Unsecured Protected Health Information from U.S. Department of Health and Human Services Office for Civil Rights. The data contains the 9 variables described and consists of 1,709 rows. Each row of the data represents a health care entity that has had some form of data breach. Each column records an aspect of the data breach, including, state, entity type, individuals affected, submission date of breach, type of breach,and more. The data has been reported on a clunky OCR data portal since 2009: https://ocrportal.hhs.gov/ocr/breach/breach_report.jsf.

The data reported for each of these breaches include:

• Name of the covered entity (Organization responsible for the PHI) • State (US State where the breach was reported) • Covered Entity Type (Type of organization responsible for the PHI) • Individuals Affected (Number of records affected by the breach) • Breach submission date (Date the breach was reported by the CE) • Type of breach (how unauthorized access to the PHI was obtained) • Location of breached information (Where was the PHI when unauthorized access was obtained) • Business associate present (Was a business associate such as a consultant or contractor involved in the breach) • Web description (A optional statement explaining what happened and the resolution)

Number of breaches by state

The following table highlighting the number of instances of data breaches by state and the average number of individuals affected in each instance of a data breach. California has had 252 data breaches the most of any state in the dataset.

##Breaches by entity type

The following table creates a summary of the number of breaches by covered entity type. Healthcare entities account for 71% of the 2048 entries.

Covered Entity Type number_of_breaches
Business Associate 315
Health Plan 268
Healthcare Clearing House 4
Healthcare Provider 1459
NA 2

##Summary of all Thefts

Below is a searchable table that contains all instances where an entity had classified a data breach as theft.

##Breaches with a business associate present for each entity type

This is a quick break down of the the number of breaches with a business associate present.

Covered Entity Type number_of_breaches
Business Associate 308
Health Plan 39
Healthcare Provider 61
NA 2

##Summary of all breaches on paper by state

Here is a quick break down of the number of paper/film data breaches to take place in the time frame. California is seen to have the most out of any state with 56 instances occurring.

##Number of healthcare data breaches by year

Below is a bar chart highlighting the number of data breaches by year. As can be seen in the graph there has been a significant decrease in the number of breaches in 2017. ##List of the top 25 largest healthcare data breaches

Here is a searchable list of the top 25 data breaches. The most common form of data breach was hacking/IT incident but interestingly the fourth largest breach of data (about half a million people) was recorded as a loss of data.

##Visual: Total healthcare records (individuals affected) exposed by state for the top 10 states

Here we observe a large spike in the number of individuals affected in Indiana. This is thanks to an outlier data point of Anthem, Inc. Affiliated Covered Entity making up 78800000 of the individuals affected. Strangly enough even though California has had the highest number of breaches out of any state they are the fifth largest sum of individuals affected.

##Number of healthcare hacking incidents by month

Here we can see a bar chart reflecting the number of healthcare hacking’s have taken place by month. The months of March and April have the highest count of hacking incidents taking place. Could this be in correlation with tax season?

##Number of breaches by covered entity type

As the table below shows there is a noticeable difference in the number of breaches taking place between 2009 and 2018 between Healthcare Providers and the other entity types.

Covered Entity Type Number of breaches
Business Associate 315
Health Plan 268
Healthcare Clearing House 4
Healthcare Provider 1459
NA 2

##Breaches during days of the week

The lowest two instances of breaches taking place are during the weekend but we can also see a large spike in the number of reported breaches on friday.

day Number of Breaches
Friday 617
Monday 333
Saturday 32
Sunday 22
Thursday 366
Tuesday 340
Wednesday 338

##How often where there more than or equal to 150 healthcare providor entity breaches and more than or equal to 50 bsuinesss associate?

2013 and 2014 where the only two years between 2009 and 2018 were more than 150 Healthcare Providers and more than 50 Business associate had data breaches.

Year HealthProvider BusinessAssociate
2013 193 64

##Breaches by Year.

The table below helps visualize the change in number of breaches occurring by breach type from year to year. We can see that in the case of thefts every year since 2010 fewer Thefts have been reported. In 2016 we can see that their was an extreme increase in the number of hacking incidents compared to the rest of the years in this frame. 71 instances of Hacking/IT incidents took place while in the years prior the largest count of instances was in 2014 with 33 entries of Hacking/IT incidents.

Part two of this document covers curious findings in data set that raise interesting questions and the present findings of useful information to the reader.

##Theft at work?

Between the years of 2009 and 2018 their were 118 instances of Thefts taking place while a business associate was present. Interestingly enough of those 118 instances 108 of those instances happened at business associate entities. That is 37% of all business associate entries in this database. Compared to Healthcare providers who only had 6 entries of theft with a business associate present representing only .5% of all healthcare provider entries, There is a noticeably higher correlation between thefts and business associates present at business associate entities.

##Which Business Assciate Entities where they?

To better support the visual above here is searchable database of the 108 business associate entities that had workers present during the theft.

##Social Security Numbers… better keep them safe!

In the table below we have the top 10 states with the most individuals affected by data breaches that contained the keyword social security number in their web description. California had the highest number of breaches in the entire data set at 49 but was still only the 7th largest state to have individuals affected. of the entire data set 450 entries contained the key word social security number and 11673076 individuals were affected.

State Number of breaches Number Of Individuals Affected
FL 40 4022358
NJ 7 1752386
UT 3 786557
TX 39 749640
SC 9 730118
WA 16 597482
CA 49 509730
PA 20 312742
IN 12 274688
PR 14 186598