Introduction to Data
This data set includes information of collected breaches of unsecured and protected health information organized by the US Department of Health and Human Services (HHS) in the Office for Civil Rights (OCR). This office is responsible for collecting and reporting disclosures of protected health information (PHI) as mandated by law. Part of the law requires that the OCR report cases where covered entities (CE—organizations responsible for protecting health information) have a breach that affects more than 500 individuals. The data reported for each of these breaches include:
| Variable | Description |
|---|---|
| Name of the covered entity | Organization responsible for the PHI | |
| State | US State where the breach was reported | |
| Covered Entity Type | Type of organization responsible for the PH | |
| Individuals Affected | Number of records affected by the breach | |
| Breach Submission Date | Date the breach was reported by the CE | |
| Type of Breach | how unauthorized access to the PHI was obtained | |
| Location of Breached Information|Where was the PHI when unauthorized access was obtained | |
| Business Associate Present | Was a business associate such as a consultant or contractor involved in the breach | |
| Web Description | A optional statement explaining what happened and the resolution | |
The data has been reported on an OCR data portal since 2009: https://ocrportal.hhs.gov/ocr/breach/breach_report.jsf
Data Summary
This summary table looks into the number of breaches and number of records lost in each state. This table shows the the most record lost during one breach, and the average amount individuals affected by state.
## # A tibble: 53 × 7
## State num_breaches total_records_lost median_record…¹ max_r…² min_r…³ mean_…⁴
## <chr> <int> <int> <dbl> <int> <int> <dbl>
## 1 "" 2 3121 1560. 2621 500 1560.
## 2 "AK" 9 24839 1556 14719 501 2760.
## 3 "AL" 31 1117381 1680 943434 550 36045.
## 4 "AR" 20 145759 2732. 27393 560 7288.
## 5 "AZ" 45 4772694 2291 3620000 500 106060.
## 6 "CA" 252 9198296 2217 4500000 500 36501.
## 7 "CO" 41 245175 1918 105470 508 5980.
## 8 "CT" 32 292432 1690 93500 500 9138.
## 9 "DC" 11 39719 2200 18000 540 3611.
## 10 "DE" 2 3562 1781 1883 1679 1781
## # … with 43 more rows, and abbreviated variable names ¹median_records_lost,
## # ²max_records_lost, ³min_records_lost, ⁴mean_records_lost
From this summary table, we can see which states have the most breaches and most lost health information.This is important because by using this data we can discover which individuals are at most risk based on the state they reside in.
Number of healthcare data breaches by year
This chart shows us that the number of breaches peaked during 2014, and has been on a decline since. This chart gives us a good idea of how many breaches occur during the year, and how well companies are preventing them from occurring.
List of the top 25 largest healthcare data breaches
## Individuals.Affected Covered.Entity.Type
## 1 78800000 Health Plan
## 2 11000000 Health Plan
## 3 10000000 Health Plan
## 4 4900000 Business Associate
## 5 4500000 Healthcare Provider
## 6 4500000 Business Associate
## 7 4500000 Business Associate
## 8 4029530 Healthcare Provider
## 9 3900000 Business Associate
## 10 3620000 Healthcare Provider
## 11 3466120 Business Associate
## 12 2213597 Healthcare Provider
## 13 2000000 Business Associate
## 14 1900000 Business Associate
## 15 1700000 Business Associate
## 16 1220000 Health Plan
## 17 1100000 Health Plan
## 18 1062509 Health Plan
## 19 1055489 Healthcare Provider
## 20 1023209 Health Plan
## 21 943434 Healthcare Provider
## 22 882590 Healthcare Provider
## 23 839711 Business Associate
## 24 800000 Business Associate
## 25 780000 Business Associate
This table shows the 25 largest healthcare data breaches since 2009. This chart shows the largest breach affecting around 78,000,000 people. This gives good perspective on how damaging these data breaches can be.
Total healthcare records (individuals affected) exposed by state for the top 10 states
## # A tibble: 10 × 2
## State Individuals_affected
## <chr> <int>
## 1 IN 83875415
## 2 NY 16671389
## 3 WA 11713021
## 4 TN 10766794
## 5 CA 9198296
## 6 FL 6130108
## 7 VA 5909244
## 8 AZ 4772694
## 9 IL 4717385
## 10 TX 4506695
This table lists the top ten states with the most amount of health care records exposed by data breaches. Indiana leads the list with over 80,000,000 individuals affected. This table shows that state populations do not always correlate with number of individuals affected.
Number of healthcare hacking incidents by month
This chart shows the amount of hacking incidents that lead to health information data breaches by month of the year. We can use this graph to identify which month hacking incidents happen most frequently, and conversely which most hacking incidents happen least frequently.
Number of breaches by covered entity type
## # A tibble: 5 × 3
## Covered.Entity.Type num_breaches total_records_lost
## <chr> <int> <int>
## 1 "" 2 18599
## 2 "Business Associate" 315 33338961
## 3 "Health Plan" 267 110384390
## 4 "Healthcare Clearing House" 4 17754
## 5 "Healthcare Provider" 1459 34878870
This table shows the number of breaches and records lost by entity type. From this table we can observe that health care provider experienced the most amount of breaches.
On what day of the week are breaches most often reported?
This chart shows the number of breaches on each day of the week. From this chart we can determine that Saturday is the day of the week with the most breaches reported.
In which year (or years) were there at least 50 breaches from a ‘Business Associate’ covered entity type and at least 150 breaches from a healthcare provider covered entity type?
## # A tibble: 10 × 4
## Breach.Submission.Year Breaches `Business_Associate_Breaches > 50` Healthca…¹
## <dbl> <int> <int> <int>
## 1 2009 18 3 0
## 2 2010 199 44 0
## 3 2011 200 45 0
## 4 2012 218 40 0
## 5 2013 277 64 0
## 6 2014 314 77 0
## 7 2015 268 12 0
## 8 2016 307 18 0
## 9 2017 209 10 0
## 10 2018 37 2 0
## # … with abbreviated variable name ¹`Healthcare_Provider_Breaches > 150`
This table displays the number of ‘Business Associate’ and ‘Healthcare Provider’ breaches for every year since 2009. From this data table, we can conclude that 2013 and 2014 are the only years with at least 50 breaches from a ‘Business Associate’ covered entity type and at least 150 breaches from a healthcare provider covered entity type.
How has the type of breach (hacking, improper disposal, loss, etc.) changed for each year?
## # A tibble: 10 × 8
## Breach.Submission.Year Hacking/IT…¹ Impro…² Loss Theft Unaut…³ Other Unknown
## <dbl> <int> <int> <int> <int> <int> <int> <int>
## 1 2009 0 0 1 15 0 2 0
## 2 2010 8 10 20 135 10 23 0
## 3 2011 17 7 20 123 36 2 7
## 4 2012 17 8 21 132 43 20 3
## 5 2013 29 14 25 133 76 21 5
## 6 2014 39 12 31 129 106 29 1
## 7 2015 56 6 24 80 102 0 0
## 8 2016 102 7 14 59 125 0 0
## 9 2017 73 7 13 35 81 0 0
## 10 2018 13 1 1 3 19 0 0
## # … with abbreviated variable names ¹`Hacking/IT Incident`,
## # ²`Improper Disposal`, ³`Unauthorized Access/Disclosure`
From this data frame we can see that Hacking/IT Incidents have occurred more frequently in recent years, along with Unauthorized Access/Disclosure. Loss has remained constant over the years. Theft has decreased over the years.
Where do the most breaches of health information come from?
## # A tibble: 69 × 2
## Location.of.Breached.Information num_b…¹
## <chr> <int>
## 1 Desktop Computer 148
## 2 Desktop Computer, Electronic Medical Record 4
## 3 Desktop Computer, Electronic Medical Record, Email, Laptop, Network … 1
## 4 Desktop Computer, Electronic Medical Record, Email, Laptop, Network … 1
## 5 Desktop Computer, Electronic Medical Record, Email, Laptop, Network … 1
## 6 Desktop Computer, Electronic Medical Record, Email, Laptop, Network … 1
## 7 Desktop Computer, Electronic Medical Record, Email, Laptop, Network … 2
## 8 Desktop Computer, Electronic Medical Record, Email, Laptop, Other Po… 1
## 9 Desktop Computer, Electronic Medical Record, Email, Laptop, Other, O… 1
## 10 Desktop Computer, Electronic Medical Record, Email, Network Server 2
## # … with 59 more rows, and abbreviated variable name ¹num_breaches
From this table we can concur that paper/film is the location with the highest amount of breaches. Laptop and Network Server are in a second tier at around 300 breaches, and Desktop Computer and Email are around 200 breaches. To answer the question, from this data set most breaches come from Paper files.
From which entities are most people affected by data breaches?
## # A tibble: 5 × 2
## Covered.Entity.Type Individuals_affected
## <chr> <int>
## 1 "Health Plan" 110384390
## 2 "Healthcare Provider" 34878870
## 3 "Business Associate" 33338961
## 4 "" 18599
## 5 "Healthcare Clearing House" 17754
From this chart we can determine people are most vulnerable of breaches from a health plan.Both healthcare provider and business associate are very similar in amount of people affected through breaches. To answer the question, people are most vulnerable of breaches from a health care plan.