Introduction to Data

This data set includes information of collected breaches of unsecured and protected health information organized by the US Department of Health and Human Services (HHS) in the Office for Civil Rights (OCR). This office is responsible for collecting and reporting disclosures of protected health information (PHI) as mandated by law. Part of the law requires that the OCR report cases where covered entities (CE—organizations responsible for protecting health information) have a breach that affects more than 500 individuals. The data reported for each of these breaches include:

Variable Description
Name of the covered entity | Organization responsible for the PHI
State | US State where the breach was reported
Covered Entity Type | Type of organization responsible for the PH
Individuals Affected | Number of records affected by the breach
Breach Submission Date | Date the breach was reported by the CE
Type of Breach | how unauthorized access to the PHI was obtained
Location of Breached Information|Where was the PHI when unauthorized access was obtained
Business Associate Present | Was a business associate such as a consultant or contractor involved in the breach
Web Description | A optional statement explaining what happened and the resolution

The data has been reported on an OCR data portal since 2009: https://ocrportal.hhs.gov/ocr/breach/breach_report.jsf

Data Summary

This summary table looks into the number of breaches and number of records lost in each state. This table shows the the most record lost during one breach, and the average amount individuals affected by state.

## # A tibble: 53 × 7
##    State num_breaches total_records_lost median_record…¹ max_r…² min_r…³ mean_…⁴
##    <chr>        <int>              <int>           <dbl>   <int>   <int>   <dbl>
##  1 ""               2               3121           1560.    2621     500   1560.
##  2 "AK"             9              24839           1556    14719     501   2760.
##  3 "AL"            31            1117381           1680   943434     550  36045.
##  4 "AR"            20             145759           2732.   27393     560   7288.
##  5 "AZ"            45            4772694           2291  3620000     500 106060.
##  6 "CA"           252            9198296           2217  4500000     500  36501.
##  7 "CO"            41             245175           1918   105470     508   5980.
##  8 "CT"            32             292432           1690    93500     500   9138.
##  9 "DC"            11              39719           2200    18000     540   3611.
## 10 "DE"             2               3562           1781     1883    1679   1781 
## # … with 43 more rows, and abbreviated variable names ¹​median_records_lost,
## #   ²​max_records_lost, ³​min_records_lost, ⁴​mean_records_lost

From this summary table, we can see which states have the most breaches and most lost health information.This is important because by using this data we can discover which individuals are at most risk based on the state they reside in.

Number of healthcare data breaches by year

This chart shows us that the number of breaches peaked during 2014, and has been on a decline since. This chart gives us a good idea of how many breaches occur during the year, and how well companies are preventing them from occurring.

List of the top 25 largest healthcare data breaches

##    Individuals.Affected Covered.Entity.Type
## 1              78800000         Health Plan
## 2              11000000         Health Plan
## 3              10000000         Health Plan
## 4               4900000  Business Associate
## 5               4500000 Healthcare Provider
## 6               4500000  Business Associate
## 7               4500000  Business Associate
## 8               4029530 Healthcare Provider
## 9               3900000  Business Associate
## 10              3620000 Healthcare Provider
## 11              3466120  Business Associate
## 12              2213597 Healthcare Provider
## 13              2000000  Business Associate
## 14              1900000  Business Associate
## 15              1700000  Business Associate
## 16              1220000         Health Plan
## 17              1100000         Health Plan
## 18              1062509         Health Plan
## 19              1055489 Healthcare Provider
## 20              1023209         Health Plan
## 21               943434 Healthcare Provider
## 22               882590 Healthcare Provider
## 23               839711  Business Associate
## 24               800000  Business Associate
## 25               780000  Business Associate

This table shows the 25 largest healthcare data breaches since 2009. This chart shows the largest breach affecting around 78,000,000 people. This gives good perspective on how damaging these data breaches can be.

Total healthcare records (individuals affected) exposed by state for the top 10 states

## # A tibble: 10 × 2
##    State Individuals_affected
##    <chr>                <int>
##  1 IN                83875415
##  2 NY                16671389
##  3 WA                11713021
##  4 TN                10766794
##  5 CA                 9198296
##  6 FL                 6130108
##  7 VA                 5909244
##  8 AZ                 4772694
##  9 IL                 4717385
## 10 TX                 4506695

This table lists the top ten states with the most amount of health care records exposed by data breaches. Indiana leads the list with over 80,000,000 individuals affected. This table shows that state populations do not always correlate with number of individuals affected.

Number of healthcare hacking incidents by month

This chart shows the amount of hacking incidents that lead to health information data breaches by month of the year. We can use this graph to identify which month hacking incidents happen most frequently, and conversely which most hacking incidents happen least frequently.

Number of breaches by covered entity type

## # A tibble: 5 × 3
##   Covered.Entity.Type         num_breaches total_records_lost
##   <chr>                              <int>              <int>
## 1 ""                                     2              18599
## 2 "Business Associate"                 315           33338961
## 3 "Health Plan"                        267          110384390
## 4 "Healthcare Clearing House"            4              17754
## 5 "Healthcare Provider"               1459           34878870

This table shows the number of breaches and records lost by entity type. From this table we can observe that health care provider experienced the most amount of breaches.

On what day of the week are breaches most often reported?

This chart shows the number of breaches on each day of the week. From this chart we can determine that Saturday is the day of the week with the most breaches reported.

In which year (or years) were there at least 50 breaches from a ‘Business Associate’ covered entity type and at least 150 breaches from a healthcare provider covered entity type?

## # A tibble: 10 × 4
##    Breach.Submission.Year Breaches `Business_Associate_Breaches > 50` Healthca…¹
##                     <dbl>    <int>                              <int>      <int>
##  1                   2009       18                                  3          0
##  2                   2010      199                                 44          0
##  3                   2011      200                                 45          0
##  4                   2012      218                                 40          0
##  5                   2013      277                                 64          0
##  6                   2014      314                                 77          0
##  7                   2015      268                                 12          0
##  8                   2016      307                                 18          0
##  9                   2017      209                                 10          0
## 10                   2018       37                                  2          0
## # … with abbreviated variable name ¹​`Healthcare_Provider_Breaches > 150`

This table displays the number of ‘Business Associate’ and ‘Healthcare Provider’ breaches for every year since 2009. From this data table, we can conclude that 2013 and 2014 are the only years with at least 50 breaches from a ‘Business Associate’ covered entity type and at least 150 breaches from a healthcare provider covered entity type.

How has the type of breach (hacking, improper disposal, loss, etc.) changed for each year?

## # A tibble: 10 × 8
##    Breach.Submission.Year Hacking/IT…¹ Impro…²  Loss Theft Unaut…³ Other Unknown
##                     <dbl>        <int>   <int> <int> <int>   <int> <int>   <int>
##  1                   2009            0       0     1    15       0     2       0
##  2                   2010            8      10    20   135      10    23       0
##  3                   2011           17       7    20   123      36     2       7
##  4                   2012           17       8    21   132      43    20       3
##  5                   2013           29      14    25   133      76    21       5
##  6                   2014           39      12    31   129     106    29       1
##  7                   2015           56       6    24    80     102     0       0
##  8                   2016          102       7    14    59     125     0       0
##  9                   2017           73       7    13    35      81     0       0
## 10                   2018           13       1     1     3      19     0       0
## # … with abbreviated variable names ¹​`Hacking/IT Incident`,
## #   ²​`Improper Disposal`, ³​`Unauthorized Access/Disclosure`

From this data frame we can see that Hacking/IT Incidents have occurred more frequently in recent years, along with Unauthorized Access/Disclosure. Loss has remained constant over the years. Theft has decreased over the years.

Where do the most breaches of health information come from?

## # A tibble: 69 × 2
##    Location.of.Breached.Information                                      num_b…¹
##    <chr>                                                                   <int>
##  1 Desktop Computer                                                          148
##  2 Desktop Computer, Electronic Medical Record                                 4
##  3 Desktop Computer, Electronic Medical Record, Email, Laptop, Network …       1
##  4 Desktop Computer, Electronic Medical Record, Email, Laptop, Network …       1
##  5 Desktop Computer, Electronic Medical Record, Email, Laptop, Network …       1
##  6 Desktop Computer, Electronic Medical Record, Email, Laptop, Network …       1
##  7 Desktop Computer, Electronic Medical Record, Email, Laptop, Network …       2
##  8 Desktop Computer, Electronic Medical Record, Email, Laptop, Other Po…       1
##  9 Desktop Computer, Electronic Medical Record, Email, Laptop, Other, O…       1
## 10 Desktop Computer, Electronic Medical Record, Email, Network Server          2
## # … with 59 more rows, and abbreviated variable name ¹​num_breaches

From this table we can concur that paper/film is the location with the highest amount of breaches. Laptop and Network Server are in a second tier at around 300 breaches, and Desktop Computer and Email are around 200 breaches. To answer the question, from this data set most breaches come from Paper files.

From which entities are most people affected by data breaches?

## # A tibble: 5 × 2
##   Covered.Entity.Type         Individuals_affected
##   <chr>                                      <int>
## 1 "Health Plan"                          110384390
## 2 "Healthcare Provider"                   34878870
## 3 "Business Associate"                    33338961
## 4 ""                                         18599
## 5 "Healthcare Clearing House"                17754

From this chart we can determine people are most vulnerable of breaches from a health plan.Both healthcare provider and business associate are very similar in amount of people affected through breaches. To answer the question, people are most vulnerable of breaches from a health care plan.