Data
This data set is about healthcare data breaches in the United States. There
Table of summary statistics
This table shows the relationship between the average number of people affected in these breaches, the total number of people affected, the most common type of breach, and how many of the breaches are thefts.
## # A tibble: 52 × 5
## State `Average number of people affected` Total number of p…¹ Max T…² Total…³
## <chr> <dbl> <dbl> <chr> <int>
## 1 CA 14745. 3052133 Unauth… 114
## 2 TX 26235. 4040208 Unknown 69
## 3 FL 48402. 6001825 Unknown 51
## 4 NY 33121. 2782138 Unknown 47
## 5 IL 54559. 4692107 Unauth… 35
## 6 PA 21508. 1376521 Unauth… 29
## 7 PR 54997. 1704916 Unauth… 24
## 8 OH 12190. 707014 Unknown 22
## 9 WA 15335. 690054 Unknown 22
## 10 IN 1560329. 79576765 Unauth… 21
## 11 GA 11988. 611399 Unauth… 20
## 12 MA 4544. 181740 Unauth… 19
## 13 AZ 27733. 1137038 Unauth… 18
## 14 KY 5649. 197724 Unauth… 18
## 15 TN 41054. 1724277 Unauth… 17
## 16 VA 191037. 5158001 Unauth… 16
## 17 OR 10378. 300949 Unauth… 15
## 18 CT 8737. 218414 Unauth… 13
## 19 MI 5199. 213166 Unauth… 13
## 20 NJ 152590. 3051796 Unauth… 13
## 21 CO 6773. 216750 Unauth… 12
## 22 NC 8058. 370661 Unknown 12
## 23 AL 43257. 1081417 Unauth… 11
## 24 MO 4730. 151367 Unknown 11
## 25 NM 3583. 68082 Unauth… 11
## 26 MD 11674. 350226 Unauth… 9
## 27 MN 4846. 184145 Unauth… 9
## 28 LA 6341. 114140 Unauth… 8
## 29 SC 34778. 765107 Unauth… 8
## 30 WI 8170. 122555 Theft 8
## 31 OK 19566. 332619 Unauth… 7
## 32 RI 4588. 45877 Theft 7
## 33 KS 8088. 88971 Unauth… 6
## 34 NV 8124. 97492 Unauth… 6
## 35 NE 5132. 51317 Unauth… 5
## 36 AK 1509. 9053 Unauth… 4
## 37 AR 6387. 114959 Unauth… 4
## 38 DC 3852. 38519 Unauth… 4
## 39 MS 9971. 99706 Unauth… 4
## 40 UT 79661. 876270 Unauth… 4
## 41 WV 9004. 81035 Unauth… 4
## 42 MT 115652. 1156519 Unauth… 3
## 43 NH 59835. 239339 Theft 3
## 44 ID 4987. 14962 Theft 2
## 45 ND 3776. 15102 Theft 2
## 46 VT 1072. 3215 Unauth… 2
## 47 IA 4225. 42253 Unauth… 1
## 48 SD 5945. 23779 Unauth… 1
## 49 WY 7912 55384 Unauth… 1
## 50 DE 1781 3562 Unauth… 0
## # … with 2 more rows, and abbreviated variable names
## # ¹`Total number of people affected`, ²`Max Type of Breach`,
## # ³`Total cases of Theft`
Unauthorized access and Uknown seems to be th most common type of breach. Also, California has the most cases of theft.
Breaches by year
How many breaches are there every year?
Breaches seem to fall off in 2017 and 2018, this may be because of the data protection act of 2018.
Top 25 worst health care breaches
## # A tibble: 25 × 2
## `Name of Covered Entity` Indiv…¹
## <chr> <dbl>
## 1 Anthem, Inc. Affiliated Covered Entity 7.88e7
## 2 Science Applications International Corporation (SA 4.9 e6
## 3 Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Gr… 4.03e6
## 4 21st Century Oncology 2.21e6
## 5 Xerox State Healthcare, LLC 2 e6
## 6 IBM 1.9 e6
## 7 GRM Information Management Services 1.7 e6
## 8 AvMed, Inc. 1.22e6
## 9 Montana Department of Public Health & Human Services 1.06e6
## 10 The Nemours Foundation 1.06e6
## # … with 15 more rows, and abbreviated variable name ¹`Individuals Affected`
The worst breach was from a company called Anthem, interestingly enough, that is not their only breach.
Records exposed by state
Indiana by far has the most amount of records exposed. Maybe they should add some measures that help to better protect people from healthcare breaches.
How many hacking incidents are there per month?
The incidents seem to be more or less randomly distributed. November and February seem to have the least amount of incidents. While March and September have the most.
Number of breaches by enitity type
## # A tibble: 4 × 2
## `Covered Entity Type` count
## <chr> <int>
## 1 Business Associate 285
## 2 Health Plan 200
## 3 Healthcare Clearing House 4
## 4 Healthcare Provider 1220
Business associates have the most breaches out of any entity type
Which day of the week are breaches most often reported
## # A tibble: 1 × 1
## # Groups: Breach Submission Date [1]
## `Breach Submission Date`
## <chr>
## 1 Wednesday
Breaches are most often reported on Wednesday.
In which years were there 50 breaches from a business associate and 150 breaches from a healthcare provider
## # A tibble: 6 × 3
## # Groups: year [4]
## year `Covered Entity Type` count
## <chr> <chr> <int>
## 1 2013 Business Associate 64
## 2 2013 Healthcare Provider 187
## 3 2014 Business Associate 67
## 4 2014 Healthcare Provider 179
## 5 2015 Healthcare Provider 155
## 6 2016 Healthcare Provider 182
Only 2013 and 2014 fit these conditions.
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
Theft used to be a big problem but it has since shifted to unauthorized access. This may be because more data is now stored on computers instead of physical paper.
How many people are affected by year
There are slightly less people affected in 2017 and 2018. This may show
how the Data protection act of 2018 has been helping.
Where did the msot breaches occur
## # A tibble: 64 × 2
## `Location of Breached Information` n
## <chr> <int>
## 1 Paper/Films 405
## 2 Laptop 274
## 3 Network Server 213
## 4 Other 165
## 5 Email 142
## 6 Desktop Computer 133
## 7 Other Portable Electronic Device 101
## 8 Electronic Medical Record 64
## 9 Other, Other Portable Electronic Device 46
## 10 Laptop, Other Portable Electronic Device 16
## # … with 54 more rows
Paper and films have the most breaches. This may be because a lot of data used to be stored on paper. Theft used to be the most common form of breach, so it makes sense that paper would be easy to steal.