Data

This data set is about healthcare data breaches in the United States. There

Table of summary statistics

This table shows the relationship between the average number of people affected in these breaches, the total number of people affected, the most common type of breach, and how many of the breaches are thefts.

## # A tibble: 52 × 5
##    State `Average number of people affected` Total number of p…¹ Max T…² Total…³
##    <chr>                               <dbl>               <dbl> <chr>     <int>
##  1 CA                                 14745.             3052133 Unauth…     114
##  2 TX                                 26235.             4040208 Unknown      69
##  3 FL                                 48402.             6001825 Unknown      51
##  4 NY                                 33121.             2782138 Unknown      47
##  5 IL                                 54559.             4692107 Unauth…      35
##  6 PA                                 21508.             1376521 Unauth…      29
##  7 PR                                 54997.             1704916 Unauth…      24
##  8 OH                                 12190.              707014 Unknown      22
##  9 WA                                 15335.              690054 Unknown      22
## 10 IN                               1560329.            79576765 Unauth…      21
## 11 GA                                 11988.              611399 Unauth…      20
## 12 MA                                  4544.              181740 Unauth…      19
## 13 AZ                                 27733.             1137038 Unauth…      18
## 14 KY                                  5649.              197724 Unauth…      18
## 15 TN                                 41054.             1724277 Unauth…      17
## 16 VA                                191037.             5158001 Unauth…      16
## 17 OR                                 10378.              300949 Unauth…      15
## 18 CT                                  8737.              218414 Unauth…      13
## 19 MI                                  5199.              213166 Unauth…      13
## 20 NJ                                152590.             3051796 Unauth…      13
## 21 CO                                  6773.              216750 Unauth…      12
## 22 NC                                  8058.              370661 Unknown      12
## 23 AL                                 43257.             1081417 Unauth…      11
## 24 MO                                  4730.              151367 Unknown      11
## 25 NM                                  3583.               68082 Unauth…      11
## 26 MD                                 11674.              350226 Unauth…       9
## 27 MN                                  4846.              184145 Unauth…       9
## 28 LA                                  6341.              114140 Unauth…       8
## 29 SC                                 34778.              765107 Unauth…       8
## 30 WI                                  8170.              122555 Theft         8
## 31 OK                                 19566.              332619 Unauth…       7
## 32 RI                                  4588.               45877 Theft         7
## 33 KS                                  8088.               88971 Unauth…       6
## 34 NV                                  8124.               97492 Unauth…       6
## 35 NE                                  5132.               51317 Unauth…       5
## 36 AK                                  1509.                9053 Unauth…       4
## 37 AR                                  6387.              114959 Unauth…       4
## 38 DC                                  3852.               38519 Unauth…       4
## 39 MS                                  9971.               99706 Unauth…       4
## 40 UT                                 79661.              876270 Unauth…       4
## 41 WV                                  9004.               81035 Unauth…       4
## 42 MT                                115652.             1156519 Unauth…       3
## 43 NH                                 59835.              239339 Theft         3
## 44 ID                                  4987.               14962 Theft         2
## 45 ND                                  3776.               15102 Theft         2
## 46 VT                                  1072.                3215 Unauth…       2
## 47 IA                                  4225.               42253 Unauth…       1
## 48 SD                                  5945.               23779 Unauth…       1
## 49 WY                                  7912                55384 Unauth…       1
## 50 DE                                  1781                 3562 Unauth…       0
## # … with 2 more rows, and abbreviated variable names
## #   ¹​`Total number of people affected`, ²​`Max Type of Breach`,
## #   ³​`Total cases of Theft`

Unauthorized access and Uknown seems to be th most common type of breach. Also, California has the most cases of theft.

Breaches by year

How many breaches are there every year?

Breaches seem to fall off in 2017 and 2018, this may be because of the data protection act of 2018.

Top 25 worst health care breaches

## # A tibble: 25 × 2
##    `Name of Covered Entity`                                              Indiv…¹
##    <chr>                                                                   <dbl>
##  1 Anthem, Inc. Affiliated Covered Entity                                 7.88e7
##  2 Science Applications International Corporation (SA                     4.9 e6
##  3 Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Gr…  4.03e6
##  4 21st Century Oncology                                                  2.21e6
##  5 Xerox State Healthcare, LLC                                            2   e6
##  6 IBM                                                                    1.9 e6
##  7 GRM Information Management Services                                    1.7 e6
##  8 AvMed, Inc.                                                            1.22e6
##  9 Montana Department of Public Health & Human Services                   1.06e6
## 10 The Nemours Foundation                                                 1.06e6
## # … with 15 more rows, and abbreviated variable name ¹​`Individuals Affected`

The worst breach was from a company called Anthem, interestingly enough, that is not their only breach.

Records exposed by state

Indiana by far has the most amount of records exposed. Maybe they should add some measures that help to better protect people from healthcare breaches.

How many hacking incidents are there per month?

The incidents seem to be more or less randomly distributed. November and February seem to have the least amount of incidents. While March and September have the most.

Number of breaches by enitity type

## # A tibble: 4 × 2
##   `Covered Entity Type`     count
##   <chr>                     <int>
## 1 Business Associate          285
## 2 Health Plan                 200
## 3 Healthcare Clearing House     4
## 4 Healthcare Provider        1220

Business associates have the most breaches out of any entity type

Which day of the week are breaches most often reported

## # A tibble: 1 × 1
## # Groups:   Breach Submission Date [1]
##   `Breach Submission Date`
##   <chr>                   
## 1 Wednesday

Breaches are most often reported on Wednesday.

In which years were there 50 breaches from a business associate and 150 breaches from a healthcare provider

## # A tibble: 6 × 3
## # Groups:   year [4]
##   year  `Covered Entity Type` count
##   <chr> <chr>                 <int>
## 1 2013  Business Associate       64
## 2 2013  Healthcare Provider     187
## 3 2014  Business Associate       67
## 4 2014  Healthcare Provider     179
## 5 2015  Healthcare Provider     155
## 6 2016  Healthcare Provider     182

Only 2013 and 2014 fit these conditions.

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

Theft used to be a big problem but it has since shifted to unauthorized access. This may be because more data is now stored on computers instead of physical paper.

How many people are affected by year

There are slightly less people affected in 2017 and 2018. This may show how the Data protection act of 2018 has been helping.

Where did the msot breaches occur

## # A tibble: 64 × 2
##    `Location of Breached Information`           n
##    <chr>                                    <int>
##  1 Paper/Films                                405
##  2 Laptop                                     274
##  3 Network Server                             213
##  4 Other                                      165
##  5 Email                                      142
##  6 Desktop Computer                           133
##  7 Other Portable Electronic Device           101
##  8 Electronic Medical Record                   64
##  9 Other, Other Portable Electronic Device     46
## 10 Laptop, Other Portable Electronic Device    16
## # … with 54 more rows

Paper and films have the most breaches. This may be because a lot of data used to be stored on paper. Theft used to be the most common form of breach, so it makes sense that paper would be easy to steal.