Introduction

The data being used throughout this analysis is from the US Department of Health and Human Services (HHS) in the Office for Civil Rights (OCR). This office is responsible for collecting and reporting disclosures of protected health information (PHI) as mandated by law. Part of the law requires that the OCR report cases where covered entities (CE—organizations responsible for protecting health information) have a breach that affects more than 500 individuals.

The data can be found uning this link : http://asayanalytics.com/breach_archive-csv

Summary Statistics

This chart breaks down incidents on the state level. As expected, the larger states had more incidents along with a wider range of how many people were affected by each incident. Obviously the larger states had more incidents with more people so overall they take the cake with regards to total affected. What is interesting is that IN and NJ have extremely large affected values, but are not states that I would off the top of my head think would be a leading party in data breaches.

## # A tibble: 52 x 6
##    State incident_by_state average_affected unauthorized_access_tot~ sd_affected
##    <chr>             <int>            <dbl>                    <int>       <dbl>
##  1 AK                    6            1509.                        2        864.
##  2 AL                   25           43257.                        6     187830.
##  3 AR                   18            6387.                        9       7989.
##  4 AZ                   41           27733.                       11     137332.
##  5 CA                  207           14745.                       50      63447.
##  6 CO                   32            6773.                       13      18580.
##  7 CT                   25            8737.                        8      20716.
##  8 DC                   10            3852.                        4       5111.
##  9 DE                    2            1781                         1        144.
## 10 FL                  124           48402.                       46     247108.
## # ... with 42 more rows, and 1 more variable: total_affected <dbl>

Number of healthcare data breaches by year

Healthcare data breaches can be extremely serious. Information that people have entrusted to companies become open to those who may not have the best intentions. This graphic allows us to identify that overall breaches appear to have peaked in 2014 and have taken a relatively downhill turn over recent years.

Top 25 healthcare data breaches

Not all data breaches are the same. There are many different ranges and each one is unique. Below you will find the top 25 data breaches within this data set. Notice that the largest, being Anthem, is significantly larger than every other breach recorded.

## # A tibble: 25 x 2
##    `Individuals Affected` `Name of Covered Entity`                              
##                     <dbl> <chr>                                                 
##  1               78800000 Anthem, Inc. Affiliated Covered Entity                
##  2                4900000 Science Applications International Corporation (SA    
##  3                4029530 Advocate Health and Hospitals Corporation, d/b/a Advo~
##  4                2213597 21st Century Oncology                                 
##  5                2000000 Xerox State Healthcare, LLC                           
##  6                1900000 IBM                                                   
##  7                1700000 GRM Information Management Services                   
##  8                1220000 AvMed, Inc.                                           
##  9                1062509 Montana Department of Public Health & Human Services  
## 10                1055489 The Nemours Foundation                                
## # ... with 15 more rows

Total healthcare records exposed by state for the top 10 states

The chart below provides us with some insight as to which states have been the most problematic with regards to their breaches. It is important to note that Indiana has a significantly larger amount of individuals affected. This is due to that Anthem breach which affected just under 79 million people.

Number of healthcare hacking incidents by month

Knowing when healthcare hacking incidents occur can be extremely useful when looking to deter these types of threats. The month that experiences the most healthcare hacking incidents is May with April being a close second. All in all it looks as though the incidents peak in every other quarter and experience dips between.

Number of breaches by covered entity type

When examining breaches, it is useful to understand which type of entity is having the most problems with keeping their data secure. This table reveals that healthcare providers, by far, experience the most amount of breaches.

## # A tibble: 4 x 2
##   `Covered Entity Type`     number_of_breaches
##   <chr>                                  <int>
## 1 Business Associate                       285
## 2 Health Plan                              200
## 3 Healthcare Clearing House                  4
## 4 Healthcare Provider                     1220

Reported breaches by day of the week

Even though breaches may occur earlier in the week, it is clear that they are reported prior to the weekend. This could be due to employees wishing to report the incident prior to the weekend so that there is some time for the heat to Passover.

Years in which there were 50 or more breaches by a Business Associate and 150 or more breaches by a Healthcare Provider

As shown by the table below, there were only two years that met the above criteria. The amount of breaches required by Associates and Providers to appear on this list are relatively severe and can explain why there are so little observations. Knowing that these years had some of the most offenses, we can identify practices and aim to develop counter actions.

## # A tibble: 6 x 3
## # Groups:   Year [4]
##   Year  `Covered Entity Type` number_of_breaches
##   <fct> <chr>                              <int>
## 1 2013  Business Associate                    64
## 2 2013  Healthcare Provider                  187
## 3 2014  Business Associate                    67
## 4 2014  Healthcare Provider                  179
## 5 2015  Healthcare Provider                  155
## 6 2016  Healthcare Provider                  182

Type of breach by year

This table allows us to identify trends between types of breaches over the years. It can also provide some insight into the amount of each type of breach by year. Overall, 2013 and 2014 had the most amount of breaches and in total, theft seems to be the most poular type of breach.

## # A tibble: 10 x 9
##    Year  hack_it_total imp_disp_total loss_total theft_total u_a_d_total
##    <fct>         <int>          <int>      <int>       <int>       <int>
##  1 2009              0              0          1          15           0
##  2 2010              0             10         20         135          10
##  3 2011              0              7         18         122          34
##  4 2012              0              8         20         124          40
##  5 2013              0             13         24         131          73
##  6 2014              0             11         30         111          98
##  7 2015              0              6         23          64          80
##  8 2016              0              7         12          46          96
##  9 2017              0              4          9          17          43
## 10 2018              0              0          0           0           1
## # ... with 3 more variables: unknown_total <int>, other_total <int>,
## #   total_breaches <int>

Breaches by month of the year

Just as we looked at specifically Hacking/IT breaches by month, it is interesting to see how all breaches broken by month line up. There are extremely similar trends with April and May still standing out. However, the dips between April and October are far less pronounced. This reveals total incidents follow a relatively similar pattern as Hacking/IT incidents.

Unauthorized access breaches by year

It seems as though that there is a n shape. There were not many breaches towards the beginning of the period and then peaked in 2014. It then dipped after that. It may be that the amount of breaches of this type came to the attention of authority and there were measures put in place to limit future breaches.