Introduction

The dataset provided shows breaches of data in the health department by the US Department of Health and Human Services. The data includes;

The data can be found in this OCR document; https://ocrportal.hhs.gov/ocr/breach/breach_report.jsf

In this publishment, we will be looking at some of the different observable trends that can be seen through visualizations and tables.

Affected People per State

In this table, we look at the number of affected people per state. Indiana has more people affected by breaches on average in their state than the others; this could be because they have more companies than other states, or one really big screw up.

## # A tibble: 20 × 2
##    State AvgAffected
##    <chr>       <dbl>
##  1 IN       1446128.
##  2 WA        221000.
##  3 TN        203147.
##  4 VA        164146.
##  5 NY        143719.
##  6 NJ        115028.
##  7 AZ        106060.
##  8 MT         97042.
##  9 UT         73064.
## 10 PR         53526.
## 11 MD         52675.
## 12 IL         48137.
## 13 NH         48022 
## 14 FL         42868.
## 15 CA         36501.
## 16 AL         36045.
## 17 SC         33289.
## 18 GA         29007.
## 19 TX         25606.
## 20 KY         21757.

Dog Days

This table shows the top 20 days that had the most people effected in a breach. 2015-03-13 actually had 2 breaches, one being the Anthem breach as well as a different breach that affected 4000 people.

## # A tibble: 20 × 2
##    Date       AvgAffected
##    <date>           <dbl>
##  1 2015-03-13   39402000 
##  2 2015-03-17    5501952.
##  3 2015-09-09    5008054.
##  4 2014-08-20    4500000 
##  5 2014-08-21    4500000 
##  6 2015-07-17    4500000 
##  7 2013-08-23    4029530 
##  8 2015-07-23    3900000 
##  9 2016-08-03    3620000 
## 10 2011-11-04    2451380.
## 11 2016-08-09    1739678.
## 12 2010-06-03    1220000 
## 13 2010-11-01    1023209 
## 14 2011-02-11     851125 
## 15 2010-07-19     800000 
## 16 2012-04-11     780000 
## 17 2016-08-12     767280.
## 18 2015-03-12     697586 
## 19 2011-04-14     633927 
## 20 2015-05-20     550253

Breaches per Year

This graph shows the number of breaches per year. We can see that as time goes on, breaches increase and then decrease. This can likely be attributed to the ongoing process of bettering safety standards.

Worst Companies

This table shows the 25 companies with the most people affected. Anthem is at number 1, with their massive data breach.

##    Individuals.Affected
## 1              78800000
## 2              11000000
## 3              10000000
## 4               4900000
## 5               4500000
## 6               4500000
## 7               4500000
## 8               4029530
## 9               3900000
## 10              3620000
## 11              3466120
## 12              2213597
## 13              2000000
## 14              1900000
## 15              1700000
## 16              1220000
## 17              1100000
## 18              1062509
## 19              1055489
## 20              1023209
## 21               943434
## 22               882590
## 23               839711
## 24               800000
## 25               780000
##                                                                                                   Name.of.Covered.Entity
## 1                                                                                 Anthem, Inc. Affiliated Covered Entity
## 2                                                                                                     Premera Blue Cross
## 3                                                                                             Excellus Health Plan, Inc.
## 4                                                                     Science Applications International Corporation (SA
## 5                                                                           University of California, Los Angeles Health
## 6                                                            Community Health Systems Professional Services Corporations
## 7                                                             Community Health Systems Professional Services Corporation
## 8                                                Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group
## 9                                                                                        Medical Informatics Engineering
## 10                                                                                                         Banner Health
## 11                                                                                                Newkirk Products, Inc.
## 12                                                                                                 21st Century Oncology
## 13                                                                                           Xerox State Healthcare, LLC
## 14                                                                                                                   IBM
## 15                                                                                   GRM Information Management Services
## 16                                                                                                           AvMed, Inc.
## 17                                                                                        CareFirst BlueCross BlueShield
## 18                                                                  Montana Department of Public Health & Human Services
## 19                                                                                                The Nemours Foundation
## 20                                                                               BlueCross BlueShield of Tennessee, Inc.
## 21                                                                                             Sutter Medical Foundation
## 22                              Valley Anesthesiology Consultants, Inc. d/b/a Valley Anesthesiology and Pain Consultants
## 23 Horizon Healthcare Services, Inc., doing business as Horizon Blue Cross Blue Shield of New Jersey, and its affiliates
## 24                                                                      Iron Mountain Data Products, Inc. (now known as 
## 25                                                                                Utah Department of Technology Services
##                    Type.of.Breach
## 1             Hacking/IT Incident
## 2             Hacking/IT Incident
## 3             Hacking/IT Incident
## 4                            Loss
## 5             Hacking/IT Incident
## 6             Hacking/IT Incident
## 7                           Theft
## 8                           Theft
## 9             Hacking/IT Incident
## 10            Hacking/IT Incident
## 11            Hacking/IT Incident
## 12            Hacking/IT Incident
## 13 Unauthorized Access/Disclosure
## 14                        Unknown
## 15                          Theft
## 16                          Theft
## 17            Hacking/IT Incident
## 18            Hacking/IT Incident
## 19                           Loss
## 20                          Theft
## 21                          Theft
## 22            Hacking/IT Incident
## 23                          Theft
## 24                           Loss
## 25            Hacking/IT Incident

States with Most Total Affected

This graph shows the states with the most total affected. Indiana has a huge difference between the others, mostly due to the Anthem breach.

Breaches per Month

Here we see the number of breaches per month. No clear difference here, meaning that the attacks are not based on a calendar system.

Breaches per Covered Entity Type

This table shows the number of breaches per the covered entity type. We can see that healthcare providers had the most number of breaches by a wide margin.

## # A tibble: 5 × 2
##   Covered.Entity.Type         Breaches
##   <chr>                          <int>
## 1 "Healthcare Provider"           1459
## 2 "Business Associate"             315
## 3 "Health Plan"                    267
## 4 "Healthcare Clearing House"        4
## 5 ""                                 2

Days with the Most Reports

Friday’s have the most reports. We can see in the tibble that 6, which is Friday if starting on Sunday, shows 617 total reports, which is the most.

## # A tibble: 7 × 2
##   day_of_week Breaches
##         <dbl>    <int>
## 1           6      617
## 2           5      366
## 3           3      339
## 4           4      338
## 5           2      333
## 6           7       32
## 7           1       22

Years where Healthcare Provider Breaches are >150, and Business Associates are >50

There are 6 years where Healthcare provider breaches are >150, and there are 2 years where there are >50 breaches for business associates.

## `summarise()` has grouped output by 'Breach_Year'. You can override using the
## `.groups` argument.
## # A tibble: 15 × 3
## # Groups:   Breach_Year [8]
##    Breach_Year Covered.Entity.Type Breaches
##          <dbl> <chr>                  <int>
##  1        2014 Business Associate        77
##  2        2013 Business Associate        64
##  3        2011 Business Associate        45
##  4        2010 Business Associate        44
##  5        2012 Business Associate        40
##  6        2016 Business Associate        18
##  7        2015 Business Associate        12
##  8        2009 Business Associate         3
##  9        2015 Healthcare Provider      195
## 10        2014 Healthcare Provider      194
## 11        2013 Healthcare Provider      193
## 12        2012 Healthcare Provider      154
## 13        2011 Healthcare Provider      135
## 14        2010 Healthcare Provider      134
## 15        2009 Healthcare Provider       14

Is the general trend of breaches increasing or decreasing?

I am going to use a line chart showing the count of breaches per year by mutating date into year, grouping by year, summarizing by “breachcount” which will count the number of entries, then plot the line graph using x as the year and y as the breachcount.

This line chart shows that the number of breaches increased from 2010-2016, but since then has seen a general decline. This could be due to better data protection practices, or a lack of reporting.