Introduction

This data is obtained from the US Department of Health and Human Services (HHS) and tracks all health care information breaches, This analysis will help you better understand those affected by these data breaches.

Required Packages:
readr
stringr
ggplot2
dplyr
lubridate

Summary Statistics

Which data loss types account for highest percentage of cases?

## # A tibble: 1 × 7
##   `% Hacking` `% Disposal` `% Loss` `% Theft` `% Disclosure` `% Unknown` % Oth…¹
##         <dbl>        <dbl>    <dbl>     <dbl>          <dbl>       <dbl>   <dbl>
## 1        13.7         3.86     9.19      44.8           27.8       0.761    5.50
## # … with abbreviated variable name ¹​`% Other`

Theft accounts for 44.8% of incidents

2.1.1

Data breaches per year

Breaches increased until 2015, when we saw a slight decrease

2.1.2

25 Largest Breaches

## # A tibble: 25 × 16
##    Name of Co…¹ State Cover…² Indiv…³ Breach S…⁴ Type …⁵ Locat…⁶ Busin…⁷ Web D…⁸
##    <chr>        <chr> <chr>     <dbl> <date>     <chr>   <chr>   <chr>   <chr>  
##  1 Anthem, Inc… IN    Health…  7.88e7 2015-03-13 Hackin… Networ… No      "On Fe…
##  2 Science App… VA    Busine…  4.9 e6 2011-11-04 Loss    Other   Yes     "\\N"  
##  3 Advocate He… IL    Health…  4.03e6 2013-08-23 Theft   Deskto… No      "Advoc…
##  4 21st Centur… FL    Health…  2.21e6 2016-03-04 Hackin… Networ… No      "Failu…
##  5 Xerox State… TX    Busine…  2   e6 2014-09-10 Unauth… Deskto… Yes     "\\N"  
##  6 IBM          NY    Busine…  1.9 e6 2011-04-14 Unknown Other   Yes     "\\N"  
##  7 GRM Informa… NJ    Busine…  1.7 e6 2011-02-11 Theft   Electr… Yes     "Unenc…
##  8 AvMed, Inc.  FL    Health…  1.22e6 2010-06-03 Theft   Laptop  No      "Two l…
##  9 Montana Dep… MT    Health…  1.06e6 2014-07-07 Hackin… Networ… No      "Monta…
## 10 The Nemours… FL    Health…  1.06e6 2011-10-07 Loss    Other   No      "A loc…
## # … with 15 more rows, 7 more variables: hackIT <dbl>, dispo <dbl>, loss <dbl>,
## #   theft <dbl>, unauth <dbl>, unknown <dbl>, other <dbl>, and abbreviated
## #   variable names ¹​`Name of Covered Entity`, ²​`Covered Entity Type`,
## #   ³​`Individuals Affected`, ⁴​`Breach Submission Date`, ⁵​`Type of Breach`,
## #   ⁶​`Location of Breached Information`, ⁷​`Business Associate Present`,
## #   ⁸​`Web Description`

2.1.3

10 states affected most

Indiana has the largest number of affected individuals

2.1.4

Hacking Incidents by Month

There is not a common trend to which months include the most breaches

2.1.5

Hacking Incidents by Covered Entity Type

## # A tibble: 4 × 2
##   `Covered Entity Type`     Breaches
##   <chr>                        <int>
## 1 Business Associate             285
## 2 Health Plan                    200
## 3 Healthcare Clearing House        4
## 4 Healthcare Provider           1220

Healthcare Providers are responsible for the most breaches

2.2.1

Which days have the highest breach incidents?

## # A tibble: 7 × 2
##   day   Breaches
##   <ord>    <int>
## 1 Sun         19
## 2 Mon        286
## 3 Tue        281
## 4 Wed        282
## 5 Thu        300
## 6 Fri        512
## 7 Sat         29

There are fewer reports on the weekends since businesses are not usually operating. M-TR are consistent, but there are the most reports on Fridays

2.2.2

In which year (or years) were there at least 50 breaches from a ‘Business Associate’ covered entity type and at least 150 breaches from a healthcare provider covered entity type?

## # A tibble: 2 × 3
##    year   biz  care
##   <dbl> <int> <int>
## 1  2013    64   187
## 2  2014    67   179

2013 and 2014 were the only years that had over 50 Business Associate and 150 Healthcare Provider Breaches

2.2.3

How has the type of breach (hacking, improper disposal, loss, etc.) changed for each year?

## # A tibble: 1,709 × 9
## # Groups:   year [10]
##    `Breach Submission Date` hackIT dispo  loss theft unauth unknown other  year
##    <date>                    <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl>
##  1 2018-03-12                    0     0     0     0      1       0     0  2018
##  2 2017-12-29                    0     0     0     1      0       0     0  2017
##  3 2017-12-21                    0     0     0     0      1       0     0  2017
##  4 2017-12-05                    0     0     0     0      1       0     0  2017
##  5 2017-12-05                    1     0     0     0      0       0     0  2017
##  6 2017-11-13                    0     1     0     0      0       0     0  2017
##  7 2017-10-24                    0     0     0     1      0       0     0  2017
##  8 2017-10-23                    0     0     0     0      1       0     0  2017
##  9 2017-10-20                    0     0     0     0      1       0     0  2017
## 10 2017-10-20                    0     0     1     0      0       0     0  2017
## # … with 1,699 more rows

3.1

a.) How many times does data get breached from a laptop when an associate is present? I am asking this to understand how mishandling company laptops can lead to a data breach

## # A tibble: 1 × 1
##   AssociatePresent
##              <int>
## 1               52

52 times an associate was present that a breach occured through a laptop. Did these employees face criminal charges?

b.) Which of these words regarding legal action appear most in the web description? I am curious to know if sanctions occur and how they talk about protected documents

## # A tibble: 1 × 3
##   sanctions counsel protected
##       <int>   <int>     <int>
## 1       173      40      1006

Over 173 breaches resulted in sanctions