Purpose of Analysis: The purpose of this analysis is to analyze and summarize data in regards to Breach of Unsecured Protected Health Information reported by the OCR
Explanation of Data: The data used in this analysis is collected by the OCR. They are responsible for collecting and reporting disclosures of protected health information (PHI) as mandadted by law. Additionally, OCR cases where covered entities (CE) have a breach that affects more than 500 individuals. Included in the data reported in each of these breaches include: State, Covered Entity Type, Breach Submission Date, Type of Breach, and other factors.
Proposed Approach/Analytical Technique: I will be taking this data and create data visualizations that will will help me get a better and more in-depth understanding of the types of breach, individuals that are affected by the breach, and the location of the breached information.
How will my analysis help? It will allow other users to interpret trends and insights on information that is related to protected health information (PHI) data breaches.
The packages that will be used during this analysis include:
dplyr: Provides a flexible grammar of data manipulation. Also similiar to SQL in that it helps manipulate datasets so the data is easy to use in R.
Tidyverse: Designed to make it easy to install and load multiple tidyverse packages in a single step.
ggplot2: Helps you map variables to aesthetics, what graphical primitives to use, and helps you plot data using different visualizations.
DT: Allows you to create an HTML widget to display data from your dataset using the JavaScript library DataTables.
skimr: Provides an alterative to the default summary functions within R. Also offers a human readable output as well.
## -- Attaching packages -------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.0 v purrr 0.3.3
## v tibble 3.0.0 v dplyr 0.8.5
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ----------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Importing Data Sets and Creating the Table:
## Parsed with column specification:
## cols(
## `Name of Covered Entity` = col_character(),
## State = col_character(),
## `Covered Entity Type` = col_character(),
## `Individuals Affected` = col_double(),
## `Breach Submission Date` = col_character(),
## `Type of Breach` = col_character(),
## `Location of Breached Information` = col_character(),
## `Business Associate Present` = col_character(),
## `Web Description` = col_character()
## )
## Parsed with column specification:
## cols(
## `Name of Covered Entity` = col_character(),
## State = col_character(),
## `Covered Entity Type` = col_character(),
## `Individuals Affected` = col_double(),
## `Breach Submission Date` = col_character(),
## `Type of Breach` = col_character(),
## `Location of Breached Information` = col_character(),
## `Business Associate Present` = col_character(),
## `Web Description` = col_logical()
## )
## Name of Covered Entity State
## 0 3
## Covered Entity Type Individuals Affected
## 3 1
## Breach Submission Date Type of Breach
## 0 1
## Location of Breached Information Business Associate Present
## 0 0
## Web Description status
## 742 0
2a. Removing Missing Values
2b. Remove Duplicates
Name of the Covered Entity: Organization responsible for the PHI Name of the covered entity (Organization responsible for the PHI) State (US State where the breach was reported) Covered Entity Type (Type of organization responsible for the PHI) Individuals Affected (Number of records affected by the breach) Breach submission date (Date the breach was reported by the CE) Type of breach (how unauthorized access to the PHI was obtained) Location of breached information (Where was the PHI when unauthorized access was obtained) Business associate present (Was a business associate such as a consultant or contractor involved in the breach) Web description (A optional statement explaining what happened and the resolution) Status: 1 if currently under investigation, 0 if not currently under investigation Year: Year of the Reported Breach
NA = Data that is missing
There are a total of 2,402 observations in this data.
Number of Individuals Affected Summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 500 988 2286 76680 7920 78800000
Minimum Number of Individuals Affected: 500 Average Number of Individuals Affected: 76,680 Max Number of Individuals Affected:78,800,000
Number of Breaches that are either under investigaton (1) or not (0)
##
## 0 1
## 2004 398
The average (mean) amount of Number of Individuals Affected by Type of Breach
## # A tibble: 8 x 2
## `Type of Breach` mean
## <fct> <dbl>
## 1 Hacking/IT Incident 276568.
## 2 Improper Disposal 17784.
## 3 Loss 51767.
## 4 Multiple 12846.
## 5 Other 12268.
## 6 Theft 24645.
## 7 Unauthorized Access/Disclosure 13215.
## 8 Unknown 191669.
Highest mean of Individuals Affected by Type of Breach was Hacking/IT Incident.
## Question 2: “Average Healthcare Data Breach Size by Year” (with the top 5% of outliers omitted)
| Name of Covered Entity | State | Individuals Affected | Status | Year |
|---|---|---|---|---|
| University of California, Los Angeles Health | CA | 4500000 | 0 | 2015 |
| Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group | IL | 4029530 | 0 | 2013 |
| Banner Health | AZ | 3620000 | 0 | 2016 |
| 21st Century Oncology | FL | 2213597 | 0 | 2016 |
| The Nemours Foundation | FL | 1055489 | 0 | 2011 |
| Sutter Medical Foundation | AL | 943434 | 0 | 2011 |
| Valley Anesthesiology Consultants, Inc. d/b/a Valley Anesthesiology and Pain Consultants | AZ | 882590 | 0 | 2016 |
| County of Los Angeles Departments of Health and Mental Health | CA | 749017 | 0 | 2016 |
| AHMC Healthcare Inc. and affiliated Hospitals | CA | 729000 | 0 | 2013 |
| Commonwealth Health Corporation | KY | 697800 | 0 | 2017 |
| Covered Entity Type | Number of Breaches |
|---|---|
| Business Associate | 342 |
| Health Plan | 317 |
| Healthcare Clearing House | 4 |
| Healthcare Provider | 1736 |
| NA | 3 |
Many individuals were affected by a theft, regardless if a business associate is present or not. Although, each type of branch has more individuals that are affected when there is no business associates, while there are less individuals affected in which there is a prescence of a business associate.
| State | Number Affected | Current.Cases |
|---|---|---|
| AK | 75785 | 2 |
| AL | 1136366 | 3 |
| AR | 488643 | 6 |
| AZ | 4792005 | 7 |
| CA | 9950736 | 30 |
| CO | 283189 | 10 |
| CT | 310031 | 5 |
| DC | 40441 | 1 |
| DE | 49638 | 2 |
| FL | 6768288 | 18 |
| GA | 3021254 | 10 |
| HI | 54462 | 1 |
| IA | 1518108 | 6 |
| ID | 19786 | 1 |
| IL | 4847583 | 22 |
| IN | 84080000 | 13 |
| KS | 230376 | 9 |
| KY | 1042879 | 7 |
| LA | 160865 | 2 |
| MA | 405578 | 18 |
| MD | 2777562 | 7 |
| ME | 10063 | 1 |
| MI | 998365 | 20 |
| MN | 390000 | 10 |
| MO | 821007 | 17 |
| MS | 151580 | 2 |
| MT | 1174195 | 2 |
| NC | 518155 | 7 |
| ND | 17515 | 2 |
| NE | 194406 | 9 |
| NH | 256420 | 2 |
| NJ | 3263956 | 15 |
| NM | 76867 | 4 |
| NV | 117285 | 5 |
| NY | 17148970 | 20 |
| OH | 859262 | 11 |
| OK | 632455 | 2 |
| OR | 420707 | 8 |
| PA | 1850138 | 10 |
| PR | 1704916 | 0 |
| RI | 103750 | 4 |
| SC | 765657 | 0 |
| SD | 35640 | 1 |
| TN | 6960836 | 11 |
| TX | 4630630 | 23 |
| UT | 895632 | 4 |
| VA | 5926242 | 5 |
| VT | 6806 | 1 |
| WA | 11774508 | 5 |
| WI | 254458 | 14 |
| WV | 82653 | 0 |
| WY | 48532 | 2 |
| NA | 39426 | 1 |
Indiana has by far the most people that are affected by current cases, due to the fact that they have the highest number of people affected in any current case (The Anthem, Inc. Affiliated Covered Entity has reported that about 78800000 individuals were affected by this case). Ususally the bigger the state, the more the number of individuals affected, and vice versa in this case.
Individuals were affected by Paper/Films the most, in terms of the Location of breached Information. Additionally, regardless of the location of breached information, not having a Business Associate present affected more individuals than a business associate being present.