#The purpose is to investigate data breaches in the US Department of Health and Human Services (HHS) in the Office for Civil Rights #We can analysis the cases that have been completed in by these two departments and also look at the cases that are ongoing #The data used is The company that committed a breach, what type of enitity or industry the company is working in
#This data is completed and currently investaging data breaches in the US heal and Human service office for Civil rights office. It says what the breach type is, what state it happened in, what entity type it is, how many individuals affect, when was the breach summitted, type of breach, location of breached inforamtion, what busienss was associated, and what is the web description.
#By using vizualization, datawrangling, and comparative analysis, we can compare the investigated data to the completed files to find similarity to this these HHS problems. This analysis will let consumers know about data breaches that most people didn’t know of and what to look out for in the future of these crimes.
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.4
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'dbplyr'
## The following objects are masked from 'package:dplyr':
##
## ident, sql
## Loading required package: gsubfn
## Loading required package: proto
## Could not load tcltk. Will use slower R code instead.
## Loading required package: RSQLite
#Data Source
HHS_completed <- read_csv("http://asayanalytics.com/breach_archive_csv")
## Parsed with column specification:
## cols(
## `Name of Covered Entity` = col_character(),
## State = col_character(),
## `Covered Entity Type` = col_character(),
## `Individuals Affected` = col_double(),
## `Breach Submission Date` = col_character(),
## `Type of Breach` = col_character(),
## `Location of Breached Information` = col_character(),
## `Business Associate Present` = col_character(),
## `Web Description` = col_character()
## )
HHS_investigating <- read_csv("https://asayanalytics.com/breach_investigation_csv")
## Parsed with column specification:
## cols(
## `Name of Covered Entity` = col_character(),
## State = col_character(),
## `Covered Entity Type` = col_character(),
## `Individuals Affected` = col_double(),
## `Breach Submission Date` = col_character(),
## `Type of Breach` = col_character(),
## `Location of Breached Information` = col_character(),
## `Business Associate Present` = col_character(),
## `Web Description` = col_logical()
## )
#DataWrangling
#Had to make a data column that told if it was under investigation for the case or not. Then I combined the two dataframes into one since they had the same Header columns. Removed duplicated anthem file as well. Cleaned up both Location Breached and Type of breach to individual columns to get a better representation of the data.
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
#DataDictionary • Name of the covered entity (Organization responsible for the PHI) • State (US State where the breach was reported) • Covered Entity Type (Type of organization responsible for the PHI) • Individuals Affected (Number of records affected by the breach) • Breach submission date (Date the breach was reported by the CE) • Type of breach (how unauthorized access to the PHI was obtained) • Location of breached information (Where was the PHI when unauthorized access was obtained) • Business associate present (Was a business associate such as a consultant or contractor involved in the breach) • Web description (A optional statement explaining what happened and the resolution) • Hackingor IT (was it a Hacking or It breach) • Improperdisposal (was it a Improperdisposal breach) • Loss (was it a Loss breach) • Theft (was it a Theft breach) • Unauthorizedaccessordisclosure (was it a Unauthorize daccess/disclosure breach) • Unknowed (was it a Unknowed breach) • Other (was it a Other breach) • desktop (was it a desktop computer location breach) • electronicmedicalrecord (was it a lelectronic medical record Location breach) • Email (was it a Email location breach) • network (was it a network service location breach) • otherportaleelectronics(was it a other portal electronics location breach) • Paperorfilm (was it a Paper or film location breach) • otherslocations (was it a other location breach)
#4.1 • Chart: “Number of Reported Breaches” (with the top 5% of outliers omitted) • Chart: “Average Healthcare Data Breach Size by Year” (with the top 5% of outliers omitted) • Table: “Largest healthcare data breaches” (including all breaches under investigation in 2017-18) • Chart: “Hacking / IT Incidents by year” • Table: “Breaches by Entity Type”
#I am curious why we had so many breaches between 2013 to 2017. I’m glad to see it went down in 2018, I’m curious to see to see 2019 results would be, hopefully the results continued to decrease.
## # A tibble: 1 x 1
## Percentage
## <dbl>
## 1 2069198.
#I was surpised to see health care still very affect even with when the top 5% of outliers aren’t included. I didn’t realze the degree of Individuals affected.
#Largest healthcare data breaches
| companyname | Department | Complete | Onlyyear | Databreach |
|---|---|---|---|---|
| Anthem, Inc. Affiliated Covered Entity | Health Plan | No | 2015 | 78800000 |
| Premera Blue Cross | Health Plan | No | 2015 | 11000000 |
| Excellus Health Plan, Inc. | Health Plan | No | 2015 | 10000000 |
| Science Applications International Corporation (SA | Business Associate | No | 2011 | 4900000 |
| University of California, Los Angeles Health | Healthcare Provider | No | 2015 | 4500000 |
| Community Health Systems Professional Services Corporations | Business Associate | No | 2014 | 4500000 |
| Community Health Systems Professional Services Corporation | Business Associate | No | 2014 | 4500000 |
| Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group | Healthcare Provider | No | 2013 | 4029530 |
| Medical Informatics Engineering | Business Associate | No | 2015 | 3900000 |
| Banner Health | Healthcare Provider | No | 2016 | 3620000 |
| Newkirk Products, Inc. | Business Associate | No | 2016 | 3466120 |
| 21st Century Oncology | Healthcare Provider | No | 2016 | 2213597 |
| Xerox State Healthcare, LLC | Business Associate | No | 2014 | 2000000 |
| IBM | Business Associate | No | 2011 | 1900000 |
| GRM Information Management Services | Business Associate | No | 2011 | 1700000 |
| AvMed, Inc. | Health Plan | No | 2010 | 1220000 |
| CareFirst BlueCross BlueShield | Health Plan | No | 2015 | 1100000 |
| Montana Department of Public Health & Human Services | Health Plan | No | 2014 | 1062509 |
| The Nemours Foundation | Healthcare Provider | No | 2011 | 1055489 |
| BlueCross BlueShield of Tennessee, Inc. | Health Plan | No | 2010 | 1023209 |
| Sutter Medical Foundation | Healthcare Provider | No | 2011 | 943434 |
| Valley Anesthesiology Consultants, Inc. d/b/a Valley Anesthesiology and Pain Consultants | Healthcare Provider | No | 2016 | 882590 |
| Horizon Healthcare Services, Inc., doing business as Horizon Blue Cross Blue Shield of New Jersey, and its affiliates | Business Associate | No | 2014 | 839711 |
| Iron Mountain Data Products, Inc. (now known as | Business Associate | No | 2010 | 800000 |
| Utah Department of Technology Services | Business Associate | No | 2012 | 780000 |
| # If I was a consumer that has business with these companies, I would be worried. Some of these are very big providers. |
#I this has been some expentionial growth from 2010 - 2017. Luckily it decline in 2018.
#Breaches by Entity Type
| Covered Entity Type | Count |
|---|---|
| Business Associate | 355 |
| Health Plan | 325 |
| Healthcare Clearing House | 4 |
| Healthcare Provider | 1767 |
| #Its crazy to see the wide r | ange in the number of breaches by different entities |
| DayOnly | Count |
|---|---|
| Friday | 767 |
| Thursday | 434 |
| Tuesday | 407 |
| Monday | 394 |
| Wednesday | 384 |
| Saturday | 42 |
| Sunday | 26 |
| #I’m actuall | y not surprised friday had the most breach days, the one day everyone tried to leave or not work has the most breaches. |
| BreachType | Onlyyear | Count |
|---|---|---|
| Hacking/IT Incident | 2010 | 8 |
| Hacking/IT Incident | 2011 | 15 |
| Hacking/IT Incident | 2012 | 10 |
| Hacking/IT Incident | 2013 | 27 |
| Hacking/IT Incident | 2014 | 35 |
| Hacking/IT Incident | 2015 | 56 |
| Hacking/IT Incident | 2016 | 113 |
| Hacking/IT Incident | 2017 | 150 |
| Hacking/IT Incident | 2018 | 112 |
| Improper Disposal | 2010 | 8 |
| Improper Disposal | 2011 | 6 |
| Improper Disposal | 2012 | 7 |
| Improper Disposal | 2013 | 12 |
| Improper Disposal | 2014 | 8 |
| Improper Disposal | 2015 | 6 |
| Improper Disposal | 2016 | 7 |
| Improper Disposal | 2017 | 11 |
| Improper Disposal | 2018 | 6 |
| Loss | 2009 | 1 |
| Loss | 2010 | 14 |
| Loss | 2011 | 15 |
| Loss | 2012 | 17 |
| Loss | 2013 | 20 |
| Loss | 2014 | 20 |
| Loss | 2015 | 24 |
| Loss | 2016 | 16 |
| Loss | 2017 | 16 |
| Loss | 2018 | 11 |
| Other | 2009 | 2 |
| Other | 2010 | 21 |
| Other | 2011 | 2 |
| Other | 2012 | 13 |
| Other | 2013 | 16 |
| Other | 2014 | 22 |
| Theft | 2009 | 15 |
| Theft | 2010 | 130 |
| Theft | 2011 | 114 |
| Theft | 2012 | 122 |
| Theft | 2013 | 119 |
| Theft | 2014 | 112 |
| Theft | 2015 | 80 |
| Theft | 2016 | 62 |
| Theft | 2017 | 56 |
| Theft | 2018 | 33 |
| Unauthorized Access/Disclosure | 2010 | 10 |
| Unauthorized Access/Disclosure | 2011 | 29 |
| Unauthorized Access/Disclosure | 2012 | 28 |
| Unauthorized Access/Disclosure | 2013 | 65 |
| Unauthorized Access/Disclosure | 2014 | 86 |
| Unauthorized Access/Disclosure | 2015 | 102 |
| Unauthorized Access/Disclosure | 2016 | 129 |
| Unauthorized Access/Disclosure | 2017 | 126 |
| Unauthorized Access/Disclosure | 2018 | 111 |
| Unknown | 2011 | 7 |
| Unknown | 2013 | 2 |
| Unknown | 2014 | 1 |
#What company in that is a healthcare provider has been caught on the most wednesdays for breachs in the last 4 years?
## Warning: Removed 1 rows containing missing values (geom_point).
#What is the most common breach out of Theft, Hacking, Impropper Disposal for all states in the United States?
| State | Total_Theft | Total_Hacking | Total_Impropperdispodal | Total_Loss |
|---|---|---|---|---|
| CA | 140 | 39 | 5 | 21 |
| TX | 78 | 52 | 10 | 15 |
| NY | 57 | 22 | 3 | 10 |
| FL | 56 | 31 | 3 | 11 |
| IL | 40 | 28 | 2 | 10 |
| PA | 34 | 15 | 2 | 9 |
| IN | 29 | 19 | 4 | 1 |
| PR | 25 | 0 | 0 | 0 |
| TN | 24 | 14 | 4 | 6 |
| WA | 24 | 14 | 2 | 2 |
| OH | 23 | 11 | 6 | 8 |
| GA | 22 | 21 | 3 | 5 |
| AZ | 20 | 12 | 3 | 6 |
| MA | 20 | 17 | 2 | 11 |
| KY | 19 | 11 | 1 | 3 |
| MI | 19 | 17 | 0 | 9 |
| VA | 19 | 7 | 2 | 4 |
| NJ | 16 | 12 | 2 | 2 |
| OR | 16 | 11 | 0 | 1 |
| CO | 15 | 12 | 2 | 3 |
| CT | 15 | 6 | 0 | 3 |
| NC | 14 | 14 | 2 | 6 |
| AL | 13 | 10 | 1 | 1 |
| MO | 13 | 15 | 4 | 0 |
| NM | 13 | 5 | 0 | 0 |
| LA | 11 | 6 | 1 | 2 |
| MD | 11 | 19 | 0 | 2 |
| MN | 11 | 11 | 3 | 6 |
| WI | 10 | 12 | 0 | 2 |
| OK | 9 | 8 | 1 | 3 |
| RI | 9 | 1 | 0 | 1 |
| SC | 8 | 2 | 4 | 1 |
| KS | 7 | 5 | 0 | 3 |
| NE | 7 | 7 | 1 | 1 |
| NV | 7 | 5 | 1 | 1 |
| DC | 5 | 1 | 0 | 1 |
| UT | 5 | 6 | 2 | 1 |
| AK | 4 | 4 | 0 | 0 |
| AR | 4 | 5 | 0 | 1 |
| MS | 4 | 6 | 1 | 2 |
| MT | 4 | 4 | 0 | 2 |
| WV | 4 | 2 | 1 | 1 |
| NH | 3 | 3 | 0 | 0 |
| VT | 3 | 1 | 0 | 0 |
| ID | 2 | 1 | 0 | 1 |
| ND | 2 | 2 | 2 | 0 |
| IA | 1 | 5 | 1 | 3 |
| ME | 1 | 2 | 0 | 0 |
| SD | 1 | 2 | 0 | 1 |
| WY | 1 | 2 | 0 | 1 |
| NA | 1 | 0 | 0 | 1 |
| DE | 0 | 2 | 0 | 0 |
| HI | 0 | 2 | 0 | 1 |