Introduction

1.1-1.4)

The purpose of this document is to store information regarding the disclosure (breech) of protected health information (PHI) as recorded by the US Department of Health and Human services (HHS).The data is currently being stored in two seperate .csv files, one that holds breach reports where an HHS investigation has been completed, and one document that holds all breach reports where an HHS is currently being investigated.By analyzing this data, I hope to analyze trends around these investigations in order to better what variables are common with breaches. By doing this, I can help the US Department of Heath and Human Services be aware of these trends so they can better prevent breaches in the future.

Packages Required

2.1)

All packages used are loaded upfront.

2.2)

Messages and warnings resulting from loading the packages are suppressed.

2.3)

The following are the packages intended for use for this analysis:

dplry- allows for conclusions about the data to be made by assisting with dataset manipulation in a manner similar to SQL.

tidyverse- allows for the usage of ggplot, which can be used to create visualizations and graphs.

rmarkdown and knitr- allows for reporting.

DT and data.table- allows for interactive data table visualization tool.

sqldf- allows for use of sql

Data Preparation

3.1)

The csv files have been imported from the web host source OneDrive. The file containing all breach reports where an HHS investigation has been completed has been given the name ‘complete investigations’, while the file containing all breach reports where the HHS is currently still investigating has been given the name ‘ongoing_investigations’.

3.2)

After importing the two datasets as seperate .csv files, the two datasets will now be combined into one dataset using the rbind function. A new variable has also been added to indicate if the row of data is from a completed investigation, or an ongoing one.If the value for that column is NA in a given row in the Web.Description column, then the investigation is ongoing and is listed as ‘TRUE’ in column ‘is_ongoing’. Otherwise, the investigation is completed and is listed as ‘FALSE’ in the column ‘is_ongoing’.

3.4)

The following shows the number of observations for each column, and gives a general overview of the data set. This is being moved before the General Data Cleaning portion so that we can understand the data as whole before changes are made.

##                      Name.of.Covered.Entity     State     
##  Walgreen Co.                   :  10       CA     : 282  
##  Henry Ford Health System       :   6       TX     : 201  
##  StayWell Health Management, LLC:   6       FL     : 162  
##  Aetna Inc.                     :   4       NY     : 136  
##  Children's Mercy Hospital      :   4       IL     : 120  
##  Clearpoint Design, Inc.        :   4       PA     :  84  
##  (Other)                        :2421       (Other):1470  
##                 Covered.Entity.Type Individuals.Affected Breach.Submission.Date
##                           :   3     Min.   :     500     2/26/2016 :   9       
##  Business Associate       : 355     1st Qu.:     981     4/2/2015  :   8       
##  Health Plan              : 326     Median :    2262     11/29/2012:   7       
##  Healthcare Clearing House:   4     Mean   :  109117     12/6/2013 :   7       
##  Healthcare Provider      :1767     3rd Qu.:    7796     4/25/2014 :   7       
##                                     Max.   :78800000     11/5/2012 :   6       
##                                     NA's   :1            (Other)   :2411       
##                         Type.of.Breach Location.of.Breached.Information
##  Theft                         :843    Paper/Films     :531            
##  Unauthorized Access/Disclosure:686    Network Server  :395            
##  Hacking/IT Incident           :527    Laptop          :323            
##  Loss                          :154    Email           :309            
##  Other                         : 76    Other           :208            
##  Improper Disposal             : 71    Desktop Computer:161            
##  (Other)                       : 98    (Other)         :528            
##  Business.Associate.Present
##  No :1956                  
##  Yes: 499                  
##                            
##                            
##                            
##                            
##                            
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Web.Description 
##  \\N                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                : 532  
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     : 336  
##  Triple-S Management Corporation (“TRIPLE-S”), on behalf of its wholly owned subsidiaries, Triple-S Salud Inc., Triple-C Inc. and Triple-S Advantage Inc. , formerly known as American Health Medicare Inc., has agreed to settle potential violations of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy and Security Rules with the U.S. Department of Health and Human Services, Office for Civil Rights (OCR).  TRIPLE-S will pay $3.5 million and will adopt a robust corrective action plan to correct deficiencies in its HIPAA compliance program, an effort it has already begun.\n\n“OCR remains committed to strong enforcement of the HIPAA Rules,” said OCR Director Jocelyn Samuels. “This case sends an important message for HIPAA Covered Entities not only about compliance with the requirements of the Security Rule, including risk analysis, but compliance with the requirements of the Privacy Rule, including those addressing business associate agreements and the minimum necessary use of protected health information.”\n\nTRIPLE-S is an insurance holding company based in San Juan, Puerto Rico, which offers a wide range of insurance products and services to residents of Puerto Rico through its subsidiaries.  TRIPLE-S has fully cooperated with HHS in investigating this case and has agreed to put in place a comprehensive HIPAA compliance program as a condition for settlement.  \n\nAfter receiving multiple breach notifications from TRIPLE-S involving unsecured protected health information (PHI), OCR initiated investigations to ascertain the entities’ compliance with HIPAA Rules. OCR’s investigations indicated widespread non-compliance throughout the various subsidiaries of Triple-S, including:\n\nFailure to implement appropriate administrative, physical, and technical safeguards to protect the privacy of its beneficiaries’ PHI;\nImpermissible disclosure of its beneficiaries’ PHI to an outside vendor with which it did not have an appropriate business associate agreement;\nUse or Disclosure of more PHI than was necessary to carry out mailings;\nFailure to conduct an accurate and thorough risk analysis that incorporates all IT equipment, applications, and data systems utilizing ePHI; and\nFailure to implement security measures sufficient to reduce the risks and vulnerabilities to its ePHI to a reasonable and appropriate level.\nThe settlement requires TRIPLE-S to establish a comprehensive compliance program designed to protect the security, confidentiality, and integrity of the personal information it collects from its beneficiaries, that includes:\n\nA risk analysis and a risk management plan;\nA process to evaluate and address any environmental or operational changes that affect the security of the ePHI it holds;\nPolicies and procedures to facilitate compliance with requirements of the HIPAA Rules; and\nA training program covering the requirements of the Privacy, Security, and Breach Notification Rules, intended to be used for all members of the workforce and business associates providing services on TRIPLE-S premises.\nTriple-S, with the help of OCR through its technical assistance, had already begun to take extensive corrective action, as required by the Corrective Action Plan, and will continue to work with OCR to come into compliance with HIPAA.\n\n“Triple-S is committed to protecting the privacy and security of its beneficiaries’ health information and implementing the Corrective Action Plan entered into with OCR,” said President and CEO of Triple-S Management Corporation, Ramon M. Ruiz.  “We are pleased with the agreement and regard it as an opportunity to strengthen our privacy policies. We have appreciated OCR’s technical assistance to date, and look forward to our collaboration in the future.”\n:   3  
##  A bag containing a compact disk - read only memory (CD-ROM) was stolen from the vehicle of a physician associated with the covered entity (CE).  The CD-ROM involved in the breach contained names, dates of birth, social security numbers, medical histories, and the treatment information of approximately 2,046 individuals.  Following the breach, the CE filed a police report and provided breach notification to affected individuals, HHS, and the media.  The CE sanctioned and retrained the physician whose bag was stolen and implemented organization wide improvements to its compliance with the Privacy and Security Rules.  As a result of OCR's investigation the covered entity posted substitute notification of the breach in the local paper and confirmed that corrective actions steps were taken. \n\\\n\\\n\\                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          :   2  
##  Advocate Health Care Network (Advocate) has agreed to a settlement with the U.S. Department of Health and Human Services, Office for Civil Rights (OCR), for multiple potential violations of the Health Insurance Portability and Accountability Act (HIPAA) involving electronic protected health information (ePHI). Advocate has agreed to pay a settlement amount of $5.55 million and adopt a corrective action plan.  This significant settlement, the largest to-date against a single entity, is a result of the extent and duration of the alleged noncompliance (dating back to the inception of the Security Rule in some instances), the involvement of the State Attorney General in a corresponding investigation, and the large number of individuals whose information was affected by Advocate, one of the largest health systems in the country. \n“We hope this settlement sends a strong message to covered entities that they must engage in a comprehensive risk analysis and risk management to ensure that individuals’ ePHI is secure,” said OCR Director Jocelyn Samuels. “This includes implementing physical, technical, and administrative security measures sufficient to reduce the risks to ePHI in all physical locations and on all portable devices to a reasonable and appropriate level.”\nOCR began its investigation in 2013, when Advocate submitted three breach notification reports pertaining to separate and distinct incidents involving its subsidiary, Advocate Medical Group ("AMG"). The combined breaches affected the ePHI of approximately 4 million individuals.  The ePHI included demographic information, clinical information, health insurance information, patient names, addresses, credit card numbers and their expiration dates, and dates of birth. OCR’s investigations into these incidents revealed that Advocate failed to:\n•conduct an accurate and thorough assessment of the potential risks and vulnerabilities to all of its ePHI;\n•implement policies and procedures and facility access controls to limit physical access to the electronic information systems housed within a large data support center;\n•obtain satisfactory assurances in the form of a written business associate contract that its business associate would appropriately safeguard all ePHI in its possession; and\n•reasonably safeguard an unencrypted laptop when left in an unlocked vehicle overnight.\nAdvocate Health Care Network is the largest fully-integrated health care system in Illinois, with more than 250 treatment locations, including ten acute-care hospitals and two integrated children's hospitals. Its subsidiary, AMG, is a nonprofit physician-led medical group that provides primary care, medical imaging, outpatient and specialty services throughout the Chicago area and in Bloomington-Normal, Illinois.\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             :   2  
##  (Other)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            :1174  
##  NA's                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               : 406  
##   is_ongoing       
##  Length:2455       
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 
3.3)

General Data Cleaning

Missing Values The code below lists the number of NA values as 407. 406 of these values are accounted in the Web.Description column, as it indicates below. This is alright for our analysis, as it shows which rows are origionally from which document, as mentioned above.

## [1] 407
##           Name.of.Covered.Entity                            State 
##                                0                                0 
##              Covered.Entity.Type             Individuals.Affected 
##                                0                                1 
##           Breach.Submission.Date                   Type.of.Breach 
##                                0                                0 
## Location.of.Breached.Information       Business.Associate.Present 
##                                0                                0 
##                  Web.Description                       is_ongoing 
##                              406                                0

Obvious Duplicates One duplicate value was removed from the data.

Type of breach I created a new column for each of the Breach Type options that lists a number if that column contains that type of breach, or NA if that type of breach is not listed. Hacking/It Incident is 1, Improper Disposal is 2, loss is 3, theft is 4, unauthorized access/disclosure is 5, unknown is 6 and other is 7. A new column is then added that combines these values, so that we can see them together

Location of breach I created a new column for each of the Breach location options that lists a number if that column contains that type of breach, or Na if that type of breach is not listed. Desktop computer is one, electronic medical record is 2, email is 3, laptop is 4, network server is 5, other portable electronic device is 6, paper/films is 7, and other is 8.

3.5)

The following is a Data Table that can be manipulated to search, sort, and filter the data.

3.6)

As shown in the ’missing values’portion above, we have very few missing values that would lead to any concern for any specific variables. Therefore, we are going to move forward with our analysis as planned; if any issues are brought forth to my attention, they will be posted here.

Required Data Analysis

4.1)

Visual 1 “Number of Reported Breaches” (with the top 5% of outliers omitted)

The first thing I want to do is create a new column that only lists the year of the breach. This will contain the x variable for our visualization. I will also make a new column that changes the format of the name of Individuals.Affect.

Next, I want to create a filter that removes the top 5% of outliers omitted and produce a visualization.

Visual 2 Average Healthcare Data Breach Size by Year" (with the top 5% of ouliers omitted)

First, I want to find the cutoff for data so that 5% of outliers are not included.After calculating the standard deviation and mean, I determined that there were no lower outliers (it would have been a negative number), but that 95% of the data was less than 4675593.

## [1] 2283702
## [1] 109160.9

Last, I filter out the data so that only values of less than 4675593 are shown. Using an ylim, only breaches under 4675593 are shown.

Visual 3 Largest healthcare data breaches (including all breaches under investigation in 2017-2018). I included all of the breaches where over a million people were affected, or took place between 2017 and 2018.

Visual 4 Chart: “Hacking/It Incidents by year”

## # A tibble: 541 x 28
## # Groups:   Year [9]
##    Name.of.Covered… State Covered.Entity.… Individuals.Aff… Breach.Submissi…
##    <fct>            <fct> <fct>                       <int> <fct>           
##  1 "Kelley Imaging… WA    Business Associ…              627 6/13/2018       
##  2 "Care Partners … OR    Healthcare Prov…              600 5/25/2018       
##  3 "Elmcroft Senio… TX    Healthcare Prov…            10000 5/21/2018       
##  4 "Billings Clini… MT    Healthcare Prov…              949 4/27/2018       
##  5 "Prestera Cente… WV    Healthcare Prov…              670 3/20/2018       
##  6 "Serene Sedatio… MD    Healthcare Prov…             5207 3/14/2018       
##  7 "University of … VA    Healthcare Prov…             1882 2/21/2018       
##  8 "Jemison Intern… AL    Health Plan                  6550 2/16/2018       
##  9 "Ron's Pharmacy… CA    Healthcare Prov…             6781 2/2/2018        
## 10 "Robert Smith D… TN    Healthcare Prov…             1500 1/22/2018       
## # … with 531 more rows, and 23 more variables: Type.of.Breach <fct>,
## #   Location.of.Breached.Information <fct>, Business.Associate.Present <fct>,
## #   Web.Description <fct>, is_ongoing <chr>, hacking <chr>, improper <chr>,
## #   loss <chr>, theft <chr>, unauthorized <chr>, unknown <chr>, other <chr>,
## #   type.code <chr>, desktop <chr>, medical <chr>, email <chr>, laptop <chr>,
## #   network <chr>, portable <chr>, paper <chr>, location.code <chr>,
## #   Year <chr>, Affected <int>

Visual 5 Breaches by Entity Type

## # A tibble: 2,454 x 29
## # Groups:   total_investigations$type.code [31]
##    Name.of.Covered… State Covered.Entity.… Individuals.Aff… Breach.Submissi…
##    <fct>            <fct> <fct>                       <int> <fct>           
##  1 Kelley Imaging … WA    Business Associ…              627 6/13/2018       
##  2 Dino-Peds        CO    Healthcare Prov…             1357 5/30/2018       
##  3 Care Partners H… OR    Healthcare Prov…              600 5/25/2018       
##  4 Elmcroft Senior… TX    Healthcare Prov…            10000 5/21/2018       
##  5 UT Physicians    TX    Healthcare Prov…             2793 5/18/2018       
##  6 New York City H… NY    Health Plan                  2078 5/11/2018       
##  7 Billings Clinic  MT    Healthcare Prov…              949 4/27/2018       
##  8 MAXIMUS, Inc. /… VA    Business Associ…             3029 4/17/2018       
##  9 Chesapeake Regi… VA    Healthcare Prov…             2100 4/6/2018        
## 10 West Kendall Ba… FL    Healthcare Prov…             1480 4/2/2018        
## # … with 2,444 more rows, and 24 more variables: Type.of.Breach <fct>,
## #   Location.of.Breached.Information <fct>, Business.Associate.Present <fct>,
## #   Web.Description <fct>, is_ongoing <chr>, hacking <chr>, improper <chr>,
## #   loss <chr>, theft <chr>, unauthorized <chr>, unknown <chr>, other <chr>,
## #   type.code <chr>, desktop <chr>, medical <chr>, email <chr>, laptop <chr>,
## #   network <chr>, portable <chr>, paper <chr>, location.code <chr>,
## #   Year <chr>, Affected <int>, `total_investigations$type.code` <chr>
4.2)

Question 1 On what day of the week (Sunday, Monday, etc.) are breaches more often reported? First, I made a new column that found the day of the week that associated with each date.

Then, I created a bar chart that shows the number of breachs by each day of the year. From the visual below, it appears that Friday is the day when most breaches occur.

Question 2 "How has the type of breach (hacking, improper disposal, etc.) changed for each year? For example, are hacking / IT Incidents more precalent in 2016 than they were in 2010?

I have included the code that I would like to use for this visualization in my file, but for some reason, it won’t run. I don’t get an error, but the visualization never appears; however, I thought I would walk through my process. First, I used the new column that I made with the individual type codes as the factor that I faceted my graph by. I then included this in a geom_dotplot graph that used Year as the x axis. The goal was to have the count of investigations by year seperated in graphs individually, with the type codes being the key for each individual

5)

Question 1 Do Breaches effect more people when a Business Associates is present? The reason why this is important is that it may indicate whether or not human error plays a role in the size of a breach. If more breaches tend to occur with an associate present, then it may indicate that more in person training is required to help midigate these issues. If not, than technical errors are more likley the culprit.

Based on the visuals above, it appears that the fast majority by both count and size of breach is done without a person present.

Question 2 Next, I thought it might be fun to use a filter to slice the data, so that I can compare the number of breaches in Ohio (Go Xavier!) to my home state of Washington.

Based on the output it looks like Washington actually has less breaches than Ohio, which suprised me. Since Washington is known for having big tech companies, I just assumed that they would likely have more breaches. It might be interesting to do some research as to why that is in the future.