The purpose of this report was to interpret data regarding healthcare data breach statistics. The data collected consisted of data breach events described by: entity, state, coverage entity, individuals affected, breach submission date, type of breach, location, business associate presence, and web description. The collection of this data and its interpretation allows the telling of a story of data breach incidents and the impact created across various categories. The proposed technique of analysis will include backend data supplied by HIPAA Journal, which will then be cleaned and investigated via summaries, mutations, and multiple visualizations. This would allow for the painting of a picture of the data as it exists today. The following report will examine the data and provide insight to relative data breach incidents described by the numerous variables as listed above. My end goal, with outliers omitted, is to analyze the totality of the breach events and damage caused, as well as provide understanding as to why this is important. The data below articulates a story concerning several reported breach visualizations, including average breach size by year, largest breaches, hacking/ IT incidents by year, and other various tables and charts. The reason as to why this is important is because as depicted below, there is an increased activity for breach events at an annual rate across all fifty states. Also, most breach events, this includes any type, show higher breach event activities during the week, promoting the question – are our business offices safe and secure? To conclude, this analysis reinforces the notion of data security and the importance of proper defense and disposal.
In the following visualizations, you will see boxplot visual testing was utilized to remove outliers for the upcoming and subsequent plots. Boxplot visual testing is a method in which outliers can be removed via a filter once an appropriate range can be determined, thus allowing appropriate analysis.
Notice outliers skewing the data set.
Now we are ready for analysis.
The following is a representation of the number of reported breaches by year with outliers omitted. As a common theme, we assume a breach occurred for every instance an individual was affected. With removing outliers, we can visualize the number of reported breaches between the years 2009 and 2018 without anomalies.
Here we can see the average healthcare breach size by year. We notice there is some variability but for the most part the average number of individuals affected, with outliers omitted, has somewhat of a seasonal trend.
The chart below sheds light on the number of hacking/IT incidents per year. From the chart, in confidence, we should expect to see the year 2018 exceed the prior. The trend is obvious in annual growth of Hacking incidents.
The following chart displays the comparison of total breach frequency and day of the week. This visualization limits/filters the type of breach to the seven valid types listed on the exam.
This visualization of text analysis is paired with an accompanying data table which quantifies the term and frequency used in the web description of the breach event. In comparison to the hacking terminology below we see there are some slight variances in frequencies but nevertheless there are continuity regarding terms in both categories.
This chart to utilized to display the percentage of individuals affected by breach location excluding any outliers.
Relative frequency table of individuals affected by state. Allows a ranking system for states when discussing individuals affected by breach.
The following is a visualization of individuals affected breaches per weekday. The day of the week axis starts with one equaling Sunday and seven representing Saturday in order of: Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, and Saturday.