Introduction to the Data

The US Departments of Health and Human Services (HHS) in the office for Civil Rights (OCR) is mandated by law to collect and report disclosures of citizens protected health information (PHI). The following tables & various visualizations depict different types analysis regarding the various data breaches and the individuals that were affected during those breaches. The data includes information regarding the Name of the entity, what state the breach occurred in, the various kinds of covered entity types, the number of individuals affected, the date and type of breach, the location of the breached information, if there was a business associate present, and a small web description of each breach.

Summary Statistics

Below is a table that shows a few summary statistics I was interested in looking at. All of the statistics are grouped at the state level.

The Number of Healthcare data breaches by year

This visual shows the number of data breaches per year. You can see that 2013 & 2014 were the two years with the most breaches comparatively to 2009 & 2018 with the least amount of breaches.

The top 25 largest health care data breaches

The table below shows all of the details of the 25 largest health care data breaches. It is arranged with the largest of the top 25 breaches first and going down in descending order. The largest health care breach had 78.8 Million individuals affected.

Total healh care records exposed by state for the top 10 states

The visual shows the top 10 states that had the most health care records exposed. Indiana far surpasses any of the other states with the amount of records exposed.

Number of health care hacking incidents by month

Below you can see the distribution of Hacking incidents by month. March and April had the most Hacking incidents compared to the rest of the months throughout the year.

Number of breaches by covered entity type

The table below shows the how many breaches were by each covered entity type. Health care providers make up the majority of breaches out of the four different covered entity types.

On what day of the week are breaches most often reported?

We wanted to see which days were more common for breaches to occur. The visual below shows the breach distribution by day of the week.

This visualization should show the amount of breaches that happen on each week day. This bar chart (used col tho), shows exactly that. You can see how many breaches occurred on each day of the week, with Friday being the most popular day for breaches and Saturday and Sunday with the the least common day for breaches to occur.

In which year (or years) were there at least 50 breaches from a ‘Business Associate’ covered entity type and at least 150 breaches from a healthcare provider covered entity type?

Below we wanted to see which years had over 50 breaches due to a Business Associate and 150 breaches due to a health care provided. The table below shows the years where this occurred.

This table shows the years that there were at least 50 breaches from a business associate and the years that were at least 150 breaches from a health care provider. This is somewhat surprising that it is only two years both 2013 and 2014. This question is interesting to see, and has the chance to be further explored to look at other covered entity types to see if any other years had a significant amount of breaches.

How has the type of breach (hacking, improper disposal, loss, etc.) changed for each year?

This visual is focused on seeing how the various types of breaches have changed over the years, which ones were more popular or less popular.

This visualization shows the number of breaches per year. The color coded legend, helps put in perspective the different types of breaches that are possible per year and how many actually occurred every year. Theft is pretty consistently one of the largest breach types from 2009 to 2013. In 2013 to 2016 unauthorized access & disclosure breaches are more common as well. Improper disposal is one of the less common breaches every year.

What is distribution of Theft breaches by month?

This would be interesting to look at the breakdown of Thefts by month to see if any specific months are more popular for people to steal health records. I intend to answer this question by using a bar chart to show each month as the X and the amount of Theft breaches as the Y. I will need to group by the month.

The bar chart shows us the distribution theft health care breaches by month. Most of the months are between 100 and 150, with the max being in April at around 160 and the lowest being in June which is right around 125. This is beneficial to look at if you are focused on thefts. I am not sure what theft is specified in this data set (aka someone stealing physical files) but it is interesting to see the distribution by month. This visual supports the question posed above because it clearly shows the distribution of thefts by month, however to break it down even further, you could focus on a specific year or normalize the data so you can see if there were more thefts one year compared to another.

What are the total heath care records (individuals affected) exposed by state for the bottom 15 states?

This would be interesting to see to compare it to the top 10 states which is visualized above. I intend to answer this question by using a bar chart to show each of the lowest 15 states as the X and the amount of health care records exposed as the Y. I will need to group by the data at the state level to see the correct distribution.

The bar chart displays the 15 states with the lowest amount of exposed health care records. This is interesting to look at because, some of this information has to be based off the state’s population, which isn’t in the data however, the lowest 15 states are by far not the most populous in the USA. This graph can be supporting evidence to my question because it distinctly shows the distribution of the exposed health care records. It is also supported by the fact that the largest health care breach was in the millions, and the largest amount exposed for the lowest 15 is around 42,000.