Analysis Report Two - Data, Data Everywhere

Author

Faheedat Ajagbe

Executive Summary

There is a lot potential in the analytics of big data in healthcare. The article “Big data analytics in Healthcare” describes big data as large volumes of high velocity and complex electronic health data sets that are way too big to be stored traditionally, are synthesized and analyzed, stored through advanced software and/or hardware and could also create challenges when used. Some problems that could be faced when involved in big data analytics are disruption of patient privacy and not having the variety of skills required to use big data analytics tools. In healthcare organizations, large amounts of data “Big Data” are usually generated and are digitized. It’s rapid growth makes healthcare professionals use big data analytics for clinical decision support, to reduce waste and costs, and provide efficiency in recording patient information to have better patient outcomes. (Raghupathi and Raghupathi 2014). Strong data governance, advanced technology infrastructure, and trained professionals who can transform large amounts of information into meaningful healthcare insights are essential for a successful application. While learning the definition of Big Data is, the “4 Vs”, which are the characteristics are covered.

Introduction

Data like electronic patient records (EPRs), clinical documentations or support systems, billing systems, machine sensor data, even social media posts all make up “big data”. Because there is so much data in the healthcare industry, it can be very overwhelming with how fast each data types need to be recorded and processed. The trend that is currently going on is the transition from patient records being paper-based to digitization. This is needed because of, again, the amount of data everywhere, and being able to manage and interpret these complex data. Big data analytics can simply be defined as the processing of large and complex data that could identify patterns, detect diseases at early stages, develop more insightful diagnoses, to be able to give high quality patient care while doing it at a lower cost(Meites et al. 2023). My purpose for this report is to review and find out how big data plays a big role in healthcare delivery, its importance, characteristics, advantages and disadvantages, explain its data visualizations and to give recommendations for the healthcare industry in this context.

The Healthcare Context

The healthcare industry is driven by storing and analyzing information of patients, that is, it is data-driven. And there are so many ways in which healthcare data can be stored. Some sources of how patient information can be obtained are web and social media data; facebook, twitter, LinkedIn or some types of blog, another is machine to machine data; basically readings from monitoring devices for vital signs, big transaction data which are often health care claims, biometric data; patients’ finger prints, x-rays, blood presure and pulse reading, and human-generated data; could be handwritten doctors notes or some sorts of paper documents. Big data analytics consists of four characteristics, which are volume; how data is created and accumulated, velocity; how data is accumulated at a rapid pace and in real-time, variety; refers to being able to perform real-time analytics across all specialties, lastly is veracity, which means data assurance or in other or more words, the big data is free of errors and is credible. There are also lots of significant benefits that can be gained while using big data analytics. For example, doctors or other healthcare professionals being able to use these analytic tool to detect diseased early, especially with diseases that could be life threatening. Big data analytics can support and contribute to the healthcare by reducing the amount of patient re-admissions, providing insightful diagnoses to get better outcomes (Manyika et al. 2011).

Data Visualizations

Visualization One - Two Table Join

SELECT patients.subject_id, patients.dob, admissions.religion
FROM patients
INNER JOIN admissions
ON patients.subject_id = admissions.subject_id
WHERE admissions.religion = 'MUSLIM'
LIMIT 5

This two table join is showing the distribution of patient ages by religion. It clearly shows the connection between two tables; patients and admissions.

ggplot(data = myquery1, 
       aes(x = religion, y = dob)) +
  geom_violin() +
  labs(
    title = "Age Distribution by Religion",
    x = "Religion",
    y = "dob"
  )

Visualization Two - Three Table Join

SELECT patients.subject_id, patients.dob, admissions.religion, icustays.outtime
FROM patients
INNER JOIN admissions
ON patients.subject_id = admissions.subject_id
INNER JOIN icustays
ON patients.subject_id = icustays.subject_id
WHERE admissions.religion = 'MUSLIM'
LIMIT 5

This three table join is showing the lengh of stay in the ICU by religion. Data is retrieved from three seperate tables, which are patients, admissions, and icustays tables.

ggplot(data = myquery1,
       aes(x = religion, y = outtime)) +
  geom_violin() +
  labs(
    title = "ICU Stay Length by Religion",
    x = "Religion",
    y = "ICU Stay (Days)"
  )

Recommendations for Industry

After research and findings, it is clear that there is still so much data in this industry that is neither processed nor stored, and they need to be stored, analyzed and kept away. Which is why its recommended to utilize more big data analytics tools like Hadoop Distributed File System (HDFS), MadReduce, and HBase (Ankam 2016). Utilizing clinical decision support system (CDSS), which is specifically a tool that helps healthcare providers make better decisions by giving them useful medical information, patient details, and healthcare knowledge when they need it. (Sutton et al. 2020) While these tools are used more, new user-friendly tools should also be developed. Another recommendation is to have stronger data security because healthcare organizations should keep patient information safe and that is by improving their security systems. They should create clear privacy rules and follow safety guidelines to protect patient trust. Also healthcare organizations should make simple rules about who can use healthcare data, how it should be kept accurate, and how it should be used in a safe and responsible way.

References

Ankam, Venkat. 2016. Big Data Analytics. Packt Publishing Ltd.
Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, Angela Hung Byers, et al. 2011. “Big Data: The Next Frontier for Innovation, Competition, and Productivity.”
Meites, Elissa, Martha Knuth, Kaely Hall, Patrick Dawson, Teresa W Wang, Marcienne Wright, Wei Yu, et al. 2023. “COVID-19 Scientific Publications from the Centers for Disease Control and Prevention, January 2020–January 2022.” Public Health Reports 138 (2): 241–47.
Raghupathi, Wullianallur, and Viju Raghupathi. 2014. “Big Data Analytics in Healthcare: Promise and Potential.” Health Information Science and Systems 2 (1): 3.
Sutton, Reed T, David Pincock, Daniel C Baumgart, Daniel C Sadowski, Richard N Fedorak, and Karen I Kroeker. 2020. “An Overview of Clinical Decision Support Systems: Benefits, Risks, and Strategies for Success.” NPJ Digital Medicine 3 (1): 17.