Analysis Report Two - Data, Data Everywhere

Author

Tristan Worlock

dbListTables(mydb)
 [1] "ADMISSIONS"         "CALLOUT"            "CAREGIVERS"        
 [4] "CHARTEVENTS"        "CPTEVENTS"          "DATETIMEEVENTS"    
 [7] "DIAGNOSES_ICD"      "DRGCODES"           "D_CPT"             
[10] "D_ICD_DIAGNOSES"    "D_ICD_PROCEDURES"   "D_ITEMS"           
[13] "D_LABITEMS"         "ICUSTAYS"           "INPUTEVENTS_CV"    
[16] "INPUTEVENTS_MV"     "LABEVENTS"          "MICROBIOLOGYEVENTS"
[19] "NOTEEVENTS"         "OUTPUTEVENTS"       "PATIENTS"          
[22] "PRESCRIPTIONS"      "PROCEDUREEVENTS_MV" "PROCEDURES_ICD"    
[25] "SERVICES"           "TRANSFERS"         

Executive Summary

The themes and findings that I have uncovered from the lessons that I have learned this week, is that healthcare data is scattered in plethora’s. Inner Joining on posit.cloud using the MIMIC III database helps to navigate all of this data more efficiently.

Additionally, graphing this data can be a pain too, so organizing graphs based off of color codes for genders, and using violin graphs or a box-plot, will help to have a aid visually for healthcare workers to see how data works in different instances.

The data that I used and graphed showed inefficiencies of length-of-stays from a insurance background and admissions-type background. Both of these had some exrtreme outliers.

Connecting the data to the readings, health data professionals need to start utilizing automated AI platforms to record data (with a clinicians oversight) and transfer data. This will discover bottlenecks of these length-of-stays, and help facility care plans, preparing for ICU emergency shortages and predicting them more accurately, and will overall create more patient quality time with doctors, as well as more patient foot-traffic to boost revenue.

Introduction

An readings that were assigned to us this week highlighted how clinical decision support systems or CDSS and other technologies that are emerging within the world such as AI, wearable devices, and digital health platforms, can help patients gain a better perspective on what their overall health is, and even gain a hypotheses as to what some issues they are having are before even going to get treatment. While all the articles approach the topic of data in Healthcare drives decisions, all of them emphasize the growing role of data-driven decision making and the need to balance technological innovation with human clinical judgment.

The first thing I wanted to do was gain a better sense of what AI has looked like in healthcare. It is no secret that AI has slowly started to takeover certain functions in our everyday lives, so I wanted to gain a better overall perspective. I read a general literature review by Al Kuwaiti, Ahmed and Nazer, Khalid and Al-Reedy, Abdullah and Al-Shehri, Shaher and Al-Muhanna, Afnan and Subbarayalu, Arun Vijay and Al Muhanna, Dhoha and Al-Muhanna, Fahad A; that was called; “A Review of the Role of Artificial Intelligence in Healthcare”. Healthcare is at a crossroads right now, as it faces an exponential cost development that have far outpaced GDP growth rates to support health system sustainability (al2023review?). A key concept that I picked up throughout the article is the shift in healthcare toward digital and data-driven healthcare (al2023review?). AI, machine learning, wearable devices, telemedicine, electronic health records, and other big data analytics are creating a more connected healthcare ecossytem. Together, these technologies support the 4 p’s of medicine, which are predictive, preventive, personalized, and participatory care. All of these allow healthcare oreganizations to move from reactive treatment toward proactive health management (al2023review?).

Going to the first article that was given to us in moodle, “An Overview of Clinical Decision Support Systems: Benefits, Risks, and Strategies for Success” by Sutton et al. (2020), provided us with a framework to help understand CDSS. The article defines CDSS as computerized tools that are designed to support clinicians by combining patient information with medical knowledge to improve decision-making Sutton et al. (2020). A concept that is presented in the article is the differences between knowledge based systems and non-knowledge-based systems Sutton et al. (2020). Knowledge-based CDSS rely on predefined clinical rules and guidelines, while non-knowledge based systems use artificial intelligence and machine learning algorithms to identify patterns and generate recommendations from large datasets Sutton et al. (2020). A big argument that stands out of that article is that CDSS can improve patient safety, clinical management, operational efficiency, and healthcare quality Sutton et al. (2020). All of these include examples that reduce medication errors through drug interaction alerts, improving adherence to clinical guidelines, enhancing diagnostic accuracy, and reducing unnecessary healthcare costs Sutton et al. (2020).

The next article that we were given; “Big Data Analytics in Healthcare, Potential and Promise” by Raghupathi, Wullianallur and Raghupathi, Viju; showed an overview of massive volumes of healthcare data and how it can be leveraged (raghupathi2014big?). With having this data, the article suggests that data from electronic health records, medical imaging, sensor data, claims information, and social media can all be used to generate actionable insights that improve healthcare quality while reducing costs (raghupathi2014big?). The article overlooks the 4 V’s which is Volume, Velocity, Variety, and Veracity; which all are core values of health data (raghupathi2014big?). Overall, big data analytics has the potential to transform healthcare by delivering higher-quality, data-driven care at lower cost. While the benefits are substantial, organizations must address data, technology, and governance challenges to fully realize the value.

From a Health Care Data Systems perspective, these readings all reinforce the importance of designing systems that are accurate, interoperable, user-friendly, and patient-centered. The future of healthcare will likely depende on a organizations’ ability to integrate advanced technologies while preserving the human relationships and clinical judgment that remain central to quality patient care.

The Healthcare Context

Technology connects so many different parts of a healthcare organization. Hospitals, clinics, pharamcies, laboratories, and billing departments-through systems like Electronic Health Records. This overall integration allows healthcare professionals to access and share patient information quickly, which also improves coordination and decision-making for patient care. The advantages of tech in health care is from better communication between providers, faster access to patient records, and improved patient safety by reducing errors and duplicate tests. Some risks are obviously the privacy and safety concerns, including data breaches and higher costs of implementation and regular maintenance.

Overall, Technology that helps implement data will have its ups and downs. Integrated data provides a complete view of a patient’s health history, helping clinicians make better decisions, detect diseases earlier, personalize treatments, and monitor patient outcomes more effectively. However, poor-quality, outdated, or inaccurate data can lead to incorrect diagnoses or treatments. Information can get overloaded quickly and may make it difficult for clinicians to identify the most important details, potentially affecting patient care.

Data Visualizations

Include the code and graphs for your two visualizations here (One using a two-table join, another a three-table join). You should describe your data and the visualization, together with an explanation of why the example is relevant for healthcare organizations. Make sure you utilize the new geoms introduced this week:

SELECT admissions.subject_id, insurance, CAST(julianday(dischtime) - julianday(admittime) AS REAL) AS length_of_stay
FROM admissions
INNER JOIN patients
ON patients.subject_id = admissions.subject_id
WHERE insurance IS NOT NULL
ggplot(data = myquery10, 
       aes(x = insurance, y = length_of_stay)) + 
  geom_boxplot()

VISUALIZATION 1: Explanation:

I created this visualization and code to show the admissions data that is crossed with billing data from insurance, and how hospitals would generally use this to reveal operational patterns in hosptials use for resource planning and cost analysis.

I used ‘julianday’ to convert the two date columns into nummbers, so subtracting them gives a duration in days, which is the same logic as the ‘cast’ trick (I believe) from the practice assignment. Just applied to dates instead of integers.

The Boxplot fits because the length opf stay is continuous/quantitative data, and I am comparing it acorss a qualitative grouping like ‘insurance’, which is exactly the use case the reading had described me to do.

Data Readings: One big thing that stands out from this data is the outline near 120 days from private insurance, which is a massive extended stay that pulls attention away from most of the data. Medicare shows two moderate outliers around 35-40 days, but also has a very tight median number. Medicaid and private insurnace have similar median stays, despite having very different patient populations.

VISUALIZATION 2: THREE JOIN TABLE:

SELECT patients.subject_id, admission_type, CAST(julianday(icustays.outtime) - julianday(icustays.intime) AS REAL) AS icu_los
FROM patients
INNER JOIN admissions
ON patients.subject_id = admissions.subject_id
INNER JOIN icustays
ON patients.subject_id = icustays.subject_id
ggplot(data = myquery11,
       aes(x = admission_type, y = icu_los)) +
geom_boxplot()

VISUALIZATION #2: EXPLANATION:

I have used this visualization to show a different set of data that is vital to an efficient hospital. ICU bed overturn rates are a pain-point to some higher foot-traffic hospitals, so understanding who is coming in and out for the ICU, will help to turn-over patient care quicker while still being safe. If emergency admissions show longer or more variable ICU stays than elective ones, that is a direct input into staffing and bed-capacity forecasting.

I reused the date-math trick from the length-of-stay query and applied it to intime/outtime from icu_stays instead of the admittime/dischtime from admissions. So now I am measuring time in the icu specifically, not the while hosptial stay. The three table join chains ‘patients’, ‘admissions’, and ‘icu_stays’, all linked by subject_id, which is exactly like the Double Join example in the practice.

The boxplot once again fits well here since ‘admission_type’ typically has few categories and I wanted a clean read on the median spread of differences between them. Emergency admissions for example often show very different ICU duration patterns than elective ones.

Data Reading: Emergency admissions show the furthest wide-spread data, with numerous high outliers stretching up to 30 days or more. Median ICU stay looks short however under the emergency admissions. Elective and Urgent admissions are tightly clustered near zero, which suggests that these are typically short, predictable stays. The Urgent category looks very flat, which ususally means that there is a very small sample size for that group rather than a true finding.

Recommendations for Industry

Based on the ICU length-of-stay and insurance length-of-stay analyses above, there are some operational things that emerge for healthcare administrators.

First thing being that administrators should build emergency-admission capacity models around variability, not averages. Emergency admissions showed a dramatically wider and more right skewed distribution of ICU stays then elective or urgent cases, with multiple long-tail outliers extending past 30 days. Administrators relying on average length-of-stay to forecast ICU bed needs will almost always understaff for emergency surges. I reccomend that capacity planning uses percentile-based models (like 90th percentile stay length) rather than the mean when budgeting for emergency-driven ICU demand. Like “would rather have it and not need it, then to need it and not have it” situation.

The next thing would be to conduct targeted case review of extreme-outlier stays. The single 120 day private-insurance stay and 35-40 day Medicare outliers represent a big disproportionate cost and bed-utilization risk relative to their trends. Rather than treating these as just noise, I recommend a root-cause review for flagging any stay beyond a defined standard to stay for case management audit. All of these cases would point to discharge-planning bottlenecks, post-acute care shortages, or authorization delays that are fixable, from an operations standpoint. It would improve many facilities efficiencies and allow more patients to come through.

References

(article?){raghupathi2014big, title={Big data analytics in healthcare: promise and potential}, author={Raghupathi, Wullianallur and Raghupathi, Viju}, journal={Health information science and systems}, volume={2}, number={1}, pages={3}, year={2014}, publisher={Springer} }

(article?){sutton2020overview, title={An overview of clinical decision support systems: benefits, risks, and strategies for success}, author={Sutton, Reed T and Pincock, David and Baumgart, Daniel C and Sadowski, Daniel C and Fedorak, Richard N and Kroeker, Karen I}, journal={NPJ digital medicine}, volume={3}, number={1}, pages={17}, year={2020}, publisher={Nature Publishing Group UK London}

(article?){gupta2023perspective, title={Perspective of artificial intelligence in healthcare data management: A journey towards precision medicine}, author={Gupta, Nancy Sanjay and Kumar, Pravir}, journal={Computers in biology and medicine}, volume={162}, pages={107051}, year={2023}, publisher={Elsevier} }

(misc?){siegel2024artificial, title={Artificial intelligence and machine learning may resolve health care information overload}, author={Siegel, Mark G and Rossi, Michael J and Lubowitz, James H}, journal={Arthroscopy: The Journal of Arthroscopic & Related Surgery}, volume={40}, number={6}, pages={1721–1723}, year={2024}, publisher={Elsevier} }

(article?){al2023review, title={A review of the role of artificial intelligence in healthcare}, author={Al Kuwaiti, Ahmed and Nazer, Khalid and Al-Reedy, Abdullah and Al-Shehri, Shaher and Al-Muhanna, Afnan and Subbarayalu, Arun Vijay and Al Muhanna, Dhoha and Al-Muhanna, Fahad A}, journal={Journal of personalized medicine}, volume={13}, number={6}, pages={951}, year={2023}, publisher={MDPI} } ::: {#refs} :::

References

Sutton, Reed T, David Pincock, Daniel C Baumgart, Daniel C Sadowski, Richard N Fedorak, and Karen I Kroeker. 2020. “An Overview of Clinical Decision Support Systems: Benefits, Risks, and Strategies for Success.” NPJ Digital Medicine 3 (1): 17.