Analysis Report Two - Data, Data Everywhere

Author

James Moore

Executive Summary

This report will discuss how large quantities of data can be used by healthcare organizations, but also how it can cause trouble for them. By using the MIMIC - III database, assigned readings, and external research I will analyze and create some examples with SQL queries and gg plot detailing how large healthcare conglomerates can use data from various different sectors to improve things such as decision making and patient care throughout the organization as a whole. Within the analysis it will go into more detail discussing sepsis admissions and what bacteria’s are the main ones associated with it, along with ICU stays and the length of said stay depending on the type of admission. These two things go hand in hand with each other and will show how having an organized place to store all data is essential as you will see there were various different places that the data came from in these visuals that we had to pull from to get results.

Introduction

The content from this week had a large focus on the challenges healthcare organizations face with the large amount of data they consume. With so much data being created at all times, these organizations often have a difficult time storing it. Through things such as medicine and medical records, laboratory and administrative systems, and electronic records, hospitals re consistently collecting data that can be used to improve care and operations, but also create storage and management struggles. The readings also put an emphasis on the growth things such as artificial intelligence, wearable technology, and predictive analytics and how these things are also contributing to the large volume of data that is being created and collected. Healthcare organizations and their leaders must find ways to use the large amount of data they have at their disposal while making sure it is stored and organized properly so that the proper clinicians and works can easily access th most important and relevant information, and avoid a cluttered mess that will ultimately cause issues and confusion for everyone involved.

The Healthcare Context

We see that in the world of healthcare today in order o be successful leaders and organizations rely on data that comes from many different areas. Things such as admissions, medical and laboratory records, pharmacy systems, and patient monitoring technology are just some of the many things healthcare organizations pull from to get data and information essential to their operation.These things ultimately help a healthcare organization improve patient outcomes, operational efficiency, and cost management (Ajegbile et al. (2024)) Specifically, clinical decision support systems were talked about as a way healthcare organizations are taking the mass amounts of data and turning it into valuable information that provide proper recommendations nd information essential to operations. Research has shown that clinical decision support systems not only help support decision making, but it also improves the patient safety and quality of healthcare(Sutton et al. (2020)). Data privacy and cybersecurity are just a couple of things that are very important, but hard to properly integrate when there is such large amounts of data like many healthcare organizations possess. However, an effective CDSS will not just help manage those issues, but create many opportunities and ideas for the data that healthcare organizations must manage that may not have ever been tought of or proposed without the help of a CDSS.

With technology being such a focal point of the healthcare and important for this topic I dove into one key feature that I think will revolutionize the use of data and how to mange it which is AI or artificial intelligence. AI can be an amazing tool when used properly, so obviously extensive training would be necessary in this case if healthcare leaders want to make it a mainstay. We see however, that AI is already rapidly transforming things in healthcare. It helps enable advances in diagnostics, personalizes medicine, betters treatment planning, and improves operational efficiency (Fahim et al. (2025)). Th overwhelming amount of data that comes through the healthcare industry can cause many issues, but through my research and reading I have noticed there are ways to manage it and educate yourself about and AI seems to be the next big stepping stone to properly manage and store data. The growth and usage is only expected to grow. It is anticipated that AI-driven healthcare technology will go from 11.2 billion in 2023 to 427.5 billion by 2032 (Faiyazuddin et al. (2025)). We know that data integration provides a plethora of benefits, but it also introduces many challenges. By utilizing things such as AI we can see the healthcare industry not only blossom operations wise, but also in terms of the care given and overall patient experience.

Data Visualizations

Visualization One - Two Table Join

SELECT microbiologyevents.org_name
FROM microbiologyevents
INNER JOIN admissions
ON microbiologyevents.hadm_id = admissions.hadm_id
WHERE admissions.diagnosis LIKE "%SEPSIS%"
AND microbiologyevents.org_name is NOT NULL

myquery1_top <- myquery1 %>%
  count(org_name, sort = TRUE) %>%
  slice_head(n = 5)

ggplot(data = myquery1_top,
       aes(y = org_name, x = n)) +
  geom_col()

After going through the practice and learning more about sepsis, I decided to do my first visualization around the admissions and microbiology data that are most commonly found in sepsis patients. I have first hand witnessed a hospital admission from sepsis with my sister, so I felt this would be a perfect visualization to create as it hits home. Healthcare organizations can better understand infection trends and how to properly prepare and make proper decisions regarding how to handle them by using the healthcare data from hospitals and other sources they accumulate these large quantities from. This visualization shows how not just sepsis, but many other diseases can be more proactively attacked and ultimately improve patient care and operations all at the same time. By being able to recognize which bacteria and/or organisms show up the most for certain diseases will ultimately help with planning out treatments for the respective issues, and help better monitor and care for patients that have said disease, and this visualization uses sepsis as just one of many potential examples.

I was able to take both the microbiologyevents table and merge it with the admissions table using hadm_id. By doing this I was able to get specific hospital admission data that directly correlates with certain microbiology results. Then by using the “where” statement in conjunction with “like %sepsis%” I then was able to limit my results to specifically admissions that stemmed from sepsis. Then in R, I was able to obtain and sort the names of each organism from most common to least common. After sorting I was able to get the top five graph them into the visual that you now see above.

Visualization Two - Three Table Join

SELECT patients.subject_id, admission_type,CAST(julianday(icustays.outtime) - julianday(icustays.intime) AS
REAL) AS icu_los
FROM patients
INNER JOIN admissions
ON patients.subject_id = admissions.subject_id
INNER JOIN icustays
ON patients.subject_id = icustays.subject_id

ggplot(data = myquery2,
       aes(x = admission_type, y = icu_los)) +
geom_boxplot()

After diving into sepsis and the specific bacteria that cause admissions, I wanted to take a look at ICU stays as I feel they can go hand in hand together. In the first visual I discussed how that information could be used to elevate quality of patient care and the effective operations in a healthcare organization. ICU turnover rates bring the same kind of value in its own way at doing those things as well. If a healthcare organization is able to take ICU admission data and utilize it properly, in turn they will understand the traffic in which people in the ICU come in, why they com in, and how long each visit typically takes. By taking this data and finding these solutions the organizations will be able to treat patients better by having more knowledge on how to allocate resources, but also it make the process of admission to discharging patents become better an safer.

Since admission types typically carry less categories I felt a box plot fit this scenario perfectly. I took my knowledge from other queries and was able to use it to create an in time and out time for ICU stays specifically, which I wanted after my research and experience with sepsis and how I have seen many ICU stays because of it. I found that elective and urgent admissions are very tight, which tells me that these stays are usually more predictable an no as long. However, the urgent admissions is very flat which indicate not much sample to go off of. On the other hand, the emergency admissions in the box plot lead us to believe that they are often more unpredictable than other admissions.

Recommendations for Industry

After diving into Sepsis and ICU admissions and their length of stays I have come to a few conclusions regarding recommendations for the healthcare industry. My first recommendation is that for the leaders in the healthcare space to be successful and efficient they must invest in creating data systems that will store all the data they gain from every aspect of the organization. From microbiology to ICU stays, these organizations should be systematically putting their data in a place that can make it accessible and able to be used to improve any necessary aspects. From the SQLs provided in this report, we can see that data came from many different places and sources. That should indicate to healthcare leaders the importance of having the data from different aspects of their organizations all place in one area. Having a cohesive data collection that intertwines data from every aspect not only makes things more efficient, but it will help provide patients with more accurate care that also comes faster.

The next recommendation I have is to embrace and train people on how to use AI to help sort through data. As stated before AI in healthcare is only going to keep increasing. If organizations begin embracing that and investing in teaching it, then not only will operation efficiency rise, but decision making and resource allocation will become a lot easier for things such as Which bacteria is cause the most sepsis admissions, or which ICU admissions are most common and have the longest or shortest duration.

Lastly, I would recommend collecting better data and samples around the ICU admissions and stays. In the visual I provided there were areas that either did not have many samples, or the sample provided were very scattered and not predictable. In regards to ICU rooms and stays making sue the data is mapped out and as conclusive as possible is very important as that will help pinpoint exactly how long to expect certain admission types to be in a room for. That in turn helps prepare for other potential ICU patients, and how many an organization can take and still provide great care. If these healthcare organizations do not prioritize that I can see ICU’s becoming very chaotic and hectic very easily, and then the patient care and quality of it will decrease and cause many problems.

References

Ajegbile, Mojeed Dayo, Janet Aderonke Olaboye, Chikwudi Cosmos Maha, and GJWJBPHS Tamunobarafiri. 2024. “Integrating Business Analytics in Healthcare: Enhancing Patient Outcomes Through Data-Driven Decision Making.” World J Biol Pharm Health Sci 19 (1): 243–50.

Fahim, Yosri A, Ibrahim W Hasani, Samer Kabba, and Waleed Mahmoud Ragab. 2025. “Artificial Intelligence in Healthcare and Medicine: Clinical Applications, Therapeutic Advances, and Future Perspectives.” European Journal of Medical Research 30 (1): 848.

Faiyazuddin, Md, Syed Jalal Q Rahman, Gaurav Anand, Reyaz Kausar Siddiqui, Rachana Mehta, Mahalaqua Nazli Khatib, Shilpa Gaidhane, Quazi Syed Zahiruddin, Arif Hussain, and Ranjit Sah. 2025. “The Impact of Artificial Intelligence on Healthcare: A Comprehensive Review of Advancements in Diagnostics, Treatment, and Operational Efficiency.” Health Science Reports 8 (1): e70312.

Sutton, Reed T, David Pincock, Daniel C Baumgart, Daniel C Sadowski, Richard N Fedorak, and Karen I Kroeker. 2020. “An Overview of Clinical Decision Support Systems: Benefits, Risks, and Strategies for Success.” NPJ Digital Medicine 3 (1): 17.