Analysis Report Two - Data, Data Everywhere

Author

Maddy Calandro

Executive Summary

A huge amount of data from wearable technology, lab systems, pharmaceutical systems, insurance records, clinical decision support systems, electronic health records, and patient monitoring tools surrounds healthcare organizations. This offers many opportunities to improve care, but it also leads to problems if the data is inaccurate, overwhelming, incomplete, or poorly handled.

This report’s main concept is that healthcare data can only enhance decision-making through rigorous organization, connection, and interpretation. AI, wearable technology, and clinical decision support systems are examples of new technologies that can assist doctors and patients in making better decisions. But these technologies also add to the volume of data that medical professionals and healthcare institutions have to handle. More data does not always translate into better care without robust data governance and well-designed systems.

I made two data visualizations using the MIMIC-III clinical database to demonstrate how SQL joins may link data from various healthcare tables. In the first graphic, distributions of typical clinical metrics are compared using a two-table join between chartevents and d_items. The second graphic compares heart rate distributions by gender using a three-table join between chartevents, d_items, and patients.

Healthcare administrators generally are recommended to figure out a balance between offensive and defensive data initiatives. Hospitals should use data to enhance clinical judgments, patient care, and operational effectiveness, but they also have to protect data quality, cut down on pointless alerts, educate users, and refrain from overloading doctors with information.

Introduction

Healthcare is going through a huge data revolution, according to the assigned readings. More health information is now available to doctors and patients than ever before. Clinical decision support systems, wearable technology, health applications, artificial intelligence, and electronic health records all open up new avenues for collaborative decision-making. Doctors can utilize clinical tools and medical skills to assess health data that patients bring to appointments from their own devices.

The measurements do, however, also demonstrate that additional data does not always correlate with better care. Patients may find it difficult to tell the difference between correct and inaccurate health information. The ability of doctors to figure out complicated data and relate it to the patient’s actual clinical situation makes them crucial. This supports a collaborative decision-making, where patients and physicians work together instead of the physician simply giving one-way instructions.

Healthcare big data is also challenging because it comes from many sources, including electronic records, clinical decision systems, monitoring devices, insurance records, labs, pharmacy systems, and wearable devices. These data sources can support clinical decision-making, population health management, disease surveillance, and operational improvement (Raghupathi and Raghupathi 2014). Clinical decision support systems are especially important because they combine patient information with clinical knowledge to support clinician decision-making (Sutton et al. 2020).

Newer research also connects directly to this topic. Khosravi et al. explain that AI tools are increasingly being used to support decision-making in healthcare by improving the quality, efficiency, and personalization of care (Khosravi et al. 2024). Elhaddad and Hamam describe AI-driven clinical decision support systems as tools that can improve clinician decision-making, but they also emphasize that these systems still need careful design and implementation (Elhaddad and Hamam 2024).

Overall, the readings show that healthcare organizations need a balanced data strategy. An offensive data strategy uses data to improve outcomes, efficiency, and innovation. A defensive data strategy protects data accuracy, privacy, security, and reliability. In healthcare, both are necessary because data directly affects patient care.

The Healthcare Context

By integrating systems like admissions, patient records, laboratories, pharmacy, ICU monitoring, insurance, billing, and clinical decision support, technology can integrate many parts of a healthcare company. Healthcare professionals can view a fuller picture of the patient and the business when these systems are linked.

There are multiple ways in which this integration can raise the standard of treatment. Connected data, for an example, can help doctors in spotting trends in patients’ vital signs, spotting dangers early, preventing prescription mistakes, and choosing the best course of action. Additionally, it can assist administrators in comprehending population health trends, ICU demand, patient volume, and resource utilization. From the standpoint of offensive data strategy, this enables the company to transform unprocessed data into valuable insights.

Additionally, technology can help doctors and patients make better decisions together. Patients can monitor heart rate, sleep, activity, and other health variables with wearables and health applications. Correct interpretation of this data may motivate patients to take a more active role in their care. To figure out whether the data is clinically useful, doctors are still required. While a health app trend or wearable notification could be helpful, it can also be misleading in lack of medical context.

The risks and challenges are also important. Healthcare data may be biased, inaccurate, redundant, or incomplete. Physicians may begin to disregard warnings if clinical decision support systems generate an excessive number of them. AI systems may produce suggestions that don’t apply to all patient populations if they employ biased or incomplete data. Physicians may feel overburdened rather than assisted if wearable data overflows the electronic medical record.

Strong data governance is necessary for healthcare companies. They require specific requirements for system design, training, accountability, privacy, access, and data quality. If the company ensures that the data is reliable, helpful, and not overwhelming for physicians, data integration can enhance care.

Data Visualizations

The MIMIC-III clinical database is used in the two visualizations below to demonstrate how joining tables increases the usefulness of healthcare data. Important data is typically spread across multiple tables in healthcare databases. For instance, the actual clinical value may be stored in one table, and its meaning may be explained in another.

Both visualizations use SQL INNER JOINS because the goal is to connect related information across tables. The first visualization uses a two-table join to connect clinical measurements with their item labels. The second visualization uses a three-table join to connect clinical measurements, item labels, and patient demographic information.

These examples also use the new geoms from this week’s practice. The first visualization uses a boxplot, which is useful for showing the spread, median, and outliers of numeric clinical data. The second visualization uses a violin plot, which is useful for comparing the distribution of a numeric variable across groups.

Warning

Remember, the practice covers certain specific concepts. Your grade is based on how well you show mastery of these concepts.

Your queries can be loosly based on Practice queries, but they must extend or adapt the practice in interesting ways.

Visualization One - Two Table Join

This first visualization uses a two-table join between the chartevents table and the d_items table. The chartevents table contains the actual recorded clinical values, while the d_items table explains what each itemid means. This join is important because a numeric value by itself does not explain what clinical measurement was taken.

This example compares three common patient measurements: heart rate, respiratory rate, and oxygen saturation. I used CAST() to make sure the values are treated as numbers instead of text. I also filtered out missing and unrealistic values so the visualization would be easier to interpret.

myquery1 <- dbGetQuery(mydb, '
SELECT
  d_items.label,
  CAST(chartevents.valuenum AS INT) AS val
FROM chartevents
INNER JOIN d_items
ON chartevents.itemid = d_items.itemid
WHERE d_items.itemid IN (220045, 220210, 220277)
  AND chartevents.valuenum IS NOT NULL
  AND chartevents.valuenum > 0
LIMIT 10000;
')

head(myquery1)
                        label val
1            Respiratory Rate  15
2                  Heart Rate  94
3 O2 saturation pulseoxymetry 100
4                  Heart Rate  88
5            Respiratory Rate  15
6 O2 saturation pulseoxymetry  99
ggplot(data = myquery1,
       aes(x = label, y = val)) +
  geom_boxplot() +
  theme_minimal() +
  labs(
    title = "Distribution of Selected Patient Vital Signs",
    subtitle = "Two-table join using CHARTEVENTS and D_ITEMS",
    x = "Clinical Measurement",
    y = "Recorded Value",
    caption = "Source: MIMIC-III Clinical Database v1.4"
  )

Health indicators have a significant role in clinical monitoring and decision-making, which makes this representation important. Because it displays the median, distribution, and potential outliers for each statistic, a boxplot is helpful in this situation. This kind of visualization can assist healthcare administrators in determining whether clinical data seems normal or whether there might be anomalous results that require more examination.

Because hospitals require accurate and trustworthy clinical data before utilizing it in dashboards, alerts, or decision support systems, this relates to defensive data strategy. It also relates to offensive data strategy since healthcare executives can use cleaned and reliable data to find patient trends and enhance care.

Visualization Two - Three Table Join

This second visualization uses a three-table join between the chartevents, d_items, and patients tables. The chartevents table provides recorded clinical values, the d_items table identifies what the measurement is, and the patients table adds demographic information such as gender.

This example extends the practice by comparing several vital sign measurements by gender instead of only looking at heart rate alone. I used CAST() again so the values are treated as numbers, and I filtered out missing or unrealistic values so the visualization would be easier to read.

myquery2 <- dbGetQuery(mydb, '
SELECT 
  patients.gender,
  d_items.label,
  CAST(chartevents.valuenum AS INT) AS val
FROM chartevents
INNER JOIN d_items
ON chartevents.itemid = d_items.itemid
INNER JOIN patients
ON chartevents.subject_id = patients.subject_id
WHERE d_items.itemid IN (220045, 220210, 220277)
  AND chartevents.valuenum IS NOT NULL
  AND chartevents.valuenum > 0
LIMIT 10000;
')

head(myquery2)
  gender                       label val
1      F            Respiratory Rate  15
2      F                  Heart Rate  94
3      F O2 saturation pulseoxymetry 100
4      F                  Heart Rate  88
5      F            Respiratory Rate  15
6      F O2 saturation pulseoxymetry  99
ggplot(data = myquery2,
       aes(x = gender, y = val, fill = gender)) +
  geom_violin() +
  theme_minimal() +
  facet_wrap(~ label) +
  labs(
    title = "Vital Sign Distributions by Gender",
    subtitle = "Three-table join using CHARTEVENTS, D_ITEMS, and PATIENTS",
    x = "Gender",
    y = "Recorded Value",
    fill = "Gender",
    caption = "Source: MIMIC-III Clinical Database v1.4"
  )

This visualization is relevant for healthcare organizations because it shows how clinical data and demographic data can be combined to compare patient groups. The violin plot is useful because it shows the full distribution of values rather than only showing an average or count. This helps viewers see where values cluster and whether one group has more variation than another.

This example also demonstrates the importance of properly interpreting healthcare data. A difference in vital sign distribution does not automatically prove a clinical cause. Other factors such as age, diagnosis, medications, ICU status, and severity of illness may also affect these measurements. For this reason, clinical judgment should be supported by healthcare data rather than replaced by it.

Recommendations for Industry

Strong data governance should be the number one priority for healthcare management. Across departments, data must be precise, consistent, safe, and well-defined. When data from several systems is combined for analysis, this is particularly crucial.

Second, data quality checks should be strengthened by healthcare institutions. Data should be examined for missing values, inaccurate values, duplicate records, and unrealistic outliers before being utilized in dashboards, clinical decision support tools, or AI systems. Poor decisions might result from poor data.

Third, clinical decision support systems should be created by hospitals to assist doctors without overwhelming them. Prioritizing alerts will help avoid overwhelming clinicians with pointless cautions. Alert fatigue can result from receiving too many low-value alerts, which could cause crucial cautions to be disregarded.

Fourth, rather than taking the place of medical judgment, healthcare institutions should embrace AI and analytics as support tools. Although AI can assist with pattern recognition, information summarization, and early diagnosis, doctors must still verify the accuracy and context of the data.

Fifth, administrators and clinical personnel should receive training on responsible data usage from hospital executives. Employees should be aware of the need of proper data entry as well as how data influences operational choices, quality reporting, and patient care.

Lastly, healthcare institutions must encourage patients and doctors to make decisions together. Wearables and health applications can provide valuable patient-generated data, but it must be carefully reviewed and monitored. Technology should be used to improve rather than to replace the patient-physician interaction.

References

Elhaddad, Malek, and Sara Hamam. 2024. “AI-Driven Clinical Decision Support Systems: An Ongoing Pursuit of Potential.” Cureus 16 (4): e57728.
Khosravi, Mohsen, Zahra Zare, Seyyed Morteza Mojtabaeian, and Reyhane Izadi. 2024. “Artificial Intelligence and Decision-Making in Healthcare: A Thematic Analysis of a Systematic Review of Reviews.” Health Services Research and Managerial Epidemiology 11: 23333928241234863.
Raghupathi, Wullianallur, and Viju Raghupathi. 2014. “Big Data Analytics in Healthcare: Promise and Potential.” Health Information Science and Systems 2 (1): 3.
Sutton, Reed T, David Pincock, Daniel C Baumgart, Daniel C Sadowski, Richard N Fedorak, and Karen I Kroeker. 2020. “An Overview of Clinical Decision Support Systems: Benefits, Risks, and Strategies for Success.” NPJ Digital Medicine 3 (1): 17.