Analysis Report Four - Health Privacy and Data Profiling

Author

Tristan Worlock

Executive Summary

This week, our assigned readings taught us about how Healthcare organizations constantly continue to adopt digital technologies such as EHR’s, telehealth platforms, AI, and cloud based health information systems to improve patient care and operational efficiency {(davidson2025consumer?)}. The assigned readings show that while these technologies have improved communication, access to patient information, and clinical decision-making, they also have introduced significant challenges related to cyber security, patient privacy, clinical workload, and system interoperability (davidson2025consumer?).

The most important theme that I found across the readings is that technology alone will not improve healthcare outcomes. Organizations must invest into cyber security, staff training, privacy protections, and change management (kruse2017cybersecurity?). Recent events, including the Change Healthcare ransom ware attack and increasing federal oversight of digital health companies, demonstrate that healthcare organizations must treate data security and governance as strategic priorities rather than technical issues (kruse2017cybersecurity?).

Introduction

The assigned readings this week focused on how exactly healthcare organizations use data as a strategic asset while addressing the challenges that go along side digital transformation. A big major theme of the readings is the widespread adoption of EHR’s, which was accelerated by the Health Information Technology for Economic and Clinical Health (HITECH) Act through financial incentives for hospitals (adler2017hitech?). Although EHR’s improve information sharing and care coordination, healthcare professionals continue to report concerns about usability, interoperability, documentation burden, and workflow disruptions (adler2017hitech?).

The readings also introduce the broader data strategy concepts involving artificial intelligence, telehealth, privacy, and cybersecurity. AI can now improve clinical decision making but also makes a grey area, as it should support, not replace, professional judgment (davidson2025consumer?). Telehealth also expands access to care, but also does raise concerns about how sensitive patient information is collected and shared (davidson2025consumer?). Across all of the articles, effective data governance depends on transparency, strong cybersecurity practices, regulatory compliance, and organizational leadership that encourages responsible technology adoption (davidson2025consumer?).

The Healthcare Context

Today’s healthcare organizations face many challenges, as they continue to adopt digital technologies while protecting sensitive patient information. EHR’s, telehealth, and AI have improved patient care, communication, and operational efficiency, they have also introduced concerns related to cybersecruity, data privacy, interoperability, and clinician workload. Recent events, such as the 2024 Change Healthcare cyberattack (kruse2017cybersecurity?), demonstrated how a single cybersecurity breach can disrupt healthcare services nationwide and expose the personal information of millions of patients (kruse2017cybersecurity?).

In addition to this, increased FTC enforcement against companies like BetterHelp highlights the growing need for stronger privacy protections and transparent data practices (kruse2017cybersecurity?). Overall, as healthcare continues its digital transformation, organizations must balance technological innovation with effective cybersecurity, regulatory compliance, and strong data governance to protect patients while delivering high-quality care.

Pre-Visualization Table:

candidates <- dbGetQuery(mydb, "
        SELECT p.subject_id, p.gender, p.dob, p.dod, a.hadm_id, a.admittime, a.dischtime, a.deathtime, a.admission_type, a.marital_status, a.ethnicity
        FROM patients p
        JOIN admissions a ON p.subject_id = a.subject_id
        WHERE p.expire_flag = 1
        AND a.deathtime IS NOT NULL
        ORDER BY a.dischtime DESC
        LIMIT 10
        ")
candidates
   subject_id gender                 dob                 dod hadm_id
1       41976      M 2136-07-28 00:00:00 2202-12-05 00:00:00  153826
2       41976      M 2136-07-28 00:00:00 2202-12-05 00:00:00  149469
3       41976      M 2136-07-28 00:00:00 2202-12-05 00:00:00  145024
4       41976      M 2136-07-28 00:00:00 2202-12-05 00:00:00  151798
5       41976      M 2136-07-28 00:00:00 2202-12-05 00:00:00  179418
6       41976      M 2136-07-28 00:00:00 2202-12-05 00:00:00  155297
7       41976      M 2136-07-28 00:00:00 2202-12-05 00:00:00  125013
8       41976      M 2136-07-28 00:00:00 2202-12-05 00:00:00  174863
9       41976      M 2136-07-28 00:00:00 2202-12-05 00:00:00  180546
10      41976      M 2136-07-28 00:00:00 2202-12-05 00:00:00  130681
             admittime           dischtime deathtime admission_type
1  2202-10-03 01:45:00 2202-10-11 16:30:00                EMERGENCY
2  2202-09-16 21:56:00 2202-09-23 16:20:00                EMERGENCY
3  2202-05-01 22:00:00 2202-05-04 18:42:00                EMERGENCY
4  2202-02-15 19:01:00 2202-02-19 16:42:00                EMERGENCY
5  2201-12-31 19:19:00 2202-01-03 17:55:00                EMERGENCY
6  2201-11-16 23:00:00 2201-11-19 16:30:00                EMERGENCY
7  2201-09-28 16:47:00 2201-10-01 15:53:00                EMERGENCY
8  2201-08-10 23:00:00 2201-08-13 16:55:00                EMERGENCY
9  2201-05-12 10:49:00 2201-05-19 14:04:00                EMERGENCY
10 2200-10-29 20:46:00 2200-11-03 18:45:00                EMERGENCY
   marital_status                      ethnicity
1         MARRIED HISPANIC/LATINO - PUERTO RICAN
2         MARRIED HISPANIC/LATINO - PUERTO RICAN
3         MARRIED HISPANIC/LATINO - PUERTO RICAN
4         MARRIED HISPANIC/LATINO - PUERTO RICAN
5         MARRIED HISPANIC/LATINO - PUERTO RICAN
6         MARRIED HISPANIC/LATINO - PUERTO RICAN
7         MARRIED HISPANIC/LATINO - PUERTO RICAN
8         MARRIED HISPANIC/LATINO - PUERTO RICAN
9         MARRIED HISPANIC/LATINO - PUERTO RICAN
10        MARRIED HISPANIC/LATINO - PUERTO RICAN

Data Visualizations

VISUALIZATION 1

SELECT i.icustay_id, i.first_careunit, i.intime, i.outtime, julianday(i.outtime) - julianday(i.intime) as unit_duration
FROM icustays i
WHERE i.hadm_id = 153826
ORDER BY i.intime
ggplot(data = icu_timeline,
       aes(x = intime, xend = outtime, y = first_careunit, yend = first_careunit)) + 
  geom_segment(size = 6, color = "steelblue") + 
  labs(title = "ICU Unit Transfers During Terminal Stay", 
       x = "Date/Time", y = "Care Unit")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

EXPLANATION for #1: I created Visualization #1 to show the direct uses of the INTIME/OUTTIME timestamps, and how to reconstruct where this patient physcially was during their final hospitalization. For this paitent 153826, it revealed something meaningful, which is the patient never left the MICU. This was a single, roughly 39-hour stay from admission to death. This is a data point in itself. This showed me and shows the readers that this wasn’t a case of escalating care across multiple units, it was a patient who arrived already critical enough to go straight to intensive caare and didn’t survive the stay.

VISUALIZATION #2:

dbGetQuery(mydb, "
           SELECT DISTINCT d.label
           FROM labevents le
           JOIN d_labitems d ON le.itemid = d.itemid
           WHERE le.hadm_id = 153826
           ")
                             label
1   Alanine Aminotransferase (ALT)
2                          Albumin
3             Alkaline Phosphatase
4                        Anion Gap
5  Asparate Aminotransferase (AST)
6                      Bicarbonate
7                 Bilirubin, Total
8                         Chloride
9                       Creatinine
10                         Glucose
11                       Potassium
12                          Sodium
13                   Urea Nitrogen
14                       Basophils
15                     Eosinophils
16                      Hematocrit
17                      Hemoglobin
18                         INR(PT)
19                     Lymphocytes
20                             MCH
21                            MCHC
22                             MCV
23                       Monocytes
24                     Neutrophils
25                  Platelet Count
26                              PT
27                             PTT
28                             RDW
29                 Red Blood Cells
30               White Blood Cells
31                     Base Excess
32            Calculated Total CO2
33                         Lactate
34                            pCO2
35                              pH
36                             pO2
37                Epithelial Cells
38                             RBC
39                Specific Gravity
40                             WBC
41                           Yeast
42                  Calcium, Total
43                       Magnesium
44                       Phosphate
45                 25-OH Vitamin D
46                        Ferritin
47                            Iron
48    Iron Binding Capacity, Total
49             Parathyroid Hormone
50                     Transferrin
51            Creatine Kinase (CK)
52   Creatine Kinase, MB Isoenzyme
53                      Troponin T
lab_trend <- dbGetQuery(mydb, "
                        SELECT le.charttime, le.valuenum, le.valueuom, d.label
                        FROM labevents le
                        JOIN d_labitems d ON le.itemid = d.itemid
                        WHERE le.hadm_id = 153826
                        AND d.label = 'Creatinine'
                        ORDER BY le.charttime
                        ")
ggplot(data = lab_trend,
       aes(x = charttime, y = valuenum)) +
  geom_line(color = "firebrick") + 
  geom_point() +
  labs(title = "Creatinine Trend Over Terminal Stay",
       x = "Date/Time", y = "Creatinine (mg/dL)")
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?

EXPLANATION #2 I created this visualization so it tracks the patient’s creatinine levels over the course of their terminal stay, using timestamped lab values that are pulled directly from the LABEVENTS table. Creatinine is a standard clinical market of a kidney function, and plotting it against exact CHARTTIME value shows wheter the renal function was declining, stable, or if it was just fluctuating in the lead-up to death. A rising trend would most likely be consistent with an acute kidney injury or some sort of multi-organ decline, which can be common in critically ill patients nearing end of life. Frequent measurements also reflect how closely this patient was being monitored during their stay. This overall can show how clinical data can be extremely granular, and timestamped, in order to reconstruct a patient’s physiological trajectory. This also reincofrces how much sensistive data detail hospitals collect and retain on patients, individually.

Recommendations for Industry

Based off of the visualizatons that I had built, I think hosptials need to take a harder look at two things. First thing, being how they use timestamp data operationally, and how carefully they protect it. On the operational side, I am surprised by how much you can reconstruct just from admission, ICU, and lab timestamps. With a few queries, I was able to map out exactly where this patient was and how their condition changed, almost hour by hour. That same capability could be credible and useful for hospitals if they are using it proactively. If administrators tracked ICU patient stay patterns and lab trends like creatinine in real time across all patients, they could catch signs of decline earlier and make faster staffing or care decisions, rather than only peicing this story together after the fact like I did.

At the same time though, the same power is exactly why I think data governance needs to be taken more seriously. My recommendation would be for healthcare organizations to invest in stronger role based access controls and regular audits of who’s running these kinds of granular queries, especially on identifiable patient data. I think the same data that makes hospitals better at predicting and responding to patient decline is also data that needs to be locked down a lot tighter than it probably is now.

References

(inproceedings?){neprash2022trends, title={Trends in ransomware attacks on US hospitals, clinics, and other health care delivery organizations, 2016-2021}, author={Neprash, Hannah T and McGlave, Claire C and Cross, Dori A and Virnig, Beth A and Puskarich, Michael A and Huling, Jared D and Rozenshtein, Alan Z and Nikpay, Sayeh S}, booktitle={JAMA Health Forum}, volume={3}, number={12}, pages={e224873}, year={2022} }

(article?){adler2017hitech, title={HITECH Act drove large gains in hospital electronic health record adoption}, author={Adler-Milstein, Julia and Jha, Ashish K}, journal={Health affairs}, volume={36}, number={8}, pages={1416–1422}, year={2017} }

(article?){davidson2025consumer, title={Consumer health data: regulation, governance, and innovation}, author={Davidson, Elizabeth and Winter, Jenifer}, year={2025} }

(article?){kruse2017cybersecurity, title={Cybersecurity in healthcare: A systematic review of modern threats and trends}, author={Kruse, Clemens Scott and Frederick, Benjamin and Jacobson, Taylor and Monticone, D Kyle}, journal={Technology and Health Care}, volume={25}, number={1}, pages={1–10}, year={2017}, publisher={SAGE Publications Sage UK: London, England} } :::