Xiangyu Wang
2025-11-01
This analysis uses an open, hospital-based heart attack dataset from Zheen Hospital in Erbil (Iraq, 2019) to answer a concrete question: within a group of patients who had already reached hospital with suspected cardiac problems, which ones were ultimately the real myocardial infarctions?
The data are not drawn from the general population; they come from a short period of clinical activity in which doctors were already filtering for heart-related complaints. That is why heart-attack cases outnumber non–heart-attack cases and why the cohort is visibly older and slightly male-skewed. All later figures must therefore be read in this clinical context: we are contrasting patients who already “looked cardiac” to clinicians, not healthy controls.
Three kinds of information will matter repeatedly: - baseline patient profile (age, sex) - cardiovascular load at presentation (blood pressure and its categories) - laboratory evidence of myocardial injury (CK-MB, Troponin).
This bar chart establishes the composition of the cohort: among the patients who presented during the study period, 810 were labelled as having a heart attack and 509 as not having a heart attack. This confirms that we are dealing with a hospital-based, high-risk population, not a general community sample. Consequently, all subsequent visualisations should be read as analyses within a clinically suspected group, asking: which variables best separate true myocardial infarction from other cardiac presentations?
Building on that starting point, the second figure clarifies who these suspected patients are. When age is shown separately for females and males, both panels reveal a predominance of middle-aged and older adults: females mostly between 50 and 70 years, males even more tightly concentrated between 55 and 70 years, with more heart-attack cases among men in this band. This tells us that many of the differences we will later observe are not random but arise within a hospital population that is already older and slightly male-skewed. Age and sex therefore constitute the demographic backdrop against which subsequent blood-pressure and biomarker differences should be interpreted.
The third figure tests whether age is merely a background characteristic or an actual discriminator. The violin–boxplot combination shows that the two outcome groups do not overlap perfectly: non–heart-attack patients cluster around the mid-40s to mid-60s, whereas the heart-attack group is shifted to the right, with a higher median and a longer upper tail. This confirms that, even within an already older hospital population, age still helps separate those who sustained myocardial injury from those who did not. This legitimises using age as a structural variable when we interpret blood-pressure and metabolic patterns later on.
With age accounted for, the fourth figure looks at baseline haemodynamic load. Plotting systolic against diastolic pressure with the 140/90 clinical thresholds shows that most patients occupy a moderately elevated zone, which is consistent with an older, higher-risk cohort. Heart-attack cases are clearly present in the hypertensive quadrant, indicating that high BP often coexists with myocardial infarction in this setting. Yet heart-attack cases also appear below the thresholds, demonstrating that blood pressure is an important but not sufficient signal. This motivates the transition to a more structured BP classification and, ultimately, to biochemical confirmation.
Accordingly, the fifth figure converts the raw BP cloud into clinically meaningful categories and asks, within each stratum, how many patients ended up with a heart-attack label. The pattern is graded: the Normal and Elevated groups show a balanced outcome mix, but from Hypertension stage 1 onward the heart-attack share increases, and in stage 2+ hypertension non-heart-attack cases recede. This indicates that persistently or categorically high blood pressure explains the outcome better than a single elevated measurement, and it strengthens the view that “age + BP” defines a vulnerable background. The scene is now set for adding the laboratory layer that actually confirms myocardial injury.
The sixth figure provides the confirmation layer. After pivoting CK-MB and Troponin and plotting them on a log scale, both markers are clearly higher and thicker in the heart-attack group than in the non-heart-attack group. This indicates that a substantial share of the positive cases had laboratory evidence of myocardial injury, not just clinical suspicion. The log transformation keeps the few very high values from dominating the display and makes the upward shift of the whole group visible. In combination with the demographic and BP findings, the message is: age and blood pressure tell us who is vulnerable; cardiac biomarkers tell us who actually had the infarction.
The final plot compresses all previous evidence into a single ranking: among the routinely available numeric variables, which ones shift the most between the heart-attack and non-heart-attack groups? The ordering mirrors the visual story: CK-MB shows by far the largest difference, followed by age, then systolic blood pressure, with heart rate and blood sugar contributing smaller but consistent shifts. Diastolic pressure and a few others move little. This means that, for presentation purposes, the dashboard should emphasise exactly these variables and in this order: biomarkers first, age second, BP third, everything else as context. In this way, the exploratory section forms one coherent narrative: a high-risk hospital cohort → older, BP-loaded patients → categorical hypertension → biochemical confirmation → ranked summary of discriminative features.
This hospital-based exploratory analysis reveals a single clinical story through its figures. * The cohort is not from the general population, but is predominantly middle-aged and older, with men somewhat overrepresented — the kind of patients that clinicians routinely screen for cardiac events. * Within this group, which is already considered high-risk, older patients and those with higher blood pressure categories are more likely to experience a heart attack, indicating that chronic cardiovascular load is a significant factor in this setting. * The log-scaled plots of CK-MB and troponin show an upward shift across the cohort only in the heart attack group. This confirms that these cases are not labelled based on symptoms alone, but also have biochemical evidence of myocardial injury. Overall, the sequence of visuals indicates that in this dataset, ‘looking like a cardiac patient’ and ‘actually having an infarction’ are not the same, and the transition from one to the other is explained by age, blood pressure categories, and cardiac biomarkers.
CK-MB shows the largest gap because it captures the event itself. Age comes next because it identifies those most vulnerable to such an event. Systolic blood pressure follows because it reflects the haemodynamic burden that often precedes the event. The overarching conclusion is that robust discrimination for this high-risk hospital cohort does not come from any single variable in isolation, but from the accumulation of baseline vulnerability, current cardiovascular stress and objective evidence of injury. Viewed through this lens, the small overlaps between groups in some of the earlier plots are to be expected rather than being problematic because they represent patients who had the background risk but did not experience final myocardial damage, which is precisely what the dataset set out to record.
Rashid, T. A., & Hassan, B. (2022). Heart attack dataset (Version 1) [Data set]. Mendeley Data. https://doi.org/10.17632/wmhctcrt5v.1
Fatemeh Mohammadinia. (n.d.). Heart Attack Dataset – Tarik A. Rashid [Data set]. Kaggle. https://www.kaggle.com/datasets/fatemehmohammadinia/heart-attack-dataset-tarik-a-rashid/data