Before running the code, we will load the necessary R packages. The tidyverse package will provide a cohesive set of functions for data manipulation, cleaning, and reshaping. The ggplot2 package allows us to create highly customizable and detailed plots. DT will enable us to render interactive tables that users can sort, search, and scroll through. Finally, rsconnect allows deployment of our Shiny app to shinyapps.io. Loading these libraries ensures that all subsequent functions will work properly and avoids errors due to missing dependencies.
#install.packages('rsconnect')
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(DT)
library(rsconnect)
After executing this code, the R environment will have all necessary packages loaded. This setup ensures that the dataset can be manipulated efficiently, visualizations can be generated with full flexibility, tables will be interactive for the user, and the Shiny app can be deployed. Users will have access to all interactive and plotting features without function errors.
Before running this code, we will import the Heart Disease dataset for analysis. We will use read.csv() with sep=” ” and header=FALSE because the dataset does not have headers and uses spaces as separators. We will then assign descriptive column names using colnames(). After importing, we will convert integer-coded variables into factors using factor(), providing human-readable labels (e.g., converting 1 and 0 in the sex variable to “M” and “F”). This will allow R to correctly treat these as categorical variables for plotting and summarization. Finally, we will add a unique PatientID using mutate() and move it to the first column with relocate(). We will also inspect the dataset using summary() and head().
#Loading and Preparing the Heart Disease Dataset
heart.dat <- read.csv("heart.dat.csv", sep=" ", header = FALSE)
names <- c("age", "sex", "cp", "restbp",
"chol", "fbs", "restecg", "maxach", "exang", "oldpeak", "slope", "num",
"thal","disease")
colnames(heart.dat) <- names
heart.dat$sex <- factor(heart.dat$sex, labels=c("F", "M"))
heart.dat$cp <- factor(heart.dat$cp,
labels=c("Typ", "Atyp", "Non-Ang", "Asymp"))
heart.dat$fbs <- factor(heart.dat$fbs, labels=c("T", "F"))
heart.dat$restecg <- factor(heart.dat$restecg,
labels=c("Normal", "Abnorm", "Hypertrophy"))
heart.dat$exang <- factor(heart.dat$exang, labels=c("N", "Y"))
heart.dat$slope <- factor(heart.dat$slope,
labels=c("Up", "Flat", "Down"))
heart.dat$thal <- factor(heart.dat$thal,
labels=c("Normal", "Fixed", "Reversible"))
heart.dat$disease <- factor(heart.dat$disease, labels=c("H", "S"))
heart.dat<- heart.dat%>%
mutate(PatientID = 1:n())%>%
relocate(PatientID, .before = 1)
summary(heart.dat)
## PatientID age sex cp restbp
## Min. : 1.00 Min. :29.00 F: 87 Typ : 20 Min. : 94.0
## 1st Qu.: 68.25 1st Qu.:48.00 M:183 Atyp : 42 1st Qu.:120.0
## Median :135.50 Median :55.00 Non-Ang: 79 Median :130.0
## Mean :135.50 Mean :54.43 Asymp :129 Mean :131.3
## 3rd Qu.:202.75 3rd Qu.:61.00 3rd Qu.:140.0
## Max. :270.00 Max. :77.00 Max. :200.0
## chol fbs restecg maxach exang
## Min. :126.0 T:230 Normal :131 Min. : 71.0 N:181
## 1st Qu.:213.0 F: 40 Abnorm : 2 1st Qu.:133.0 Y: 89
## Median :245.0 Hypertrophy:137 Median :153.5
## Mean :249.7 Mean :149.7
## 3rd Qu.:280.0 3rd Qu.:166.0
## Max. :564.0 Max. :202.0
## oldpeak slope num thal disease
## Min. :0.00 Up :130 Min. :0.0000 Normal :152 H:150
## 1st Qu.:0.00 Flat:122 1st Qu.:0.0000 Fixed : 14 S:120
## Median :0.80 Down: 18 Median :0.0000 Reversible:104
## Mean :1.05 Mean :0.6704
## 3rd Qu.:1.60 3rd Qu.:1.0000
## Max. :6.20 Max. :3.0000
head(heart.dat)
## PatientID age sex cp restbp chol fbs restecg maxach exang oldpeak
## 1 1 70 M Asymp 130 322 T Hypertrophy 109 N 2.4
## 2 2 67 F Non-Ang 115 564 T Hypertrophy 160 N 1.6
## 3 3 57 M Atyp 124 261 T Normal 141 N 0.3
## 4 4 64 M Asymp 128 263 T Normal 105 Y 0.2
## 5 5 74 F Atyp 120 269 T Hypertrophy 121 Y 0.2
## 6 6 65 M Asymp 120 177 T Normal 140 N 0.4
## slope num thal disease
## 1 Flat 3 Normal S
## 2 Flat 0 Reversible H
## 3 Up 0 Reversible S
## 4 Flat 1 Reversible H
## 5 Up 1 Normal H
## 6 Up 0 Reversible H
After executing this code, the dataset will be clean and ready for analysis. All categorical variables now have descriptive labels, making plots easier to interpret. The PatientID column provides a unique identifier for each row. The summary() and head() output will allow users to verify the variable types, observe ranges for continuous variables, and see that the dataset has been successfully prepared for visualization and analysis.
We will now create a scatter plot function to allow users to explore the relationships between two continuous variables (e.g., age and chol). This function will also colour points based on a categorical variable (e.g., sex) to reveal patterns or group differences. To organize the data, we will define continuousVars and categoricalVars as vectors containing the appropriate continuous and categorical variables. For this plot, varX and varY will be set to variables from continuousVars, while varCol will be chosen from categoricalVars. We will use ggplot() to initiate the plot, aes() with .data[[varX]], .data[[varY]], and .data[[varCol]] to map variables dynamically, and geom_point() to plot individual data points. The theme_minimal() function will give the plot a clean, uncluttered appearance, and labs() will be used to generate dynamic axis labels and a descriptive plot title.
#Scatter Plot Function (Two Continuous Variables Coloured by a Categorical Variable)
#Main variable groups
continuousVars <- c("age", "restbp", "chol", "maxach", "oldpeak")
categoricalVars <- c("disease", "exang", "fbs", "thal", "cp", "sex", "restecg")
#Define graph specific varX, varY, varCol
varX <- continuousVars
varY<- continuousVars
varCol <- categoricalVars
#Make Graph
myScatterPlot <- function(varX, varY, varCol) {
heart.dat %>%
ggplot(aes(x = .data[[varX]],
y = .data[[varY]],
colour = .data[[varCol]])) +
geom_point(size = 3, alpha = 0.7) +
theme_minimal() +
labs(
x = varX,
y = varY,
colour = varCol,
title = paste("Scatter plot of", varY, "vs", varX, "coloured by", varCol)
)
}
#Example
myScatterPlot("age", "chol", "sex")
After running this code, users will be able to generate a scatter plot of any two continuous variables and color points by a categorical variable. The plot will show the relationship between the variables, highlight clusters or trends, and help users visually detect patterns in the data.
We will now create a box plot function to allow users to compare distributions of a continuous variable across different levels of a categorical variable, and separate further by a second categorical variable. ggplot() will be used with geom_boxplot() to display medians, quartiles, and outliers. The fill aesthetic allows colouring by a second categorical variable. This function will help users quickly see differences and variability in continuous measures across categories.
#Box Plot Function (Categorical vs Continuous, Optional Fill)
#Define graph specific varX, varY, varCol
varX <- categoricalVars
varY<- continuousVars
varCol <- categoricalVars
#Make Graph
myBoxPlot <- function(varX, varY, varCol) {
heart.dat %>%
ggplot(aes(x = .data[[varX]],
y = .data[[varY]],
fill = .data[[varCol]])) +
geom_boxplot() +
theme_minimal() +
labs(x = varX, y = varY, fill = varCol,
title = paste("Boxplot of", varY, "by", varX))
}
#Example
myBoxPlot("exang", "chol", "sex")
After executing this code, users will be able to create boxplots showing the distribution of a continuous variable for each category. The boxes will be coloured to reflect any chosen categorical variable, making comparisons between groups visually intuitive.
We will now create a function, myCatPlot(), to allow users to explore one or two categorical variables in the Heart Disease dataset. The goal of this function will be to provide a clear overview of category counts, enabling users to understand the distribution of participants within each category and to detect potential patterns when two categorical variables are combined. We will design the function so that the second variable, varFill, is optional. If a second categorical variable is provided, we will combine it with the first using interaction() to create a new factor representing all possible category combinations. This approach will allow users to examine joint distributions dynamically while keeping the function reusable for different variable selections.
We will use ggplot() to create the visualization, mapping the combined variable to both the x-axis and the fill color via aes(x = Combined, fill = Combined). The geom_bar() function will generate the bar plot showing the counts for each category or combination of categories. We will apply theme_minimal() to produce a clean, uncluttered plot, and use labs() to dynamically generate descriptive axis labels and a title based on the selected variables. Finally, we will rotate the x-axis labels with element_text(angle = 45, hjust = 1) to prevent overlapping text, ensuring readability even when category names are long or when many combinations exist. This function will allow users to intuitively explore categorical distributions and relationships, providing a foundation for further analysis and comparison across the dataset.
#Categorical Plot Function (One or Two Categorical Variables)
#Define graph specific varX, varY, varCol
varX <- categoricalVars
varFill <- categoricalVars
myCatPlot <- function(varX, varFill = NULL) {
df <- heart.dat
#Create combined variable (if varFill is given)
df$Combined <- if (!is.null(varFill) && varFill != "") {
interaction(df[[varX]], df[[varFill]], sep = "_")
} else {
df[[varX]]
}
#Labels and title
label <- if (!is.null(varFill) && varFill != "") paste(varX, "&", varFill) else varX
title <- paste("Bar plot of", label)
#Make Graph
ggplot(df, aes(x = Combined, fill = Combined)) +
geom_bar() +
theme_minimal() +
labs(x = label, y = "Count", fill = label, title = title) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
#Examples
myCatPlot("sex") # Single categorical variable
myCatPlot("sex", "cp") # Combine two categorical variables
After executing this code, users will see a bar plot displaying the counts of participants for the selected categorical variable(s). If only a single variable (varX) is chosen, the plot will show a separate bar for each category, with the height of each bar representing the number of participants in that category. If a second variable (varFill) is selected, the plot will display bars for all combinations of the two variables, allowing users to see the joint distribution of categories.
The fill color of the bars will correspond to the category or combination of categories, making it easier to visually distinguish groups. The x-axis labels will be rotated at a 45-degree angle to prevent overlapping text, ensuring that all categories are readable, even when there are many or long category names. The plot title and axis labels will dynamically update based on the variables selected, providing context without requiring users to manually annotate the plot.
Overall, this function will allow users to quickly explore the composition of categorical variables, identify imbalances or dominant groups, and detect patterns or relationships between two categorical variables, all within a single, clean, and interactive visual representation.
We will create a function to allow users to select a range of columns and view the data as an interactive table. The select() function will subset the dataset by column range, and datatable() will render the data interactively, supporting scrolling, sorting, and searching.
# Make a table
makeTable <- function(startVar, endVar) {
heart.dat %>%
select(all_of(startVar):all_of(endVar)) %>%
datatable(
options = list(scrollX = TRUE),
class = "cell-border stripe",
rownames = FALSE
)
}
# Example
makeTable("PatientID", "chol")
After executing this code, users will be able to interactively explore the data in a table format. Columns within the selected range will be displayed, and users can scroll horizontally, search for values, and sort data for easier exploration.
We will create a glossary to describe each variable in the dataset. A data.frame will store variable names and descriptions. datatable() will render it interactively. This will provide users a quick reference to understand what each variable represents before creating plots or tables.
# Create a glossary table of variables
glossary <- data.frame(
Variable = c("PatientID", "age", "sex", "cp", "restbp", "chol",
"fbs", "restecg", "maxach", "exang", "oldpeak",
"slope", "num", "thal", "disease"),
Description = c(
"Unique patient identifier",
"Age in years",
"Sex (F = female, M = male)",
"Chest pain type (Typ = typical, Atyp = atypical, Non-Ang = non-anginal, Asymp = asymptomatic)",
"Resting blood pressure (mm Hg)",
"Serum cholesterol (mg/dl)",
"Fasting blood sugar > 120 mg/dl (T = true, F = false)",
"Resting electrocardiographic results (Normal, Abnorm, Hypertrophy)",
"Maximum heart rate achieved",
"Exercise induced angina (N = no, Y = yes)",
"ST depression induced by exercise relative to rest",
"Slope of the peak exercise ST segment (Up, Flat, Down)",
"Number of major vessels coloured by fluoroscopy (0–3)",
"Thalassemia (Normal, Fixed, Reversible)",
"Presence of heart disease (H = heart disease, S = healthy)"
),
stringsAsFactors = FALSE
)
# Display as interactive table
datatable(glossary, options = list(scrollX = TRUE, pageLength = 5), rownames = FALSE)
After running this code, users will see an interactive table with variable names and detailed descriptions. The table will allow horizontal scrolling, sorting, and searching. This glossary ensures that users can understand each variable before selecting it for plots or tables, improving the clarity and usability of the Shiny app.