Asthma Presence

Author

Crows: Danielle Clarke, Artemas Souder, Kensley House

library(qrcode)
library(magick)
Linking to ImageMagick 6.9.13.29
Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
Disabled features: fftw, ghostscript, x11
qr <- qr_code("http://rpubs.com/khouse15/1428140")
img <- image_read("Crow_Compressed Zip/Ksu Logo-1.png")
plot(img) 

plot(qr)

Introduction

Knowing the factors that influence asthma can be integrally in determining what factors might influence asthma severity. This data set provides plenty of general information in the healthcare domain, to explore this topic.

Research Questions

  • What factors are associated with the presence of Asthma?

  • How does BMI interact with Asthma Risk and Asthma Severity?

  • How is Age affected by Asthma Risk?

  • How does Gender interact with Medication Adherence?

Statement of Purpose

The purpose of this research is to gain a better understanding of what influences Asthma Risk and Severity. Our aim is provide adequate analysis, and to understand how certain personal factors might prove important for people with Asthma.

Data

The following data set was collected from Kaggle: Asthma Risk & Severity Dataset

Of the original seventeen variables present in the data set, ten will be kept for further analysis. BMI, Age, and Medication Adherence will be turned into categorical groups for general comparison purposes based on broad categories affecting Asthma Levels.

asthma_clean <- asthma|>
  mutate(BMI_groups = case_when(
            BMI < 18.5 ~ "Underweight",
            BMI <= 24.9 ~ "Normal",
            BMI <= 29.9 ~ "Overweight",
            BMI >29.9 ~ "Obese"),
          asthma_y_n = case_when(
              Has_Asthma == 0 ~ "No Asthma",
              Has_Asthma == 1 ~ "Has Asthma")) |>
  mutate(age_groups = cut(asthma$Age, 
                          breaks = c(0, 25, 50, 75, 100),
                          labels = c("Young(0-25)", "Middle Age(26-50)", "Senior(51-75)", "Elder(76-100)"),
                          right = FALSE)) |>
  filter(!is.na(Gender),!is.na(Asthma_Control_Level), !is.na(Age)) |>
  select(Age, BMI, Gender, Allergies, Physical_Activity_Level, Occupation_Type, Comorbidities, Has_Asthma,BMI_groups, Medication_Adherence,Asthma_Control_Level, age_groups)

The table below is interactive allowing you to explore the data set through the use of the search box.

Analysis

asthma_clean |>
  group_by(Has_Asthma) |>
  summarize(
    n = n(),
    mean_BMI = mean(BMI, na.rm = TRUE),
    mean_age = mean(Age, na.rm = TRUE),
    mean_adherence = mean(Medication_Adherence, na.rm = TRUE)
  )
# A tibble: 2 × 5
  Has_Asthma     n mean_BMI mean_age mean_adherence
       <dbl> <int>    <dbl>    <dbl>          <dbl>
1          0  7567     24.8     45.0          0.498
2          1  2433     25.9     44.7          0.499

Within the sample, 7,567 participants do not have asthma while 2,433 participants do. The Mean BMI for individuals with without asthma is 24.8 while those with asthma have a mean BMI of 25.9. For those with asthma the mean age of participants is 44.7 while it is 45.0 for their counterparts without asthma.

asthma_clean |>
  group_by(Comorbidities) |>
  summarize(
    n = n(),
    mean_BMI = mean(BMI, na.rm = TRUE),
    mean_age = mean(Age, na.rm = TRUE),
    mean_adherence = mean(Medication_Adherence, na.rm = TRUE)
  )
# A tibble: 4 × 5
  Comorbidities     n mean_BMI mean_age mean_adherence
  <chr>         <int>    <dbl>    <dbl>          <dbl>
1 Both            986     25.0     46.3          0.494
2 Diabetes       2029     25.1     44.5          0.494
3 Hypertension   2018     25.1     44.3          0.499
4 None           4967     25.0     45.1          0.500

Within the sample, most participants (4,967) do not have a co-morbidity. Nearly the same amount of participants presented with one co-morbidity (either diabetes or hypertension) while less than 1,000 participants have both diabetes and hypertension.

We began the analysis by looking at table below which highlights the differences for BMI groups when also looking at Age groups (figure 1). Next we faceted bar charts of the variables to visualize the numbers as a percentage of their group (figure 2).

hold<- table(asthma_clean$age_groups,asthma_clean$Comorbidities)


knitr::kable(hold, caption = " Figure 1a: Table of Comorbidities by Age Group")
Figure 1a: Table of Comorbidities by Age Group
Both Diabetes Hypertension None
Young(0-25) 254 561 584 1320
Middle Age(26-50) 262 575 564 1398
Senior(51-75) 277 551 538 1427
Elder(76-100) 193 342 332 822
hold<- table(asthma_clean$age_groups,asthma_clean$BMI_groups)


knitr::kable(hold, caption = " Figure 1b: Table of BMI and Age as Categorical Variables")
Figure 1b: Table of BMI and Age as Categorical Variables
Normal Obese Overweight Underweight
Young(0-25) 1090 450 932 247
Middle Age(26-50) 1118 442 991 248
Senior(51-75) 1144 431 948 270
Elder(76-100) 683 269 579 158
pB <- asthma_clean |>
  filter(!is.na(Has_Asthma)) |>
  ggplot(aes(x = Gender, fill = Comorbidities)) +
  geom_bar(position = "dodge")+
  facet_grid(~ Has_Asthma)+
  labs(x= "Gender",
       title = "Figure 1c: Comorbidities by Gender",
       subtitle = "Faceted by Asthma")

pB

For males without asthma, hypertension was reported more than diabetes while it was reported less than diabetes in males with asthma.

asthma_clean|>
  ggplot(aes(x=BMI_groups, fill = age_groups))+
  geom_bar(position = "fill")+
  scale_y_continuous(
      name = "Percentage",
      labels = label_percent()
  )+
  facet_wrap(~Has_Asthma)+
  labs(x= "BMI Levels",
       title = "Figure 2: Faceted Bar Charts of BMI and Age Groups",
       subtitle = "Faceted by (Has_Asthma)")

From Figure 2, we see individuals with asthma in the 26–50 age group seem to have a higher proportion of obesity when compared to individuals without asthma in the same age range while individuals with wihtout asthma in the 25 & under age range seem to have a higher proportion of obesity when compared to individuals with asthma in the same age range.

Age_distribution_by_group <- ggplot(asthma_clean, aes(x = age_groups))+
   geom_bar()+
   labs(title = "Figure 3b: Bar chart of Age 
        Group Distribution",
        ) +
    theme_minimal()
 
 Age_distribution <- ggplot(asthma_clean, aes(x = Age,color = Gender))+
   geom_histogram()+
   labs(title = "Figure 3a: Bar chart of Age Distribution 
        by Gender",
   ) +
   theme_minimal()
 
 ## combined chart to show how things differ
 Age_distribution + Age_distribution_by_group
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Figure 3a & 3b show the distribution of age. Age has a pretty normal distribution with a slight spike of individuals in their 30’s and least amount of members in the group 76 +.

age_asthma <- asthma_clean |>
  filter(Has_Asthma == "1") |>
  group_by(age_groups) |>
  summarize(count = n()) |>
  mutate(percentage = round(100 * count / sum(count), 1),
         label = print(paste(percentage, "%")))
[1] "27.2 %" "28.6 %" "27.9 %" "16.3 %"
#This code filtered the 'Has_Asthma' variable to only contain the recordings for '1' (does have asthma) and stored it as a new data set called 'age_asthma'
#once filtered, the data was then grouped by the variable 'age_groups' and the percentage for each level was determined 

# Asthma by Age-Group pie chart
age_group_pie <- ggplot(age_asthma, aes(x = "", y = count, fill = age_groups)) +
  geom_bar(stat = "identity", width = 1) +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5), size = 4) +
  coord_polar("y") +
  theme_void() +
  ggtitle("Figure 4a: Asthma Percentage Based on Age 
          Group") +
  labs(fill = "Age Group") +
  scale_fill_manual(values = c("lightblue", "palegreen", "pink", "purple"))
#here a pie chart was constructed to show the percentage of asthma for each age group only using those who recorded having asthma 

# Asthma by Gender Pie Chart
gender_asthma <- asthma_clean |>
  filter(Has_Asthma == "1") |>
  group_by(Gender) |>
  summarize(count = n()) |>
  mutate(percentage = round(100 * count / sum(count), 1),
         label = print(paste(percentage, "%")))
[1] "48.2 %" "47.9 %" "3.9 %" 
# created the data set gender_asthma which filtered the 'Has_Asthma' variable and grouped it by 'Gender. Returning a percent for each category

gender_asthma_pie <- ggplot(gender_asthma, aes(x = "", y = count, fill = Gender)) +
  geom_bar(stat = "identity", width = 1) +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5), size = 4) +
  coord_polar("y") +
  theme_void() +
  ggtitle("     Figure 4b: Gender Asthma
          Percentage") +
  labs(fill = "Gender") +
  scale_fill_manual(values = c("lightblue", "pink","palegreen"))

#This pie chart shows the percentage of people from each gender that recorded having asthma (only from those with asthma)
age_group_pie + gender_asthma_pie

Figure 4a provides percentages for each age group category. Taking only from individuals who reported having asthma to determine which age group of individuals reported having Asthma the most. Figure 4b Shows, of those reporting “has” for their asthma status, the percentage for each gender category with those who did not disclose reporting as ‘other’.

ggplot(asthma_clean, aes(x = Has_Asthma, fill = Gender))+
  geom_bar()+
  scale_x_discrete(name = "Asthma Presence",
                   breaks = c(0,1),
                   labels = c('No Asthma', "Has Asthma"))+
  annotate(geom = "label", x=1.5, y=4100,
           label = '24.33% Have Asthma', hjust = "center",
           vjust = "bottom",
           color = "red")+
  annotate(
    geom = "segment", x= 1.5, y=4100,
    xend = 1.1, yend = 2500,
    color = "blue",
    arrow = arrow(type = "closed"))+

  annotate(geom = "label", x=.75, y=5500,
           label = '75.67% Do Not Have Asthma', hjust = "left",
           color = "red")+
  annotate(
    geom = "segment", x= .75, y=5500,
    xend = 0.5, yend = 5000,
    color = "blue",
    arrow = arrow(type = "closed"))+
  labs(title = "Figure 5: Bar chart For Presence of Asthma by Gender")

Figure 5 shows the count of individuals with and without asthma by Gender. Both groups there are an approximately equal numbers of males and females at 48%

#histogram for occ type in BMI
ggplot(asthma_clean,aes(x=BMI_groups, fill = as.character(Has_Asthma)))+
  facet_wrap(~Occupation_Type)+geom_bar()+
  scale_fill_discrete(labels = c('Does Not Have Asthma', "Has Asthma"))+
  labs(title = "Figure 6: Faceted Histogram for BMI by Asthma Presence and Occupation Location", fill = "Asthma Presence")

Figure 6 is a Histogram that compares the BMI categories of individuals with and without asthma based on whether they worked indoors or outdoors. BMI does not show any correlation with asthma presence in this visualization.

Interactive Elements

plot_ly(
  data= asthma_clean,
  x=~BMI,
  type = "box"
)%>%
  layout(
    xaxis = list(title = "BMI")
  )
plot_ly(
  data= asthma_clean,
  x=~BMI,
  type = "histogram"
)%>%
  layout(
    xaxis = list(title = "BMI"),
    yaxis = list(title = "Number of People")
  )

The interactive Box Plot and Histogram pictured above show the BMI levels collected from the individuals used in the study. In the Box Plot one can see that the average recorded BMI was 25 with a minimum of 15 and max of 45.

#density ridges plot
library(ggridges)
ggplot(asthma_clean, aes(x=BMI, y=as.character(Has_Asthma),
                   fill = Occupation_Type,
                   color = as.character(Has_Asthma)))+
  geom_density_ridges(alpha = .4,show.legend = FALSE)+
  labs(title = "Figure 7: Density Plot of BMI Distribution by occupation and asthma status"
      )
Picking joint bandwidth of 1

In Figure 7 we used a density graph to analyse the distribution of BMI by occupation (indoor or outdoor) and asthma status.

p1 <- ggplot(asthma_clean, aes(x = Medication_Adherence, color = Gender)) +
   geom_histogram() +
   facet_wrap( ~ Physical_Activity_Level) +
   labs(title = "Figure 8a: Stacked bar chart of Medication Adherance by Activity Level
        and Gender",
        subtitle = "Individuals with Asthma"
        )

 
 ##Is there a difference in medication adherence between genders
p2 <- ggplot(asthma_clean, aes(x = Medication_Adherence, fill = Gender)) +
   geom_density( alpha = .4)+
   
   labs(title = "Figure 8b: Histogram of Medication Adherance
                  by Gender",
        subtitle = "Individuals with Asthma"
        )

p1/p2
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

In figure 8a we used stacked histograms to show the distribution of medication adherence for individuals with asthma, faceted by physical activity level (Active, Moderate, Sedentary) and colored by gender. Underneath in figure 8b, we used a density plot to represent the medication adherence for individuals with asthma, faceted by gender. Based on figures 8a & 8b we concluded that medication adherence had no significant correlation with gender or physical activity level.

asthma_clean2 <- asthma_clean |>
  mutate(med_adherence = cut(asthma_clean$Medication_Adherence, 
                             breaks = c(0,0.25, 0.5,0.75, 1),
                             labels = c("Low (0% - 25%) ", "Below 50% (26% - 50%)", "Above 50% (51% - 75%)", "High (76%-100%)"),
                             right = FALSE)) |>
  select(med_adherence, Asthma_Control_Level, Gender, Occupation_Type, Has_Asthma, Physical_Activity_Level, BMI, Medication_Adherence,age_groups, Age)
# This code created a new data set that broke 'Medical_Adherence_Level' into subgroupings under the variable 'med_adherence'


female_only_ad <- asthma_clean2 |>
  filter(Gender == "Female") |>
  group_by(med_adherence) |>
  summarize(count = n()) |>
  mutate(percentage = round(100 * count / sum(count), 1),
         label = print(paste(percentage, "%")))
[1] "15.1 %" "34.5 %" "34.2 %" "16.1 %"
#here a data set called 'female_only_ad was created. This data set filtered 'Gender' to 'Female' and grouped the female observations by med_adherence returning percentages for each category

female_adherence <- ggplot(female_only_ad, aes(x = "", y = count, fill = med_adherence)) +
  geom_bar(stat = "identity", width = 1) +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5), size = 2) +
  coord_polar("y") +
  theme_void() +
  ggtitle("Figure 9a: Female Medication
          Adherence") +
  labs(fill = "Adherence") +
  scale_fill_manual(values = c("lightblue", "palegreen", "pink", "purple"))
#This code created a pie chart showing the medication adherence for the female population


male_only_ad <- asthma_clean2 |>
  filter(Gender == "Male") |>
  group_by(med_adherence) |>
  summarize(count = n()) |>
  mutate(percentage = round(100 * count / sum(count), 1),
         label = print(paste(percentage, "%")))
[1] "15.5 %" "33.8 %" "34 %"   "16.7 %"
# Male Pie Chart
male_adherence <- ggplot(male_only_ad, aes(x = "", y = count, fill = med_adherence)) +
  geom_bar(stat = "identity", width = 1) +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5), size = 2) +
  coord_polar("y") +
  theme_void() +
  ggtitle("Figure 9b: Male Medication
          Adherence") +
  labs(fill = "Adherence") +
  scale_fill_manual(values = c("lightblue", "palegreen", "pink", "purple"))
#This code created a pie chart showing the medication adherence for the male population

female_adherence+male_adherence

Figures 9a and 9b show how well each gender adheres to asthma medication. There is not a visible significant difference between the medication adherence for each gender.

# Pie Charts of Asthma Control by Gender
asthma_filtered<- asthma |>
  filter(!is.na(Gender),!is.na(Asthma_Control_Level)) |>
  select(Age, Asthma_Control_Level, Gender, Occupation_Type, Has_Asthma, Physical_Activity_Level, BMI, Medication_Adherence)

asthma_filtered$gender_group <-asthma$Gender


female_only <- asthma_filtered |>
  filter(Gender == "Female") |>
  group_by(Asthma_Control_Level) |>
  summarize(count = n()) |>
  mutate(percentage = round(100 * count / sum(count), 1),
         label = print(paste(percentage, "%")))
[1] "75.6 %" "12 %"   "11.6 %" "0.7 %" 
#This code filtered the 'Gender' variable to only contain the recordings for 'Female' and stored it as a new data set called 'female_only'
#once filtered, the data was then grouped by asthma control level and the percentage for each level was determined 



#Female pie chart
female_control <- ggplot(female_only, aes(x = "", y = count, fill = Asthma_Control_Level)) +
  geom_bar(stat = "identity", width = 1) +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5), size = 2) +
  coord_polar("y") +
  theme_void() +
  ggtitle("Figure 10a: Female Asthma Control
          Levels") +
  labs(fill = "Asthma Control") +
  scale_fill_manual(values = c("lightblue", "palegreen", "pink", "purple"))
#This code used the 'female_only' data set to create a pie chart of the percentage of females surveyed that had no asthma control, poorly controlled asthma, well controlled asthma, and those that did not provide the information 


male_only <- asthma_filtered |>
  filter(Gender == "Male") |>
  group_by(Asthma_Control_Level) |>
  summarize(count = n()) |>
  mutate(percentage = round(100 * count / sum(count), 1),
         label = print(paste(percentage, "%")))
[1] "75.6 %" "12.7 %" "10.7 %" "0.9 %" 
#This code filtered the 'Gender' variable to only contain the recordings for 'Male' and stored it as a new data set called 'male_only'
#once filtered, the data was then grouped by asthma control level and the percentage for each level was determined 


# Male Pie Chart
male_control <- ggplot(male_only, aes(x = "", y = count, fill = Asthma_Control_Level)) +
  geom_bar(stat = "identity", width = 1) +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5), size = 2) +
  coord_polar("y") +
  theme_void() +
  ggtitle("Figure 10b: Male Asthma Control
          Levels") +
  labs(fill = "Asthma Control") +
  scale_fill_manual(values = c("lightblue", "palegreen", "pink", "purple"))
#This code used the 'male_only' data set to create a pie chart of the percentage of males surveyed that had no asthma control, poorly controlled asthma, well controlled asthma, and those that did not provide the information 

female_control+male_control

#This line created side-by-side pie charts for each gender

Figures 10a and 10b above show how well each gender is able to control their asthma. The percentages show that their is not a significant coorelation between Gender and Asthma Control.

Results

  • Medication adherence had no significant correlation with gender or physical activity level.

  • BMI does not show any correlation with asthma presence.

  • Asthma Risk does not have a significant correlation with Age.

  • Asthma Control does not show any correlation with Gender.

Conclusion

This project analyzed factors associated with asthma prevalence. Of the ten variables analysed, no strong correlations with asthma presence were present for most demographic and lifestyle factors, including Gender, Occupation, and Physical Activity Level.

While BMI appears to have a weak relationship with asthma suggesting a need for further analysis with different variables. The density plot used for Figure 7 showed a slight shift toward higher BMI values for individuals with asthma when observing by occupation type (indoor or outdoor). However, the overlap between groups indicates that BMI is not a strong predictor of asthma presence in this dataset. Likewise, Medication Adherence did not show a significant variation across Gender, Physical Activity Levels, or Age groupings to suggest correlation between the variables.

In conclusion, the absence of evidence supporting strong correlations between the observed variables does not imply that asthma is random but rather it highlights the need for additional variables like individual triggers/allergies and family history in future analysis.

library(leaflet)
leaflet() |>
  addTiles() |>
  setView(lat =34.038, lng = -84.583, zoom = 16) |>
  addMarkers(lat = 34.038, lng = -84.583, popup = "Kennesaw State University - Marietta Campus,GA")

Contact Information

  • xsouder@students.kennesaw.edu

  • dclar175@students.kennesaw.edu

  • khouse15@students.kennesaw.edu