Crows: Danielle Clarke, Artemas Souder, Kensley House
Introduction
Knowing the factors that influence asthma can be integrally in determining what factors might influence asthma severity. This data set provides plenty of general information in the healthcare domain, to explore this topic.
Research Questions
What factors are associated with the presence of Asthma?
How does BMI interact with Asthma Risk and Asthma Severity?
Statement of Purpose
The purpose of this research is to gain a better understanding of what influences Asthma Risk and Severity. Our aim is provide adequate analysis, and to understand how certain personal factors might prove important for people with Asthma.
Warning: package 'readxl' was built under R version 4.4.3
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.4.3
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.4.2
Warning: package 'tibble' was built under R version 4.4.1
Warning: package 'tidyr' was built under R version 4.4.1
Warning: package 'readr' was built under R version 4.4.2
Warning: package 'purrr' was built under R version 4.4.2
Warning: package 'dplyr' was built under R version 4.4.2
Warning: package 'stringr' was built under R version 4.4.1
Warning: package 'forcats' was built under R version 4.4.1
Warning: package 'lubridate' was built under R version 4.4.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ lubridate 1.9.4 ✔ tibble 3.2.1
✔ purrr 1.0.4 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(scales)
Warning: package 'scales' was built under R version 4.4.3
Attaching package: 'scales'
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
library(ggrepel)
Warning: package 'ggrepel' was built under R version 4.4.2
library(patchwork)
Warning: package 'patchwork' was built under R version 4.4.3
library(gridExtra)
Warning: package 'gridExtra' was built under R version 4.4.2
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
library(gganimate)
Warning: package 'gganimate' was built under R version 4.4.3
library(plotly)
Warning: package 'plotly' was built under R version 4.4.2
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(knitr)
Warning: package 'knitr' was built under R version 4.4.2
Warning in attr(x, "align"): 'xfun::attr()' is deprecated.
Use 'xfun::attr2()' instead.
See help("Deprecated")
Warning in attr(x, "format"): 'xfun::attr()' is deprecated.
Use 'xfun::attr2()' instead.
See help("Deprecated")
Variable Names in Asthma Risk Dataset
Variable_Names
Patient_ID
Age
Gender
BMI
Smoking_Status
Family_History
Allergies
Air_Pollution_Level
Physical_Activity_Level
Occupation_Type
Comorbidities
Medication_Adherence
Number_of_ER_Visits
Peak_Expiratory_Flow
FeNO_Level
Has_Asthma
Asthma_Control_Level
Of the seventeen variables present in the data set, nine will be kept for further analysis. BMI and Age will be turned into categorical groups for general comparison purposes based on broad categories affecting Asthma Levels.
asthma <- asthma|>mutate( Age_groups =case_when( Age <26~" 25 & Under", Age >=26& Age <51~"26 - 50", Age >=51& Age <76~"51 - 75", Age >=76~"76+"), BMI_groups =case_when( BMI <18.5~"Underweight", BMI <=24.9~"Normal", BMI <=29.9~"Overweight", BMI >29.9~"Obese"),asthma_y_n =case_when( Has_Asthma ==0~"No Asthma", Has_Asthma ==1~"Has Asthma") )|>select(Age, BMI, Gender, Allergies, Physical_Activity_Level, Occupation_Type, Comorbidities, Has_Asthma, Age_groups, BMI_groups, Medication_Adherence)asthma_clean <- asthma
The table below is interactive allowing you to explore the data set through the use of the search box.
Within the sample, 7,567 participants do not have asthma while 2,433 participants do. The Mean BMI for individuals with without asthma is 24.8 while those with asthma have a mean BMI of 25.9. For those with asthma the mean age of participants is 44.7 while it is 45.0 for their counterparts without asthma.
Within the sample, most participants (4,967) do not have a co-morbidity. Nearly the same amount of participants presented with one co-morbidity (either diabetes or hypertension) while less than 1,000 participants have both diabetes and hypertension.
We began the analysis by looking at table below which highlights the differences for BMI groups when also looking at Age groups (figure 1). Next we faceted bar charts of the variables to visualize the numbers as a percentage of their group (figure 2).
hold<-table(asthma_clean$Age_groups,asthma_clean$Comorbidities)knitr::kable(hold, caption =" Figure 1a:Table of Comorbidities by Age Group")
Figure 1a:Table of Comorbidities by Age Group
Both
Diabetes
Hypertension
None
25 & Under
265
583
612
1387
26 - 50
261
575
551
1380
51 - 75
281
547
552
1438
76+
179
324
303
762
hold<-table(asthma_clean$Age_groups,asthma_clean$BMI_groups)knitr::kable(hold, caption =" Figure 1b:Table of BMI and Age as categorical variables")
Figure 1b:Table of BMI and Age as categorical variables
Normal
Obese
Overweight
Underweight
25 & Under
1133
467
990
257
26 - 50
1112
441
969
245
51 - 75
1158
443
945
272
76+
632
241
546
149
For males without asthma, hypertension was reported more than diabetes while it was reported less than diabetes in males with asthma.
#| message: false#| warning: false#| paged-print: falseasthma_clean|>ggplot(aes(x=BMI_groups, fill = Age_groups))+geom_bar(position ="fill")+scale_y_continuous(name ="Percentage",labels =label_percent() )+facet_wrap(~Has_Asthma)+labs(x="BMI Levels",title ="Figure 2a: Faceted Bar charts of BMI and Age Groups",subtitle ="Faceted by (Has_Asthma)")
From Figure 2a, we see individuals with asthma in the 26–50 age group seem to have a higher proportion of obesity when compared to individuals without asthma in the same age range while individuals with without asthma in the 25 & under age range seem to have a higher proportion of obesity when compared to individuals with asthma in the same age range. In figure 2b the outlines for BMI are shown in a boxplot.
#| message: false#| warning: falseAge_distribution_by_group <-ggplot(asthma_clean, aes(x = Age_groups))+geom_bar()+labs(title ="Figure 3b: Bar chart of Age Group Distribution", ) +theme_minimal() Age_distribution <-ggplot(asthma_clean, aes(x = Age,color = Gender))+geom_histogram()+labs(title ="Figure 3a: Bar chart of Age Distribution by gender", ) +theme_minimal()## combined chart to show how things differ Age_distribution + Age_distribution_by_group
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Figure 3a & 3b show the distribution of age. Age has a pretty normal distribution with a slight spike of individuals in their 30’s and least amount of members in the group 76 +.
#| message: false#| warning: falseggplot(asthma_clean, aes(x = Has_Asthma, fill = Gender))+geom_bar()+scale_x_discrete(name ="Asthma Presence",breaks =c(0,1),labels =c('No Asthma', "Has Asthma"))+annotate(geom ="label", x=1.5, y=4100,label ='24.33% Have Asthma', hjust ="center",vjust ="bottom",color ="red")+annotate(geom ="segment", x=1.5, y=4100,xend =1.1, yend =2500,color ="blue",arrow =arrow(type ="closed"))+annotate(geom ="label", x=.75, y=5500,label ='75.67% Do Not Have Asthma', hjust ="left",color ="red")+annotate(geom ="segment", x= .75, y=5500,xend =0.5, yend =5000,color ="blue",arrow =arrow(type ="closed"))+labs(title ="Figure 4: Bar chart For Presence of Asthma by Gender")
Figure 4 shows the count of individuals with and without asthma by Gender. Both groups there are an approximately equal numbers of males and females at 48%
#| message: false#| warning: false#histogram for occ type in BMIggplot(asthma_clean,aes(x=BMI_groups, fill =as.character(Has_Asthma)))+facet_wrap(~Occupation_Type)+geom_bar()+scale_fill_discrete(labels =c('Does Not Have Asthma', "Has Asthma"))+labs(title ="Figure 5: Faceted Histogram for BMI by Asthma Presence and Occupation Location", fill ="Asthma Presence")
In the following histogram (figure 5) we compared BMI categories of individuals with and without asthma based on whether they worked indoors or outdoors. BMI does not show any correlation with asthma presence in this visualization.
Warning: package 'ggridges' was built under R version 4.4.3
ggplot(asthma_clean, aes(x=BMI, y=as.character(Has_Asthma),fill = Occupation_Type,color =as.character(Has_Asthma)))+geom_density_ridges(alpha = .4,show.legend =FALSE)+labs(title ="Figure 6: Density Plot of BMI Distribution by occupation and asthma status")
Picking joint bandwidth of 1
In figure 6 we used a density graph to analyse the distribution of BMI by occupation (indoor or outdoor) and asthma status.
#| message: false #| warning: false p1 <-ggplot(asthma_clean, aes(x = Medication_Adherence, color = Gender)) +geom_histogram() +facet_wrap( ~ Physical_Activity_Level) +labs(title ="Figure 7a: Stacked bar chart of Medication Adherance by Activity Level and Gender", subtitle ="Individuals with Asthma") p2 <-ggplot(asthma_clean, aes(x = Medication_Adherence, fill = Gender)) +geom_density( alpha = .4)+labs(title ="Figure 7b: Histogram of Medication Adherance by Gender", subtitle ="Individuals with Asthma") p1/p2
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
In figure 7a we used stacked histograms to show the distribution of medication adherence for individuals with asthma, faceted by physical activity level (Active, Moderate, Sedentary) and colored by gender. Underneath in figure 7b, we used a density plot to represent the medication adherence for individuals with asthma, faceted by gender. Based on figures 7a & 7b we concluded that medication adherence had no significant correlation with gender or physical activity level.
# Asthma by Age-Group pie chartage_group_pie <-ggplot(age_asthma, aes(x ="", y = count, fill = age_groups)) +geom_bar(stat ="identity", width =1) +geom_text(aes(label = label), position =position_stack(vjust =0.5), size =5) +coord_polar("y") +theme_void() +ggtitle("Figure 8: Asthma Percentage Based on Age Group") +labs(fill ="Age Group") +scale_fill_manual(values =c("lightblue", "palegreen", "pink", "purple"))age_group_pie
Here we created another data set called ‘asthma_filtered2’ this data set created the age_group variable by catagorised our original variable ‘Age’ into 4 groupings: young, middle-aged, senior, and elder. This code filtered the ‘Has_Asthma’ variable to only contain the recordings for ‘1’ (does have asthma) and stored it as a new data set called ‘age_asthma’. Once filtered, the data was then grouped by the variable ‘age_groups’ and the percentage for each level was determined. A pie chart was constructed to show the percentage of asthma for each age group only using those who recorded having asthma.
# created the data set gender_asthma which filtered the 'Has_Asthma' variable and grouped it by 'Gender. Returning a percent for each categorygender_asthma_pie <-ggplot(gender_asthma, aes(x ="", y = count, fill = Gender)) +geom_bar(stat ="identity", width =1) +geom_text(aes(label = label), position =position_stack(vjust =0.5), size =5) +coord_polar("y") +theme_void() +ggtitle("Figure 9: Gender Asthma Percentage") +labs(fill ="Gender") +scale_fill_manual(values =c("lightblue", "pink","palegreen"))gender_asthma_pie
This pie chart shows the percentage of people from each gender that recorded having asthma (only from those with asthma)
female_adherence <-ggplot(female_only_ad, aes(x ="", y = count, fill = med_adherence)) +geom_bar(stat ="identity", width =1) +geom_text(aes(label = label), position =position_stack(vjust =0.5), size =5) +coord_polar("y") +theme_void() +ggtitle("Figure 10a:Female Medication Adherence") +labs(fill ="Adherence") +scale_fill_manual(values =c("lightblue", "palegreen", "pink", "purple"))#This code created a pie chart showing the medication adherence for the female populationmale_only_ad <- asthma_filtered3 |>filter(Gender =="Male") |>group_by(med_adherence) |>summarize(count =n()) |>mutate(percentage =round(100* count /sum(count), 1),label =print(paste(percentage, "%")))
[1] "15.5 %" "33.8 %" "34 %" "16.7 %"
# Male Pie Chartmale_adherence <-ggplot(male_only_ad, aes(x ="", y = count, fill = med_adherence)) +geom_bar(stat ="identity", width =1) +geom_text(aes(label = label), position =position_stack(vjust =0.5), size =5) +coord_polar("y") +theme_void() +ggtitle("Figure 10b:Male Medication Adherence") +labs(fill ="Adherence") +scale_fill_manual(values =c("lightblue", "palegreen", "pink", "purple"))#This code created a pie chart showing the medication adherence for the male populationfemale_adherence+male_adherence
#| message: false#| warning: false# Asthma by Age-Group pie chartage_group_pie <-ggplot(age_asthma, aes(x ="", y = count, fill = age_groups)) +geom_bar(stat ="identity", width =1) +geom_text(aes(label = label), position =position_stack(vjust =0.5), size =5) +coord_polar("y") +theme_void() +ggtitle("Figure 11:Asthma Percentage Based on Age Group") +labs(fill ="Age Group") +scale_fill_manual(values =c("lightblue", "palegreen", "pink", "purple"))age_group_pie
Here a pie chart was constructed to show the percentage of asthma for each age group only using those who recorded having asthma.
This pie chart shows the percentage of people from each gender that recorded having asthma (only from those with asthma).
#| message: false#| warning: false# Medication Adherence for Gendersasthma_filtered3 <- asthma_filtered2 |>mutate(med_adherence =cut(asthma_filtered$Medication_Adherence, breaks =c(0,0.25, 0.5,0.75, 1),labels =c("Low (0% - 25%) ", "Below 50% (26% - 50%)", "Above 50% (51% - 75%)", "High (76%-100%)"),right =FALSE)) |>select(med_adherence, Gender, Occupation_Type, Has_Asthma, Physical_Activity_Level, BMI, Medication_Adherence,age_groups)# This code created a new data set that broke 'Medical_Adherence_Level' into subgroupings under the variable 'med_adherence'female_only_ad <- asthma_filtered3 |>filter(Gender =="Female") |>group_by(med_adherence) |>summarize(count =n()) |>mutate(percentage =round(100* count /sum(count), 1),label =print(paste(percentage, "%")))
[1] "15.1 %" "34.5 %" "34.2 %" "16.1 %"
#here a data set called 'female_only_ad was created. This data set filtered 'Gender' to 'Female' and grouped the female observations by med_adherence returning percentages for each categoryfemale_adherence <-ggplot(female_only_ad, aes(x ="", y = count, fill = med_adherence)) +geom_bar(stat ="identity", width =1) +geom_text(aes(label = label), position =position_stack(vjust =0.5), size =5) +coord_polar("y") +theme_void() +ggtitle("Figure 13a: Female Medication Adherence") +labs(fill ="Adherence") +scale_fill_manual(values =c("lightblue", "palegreen", "pink", "purple"))#This code created a pie chart showing the medication adherence for the female populationmale_only_ad <- asthma_filtered3 |>filter(Gender =="Male") |>group_by(med_adherence) |>summarize(count =n()) |>mutate(percentage =round(100* count /sum(count), 1),label =print(paste(percentage, "%")))
[1] "15.5 %" "33.8 %" "34 %" "16.7 %"
# Male Pie Chartmale_adherence <-ggplot(male_only_ad, aes(x ="", y = count, fill = med_adherence)) +geom_bar(stat ="identity", width =1) +geom_text(aes(label = label), position =position_stack(vjust =0.5), size =5) +coord_polar("y") +theme_void() +ggtitle("Figure 13b: Male Medication Adherence") +labs(fill ="Adherence") +scale_fill_manual(values =c("lightblue", "palegreen", "pink", "purple"))#This code created a pie chart showing the medication adherence for the male populationfemale_adherence+male_adherence
This code created a pie chart showing the medication adherence for the female population.
Results
Medication adherence had no significant correlation with gender or physical activity level
BMI does not show any correlation with asthma presence
Conclusion
This project analyzed factors associated with asthma prevalence. Of the eleven variables analysed, no strong correlations with asthma presence were present for most demographic and lifestyle factors, including gender, occupation, and physical activity level.
While BMI appears to have a weak relationship with asthma suggesting a need for further analysis with different variables. The density plot used for figure six showed a slight shift toward higher BMI values for individuals with asthma when observing by occupation type (indoor or outdoor). However, the overlap between groups indicates that BMI is not a strong predictor of asthma presence in this data set. Likewise, medication adherence did not show a significant variation across gender or physical activity levels to suggest correlation between the variables.
In conclusion, the absence of evidence supporting strong correlations between the observed variables does not imply that asthma is random but rather it highlights the need for additional variables like individual triggers/allergies and family history in future analysis.