The data structure (fev_dataset) is a data frame containing 654 observations (rows) and 5 variables (columns). All 5 variables are numeric and includes age measured in years, FEV (Forced Expiratory Volume measured in litres), height (height measured in inches), sex, and smoke. The sex variable is a binary variable that seperates into 2 genders with the following observations, 0 meaning female and 1 meaning male. The smoke variable is also a binary variable where 1 indicating that the child has someone who smokes in their household, whereas 0 indicating that the child has no smokers in their household.
'data.frame': 654 obs. of 5 variables:
$ age : int 9 8 7 9 9 8 6 6 8 9 ...
$ FEV : num 1.71 1.72 1.72 1.56 1.9 ...
$ ht : num 57 67.5 54.5 53 57 61 58 56 58.5 60 ...
$ sex : int 0 0 0 1 1 0 0 0 0 0 ...
$ smoke: int 0 0 0 0 0 0 0 0 0 0 ...
hist(fev_dataset$age,xlab ="Age (years)",ylab ="FEV (litres)",main ="Participants with Asthma in varied ages vs FEV")
From analyzing the histogram, it is clear that the Forced Expiratory Volume measured in litres is more seen as children are between the ages of 5 to 10.
boxplot(fev_dataset$ht,xlab ="Height (inches)",ylab ="FEV (litres)",main ="Children with asthma with different heights vs FEV")
As seen in the box plot above, it is symmetric as the median line is at the center of the box. This indicates that the heights and FEV for children with asthma is distributed evenly.
mosaicplot(~sex + smoke, data = (fev_dataset),main ="Gender vs Smoke Exposure of children")
The mosaic plot shows the distribution of males and females and their exposure to smokers. From visualizing the mosaic plot, it is evident that the males tend to have noe exposure to smokers in the household rather than females do.
Question 2
For the second question, I will be doing scatterplots for the following variables, the age and the height of the children as the x-axis.
age <- (fev_dataset$age)FEV <- (fev_dataset$fev)# | Label : line_of_best_fitplot(age ~ FEV, data = fev_dataset,main ="FEV against Age of Children with Asthma",xlab ="Age (years)",ylab ="FEV (litres)",pch =1 )my_model <-lm(fev_dataset$age ~ fev_dataset$FEV,data = fev_dataset)abline(my_model, col ="purple", lty =1,)
The scatter plot above demonstrates that the relationship is positive. It is deemed as positive because as the age of children increases, the FEV variable also increases. This suggests that the health of the lungs improve as the children get older.
ht <- (fev_dataset$ht)FEV <- (fev_dataset$FEV)# | Label : line_of_best_fitplot(ht ~ FEV, data = fev_dataset,main ="FEV against Height of Children with Asthma",xlab ="Height (inches)",ylab ="feV (litres)",pch =1)my_model <-lm(fev_dataset$ht ~ fev_dataset$FEV,data = fev_dataset)abline(my_model, col ="purple", lty =1)
As seen in the scatter plot above, the relationship between both variables is deemed is positive. As the height of the children increases, the FEV also increases. This indicates that the participants who are taller have healthier lungs.