## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.2
## Warning: package 'tidyverse' was built under R version 3.6.3
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'corrplot' was built under R version 3.6.3
## Warning: package 'GGally' was built under R version 3.6.3
I removed anyone who did not complete the survey, had missing values on selected variables of interest, and removed people living in non-states (e.g., Guam). I renamed this data frame as “Health.Test.” Next, I selected only variables used in my analyses: poorhlth, sleptim1, pa1min_, _incomg, and state. Finally, I aggregated these data so that I had mean values for poorhlth, sleptim1, pa1min at the state (_state) and incomes levels (_incomg). This method allowed me to reduce the original dataset from 491,775 observations to 250 observations to test for relationships among the selected variables.
load("brfss2013.RData")
# Filter values
# Kept only people who complete survey
# Removed blanks (NA) on selected variables
# Removed non-states
Health.Test<-filter(brfss2013, dispcode == "Completed interview"
& poorhlth!= "NA" & X_incomg!= "NA"
& X_state!="NA"
& X_state!="District of Columbia"
& X_state!="Guam"
& X_state!="Puerto Rico")
# Included only variables of interest and renamed variables
Health.Test <- tibble('poor.health'=Health.Test$poorhlth,
'sleep'=Health.Test$sleptim1,
'exercise'=Health.Test$pa1min_,
'income'=Health.Test$X_incomg,
'State'=Health.Test$X_state)
# Aggregated data by state and income levels
Health.Test.agg_mean = aggregate(Health.Test[,1:3],by=list(income =Health.Test$income, state =Health.Test$State),FUN=mean, na.rm=TRUE)The sampling used in this investigation is not random. The survey consisted only of people who participated in this study, which excludes the people who declined to participate in this survey. The analyses I conducted examined the relationship among a select group of variables. Although these relationships are informative, it cannot test directionality (meaning, one cannot say there is a causal link between the variables) and excludes the possibility that other variables not selected in my analyses (e.g., race and gender) influence the selected dependent variable. * * *
Research question 1: The first research question I asked is if there is a negative relationship between the amount of sleep and the number of days physical/mental health affected everyday activities. This research question is informative because often people may undermine the health benefits of getting enough restful sleep. Researchers have reported sleep is essential for good physical health (e.g., weight loss) and mental health (e.g., memory consolidation). Thus, the more people sleep each night, the less physical/mental health problems.
Research question 2: The second research question I asked is if there is a negative relationship between the amount of physical activity and the number of days physical/mental health affected everyday activities. This research question is informative because many people often believe they need to be more physically active to improve their overall health, but physical activity has received mixed results, with some researchers showing enhanced overall health and others showing no relationship.
Research question 3: The third research question I asked was to test if income, sleep, and the amount of physical activity can be combined to predict physical/mental health. The previous two research questions only examined correlations between two variables. This analysis uses a multiple regression analysis. The dependent variable is the number of days physical or mental health prevented people from doing their usual activities, and the independent variables are income, sleep, and physical activity. The multiple regression will test if there is a significant linear trend when all these selected variables are entered in the model. The new variable tested in this analysis is income. I hypothesized people who have more financial resources would have better physical/mental health. This information is informative. If people are interested in improving physical/mental health for all Americans, it is essential to consider how income affects physical/mental health.
NOTE: Insert code chunks as needed by clicking on the “Insert a new code chunk” button (green button with orange arrow) above. Make sure that your code is visible in the project you submit. Delete this note when before you submit your work.
Research question 1: To test the first research question, I ran a Pearson correlation between the amount of sleep and physical/mental health. Results of this correlation indicated a significant negative association between the number of days physical or mental health prevented usual activities and the number hours a person slept, r(248) = -.55, p < .001. The more people sleep each night, the less amount of days in which their physical/mental prevented their them from doing their everyday activities.
# Scatter Plot
ggplot(Health.Test.agg_mean,
aes(sleep, poor.health))+
geom_point(alpha =.40) +
geom_smooth(method = lm)+
xlab("# Hours Asleep")+
ylab("Days Physical/Mental Health Prevented Activities") +
theme(legend.position = "bottom")+
theme_bw()## `geom_smooth()` using formula 'y ~ x'
##
## Pearson's product-moment correlation
##
## data: sleep and poor.health
## t = -10.459, df = 248, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.6338011 -0.4607988
## sample estimates:
## cor
## -0.5532372
Research question 2: To test second question, I ran a Pearson correlation between minutes of physical activity and physical/mental health. Surprisingly, this correlation is in the opposite direction. There was a significant positive association between the number of days physical or mental health prevented usual activities and minutes of total physical activity per week, r(248) = .22, p < .001. These data suggest being more physically active increases the number of days their physical/mental health prevented them from doing their everyday activities.
# Scatter Plot
ggplot(Health.Test.agg_mean,
aes(exercise, poor.health))+
geom_point(alpha =.40) +
geom_smooth(method = lm)+
xlab("Minutes Of Total Physical Activity Per Week")+
ylab("Days Physical/Mental Health Prevented Activities") +
theme(legend.position = "bottom")+
theme_bw()## `geom_smooth()` using formula 'y ~ x'
##
## Pearson's product-moment correlation
##
## data: exercise and poor.health
## t = 3.5507, df = 248, p-value = 0.0004596
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.09857212 0.33487796
## sample estimates:
## cor
## 0.2199493
Research question 3: Results of the multiple linear regression indicated that there was a collective significant effect, F(6, 243) = 495.6, p < .001, R2 = .92). The individual predictors indicated that income and sleep were significant predictors, ps <.001, while physical activity did not reach statistical significance, p = .12. Moreover, these data suggest the strongest predictor of a person’s physical/mental health is their income. According to the linear equation generated from this multiple regression, if people’s income is $50,000 or more, they will have 6.6 days less when their physical or mental health prevented them for doing their everyday activities than people who make less than $15,000 (the baseline comparison group in this model).
# Scatter Plot
ggplot(Health.Test.agg_mean,
aes(sleep, poor.health,
color = income, size = exercise))+
scale_color_manual(values=c("red4","red","orange","skyblue","blue"))+
geom_point()+
xlab("# Hours Asleep")+
ylab("Days Physical/Mental Health Prevented Activities") +
theme(legend.position = "bottom")+
theme_bw()# Boxplot
ggplot(Health.Test.agg_mean, aes(income,poor.health, fill = income))+
geom_boxplot()+
scale_fill_manual(values=c("red4","red","orange","skyblue","blue"))+
xlab("Income")+
ylab("Days Physical/Mental Health Prevented Activities") +
theme_bw()## Multiple Regression
MultipleReg<-lm(poor.health ~income+ sleep+exercise, data = Health.Test.agg_mean)
MultipleReg %>% summary()##
## Call:
## lm(formula = poor.health ~ income + sleep + exercise, data = Health.Test.agg_mean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.52468 -0.42979 -0.02853 0.38220 2.85483
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.0125203 2.2274577 9.882 < 2e-16
## income$15,000 to less than $25,000 -2.8254676 0.1485639 -19.019 < 2e-16
## income$25,000 to less than $35,000 -4.4922941 0.1582612 -28.385 < 2e-16
## income$35,000 to less than $50,000 -5.4071757 0.1566605 -34.515 < 2e-16
## income$50,000 or more -6.6072089 0.1654191 -39.942 < 2e-16
## sleep -1.7196834 0.3314312 -5.189 4.47e-07
## exercise -0.0011088 0.0007074 -1.567 0.118
##
## (Intercept) ***
## income$15,000 to less than $25,000 ***
## income$25,000 to less than $35,000 ***
## income$35,000 to less than $50,000 ***
## income$50,000 or more ***
## sleep ***
## exercise
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7017 on 243 degrees of freedom
## Multiple R-squared: 0.9245, Adjusted R-squared: 0.9226
## F-statistic: 495.6 on 6 and 243 DF, p-value: < 2.2e-16