if (!require(NHANES)) install.packages('NHANES')
## Loading required package: NHANES
library(NHANES)
data(NHANES)
d <- NHANES
if (!require(fastDummies)) install.packages('fastDummies')
## Loading required package: fastDummies
## Thank you for using fastDummies!
## To acknowledge our work, please cite the package:
## Kaplan, J. & Schlegel, B. (2023). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. Version 1.7.1. URL: https://github.com/jacobkap/fastDummies, https://jacobkap.github.io/fastDummies/.
library(fastDummies)
d <- dummy_cols(d,select_columns = c("PhysActive"))
?NHANES
2 continuous Variables- BMI,poverty 2 categorical variables - AgeDecade,Smoke100
What is the relationship between Physically Active and BMI,Age (decade), poverty, and smoking history in Americans during the years 2009–2012?
d <- d[c("PhysActive","BMI","Poverty","AgeDecade","Smoke100","PhysActive_Yes","PhysActive_No")]
dim(d)
## [1] 10000 7
d <- na.omit(d)
plot(d$BMI, d$PhysActive_Yes)
### Scatterplot of independent variable Poverty and dependent variable
PhysActive_Yes
plot(d$Poverty, d$PhysActive_Yes)
### Two way table for Physical Active Status and AgeDecade
with(d, table(AgeDecade, PhysActive_Yes))
## PhysActive_Yes
## AgeDecade 0 1
## 0-9 0 0
## 10-19 0 0
## 20-29 420 825
## 30-39 523 723
## 40-49 593 691
## 50-59 584 625
## 60-69 425 421
## 70+ 332 190
with(d, table(Smoke100, PhysActive_Yes))
## PhysActive_Yes
## Smoke100 0 1
## No 1457 2095
## Yes 1420 1380
logit1 <- glm(PhysActive_Yes ~ BMI+Poverty+AgeDecade+Smoke100, data = d, family = "binomial")
summary(logit1)
##
## Call:
## glm(formula = PhysActive_Yes ~ BMI + Poverty + AgeDecade + Smoke100,
## family = "binomial", data = d)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.282240 0.140525 9.125 < 2e-16 ***
## BMI -0.040636 0.004109 -9.890 < 2e-16 ***
## Poverty 0.287023 0.016794 17.090 < 2e-16 ***
## AgeDecade 30-39 -0.441473 0.087187 -5.063 4.12e-07 ***
## AgeDecade 40-49 -0.711900 0.086950 -8.187 2.67e-16 ***
## AgeDecade 50-59 -0.862360 0.089461 -9.640 < 2e-16 ***
## AgeDecade 60-69 -0.925904 0.097796 -9.468 < 2e-16 ***
## AgeDecade 70+ -1.379051 0.114297 -12.066 < 2e-16 ***
## Smoke100Yes -0.275341 0.053906 -5.108 3.26e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8749.4 on 6351 degrees of freedom
## Residual deviance: 8102.2 on 6343 degrees of freedom
## AIC: 8120.2
##
## Number of Fisher Scoring iterations: 4
exp(cbind(OR = coef(logit1), confint(logit1)))
## Waiting for profiling to be done...
## OR 2.5 % 97.5 %
## (Intercept) 3.6047069 2.7395816 4.7528065
## BMI 0.9601785 0.9524374 0.9679047
## Poverty 1.3324553 1.2894456 1.3771997
## AgeDecade 30-39 0.6430886 0.5419081 0.7627425
## AgeDecade 40-49 0.4907110 0.4136270 0.5816419
## AgeDecade 50-59 0.4221648 0.3540740 0.5028227
## AgeDecade 60-69 0.3961733 0.3268991 0.4796579
## AgeDecade 70+ 0.2518175 0.2010101 0.3146729
## Smoke100Yes 0.7593132 0.6831689 0.8439255
if (!require(jtools)) install.packages('jtools')
## Loading required package: jtools
library(jtools)
summ(logit1, exp = TRUE, confint = TRUE)
## MODEL INFO:
## Observations: 6352
## Dependent Variable: PhysActive_Yes
## Type: Generalized linear model
## Family: binomial
## Link function: logit
##
## MODEL FIT:
## χ²(8) = 647.18, p = 0.00
## Pseudo-R² (Cragg-Uhler) = 0.13
## Pseudo-R² (McFadden) = 0.07
## AIC = 8120.18, BIC = 8180.99
##
## Standard errors: MLE
## ----------------------------------------------------------------
## exp(Est.) 2.5% 97.5% z val. p
## --------------------- ----------- ------ ------- -------- ------
## (Intercept) 3.60 2.74 4.75 9.12 0.00
## BMI 0.96 0.95 0.97 -9.89 0.00
## Poverty 1.33 1.29 1.38 17.09 0.00
## AgeDecade 30-39 0.64 0.54 0.76 -5.06 0.00
## AgeDecade 40-49 0.49 0.41 0.58 -8.19 0.00
## AgeDecade 50-59 0.42 0.35 0.50 -9.64 0.00
## AgeDecade 60-69 0.40 0.33 0.48 -9.47 0.00
## AgeDecade 70+ 0.25 0.20 0.32 -12.07 0.00
## Smoke100Yes 0.76 0.68 0.84 -5.11 0.00
## ----------------------------------------------------------------
For every one-unit increase in BMI, the odds of being Physically Active are predicted to change on average by 0.96 times, controlling for all other independent variables. ### Poverty For every one-unit increase in Poverty, the odds of being Physically Active are predicted to change on average by 1.33 times, controlling for all other independent variables. ### AgeDecade For every one-unit increase in AgeDecade 30-39 compared to AgeDecade 20-29, the odds of diabetes are predicted to change on average by 0.64 times, controlling for all other independent variables. For every one-unit increase in AgeDecade 40-49 compared to AgeDecade 20-29, the odds of being Physically Active are predicted to change on average by 0.49 times, controlling for all other independent variables.
For every one-unit increase in AgeDecade 50-59 compared to AgeDecade 20-29, the odds of diabetes are predicted to change on average by 0.42 times, controlling for all other independent variables. For every one-unit increase in AgeDecade 60-69 compared to AgeDecade 20-29, the odds of diabetes are predicted to change on average by 0.40 times, controlling for all other independent variables. For every one-unit increase in AgeDecade 70+ compared to AgeDecade 20-29, the odds of diabetes are predicted to change on average by 0.25 times, controlling for all other independent variables. ### Smoke100 For every one-unit increase in individuals with a smoking history of atleast 100 cigarettes in their life as compared to no smoking history , the odds of being Physically Active are predicted to change on average by 0.76 times, controlling for all other independent variables.
We are 95% confident that in our population of interest, for every one-unit increase in BMI, the odds of being Physically Active are predicted to change by at least 0.95 times and at most 0.97 times, controlling for all other independent variables. ### Poverty We are 95% confident that in our population of interest, for every one-unit increase in Poverty, the odds of being Physically Active are predicted to change by at least 1.29 times and at most 1.38 times, controlling for all other independent variables. ### AgeDecade We are 95% confident that in our population of interest,for every one-unit increase in AgeDecade 30-39 compared to AgeDecade 20-29, the odds of being physically active are predicted to change by atleast 0.54 times and at most 0.76 times, controlling for all other independent variables.
We are 95% confident that in our population of interest,for every one-unit increase in AgeDecade 40-49 compared to AgeDecade 20-29, the odds of being physically active are predicted to change by atleast 0.41 times and at most 0.58 times, controlling for all other independent variables.
We are 95% confident that in our population of interest,for every one-unit increase in AgeDecade 50-59 compared to AgeDecade 20-29, the odds of being physically active are predicted to change by atleast 0.35 times and at most 0.50 times, controlling for all other independent variables.
We are 95% confident that in our population of interest,for every one-unit increase in AgeDecade 60-69 compared to AgeDecade 20-29, the odds of being physically active are predicted to change by atleast 0.33 times and at most 0.48 times, controlling for all other independent variables.
We are 95% confident that in our population of interest,for every one-unit increase in AgeDecade 70+ compared to AgeDecade 20-29, the odds of being physically active are predicted to change by atleast 0.20 times and at most 0.32 times, controlling for all other independent variables.
We are 95% confident that in our population of interest, for every one-unit increase in individuals with a smoking history of atleast 100 cigarettes in their life as compared to no smoking history , the odds of being Physically Active are predicted to change by at least 0.68 times and at most 0.84 times, controlling for all other independent variables.
The p value is low which means that the null hypothesis that there is no association between BMI and PhyActive can be rejected and the alternate hypothesis is true ### Poverty The p value is low which means that the null hypothesis that there is no association between Poverty and PhyActive can be rejected and the alternate hypothesis is true ### AgeDecade The p value is low which means that the null hypothesis that there is no association between AgeDecade and PhyActive can be rejected and alternate hypothesis is true ### Smoke100 The p value is low which means that the null hypothesis that there is no association between Smoke100 and PhyActive can be rejected and the alternate hypothesis is true
if (!require(DescTools)) install.packages('DescTools')
## Loading required package: DescTools
##
## Attaching package: 'DescTools'
## The following object is masked from 'package:jtools':
##
## %nin%
library(DescTools)
PseudoR2(logit1)
## McFadden
## 0.07396876
The Pseudo R^2 is 0.074 for our regression model logit1 which is a low value and means that our model doesnt fit the data well, the residual errors are big and the regression model will not reliably predict whether someone is 1 or 0 on the dependent variable which is PhysActive.
if (!require(ResourceSelection)) install.packages('ResourceSelection')
## Loading required package: ResourceSelection
## ResourceSelection 0.3-6 2023-06-27
library(ResourceSelection)
hoslem.test(logit1$y,predict(logit1, type = "response"))
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: logit1$y, predict(logit1, type = "response")
## X-squared = 34.11, df = 8, p-value = 3.881e-05
The p-value for the Hosmer-Lemeshow test on our model logit1 is extremely low, supporting the alternate hypothesis that this logistic regression model does not fit our data well
the independent variables AgeDecade and Smoke history had dummy variables that i didnot put code in - so R did it - is that correct? so we only make dummy variables for the dependent variable to code the regression model? Can you review the p value and the Pseudo R^2 interpretation ?