You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Body Mass Index(BMI) and risk of cardiovascular disease; the Framingham study
What are the cases, and how many are there? The Framingham Heart Study is a long-term, ongoing cardiovascular cohort study on residents of the city of Framingham, Massachusetts. The study began in 1948 with 5,209 adult subjects from Framingham, and is now on its third generation of participants.
Describe the method of data collection. The Framingham Heart Study participants, and their children and grandchildren, voluntarily consented to undergo a detailed medical history, physical examination, and medical tests every two years, creating a wealth of data about physical and mental health, especially about cardiovascular disease. All subjects were white.
What type of study is this (observational/experiment)? prospective observational longitudinal study.
If you collected the data, state self-collected. If not, provide a citation/link. www.kaggle.com
What is the response variable? Is it quantitative or qualitative? BMI, the BMI was calculated by subject’s weight(kg) and height(m). It is a quatitative variable. BMI was calculated as the weight in kilograms divided by the square of the height in meters (kg/m2).
You should have two independent variables, one quantitative and one qualitative. The independat variables including sex( qualitative), age(quantitative), education (qualitative), smoking(qualitative), hypertension (qualitative), diabetes(qualitative), cholestrol(quantitative), coronary heart disease(qualitative)
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed. Means will be calculated for all parameters in both men and women and in different age groups. The age group categories are: <30 years, 30 to 39 years, 40 to 49 years, 50 to 59 years, and ???60 years. The majority of the individuals in the <30 years category were between 20 and 29 years of age, and the majority of the individuals in the ???60 years category were between 60 and 69 years of age in both men and women. Subjects were also divided into 6 groups according to their BMI: <21.00, 21.00 to 22.99, 23.00 to 24.99, 25.00 to 27.49, 27.50 to 29.99, and ???30.00 kg/m2. These ranges are selected because they are similar to those selected in other large epidemiological studies of men and women.5927 To achieve normal distribution, a logarithmic transformation will be applied to BMI, total cholesterol in men and women. The PROC REG procedure will be used to test the association of BMI (as a continuous variable) with blood pressure, glucose, and plasma lipid levels after adjustment for age effects and exclusion of smokers. The odds ratios for each unit of BMI increase will be determined using PROC LOGIST, after the exclusion of smokers from the analysis to avoid residual effects of smoking.
require(rvest)
## Loading required package: rvest
## Loading required package: xml2
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(stringr)
## Loading required package: stringr
require(tidyr)
## Loading required package: tidyr
require(dplyr)
require(ggplot2)
## Loading required package: ggplot2
require(readr)
## Loading required package: readr
##
## Attaching package: 'readr'
## The following object is masked from 'package:rvest':
##
## guess_encoding
require(broom)
## Loading required package: broom
fhs <- read_csv("https://raw.githubusercontent.com/johnpannyc/data-606-final-project/aaa4460bec757f87321b826800b2017a48b3d437/framingham.csv")
## Parsed with column specification:
## cols(
## male = col_integer(),
## age = col_integer(),
## education = col_integer(),
## currentSmoker = col_integer(),
## cigsPerDay = col_integer(),
## BPMeds = col_integer(),
## prevalentStroke = col_integer(),
## prevalentHyp = col_integer(),
## diabetes = col_integer(),
## totChol = col_integer(),
## sysBP = col_double(),
## diaBP = col_double(),
## BMI = col_double(),
## heartRate = col_integer(),
## glucose = col_integer(),
## TenYearCHD = col_integer()
## )
dim(fhs)
## [1] 4240 16
head(fhs)
## # A tibble: 6 x 16
## male age education currentSmoker cigsPerDay BPMeds prevalentStroke
## <int> <int> <int> <int> <int> <int> <int>
## 1 1 39 4 0 0 0 0
## 2 0 46 2 0 0 0 0
## 3 1 48 1 1 20 0 0
## 4 0 61 3 1 30 0 0
## 5 0 46 3 1 23 0 0
## 6 0 43 2 0 0 0 0
## # ... with 9 more variables: prevalentHyp <int>, diabetes <int>,
## # totChol <int>, sysBP <dbl>, diaBP <dbl>, BMI <dbl>, heartRate <int>,
## # glucose <int>, TenYearCHD <int>
tail(fhs)
## # A tibble: 6 x 16
## male age education currentSmoker cigsPerDay BPMeds prevalentStroke
## <int> <int> <int> <int> <int> <int> <int>
## 1 1 51 3 1 43 0 0
## 2 0 48 2 1 20 NA 0
## 3 0 44 1 1 15 0 0
## 4 0 52 2 0 0 0 0
## 5 1 40 3 0 0 0 0
## 6 0 39 3 1 30 0 0
## # ... with 9 more variables: prevalentHyp <int>, diabetes <int>,
## # totChol <int>, sysBP <dbl>, diaBP <dbl>, BMI <dbl>, heartRate <int>,
## # glucose <int>, TenYearCHD <int>
summary(fhs)
## male age education currentSmoker
## Min. :0.0000 Min. :32.00 Min. :1.000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:42.00 1st Qu.:1.000 1st Qu.:0.0000
## Median :0.0000 Median :49.00 Median :2.000 Median :0.0000
## Mean :0.4292 Mean :49.58 Mean :1.979 Mean :0.4941
## 3rd Qu.:1.0000 3rd Qu.:56.00 3rd Qu.:3.000 3rd Qu.:1.0000
## Max. :1.0000 Max. :70.00 Max. :4.000 Max. :1.0000
## NA's :105
## cigsPerDay BPMeds prevalentStroke prevalentHyp
## Min. : 0.000 Min. :0.00000 Min. :0.000000 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.0000
## Median : 0.000 Median :0.00000 Median :0.000000 Median :0.0000
## Mean : 9.006 Mean :0.02962 Mean :0.005896 Mean :0.3106
## 3rd Qu.:20.000 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:1.0000
## Max. :70.000 Max. :1.00000 Max. :1.000000 Max. :1.0000
## NA's :29 NA's :53
## diabetes totChol sysBP diaBP
## Min. :0.00000 Min. :107.0 Min. : 83.5 Min. : 48.0
## 1st Qu.:0.00000 1st Qu.:206.0 1st Qu.:117.0 1st Qu.: 75.0
## Median :0.00000 Median :234.0 Median :128.0 Median : 82.0
## Mean :0.02571 Mean :236.7 Mean :132.4 Mean : 82.9
## 3rd Qu.:0.00000 3rd Qu.:263.0 3rd Qu.:144.0 3rd Qu.: 90.0
## Max. :1.00000 Max. :696.0 Max. :295.0 Max. :142.5
## NA's :50
## BMI heartRate glucose TenYearCHD
## Min. :15.54 Min. : 44.00 Min. : 40.00 Min. :0.0000
## 1st Qu.:23.07 1st Qu.: 68.00 1st Qu.: 71.00 1st Qu.:0.0000
## Median :25.40 Median : 75.00 Median : 78.00 Median :0.0000
## Mean :25.80 Mean : 75.88 Mean : 81.96 Mean :0.1519
## 3rd Qu.:28.04 3rd Qu.: 83.00 3rd Qu.: 87.00 3rd Qu.:0.0000
## Max. :56.80 Max. :143.00 Max. :394.00 Max. :1.0000
## NA's :19 NA's :1 NA's :388
table(fhs$male)
##
## 0 1
## 2420 1820
hist(fhs$age)
hist(fhs$BMI, main=paste("distribution of BMI in Framingham Heart Study"))
boxplot
boxplot(fhs$BMI)
Above plot shows that BMI is normal distribution with mean BMI equals to 25.80 and median BMI equals to 25.40.
table(fhs$TenYearCHD)
##
## 0 1
## 3596 644
Among the all participants, 3596 participants without TenYearCHD, 644 participants with TenYearCHD.
table(fhs$currentSmoker)
##
## 0 1
## 2145 2095
table(fhs$prevalentHyp)
##
## 0 1
## 2923 1317
table(fhs$diabetes)
##
## 0 1
## 4131 109
ggplot(data = fhs, aes(x = TenYearCHD, y = BMI,group=TenYearCHD)) +
geom_boxplot()
## Warning: Removed 19 rows containing non-finite values (stat_boxplot).
ggplot(data = fhs, aes(x = factor(TenYearCHD), y = BMI)) +
geom_boxplot()
## Warning: Removed 19 rows containing non-finite values (stat_boxplot).
#Explore Data
glimpse(fhs)
## Observations: 4,240
## Variables: 16
## $ male <int> 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0...
## $ age <int> 39, 46, 48, 61, 46, 43, 63, 45, 52, 43, 50, 43...
## $ education <int> 4, 2, 1, 3, 3, 2, 1, 2, 1, 1, 1, 2, 1, 3, 2, 2...
## $ currentSmoker <int> 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1...
## $ cigsPerDay <int> 0, 0, 20, 30, 23, 0, 0, 20, 0, 30, 0, 0, 15, 0...
## $ BPMeds <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
## $ prevalentStroke <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ prevalentHyp <int> 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1...
## $ diabetes <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ totChol <int> 195, 250, 245, 225, 285, 228, 205, 313, 260, 2...
## $ sysBP <dbl> 106.0, 121.0, 127.5, 150.0, 130.0, 180.0, 138....
## $ diaBP <dbl> 70.0, 81.0, 80.0, 95.0, 84.0, 110.0, 71.0, 71....
## $ BMI <dbl> 26.97, 28.73, 25.34, 28.58, 23.10, 30.30, 33.1...
## $ heartRate <int> 80, 95, 75, 65, 85, 77, 60, 79, 76, 93, 75, 72...
## $ glucose <int> 77, 76, 70, 103, 85, 99, 85, 78, 79, 88, 76, 6...
## $ TenYearCHD <int> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1...
The following are the ranges to define different categories of BMI. Underweight: BMI is less than 18.5 Normal weight: BMI is 18.5 to 24.9 Overweight: BMI is 25 to 29.9 Obese: BMI is 30 or more We are going to study whether BMI >30 as obese is a risk factor of CHD Here, we create 2 categories, BMI<=30 non-obese, BMI>30 obese
fhs$obesity<-ifelse(fhs$BMI>30, 1,0)
glimpse(fhs)
## Observations: 4,240
## Variables: 17
## $ male <int> 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0...
## $ age <int> 39, 46, 48, 61, 46, 43, 63, 45, 52, 43, 50, 43...
## $ education <int> 4, 2, 1, 3, 3, 2, 1, 2, 1, 1, 1, 2, 1, 3, 2, 2...
## $ currentSmoker <int> 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1...
## $ cigsPerDay <int> 0, 0, 20, 30, 23, 0, 0, 20, 0, 30, 0, 0, 15, 0...
## $ BPMeds <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
## $ prevalentStroke <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ prevalentHyp <int> 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1...
## $ diabetes <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ totChol <int> 195, 250, 245, 225, 285, 228, 205, 313, 260, 2...
## $ sysBP <dbl> 106.0, 121.0, 127.5, 150.0, 130.0, 180.0, 138....
## $ diaBP <dbl> 70.0, 81.0, 80.0, 95.0, 84.0, 110.0, 71.0, 71....
## $ BMI <dbl> 26.97, 28.73, 25.34, 28.58, 23.10, 30.30, 33.1...
## $ heartRate <int> 80, 95, 75, 65, 85, 77, 60, 79, 76, 93, 75, 72...
## $ glucose <int> 77, 76, 70, 103, 85, 99, 85, 78, 79, 88, 76, 6...
## $ TenYearCHD <int> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1...
## $ obesity <dbl> 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0...
mod2 <- glm( TenYearCHD ~ obesity + male+diabetes + prevalentHyp+currentSmoker, data = fhs,family=binomial)
mod2
##
## Call: glm(formula = TenYearCHD ~ obesity + male + diabetes + prevalentHyp +
## currentSmoker, family = binomial, data = fhs)
##
## Coefficients:
## (Intercept) obesity male diabetes prevalentHyp
## -2.46102 0.05307 0.49228 1.02231 0.96554
## currentSmoker
## 0.15445
##
## Degrees of Freedom: 4220 Total (i.e. Null); 4215 Residual
## (19 observations deleted due to missingness)
## Null Deviance: 3571
## Residual Deviance: 3391 AIC: 3403
summary(mod2)
##
## Call:
## glm(formula = TenYearCHD ~ obesity + male + diabetes + prevalentHyp +
## currentSmoker, family = binomial, data = fhs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2750 -0.5632 -0.4358 -0.4047 2.2552
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.46102 0.08886 -27.697 < 2e-16 ***
## obesity 0.05307 0.12621 0.421 0.6741
## male 0.49228 0.09039 5.446 5.15e-08 ***
## diabetes 1.02231 0.21330 4.793 1.64e-06 ***
## prevalentHyp 0.96554 0.09126 10.580 < 2e-16 ***
## currentSmoker 0.15445 0.09150 1.688 0.0914 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3571.5 on 4220 degrees of freedom
## Residual deviance: 3391.4 on 4215 degrees of freedom
## (19 observations deleted due to missingness)
## AIC: 3403.4
##
## Number of Fisher Scoring iterations: 4
table(fhs$obesity)
##
## 0 1
## 3685 536
fhs2<- filter(fhs, age>50)
let us look at the new dataframe fhs
glimpse(fhs2)
## Observations: 1,883
## Variables: 17
## $ male <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0...
## $ age <int> 61, 63, 52, 52, 52, 60, 61, 60, 59, 61, 54, 56...
## $ education <int> 3, 1, 1, 1, 3, 1, 3, 1, 1, NA, 1, NA, 1, 1, 2,...
## $ currentSmoker <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0...
## $ cigsPerDay <int> 30, 0, 0, 0, 20, 0, 0, 0, 0, 5, 20, 0, 0, 0, 0...
## $ BPMeds <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1...
## $ prevalentStroke <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ prevalentHyp <int> 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1...
## $ diabetes <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1...
## $ totChol <int> 225, 205, 260, 234, 215, 260, 272, 247, 209, 1...
## $ sysBP <dbl> 150.0, 138.0, 141.5, 148.0, 132.0, 110.0, 182....
## $ diaBP <dbl> 95.0, 71.0, 89.0, 78.0, 82.0, 72.5, 121.0, 88....
## $ BMI <dbl> 28.58, 33.11, 26.36, 34.17, 25.11, 26.59, 32.8...
## $ heartRate <int> 65, 60, 76, 70, 71, 65, 85, 72, 90, 72, 96, 72...
## $ glucose <int> 103, 85, 79, 113, 75, NA, 65, 74, 88, 75, 87, ...
## $ TenYearCHD <int> 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1...
## $ obesity <dbl> 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0...
table(fhs2$obesity)
##
## 0 1
## 1597 276
table(fhs2$TenYearCHD)
##
## 0 1
## 1452 431
mod3 <- glm( TenYearCHD ~ obesity + male+diabetes + prevalentHyp+currentSmoker, data = fhs2, family=binomial)
mod3
##
## Call: glm(formula = TenYearCHD ~ obesity + male + diabetes + prevalentHyp +
## currentSmoker, family = binomial, data = fhs2)
##
## Coefficients:
## (Intercept) obesity male diabetes prevalentHyp
## -1.9847 0.1058 0.5089 0.7806 0.7520
## currentSmoker
## 0.2604
##
## Degrees of Freedom: 1872 Total (i.e. Null); 1867 Residual
## (10 observations deleted due to missingness)
## Null Deviance: 2004
## Residual Deviance: 1925 AIC: 1937
summary(mod3)
##
## Call:
## glm(formula = TenYearCHD ~ obesity + male + diabetes + prevalentHyp +
## currentSmoker, family = binomial, data = fhs2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3615 -0.7207 -0.6417 -0.5075 2.0560
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.9847 0.1148 -17.287 < 2e-16 ***
## obesity 0.1058 0.1583 0.668 0.50389
## male 0.5089 0.1183 4.302 1.69e-05 ***
## diabetes 0.7806 0.2484 3.142 0.00168 **
## prevalentHyp 0.7520 0.1166 6.451 1.11e-10 ***
## currentSmoker 0.2604 0.1196 2.177 0.02945 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2003.6 on 1872 degrees of freedom
## Residual deviance: 1925.1 on 1867 degrees of freedom
## (10 observations deleted due to missingness)
## AIC: 1937.1
##
## Number of Fisher Scoring iterations: 4
fhs3<-filter(fhs2, male==1)
glimpse(fhs3)
## Observations: 780
## Variables: 17
## $ male <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ age <int> 52, 61, 54, 56, 52, 54, 51, 56, 53, 57, 60, 53...
## $ education <int> 1, NA, 1, NA, 1, 2, 4, 4, 1, 1, 1, 1, 4, 3, 1,...
## $ currentSmoker <int> 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0...
## $ cigsPerDay <int> 0, 5, 20, 0, 0, 0, 0, 20, 20, 0, 20, 20, 30, 0...
## $ BPMeds <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ prevalentStroke <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ prevalentHyp <int> 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0...
## $ diabetes <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ totChol <int> 260, 175, 214, 257, 178, 195, 216, 270, 220, 2...
## $ sysBP <dbl> 141.5, 134.0, 147.0, 153.5, 160.0, 132.0, 112....
## $ diaBP <dbl> 89.0, 82.5, 74.0, 102.0, 98.0, 83.5, 66.0, 79....
## $ BMI <dbl> 26.36, 18.59, 24.71, 28.09, 40.11, 26.21, 23.4...
## $ heartRate <int> 76, 72, 96, 72, 75, 75, 90, 95, 78, 75, 90, 60...
## $ glucose <int> 79, 75, 87, 75, 225, 100, 95, 93, 73, 64, 83, ...
## $ TenYearCHD <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0...
## $ obesity <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
mod4 <- lm( TenYearCHD ~ obesity + diabetes + prevalentHyp+currentSmoker, data = fhs3, family=binomial)
## Warning: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
## extra argument 'family' will be disregarded
mod4
##
## Call:
## lm(formula = TenYearCHD ~ obesity + diabetes + prevalentHyp +
## currentSmoker, data = fhs3, family = binomial)
##
## Coefficients:
## (Intercept) obesity diabetes prevalentHyp currentSmoker
## 0.15968 0.07385 0.26057 0.11473 0.10064
summary(mod4)
##
## Call:
## lm(formula = TenYearCHD ~ obesity + diabetes + prevalentHyp +
## currentSmoker, data = fhs3, family = binomial)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7095 -0.2744 -0.2335 0.5511 0.8403
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.15968 0.02677 5.964 3.75e-09 ***
## obesity 0.07385 0.05364 1.377 0.168947
## diabetes 0.26057 0.07666 3.399 0.000711 ***
## prevalentHyp 0.11473 0.03278 3.499 0.000493 ***
## currentSmoker 0.10064 0.03182 3.162 0.001627 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4379 on 772 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.04509, Adjusted R-squared: 0.04014
## F-statistic: 9.114 on 4 and 772 DF, p-value: 3.388e-07
fhs4 <- filter(fhs2, male==0)
glimpse(fhs4)
## Observations: 1,103
## Variables: 17
## $ male <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ age <int> 61, 63, 52, 52, 60, 61, 60, 59, 52, 53, 65, 63...
## $ education <int> 3, 1, 1, 3, 1, 3, 1, 1, 1, 3, 1, 2, 1, 1, 1, 1...
## $ currentSmoker <int> 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0...
## $ cigsPerDay <int> 30, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 40, 3, 0, 9...
## $ BPMeds <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0...
## $ prevalentStroke <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ prevalentHyp <int> 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1...
## $ diabetes <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0...
## $ totChol <int> 225, 205, 234, 215, 260, 272, 247, 209, NA, 31...
## $ sysBP <dbl> 150.0, 138.0, 148.0, 132.0, 110.0, 182.0, 130....
## $ diaBP <dbl> 95.0, 71.0, 78.0, 82.0, 72.5, 121.0, 88.0, 85....
## $ BMI <dbl> 28.58, 33.11, 34.17, 25.11, 26.59, 32.80, 30.3...
## $ heartRate <int> 65, 60, 70, 71, 65, 85, 72, 90, 70, 76, 90, 95...
## $ glucose <int> 103, 85, 113, 75, NA, 65, 74, 88, NA, 215, 87,...
## $ TenYearCHD <int> 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0...
## $ obesity <dbl> 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0...
mod5 <- lm( TenYearCHD ~ obesity + diabetes + prevalentHyp + currentSmoker, data = fhs4,family=binomial)
## Warning: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
## extra argument 'family' will be disregarded
mod5
##
## Call:
## lm(formula = TenYearCHD ~ obesity + diabetes + prevalentHyp +
## currentSmoker, data = fhs4, family = binomial)
##
## Coefficients:
## (Intercept) obesity diabetes prevalentHyp currentSmoker
## 0.122729 -0.007899 0.076696 0.136878 0.001147
summary(mod5)
##
## Call:
## lm(formula = TenYearCHD ~ obesity + diabetes + prevalentHyp +
## currentSmoker, data = fhs4, family = binomial)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.3375 -0.2596 -0.1227 -0.1227 0.8852
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.122729 0.018800 6.528 1.02e-10 ***
## obesity -0.007899 0.031249 -0.253 0.800
## diabetes 0.076696 0.062879 1.220 0.223
## prevalentHyp 0.136878 0.024132 5.672 1.81e-08 ***
## currentSmoker 0.001147 0.026432 0.043 0.965
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.388 on 1091 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.03228, Adjusted R-squared: 0.02873
## F-statistic: 9.097 on 4 and 1091 DF, p-value: 3.14e-07