library(ggplot2)
library(dplyr)
log( 10 , base=10 )
## [1] 1
log( 10 )
## [1] 2.302585
id: Patient identification numberage: Age at time of enrollment in the study
(years)height: Height at time of enrollment in the study
(inches)weight: Weight at time of enrollment in the study
(lbs)sbp: Systolic blood pressuredbp: Diastolic blood pressurechol: Cholesterol (mg/dL)ncigs: Number of cigarettes smoked per daytabp: Presence (1) or absence (0) of Type A
behaviorchd: Presence (1) or absence (0) of CHD over
followuptypchd: Type of CHD event (0 = none, 1 = MI or SD, 2 =
silent MI, 3 = angina)time: Followup time (days)arcus: Presence (1) or absence (0) of arcus
senilisbmi: Constructed variable; BMI = weight * 703 /
height^2WCGS = read.csv("https://raw.githubusercontent.com/cwolock/BIOST311/main/Datasets/wcgs.csv")
the appropriate explanatory variable is tabp, and the appropriate response variable is chd
#We can convert numeric binary variables (e.g. 0-1 variables) into "No-Yes" variables, as follows (uncomment the 2 lines below):
WCGS = WCGS %>% mutate( TypeA = ifelse(tabp==0,"No","Yes") ,
CHD = ifelse(chd==0,"No","Yes") )
# Make a barplot of the "No-Yes" explanatory variable, but where we fill the bars based on the "No-Yes" response variable (uncomment and complete the lines below):
ggplot( data=WCGS , aes(x = TypeA,
fill = CHD) ) +
geom_bar()
WCGS %>% select( TypeA , CHD ) %>% table( )
## CHD
## TypeA No Yes
## No 1486 79
## Yes 1411 178
P=0.112, odds=0.1262
P=0.0505, odds=0.0532
Odds Ratio=2.3722, the odds of someone developing CHD with type A behavior is a little under 2.5 times more likely than the odds of someone developing CHD without type A behavior
chd as
your response variable and tabp as your predictor.
Interpret the coefficient on tabp using an odds ratio.glm(chd~tabp,family=binomial,data=WCGS)
##
## Call: glm(formula = chd ~ tabp, family = binomial, data = WCGS)
##
## Coefficients:
## (Intercept) tabp
## -2.9344 0.8641
##
## Degrees of Freedom: 3153 Total (i.e. Null); 3152 Residual
## Null Deviance: 1781
## Residual Deviance: 1740 AIC: 1744
the coefficient for tabp is the number that is subtracted from the intercept to get the power of e which will give you the odds ratio
The coefficient for tabp is the power of e which represents the odds ratio
WCGS dataset. Call this new variable highbp.
To help, here is some code that you can uncomment and complete:WCGS = WCGS %>% mutate(highbp = case_when((sbp>=140 & dbp>=90)~TRUE,(sbp<=140&dbp<=90)~FALSE,(sbp>=140&dbp<=90)~FALSE,(sbp<=140&dbp>=90)~FALSE))
df<-WCGS
table(WCGS$highbp)
##
## FALSE TRUE
## 2701 453
print(453/3154)
## [1] 0.1436271
ggplot(data=WCGS,aes(x=chol,y=highbp))+geom_boxplot()
## Warning: Removed 12 rows containing non-finite values (`stat_boxplot()`).
people with higher cholesterol are more likely to have high blood
pressure.
chol using an odds ratio.glm(highbp~chol,family=binomial,data=WCGS)
##
## Call: glm(formula = highbp ~ chol, family = binomial, data = WCGS)
##
## Coefficients:
## (Intercept) chol
## -3.11001 0.00575
##
## Degrees of Freedom: 3141 Total (i.e. Null); 3140 Residual
## (12 observations deleted due to missingness)
## Null Deviance: 2585
## Residual Deviance: 2559 AIC: 2563
The chol coefficient represents the number subtracted from the intercept that will get you the power of e that equals the odds ratio, in this case, the odds ratio represents the strength of the relationship between high blood pressure and cholesterol, and since that ratio is equal to about 1, it can be inferred that the relationship between high blood pressure and cholesterol is weak
WCGS=WCGS%>%mutate(smoking_status=case_when(ncigs>=1~"smoker",ncigs==0~"non-smoker"))
glm(chd~tabp:smoking_status,family=binomial,data=WCGS)
##
## Call: glm(formula = chd ~ tabp:smoking_status, family = binomial, data = WCGS)
##
## Coefficients:
## (Intercept) tabp:smoking_statusnon-smoker
## -2.934 0.612
## tabp:smoking_statussmoker
## 1.070
##
## Degrees of Freedom: 3153 Total (i.e. Null); 3151 Residual
## Null Deviance: 1781
## Residual Deviance: 1732 AIC: 1738
for people of the same smoking status, people with type A behavior
are far less likely to develop CHD than those without it.
glm(chd~tabp:smoking_status,family=binomial,data=WCGS)
##
## Call: glm(formula = chd ~ tabp:smoking_status, family = binomial, data = WCGS)
##
## Coefficients:
## (Intercept) tabp:smoking_statusnon-smoker
## -2.934 0.612
## tabp:smoking_statussmoker
## 1.070
##
## Degrees of Freedom: 3153 Total (i.e. Null); 3151 Residual
## Null Deviance: 1781
## Residual Deviance: 1732 AIC: 1738
the non smoker coefficient refers to the number that you subtract
from the intercept to get the power of e that will get you the odds
ratio for non smokers, this odds ratio shows that among non-smokers,
those with type A behavior have odds that are about ten times worse than
those without type A behavior
WCGS=WCGS%>%
group_by(ncigs)%>%
mutate(ncigs_cat=cut(ncigs,42))
I don’t know
Party: R = republican and D = democrat
AutoCont: Amount of money received in contributions
from auto manufacturers
Senate = read.csv( 'https://raw.githubusercontent.com/vittorioaddona/data/main/Senate.csv' )
Name |Party|Vote|
:——–:|:—:|:–:| Murkowski R 1 Stevens R 1 Sessions R 1 Shelby R 1 Hutchinson R 1 Lincoln D 1 McCain R 0 Kyl R 1 Boxer D 0 Feinstein D 0 Allard R 1 Campbell R 1 Dodd D 0 Lieberman D 0 Carper D. 1 Biden. D. 0 Graham. D. 0 Nelson. D. 0 Cleland. D. 1 Miller. D. 1 Akaka. D. 0 Inouye. D. 0 Harkin. D. 0 Grassley. R. 1 Craig. R. 1 Crapo. R. 1 Durbin. D. 0 Fitzgerald. R. 1 Bayh. D. 1 Lugar. R. 1 Brownback. R. 1 Roberts. R. 1 Bunning. R. 1 McConnell. R. 1 Breaux. D. 1 Landrieu. D. 1 Kennedy. D. 0 Kerry. D. 0 Sarbanes. D. 0 Mikulski. D. 1 Collins. R. 0 Snowe. R. 0 Levin D. 1 Stabenow. D. 1 Dayton. D. 0 Wellstone. D. 0 Bond. R. 1 Carnahan. D. 1 Cochran. R. 1 Lott. R. 1 Baucus. D. 1 Burns. R. 1 Edwards. D. 0 Helms. R. 1 Conrad. D. 1 Dorgan. D. 1 Hagel. R. 1 Nelson. D. 1 Gregg R. 0 Smith. R. 1 Corzine. D. 0 Torricelli. D. 0 Bingaman. D. 0 Domenici. R. 1 Reid. D. 0 Ensign. R. 1 Clinton. D. 0 Schumer. D 0 DeWine. R. 1 Voinovich. R. 1 Inhofe. R. 1 Nickles. R. 1 Smith. R. 0 Wyden. D. 0 Santorum. R. 1 Specter. R. 1 Chafee. R. 0 Reed. D. 0 Hollings. D. 0 Thurmond. R. 1 Daschle. D. 0 Johnson. D. 1 Frist. R. 1 Thompson. R. 1 Gramm. R. 1 Hutchison. R. 1 Bennett. R. 1 Hatch. R. 1 Allen. R. 1 Warner. R. 1 Jeffords. D. 0 Leahy. D. 0 Cantwell. D. 0 Murray. D. 0 Feingold. D. 1 Kohl. D. 1 Rockefeller. D. 0 Byrd. D. 1 Enzi. R. 1 Thomas. R. 1
Democrats=0.387 Republicans=0.877
Dem=0.6333 Rep=7.13008
11.294
Vote using
party affiliation.glm(Vote~Party,family=binomial,Senate)
##
## Call: glm(formula = Vote ~ Party, family = binomial, data = Senate)
##
## Coefficients:
## (Intercept) PartyR
## -0.5213 2.4907
##
## Degrees of Freedom: 99 Total (i.e. Null); 98 Residual
## Null Deviance: 132.8
## Residual Deviance: 103.8 AIC: 107.8
add the pary r coefficient to the intercept coefficient and set that value as the power of e
Vote that
uses both Party and AutoCont. Provide an
interpretation of the coefficient on AutoCont.glm(Vote~Party+AutoCont,family=binomial,Senate)
##
## Call: glm(formula = Vote ~ Party + AutoCont, family = binomial, data = Senate)
##
## Coefficients:
## (Intercept) PartyR AutoCont
## -1.4001534 1.8810341 0.0001079
##
## Degrees of Freedom: 99 Total (i.e. Null); 97 Residual
## Null Deviance: 132.8
## Residual Deviance: 89.3 AIC: 95.3
Party increase
or decrease when you added AutoCont to the model? What does
this tell us about the contributions made to senators from each party
(i.e. which party must have received higher contributions)? Briefly
explain your reasoning.decrease, which means that the democrats received more contributions because their odds went up more
0.7252
fun = function(x,m,b) { exp(m*x+b)/(1+exp(m*x+b)) }
ggplot( data=Senate, aes(x=AutoCont,y=jitter(Vote,amount=0.03),color=Party) ) +
geom_point() +
stat_function(aes(color="R"),fun=fun,args=list(m=0.0001079,b=0.4808807)) +
stat_function(aes(color="D"),fun=fun,args=list(m=0.0001079,b=-1.4001534)) +
coord_cartesian(xlim = c(0,55000)) +
ylab("Vote")