Call:
lm(formula = Overall_Risk_Score ~ Air_Pollution + Alcohol_Use +
Obesity + Occupational_Hazards, data = cancer_data)
Residuals:
Min 1Q Median 3Q Max
-0.305332 -0.055599 -0.002445 0.056468 0.263308
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1688402 0.0064234 26.29 <2e-16 ***
Air_Pollution 0.0178606 0.0005837 30.60 <2e-16 ***
Alcohol_Use 0.0134763 0.0005705 23.62 <2e-16 ***
Obesity 0.0103827 0.0006076 17.09 <2e-16 ***
Occupational_Hazards 0.0121961 0.0005797 21.04 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.08296 on 1995 degrees of freedom
Multiple R-squared: 0.5466, Adjusted R-squared: 0.5457
F-statistic: 601.2 on 4 and 1995 DF, p-value: < 2.2e-16
- Air pollution, alcohol use, obesity, and occupational hazards all demonstrated a very small p-value; we can conclude there is a statistically significant effect on the overall risk score that isn’t a result of chance.
- Additionally, all the variables have high t values (cutoff for signficance is 0.05, so for t we used the cutoff 2), which further supports that each risk factor has a legitimate effect on overall risk score.
- Lastly, the linear model explains 50% of the variation (shown by r-squared being approximately 0.55).