This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
# Load the dataset
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'tibble' was built under R version 4.5.2
## Warning: package 'tidyr' was built under R version 4.5.2
## Warning: package 'readr' was built under R version 4.5.2
## Warning: package 'purrr' was built under R version 4.5.2
## Warning: package 'stringr' was built under R version 4.5.2
## Warning: package 'forcats' was built under R version 4.5.2
## Warning: package 'lubridate' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.1 ✔ readr 2.1.6
## ✔ ggplot2 4.0.2 ✔ stringr 1.6.0
## ✔ lubridate 1.9.4 ✔ tibble 3.3.1
## ✔ purrr 1.2.1 ✔ tidyr 1.3.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(haven)
## Warning: package 'haven' was built under R version 4.5.2
library(janitor)
## Warning: package 'janitor' was built under R version 4.5.2
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(knitr)
## Warning: package 'knitr' was built under R version 4.5.2
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 4.5.2
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
library(plotly)
## Warning: package 'plotly' was built under R version 4.5.2
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(broom)
library(ggeffects)
## Warning: package 'ggeffects' was built under R version 4.5.2
library(gtsummary)
## Warning: package 'gtsummary' was built under R version 4.5.2
library(GGally)
## Warning: package 'GGally' was built under R version 4.5.2
library(car)
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.5.2
##
## Attaching package: 'car'
##
## The following object is masked from 'package:purrr':
##
## some
##
## The following object is masked from 'package:dplyr':
##
## recode
library(lmtest)
## Warning: package 'lmtest' was built under R version 4.5.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.5.2
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.5.2
## corrplot 0.95 loaded
brfss_full <- read_xpt("C:\\Users\\safwa\\OneDrive - University at Albany - SUNY\\EPI 553\\Labs\\brfss dataset\\LLCP2020.XPT") %>%
clean_names()
# Display dataset dimensions
names(brfss_full)
## [1] "state" "fmonth" "idate" "imonth" "iday" "iyear"
## [7] "dispcode" "seqno" "psu" "ctelenm1" "pvtresd1" "colghous"
## [13] "statere1" "celphone" "ladult1" "colgsex" "numadult" "landsex"
## [19] "nummen" "numwomen" "respslct" "safetime" "ctelnum1" "cellfon5"
## [25] "cadult1" "cellsex" "pvtresd3" "cclghous" "cstate1" "landline"
## [31] "hhadult" "sexvar" "genhlth" "physhlth" "menthlth" "poorhlth"
## [37] "hlthpln1" "persdoc2" "medcost" "checkup1" "exerany2" "sleptim1"
## [43] "cvdinfr4" "cvdcrhd4" "cvdstrk3" "asthma3" "asthnow" "chcscncr"
## [49] "chcocncr" "chccopd2" "havarth4" "addepev3" "chckdny2" "diabete4"
## [55] "diabage3" "lastden4" "rmvteth4" "marital" "educa" "renthom1"
## [61] "numhhol3" "numphon3" "cpdemo1b" "veteran3" "employ1" "children"
## [67] "income2" "pregnant" "weight2" "height3" "deaf" "blind"
## [73] "decide" "diffwalk" "diffdres" "diffalon" "smoke100" "smokday2"
## [79] "stopsmk2" "lastsmk2" "usenow3" "alcday5" "avedrnk3" "drnk3ge5"
## [85] "maxdrnks" "flushot7" "flshtmy3" "shingle2" "pneuvac4" "fall12mn"
## [91] "fallinj4" "seatbelt" "drnkdri2" "hadmam" "howlong" "hadpap2"
## [97] "lastpap2" "hpvtest" "hplsttst" "hadhyst2" "pcpsaad3" "pcpsadi1"
## [103] "pcpsare1" "psatest1" "psatime" "pcpsars1" "colnscpy" "colntest"
## [109] "sigmscpy" "sigmtest" "bldstol1" "lstblds4" "stooldna" "sdnatest"
## [115] "vircolon" "vclntest" "hivtst7" "hivtstd3" "hivrisk5" "pdiabtst"
## [121] "prediab1" "insulin1" "bldsugar" "feetchk3" "doctdiab" "chkhemo3"
## [127] "feetchk" "eyeexam1" "diabeye" "diabedu" "toldcfs" "havecfs"
## [133] "workcfs" "toldhepc" "trethepc" "prirhepc" "havehepc" "havehepb"
## [139] "medshepb" "hlthcvr1" "cimemlos" "cdhouse" "cdassist" "cdhelp"
## [145] "cdsocial" "cddiscus" "caregiv1" "crgvrel4" "crgvlng1" "crgvhrs1"
## [151] "crgvprb3" "crgvalzd" "crgvper1" "crgvhou1" "crgvexpt" "ecigaret"
## [157] "ecignow" "marijan1" "usemrjn2" "rsnmrjn1" "lcsfirst" "lcslast"
## [163] "lcsnumcg" "lcsctscn" "cncrdiff" "cncrage" "cncrtyp1" "csrvtrt3"
## [169] "csrvdoc1" "csrvsum" "csrvrtrn" "csrvinst" "csrvinsr" "csrvdein"
## [175] "csrvclin" "csrvpain" "csrvctl2" "pcpsade1" "pcdmdec1" "hpvadvc4"
## [181] "hpvadsht" "tetanus1" "imfvpla1" "birthsex" "somale" "sofemale"
## [187] "trnsgndr" "acedeprs" "acedrink" "acedrugs" "aceprisn" "acedivrc"
## [193] "acepunch" "acehurt1" "aceswear" "acetouch" "acetthem" "acehvsex"
## [199] "rcsgendr" "rcsrltn2" "casthdx2" "casthno2" "qstver" "qstlang"
## [205] "metstat" "urbstat" "mscode" "ststr" "strwt" "rawrake"
## [211] "wt2rake" "imprace" "chispnc" "crace1" "cprace" "cllcpwt"
## [217] "dualuse" "dualcor" "llcpwt2" "llcpwt" "rfhlth" "phys14d"
## [223] "ment14d" "hcvu651" "totinda" "michd" "ltasth1" "casthm1"
## [229] "asthms1" "drdxar2" "exteth3" "alteth3" "denvst3" "prace1"
## [235] "mrace1" "hispanc" "race" "raceg21" "racegr3" "raceprv"
## [241] "sex" "ageg5yr" "age65yr" "age80" "age_g" "htin4"
## [247] "htm4" "wtkg3" "bmi5" "bmi5cat" "rfbmi5" "chldcnt"
## [253] "educag" "incomg" "smoker3" "rfsmok3" "drnkany5" "drocdy3"
## [259] "rfbing5" "drnkwk1" "rfdrhv7" "flshot7" "pneumo3" "rfseat2"
## [265] "rfseat3" "drnkdrv" "rfmam22" "mam5023" "rfpap35" "rfpsa23"
## [271] "clnscpy" "sgmscpy" "sgms10y" "rfblds4" "stoldna" "vircoln"
## [277] "sbontim" "crcrec1" "aidtst4"
# Recode and clean
brfss_mlr <- brfss_full%>%
mutate(
# Outcome: mentally unhealthy days in past 30 (88 = none = 0; 77/99 = DK/refused = NA)
menthlth_days = case_when(
menthlth == 88 ~ 0,
menthlth >= 1 & menthlth <= 30 ~ as.numeric(menthlth),
TRUE ~ NA_real_
),
# Physical health days (key predictor)
physhlth_days = case_when(
physhlth == 88 ~ 0,
physhlth >= 1 & physhlth <= 30 ~ as.numeric(physhlth),
TRUE ~ NA_real_
),
# Sleep hours (practical cap at 14)
sleep_hrs = case_when(
sleptim1 >= 1 & sleptim1 <= 14 ~ as.numeric(sleptim1),
TRUE ~ NA_real_
),
# Age (capped at 80 per BRFSS coding)
age = age80,
# Income category (ordinal 1–8)
income_cat = case_when(
income2 %in% 1:8 ~ as.numeric(income2),
TRUE ~ NA_real_
),
# Sex
sex = factor(sexvar, levels = c(1, 2), labels = c("Male", "Female")),
# Exercise in past 30 days (any physical activity)
exercise = factor(case_when(
exerany2 == 1 ~ "Yes",
exerany2 == 2 ~ "No",
TRUE ~ NA_character_
), levels = c("No", "Yes")),
# BMI (stored as integer × 100 in BRFSS)
bmi = ifelse(bmi5 > 0, bmi5 / 100, NA_real_),
# Income as labeled factor (for display)
income_f = factor(income2, levels = 1:8,
labels = c("<$10k", "$10-15k", "$15-20k", "$20-25k",
"$25-35k", "$35-50k", "$50-75k", ">$75k"))
) %>%
filter(
!is.na(menthlth_days),
!is.na(physhlth_days),
!is.na(sleep_hrs),
!is.na(age), age >= 18,
!is.na(income_cat),
!is.na(sex),
!is.na(exercise)
)
# ── Analytic sample (reproducible random sample) ────────────────────────────
set.seed(553)
brfss_mlr <- brfss_mlr %>%
select(menthlth_days, physhlth_days, sleep_hrs, age,
income_cat, income_f, sex, exercise, bmi) %>%
slice_sample(n = 5000)
# Save for lab activity
saveRDS(brfss_mlr,
"C:\\Users\\safwa\\OneDrive - University at Albany - SUNY\\EPI 553\\Labs\\brfss dataset\\brfss_mlr_2020.rds")
tibble(Metric = c("Observations", "Variables"),
Value = c(nrow(brfss_mlr), ncol(brfss_mlr))) %>%
kable(caption = "Analytic Dataset Dimensions") %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)
| Metric | Value |
|---|---|
| Observations | 5000 |
| Variables | 9 |
1a. (5 pts) Create a descriptive statistics table
using tbl_summary() that includes all variables in the
dataset. Include means (SD) for continuous variables and n (%) for
categorical variables.
brfss_mlr %>%
select(menthlth_days, physhlth_days, sleep_hrs, age,
income_f, sex, exercise) %>%
tbl_summary(
label = list(
menthlth_days ~ "Mentally unhealthy days (past 30)",
physhlth_days ~ "Physically unhealthy days (past 30)",
sleep_hrs ~ "Sleep (hours/night)",
age ~ "Age (years)",
income_f ~ "Household income",
sex ~ "Sex",
exercise ~ "Any physical activity (past 30 days)"
),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
digits = all_continuous() ~ 1,
missing = "no"
) %>%
add_n() %>%
bold_labels() %>%
modify_caption("**Table 1. Descriptive Statistics — BRFSS 2020 Analytic Sample (n = 5,000)**")
| Characteristic | N | N = 5,0001 |
|---|---|---|
| Mentally unhealthy days (past 30) | 5,000 | 3.8 (7.7) |
| Physically unhealthy days (past 30) | 5,000 | 3.3 (7.8) |
| Sleep (hours/night) | 5,000 | 7.1 (1.3) |
| Age (years) | 5,000 | 54.3 (17.2) |
| Household income | 5,000 | |
| <$10k | 190 (3.8%) | |
| $10-15k | 169 (3.4%) | |
| $15-20k | 312 (6.2%) | |
| $20-25k | 434 (8.7%) | |
| $25-35k | 489 (9.8%) | |
| $35-50k | 683 (14%) | |
| $50-75k | 841 (17%) | |
| >$75k | 1,882 (38%) | |
| Sex | 5,000 | |
| Male | 2,331 (47%) | |
| Female | 2,669 (53%) | |
| Any physical activity (past 30 days) | 5,000 | 3,874 (77%) |
| 1 Mean (SD); n (%) | ||
1b. (5 pts) Create a histogram of
menthlth_days. Describe the shape of the distribution. Is
it symmetric, right-skewed, or left-skewed? What are the implications of
this shape for regression modeling?
p_hist <- ggplot(brfss_mlr, aes(x = menthlth_days)) +
geom_histogram(binwidth = 1, fill = "steelblue", color = "white", alpha = 0.85) +
labs(
title = "Distribution of Mentally Unhealthy Days in the Past 30 Days",
subtitle = "BRFSS 2020 Analytic Sample (n = 5,000)",
x = "Number of Mentally Unhealthy Days",
y = "Count"
) +
theme_minimal(base_size = 13)
ggplotly(p_hist)
Distribution of Mentally Unhealthy Days (BRFSS 2020)
1c. (5 pts) Create a scatterplot matrix (using
GGally::ggpairs() or similar) for the continuous variables:
menthlth_days, physhlth_days,
sleep_hrs, and age. Comment on the direction
and strength of each pairwise correlation with the outcome.
library(GGally)
brfss_mlr %>%
select(menthlth_days, physhlth_days, sleep_hrs, age) %>%
ggpairs()
2a. (5 pts) Fit a simple linear regression model
regressing menthlth_days on sleep_hrs alone.
Write out the fitted regression equation.
model_A <- lm(menthlth_days ~ sleep_hrs, data =brfss_mlr )
summary(model_A)
##
## Call:
## lm(formula = menthlth_days ~ sleep_hrs, data = brfss_mlr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.670 -3.845 -3.040 -0.040 31.785
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.47429 0.57712 16.42 <2e-16 ***
## sleep_hrs -0.80424 0.08025 -10.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.642 on 4998 degrees of freedom
## Multiple R-squared: 0.0197, Adjusted R-squared: 0.0195
## F-statistic: 100.4 on 1 and 4998 DF, p-value: < 2.2e-16
2b. (5 pts) Interpret the slope for sleep in a single sentence appropriate for a public health audience (no statistical jargon).
Ans: Each additional hour of sleep per night is associated with about 0.8 fewer days of poor mental health per month, on average.
2c. (5 pts) State the null and alternative hypotheses for the slope, report the t-statistic and p-value, and state your conclusion. What is the degree of freedom for this test?
Null Hypothesis 𝐻0 : 𝛽sleep = 0 There is no relationship between sleep duration and poor mental health days.
Alternative Hypothesis is
HA :𝛽sleep≠0 is Sleep duration is associated with poor mental health days.
3a. (5 pts) Fit three models:
menthlth_days ~ sleep_hrsmenthlth_days ~ sleep_hrs + age + sexmenthlth_days ~ sleep_hrs + age + sex + physhlth_days + income_cat + exercisemodel_A <- lm(menthlth_days ~ sleep_hrs, data = brfss_mlr)
model_B <- lm(menthlth_days ~ sleep_hrs + age + sex,
data = brfss_mlr)
model_C <- lm(menthlth_days ~ sleep_hrs + age + sex +
physhlth_days + income_cat + exercise,
data = brfss_mlr)
3b. (10 pts) Create a table comparing the sleep coefficient (\(\hat{\beta}\), SE, 95% CI, p-value) across Models A, B, and C. Does the sleep coefficient change substantially when you add more covariates? What does this suggest about confounding?
library(broom)
bind_rows(
tidy(model_A, conf.int = TRUE) %>% filter(term == "sleep_hrs") %>% mutate(Model = "A"),
tidy(model_B, conf.int = TRUE) %>% filter(term == "sleep_hrs") %>% mutate(Model = "B"),
tidy(model_C, conf.int = TRUE) %>% filter(term == "sleep_hrs") %>% mutate(Model = "C")
)
## # A tibble: 3 × 8
## term estimate std.error statistic p.value conf.low conf.high Model
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 sleep_hrs -0.804 0.0802 -10.0 2.03e-23 -0.962 -0.647 A
## 2 sleep_hrs -0.734 0.0793 -9.25 3.17e-20 -0.889 -0.578 B
## 3 sleep_hrs -0.509 0.0753 -6.76 1.57e-11 -0.657 -0.361 C
AnS : The coefficient for sleep hours decreases as additional covariates are added to the model. In Model A, each additional hour of sleep is associated with 0.80 fewer poor mental health days per month. After adjusting for age and sex in Model B, the estimate becomes slightly smaller (−0.73). In Model C, after further adjusting for physical health days, income, and exercise, the association decreases further to −0.51. This change suggests that some of the relationship between sleep and mental health is explained by other factors, particularly physical health and socioeconomic characteristics. However, the association remains strong and statistically meaningful across all models, indicating that sleep duration is independently related to mental health days even after accounting for these potential confounders.
3c. (10 pts) For Model C, write out the full fitted regression equation and interpret every coefficient in plain language appropriate for a public health report.
Ans : Intercept (β₀)- The intercept represents the predicted number of poor mental health days for a person in the reference categories (baseline sex, income group, and exercise status) when sleep hours, age, and physical health days are zero. While this value helps define the regression line mathematically, it is not typically meaningful in a real-world context.
Sleep Hours (β₁ = -0.509) Each additional hour of sleep per night is associated with about 0.51 fewer poor mental health days per month, on average, after accounting for age, sex, physical health, income, and exercise.
Age (β₂) Holding other factors constant, a one-year increase in age is associated with a β₂ change in the number of poor mental health days per month.
Sex (β₃) Compared with the reference sex group (usually males), individuals in the other sex category report β₃ more or fewer poor mental health days per month, on average, after accounting for other factors.
Physical Health Days (β₄) Each additional day of poor physical health is associated with β₄ additional poor mental health days, suggesting that physical and mental health problems often occur together.
Income Category (β₅) Compared with the reference income group, individuals in higher (or lower) income categories report β₅ fewer or more poor mental health days per month, after adjusting for other characteristics.
Exercise (β₆) People who report exercising have β₆ fewer (or more) poor mental health days per month compared with those who do not exercise, holding other factors constant.
4a. (5 pts) Report \(R^2\) and Adjusted \(R^2\) for each of the three models (A, B, C). Create a table. Which model explains the most variance in mental health days?
library(broom)
glance(model_A)
## # A tibble: 1 × 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.0197 0.0195 7.64 100. 2.03e-23 1 -17262. 34529. 34549.
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
glance(model_B)
## # A tibble: 1 × 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.0504 0.0498 7.52 88.3 1.10e-55 3 -17182. 34374. 34407.
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
glance(model_C)
## # A tibble: 1 × 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.157 0.156 7.09 155. 6.48e-181 6 -16885. 33785. 33837.
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Ans: Model C explains the most variance in mental health days
4b. (5 pts) What is the Root MSE for Model C? Interpret it in practical terms — what does it tell you about prediction accuracy?
Ans: Root MSE = 7.09. On average, the predicted number of poor mental health days from Model C differs from the actual reported number of poor mental health days by about 7 days per month.
4c. (10 pts) Using the ANOVA output for Model C,
fill in the following table manually (i.e., compute the values using the
output from anova() or glance()):
anova(model_C)
## Analysis of Variance Table
##
## Response: menthlth_days
## Df Sum Sq Mean Sq F value Pr(>F)
## sleep_hrs 1 5865 5864.8 116.6678 < 2.2e-16 ***
## age 1 6182 6182.2 122.9832 < 2.2e-16 ***
## sex 1 2947 2947.1 58.6266 2.274e-14 ***
## physhlth_days 1 29456 29455.5 585.9585 < 2.2e-16 ***
## income_cat 1 2177 2176.8 43.3031 5.169e-11 ***
## exercise 1 92 92.1 1.8326 0.1759
## Residuals 4993 250993 50.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
| Source | df | SS | MS | F |
|---|---|---|---|---|
| Model | 6 | 46,719 | 7,786.5 | 154.8 |
| Residual | 4,993 | 250,993 | 50.3 | |
| Total | 4,999 | 297,712 |
State the null hypothesis for the overall F-test and your conclusion. Ans: Null Hypothesis (H₀) :All regression coefficients are equal to zero. Alternative Hypothesis (H₁) : At least one predictor is associated with mental health days.
The overall F-test is highly statistically significant (p < 0.001). Therefore, we reject the null hypothesis and conclude that the predictors in Model C collectively explain variation in the number of poor mental health days. At least one of the variables in the model is associated with mental health days.
5a. (5 pts) For Model C, produce the four standard diagnostic plots (Residuals vs. Fitted, Normal Q-Q, Scale-Location, Cook’s Distance). Comment on what each plot tells you about the LINE assumptions.
par(mfrow=c(2,2))
plot(model_C)
Ans: Residual vs fitted suggests possible non linearity and
heteroscedasticity.
Q-Q plot shows clear deviation from normality.
Scale-location plot indicates non-constant variance.
Residual vs leverage shows no severe influential observations.
5b. (5 pts) Given the nature of the outcome (mental health days, bounded at 0 and 30, heavily right-skewed), which assumptions are most likely to be violated? Does this invalidate the analysis? Explain.
Ans: the outcome is bounded (0–30), discrete, and right-skewed, violations of normality and equal variance are expected. These issues do not necessarily invalidate the analysis, especially with a large sample size, but suggest that alternative models for count data (e.g., Poisson or negative binomial regression) could be more appropriate.
5c. (5 pts) Identify any observations with Cook’s Distance > 1. How many are there? What would you do with them in a real analysis?
cooks <- cooks.distance(model_C)
sum(cooks > 1)
## [1] 0
Ans: 0 observation.No observations had Cook’s Distance greater than 1.
Interpretation:In a real analysis, influential points should be investigated for data errors or unusual cases before deciding whether to exclude them.
Suppose you are writing the results section of a public health paper. Write a 3–4 sentence paragraph summarizing the findings from Model C for a non-statistical audience. Your paragraph should:
Ans : Adults who reported fewer hours of sleep tended to experience more days of poor mental health during the past month. Individuals with more physical health problems also reported substantially more poor mental health days. Higher income and regular exercise were associated with fewer days of poor mental health. Because the data were collected at a single point in time, these findings show relationships between these factors but do not establish cause-and-effect.
End of Lab Activity