Research Question: How do sex, state, and disease type predict chronic disease rates in the United States?
This project uses the U.S. Chronic Disease Indicators dataset to show patterns in chronic disease rates across the country. This dataset contains 309,215 observations and 34 variables, and it includes health data reported in the United States. Each row represents a specific health indicator for a certain state, year, disease topic, and demographic group. While the dataset has many variables, this analysis focuses on the most relevant ones for the research question: DataValue (the measured disease rate), Stratification1 (sex: Male/Female), Topic (chronic disease category), and LocationAbbr (state abbreviation). These variables allow us to study how chronic disease levels differ by demographic group, state, and disease type.
The dataset is from the source on Data.gov, it can be accessed directly at this following link: https://catalog.data.gov/dataset/u-s-chronic-disease-indicators/resource/011ec939-38cc-4d22-b2e9-fb81217225c9
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.4 âś” readr 2.1.5
## âś” forcats 1.0.0 âś” stringr 1.5.2
## âś” ggplot2 4.0.0 âś” tibble 3.3.0
## âś” lubridate 1.9.4 âś” tidyr 1.3.1
## âś” purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("~/Downloads")
data <- read_csv("U.S._Chronic_Disease_Indicators (1).csv")
## Rows: 309215 Columns: 34
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): LocationAbbr, LocationDesc, DataSource, Topic, Question, DataValue...
## dbl (6): YearStart, YearEnd, DataValue, DataValueAlt, LowConfidenceLimit, H...
## lgl (10): Response, StratificationCategory2, Stratification2, Stratification...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Load Dataset
data_small <- data %>%
select(DataValue, Stratification1, LocationAbbr, Topic, YearStart)
Selecting the variables needed
cleaned <- data_small %>%
filter(!is.na(DataValue), !is.na(Stratification1), !is.na(LocationAbbr), !is.na(Topic))
Cleaning the dataset and EDA
summary(cleaned$DataValue)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 12.4 27.0 694.4 57.8 2925456.0
For this analysis, I used a multiple linear regression model to examine how sex, state, and disease type predict chronic disease rates in the United States. The outcome variable, DataValue, is continuous, so the lm() function is right for this project. The model summary provides estimates, standard errors, p-values, and confidence intervals for each predictor. These values show how the average disease rate changes depending on sex, state, or disease category. Because the predictors are categorical, each coefficient represents the difference in the expected DataValue compared to its reference group.
model <- lm(DataValue ~ Stratification1 + LocationAbbr + Topic, data = cleaned)
summary(model)
##
## Call:
## lm(formula = DataValue ~ Stratification1 + LocationAbbr + Topic,
## data = cleaned)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15388 -972 -60 624 2910063
##
## Coefficients:
## Estimate
## (Intercept) -72.322
## Stratification1Age 0-44 -1559.477
## Stratification1Age 1-5 -453.044
## Stratification1Age 10-13 -524.653
## Stratification1Age 12-17 -428.589
## Stratification1Age 18-44 -738.314
## Stratification1Age 4 m - 5 y -554.564
## Stratification1Age 45-64 -847.928
## Stratification1Age 6-11 -423.262
## Stratification1Age 6-14 -555.599
## Stratification1Age 6-9 -513.017
## Stratification1American Indian or Alaska Native, non-Hispanic -1428.397
## Stratification1Asian or Pacific Islander, non-Hispanic -2561.558
## Stratification1Asian, non-Hispanic -1349.159
## Stratification1Black, non-Hispanic -1095.711
## Stratification1Female -286.484
## Stratification1Grade 10 -587.918
## Stratification1Grade 11 -585.769
## Stratification1Grade 12 -583.582
## Stratification1Grade 9 -589.833
## Stratification1Hawaiian or Pacific Islander, non-Hispanic -3089.947
## Stratification1Hispanic -1160.133
## Stratification1Male -244.324
## Stratification1Multiracial, non-Hispanic -985.752
## Stratification1Overall 258.191
## Stratification1White, non-Hispanic 1.780
## LocationAbbrAL 246.446
## LocationAbbrAR 148.399
## LocationAbbrAZ 299.231
## LocationAbbrCA 1452.190
## LocationAbbrCO 171.657
## LocationAbbrCT 144.817
## LocationAbbrDC -68.466
## LocationAbbrDE -59.478
## LocationAbbrFL 1500.001
## LocationAbbrGA 464.100
## LocationAbbrGU 619.396
## LocationAbbrHI 264.867
## LocationAbbrIA 107.020
## LocationAbbrID -74.300
## LocationAbbrIL 644.363
## LocationAbbrIN 365.572
## LocationAbbrKS 143.526
## LocationAbbrKY 246.335
## LocationAbbrLA 190.116
## LocationAbbrMA 319.379
## LocationAbbrMD 320.677
## LocationAbbrME -19.880
## LocationAbbrMI 611.206
## LocationAbbrMN 253.995
## LocationAbbrMO 358.145
## LocationAbbrMS 52.850
## LocationAbbrMT -11.431
## LocationAbbrNC 528.131
## LocationAbbrND -63.696
## LocationAbbrNE 69.524
## LocationAbbrNH -112.675
## LocationAbbrNJ 449.994
## LocationAbbrNM 40.701
## LocationAbbrNV 41.953
## LocationAbbrNY 985.457
## LocationAbbrOH 761.144
## LocationAbbrOK 246.466
## LocationAbbrOR 143.809
## LocationAbbrPA 774.404
## LocationAbbrPR 2322.837
## LocationAbbrRI -10.806
## LocationAbbrSC 261.141
## LocationAbbrSD -48.521
## LocationAbbrTN 398.372
## LocationAbbrTX 1138.902
## LocationAbbrUS 12193.639
## LocationAbbrUT 57.514
## LocationAbbrVA 385.973
## LocationAbbrVI 224.571
## LocationAbbrVT -114.358
## LocationAbbrWA 375.266
## LocationAbbrWI 243.799
## LocationAbbrWV 5.273
## LocationAbbrWY -201.407
## TopicArthritis 57.878
## TopicAsthma -192.354
## TopicCancer 2002.165
## TopicCardiovascular Disease 1958.491
## TopicChronic Kidney Disease 2004.379
## TopicChronic Obstructive Pulmonary Disease 3013.638
## TopicCognitive Health and Caregiving -372.399
## TopicDiabetes 798.566
## TopicDisability 151.484
## TopicHealth Status 64.822
## TopicImmunization 38.976
## TopicMaternal Health 90.922
## TopicMental Health 63.532
## TopicNutrition, Physical Activity, and Weight Status 105.806
## TopicOral Health 59.369
## TopicSleep 149.909
## TopicSocial Determinants of Health 65.596
## TopicTobacco 39.406
## Std. Error
## (Intercept) 332.625
## Stratification1Age 0-44 430.592
## Stratification1Age 1-5 1140.271
## Stratification1Age 10-13 1587.970
## Stratification1Age 12-17 1140.271
## Stratification1Age 18-44 251.080
## Stratification1Age 4 m - 5 y 1604.456
## Stratification1Age 45-64 231.551
## Stratification1Age 6-11 1140.271
## Stratification1Age 6-14 1604.456
## Stratification1Age 6-9 1587.970
## Stratification1American Indian or Alaska Native, non-Hispanic 235.867
## Stratification1Asian or Pacific Islander, non-Hispanic 375.541
## Stratification1Asian, non-Hispanic 249.421
## Stratification1Black, non-Hispanic 207.065
## Stratification1Female 198.276
## Stratification1Grade 10 541.259
## Stratification1Grade 11 541.259
## Stratification1Grade 12 541.259
## Stratification1Grade 9 541.259
## Stratification1Hawaiian or Pacific Islander, non-Hispanic 469.441
## Stratification1Hispanic 205.931
## Stratification1Male 198.528
## Stratification1Multiracial, non-Hispanic 226.483
## Stratification1Overall 196.190
## Stratification1White, non-Hispanic 197.602
## LocationAbbrAL 371.977
## LocationAbbrAR 370.241
## LocationAbbrAZ 357.107
## LocationAbbrCA 358.524
## LocationAbbrCO 362.320
## LocationAbbrCT 365.477
## LocationAbbrDC 380.273
## LocationAbbrDE 379.651
## LocationAbbrFL 373.374
## LocationAbbrGA 362.494
## LocationAbbrGU 434.812
## LocationAbbrHI 361.361
## LocationAbbrIA 370.794
## LocationAbbrID 384.690
## LocationAbbrIL 367.226
## LocationAbbrIN 367.325
## LocationAbbrKS 359.650
## LocationAbbrKY 375.705
## LocationAbbrLA 370.303
## LocationAbbrMA 363.185
## LocationAbbrMD 358.978
## LocationAbbrME 383.486
## LocationAbbrMI 360.178
## LocationAbbrMN 359.747
## LocationAbbrMO 363.075
## LocationAbbrMS 380.347
## LocationAbbrMT 376.498
## LocationAbbrNC 362.480
## LocationAbbrND 385.303
## LocationAbbrNE 366.839
## LocationAbbrNH 395.823
## LocationAbbrNJ 378.220
## LocationAbbrNM 366.586
## LocationAbbrNV 370.136
## LocationAbbrNY 353.950
## LocationAbbrOH 363.382
## LocationAbbrOK 359.172
## LocationAbbrOR 368.429
## LocationAbbrPA 365.590
## LocationAbbrPR 460.587
## LocationAbbrRI 374.802
## LocationAbbrSC 367.541
## LocationAbbrSD 383.130
## LocationAbbrTN 367.429
## LocationAbbrTX 357.847
## LocationAbbrUS 348.386
## LocationAbbrUT 367.723
## LocationAbbrVA 360.852
## LocationAbbrVI 668.248
## LocationAbbrVT 391.073
## LocationAbbrWA 354.096
## LocationAbbrWI 366.478
## LocationAbbrWV 385.187
## LocationAbbrWY 397.149
## TopicArthritis 193.849
## TopicAsthma 255.635
## TopicCancer 182.450
## TopicCardiovascular Disease 164.765
## TopicChronic Kidney Disease 1582.402
## TopicChronic Obstructive Pulmonary Disease 174.077
## TopicCognitive Health and Caregiving 290.450
## TopicDiabetes 198.709
## TopicDisability 283.127
## TopicHealth Status 171.834
## TopicImmunization 192.765
## TopicMaternal Health 667.952
## TopicMental Health 194.826
## TopicNutrition, Physical Activity, and Weight Status 171.661
## TopicOral Health 216.762
## TopicSleep 285.832
## TopicSocial Determinants of Health 192.839
## TopicTobacco 207.722
## t value Pr(>|t|)
## (Intercept) -0.217 0.827876
## Stratification1Age 0-44 -3.622 0.000293
## Stratification1Age 1-5 -0.397 0.691137
## Stratification1Age 10-13 -0.330 0.741104
## Stratification1Age 12-17 -0.376 0.707017
## Stratification1Age 18-44 -2.941 0.003277
## Stratification1Age 4 m - 5 y -0.346 0.729614
## Stratification1Age 45-64 -3.662 0.000250
## Stratification1Age 6-11 -0.371 0.710493
## Stratification1Age 6-14 -0.346 0.729129
## Stratification1Age 6-9 -0.323 0.746646
## Stratification1American Indian or Alaska Native, non-Hispanic -6.056 1.40e-09
## Stratification1Asian or Pacific Islander, non-Hispanic -6.821 9.07e-12
## Stratification1Asian, non-Hispanic -5.409 6.34e-08
## Stratification1Black, non-Hispanic -5.292 1.21e-07
## Stratification1Female -1.445 0.148494
## Stratification1Grade 10 -1.086 0.277389
## Stratification1Grade 11 -1.082 0.279149
## Stratification1Grade 12 -1.078 0.280948
## Stratification1Grade 9 -1.090 0.275828
## Stratification1Hawaiian or Pacific Islander, non-Hispanic -6.582 4.65e-11
## Stratification1Hispanic -5.634 1.77e-08
## Stratification1Male -1.231 0.218445
## Stratification1Multiracial, non-Hispanic -4.352 1.35e-05
## Stratification1Overall 1.316 0.188167
## Stratification1White, non-Hispanic 0.009 0.992813
## LocationAbbrAL 0.663 0.507633
## LocationAbbrAR 0.401 0.688555
## LocationAbbrAZ 0.838 0.402070
## LocationAbbrCA 4.050 5.11e-05
## LocationAbbrCO 0.474 0.635663
## LocationAbbrCT 0.396 0.691929
## LocationAbbrDC -0.180 0.857117
## LocationAbbrDE -0.157 0.875509
## LocationAbbrFL 4.017 5.89e-05
## LocationAbbrGA 1.280 0.200442
## LocationAbbrGU 1.425 0.154298
## LocationAbbrHI 0.733 0.463576
## LocationAbbrIA 0.289 0.772869
## LocationAbbrID -0.193 0.846847
## LocationAbbrIL 1.755 0.079316
## LocationAbbrIN 0.995 0.319627
## LocationAbbrKS 0.399 0.689840
## LocationAbbrKY 0.656 0.512044
## LocationAbbrLA 0.513 0.607667
## LocationAbbrMA 0.879 0.379193
## LocationAbbrMD 0.893 0.371696
## LocationAbbrME -0.052 0.958656
## LocationAbbrMI 1.697 0.089707
## LocationAbbrMN 0.706 0.480164
## LocationAbbrMO 0.986 0.323927
## LocationAbbrMS 0.139 0.889489
## LocationAbbrMT -0.030 0.975779
## LocationAbbrNC 1.457 0.145119
## LocationAbbrND -0.165 0.868698
## LocationAbbrNE 0.190 0.849685
## LocationAbbrNH -0.285 0.775905
## LocationAbbrNJ 1.190 0.234140
## LocationAbbrNM 0.111 0.911596
## LocationAbbrNV 0.113 0.909758
## LocationAbbrNY 2.784 0.005367
## LocationAbbrOH 2.095 0.036207
## LocationAbbrOK 0.686 0.492584
## LocationAbbrOR 0.390 0.696292
## LocationAbbrPA 2.118 0.034156
## LocationAbbrPR 5.043 4.58e-07
## LocationAbbrRI -0.029 0.976998
## LocationAbbrSC 0.711 0.477389
## LocationAbbrSD -0.127 0.899222
## LocationAbbrTN 1.084 0.278272
## LocationAbbrTX 3.183 0.001460
## LocationAbbrUS 35.000 < 2e-16
## LocationAbbrUT 0.156 0.875714
## LocationAbbrVA 1.070 0.284793
## LocationAbbrVI 0.336 0.736826
## LocationAbbrVT -0.292 0.769966
## LocationAbbrWA 1.060 0.289244
## LocationAbbrWI 0.665 0.505893
## LocationAbbrWV 0.014 0.989078
## LocationAbbrWY -0.507 0.612062
## TopicArthritis 0.299 0.765265
## TopicAsthma -0.752 0.451776
## TopicCancer 10.974 < 2e-16
## TopicCardiovascular Disease 11.887 < 2e-16
## TopicChronic Kidney Disease 1.267 0.205275
## TopicChronic Obstructive Pulmonary Disease 17.312 < 2e-16
## TopicCognitive Health and Caregiving -1.282 0.199792
## TopicDiabetes 4.019 5.85e-05
## TopicDisability 0.535 0.592623
## TopicHealth Status 0.377 0.705996
## TopicImmunization 0.202 0.839763
## TopicMaternal Health 0.136 0.891726
## TopicMental Health 0.326 0.744350
## TopicNutrition, Physical Activity, and Weight Status 0.616 0.537655
## TopicOral Health 0.274 0.784170
## TopicSleep 0.524 0.599956
## TopicSocial Determinants of Health 0.340 0.733735
## TopicTobacco 0.190 0.849539
##
## (Intercept)
## Stratification1Age 0-44 ***
## Stratification1Age 1-5
## Stratification1Age 10-13
## Stratification1Age 12-17
## Stratification1Age 18-44 **
## Stratification1Age 4 m - 5 y
## Stratification1Age 45-64 ***
## Stratification1Age 6-11
## Stratification1Age 6-14
## Stratification1Age 6-9
## Stratification1American Indian or Alaska Native, non-Hispanic ***
## Stratification1Asian or Pacific Islander, non-Hispanic ***
## Stratification1Asian, non-Hispanic ***
## Stratification1Black, non-Hispanic ***
## Stratification1Female
## Stratification1Grade 10
## Stratification1Grade 11
## Stratification1Grade 12
## Stratification1Grade 9
## Stratification1Hawaiian or Pacific Islander, non-Hispanic ***
## Stratification1Hispanic ***
## Stratification1Male
## Stratification1Multiracial, non-Hispanic ***
## Stratification1Overall
## Stratification1White, non-Hispanic
## LocationAbbrAL
## LocationAbbrAR
## LocationAbbrAZ
## LocationAbbrCA ***
## LocationAbbrCO
## LocationAbbrCT
## LocationAbbrDC
## LocationAbbrDE
## LocationAbbrFL ***
## LocationAbbrGA
## LocationAbbrGU
## LocationAbbrHI
## LocationAbbrIA
## LocationAbbrID
## LocationAbbrIL .
## LocationAbbrIN
## LocationAbbrKS
## LocationAbbrKY
## LocationAbbrLA
## LocationAbbrMA
## LocationAbbrMD
## LocationAbbrME
## LocationAbbrMI .
## LocationAbbrMN
## LocationAbbrMO
## LocationAbbrMS
## LocationAbbrMT
## LocationAbbrNC
## LocationAbbrND
## LocationAbbrNE
## LocationAbbrNH
## LocationAbbrNJ
## LocationAbbrNM
## LocationAbbrNV
## LocationAbbrNY **
## LocationAbbrOH *
## LocationAbbrOK
## LocationAbbrOR
## LocationAbbrPA *
## LocationAbbrPR ***
## LocationAbbrRI
## LocationAbbrSC
## LocationAbbrSD
## LocationAbbrTN
## LocationAbbrTX **
## LocationAbbrUS ***
## LocationAbbrUT
## LocationAbbrVA
## LocationAbbrVI
## LocationAbbrVT
## LocationAbbrWA
## LocationAbbrWI
## LocationAbbrWV
## LocationAbbrWY
## TopicArthritis
## TopicAsthma
## TopicCancer ***
## TopicCardiovascular Disease ***
## TopicChronic Kidney Disease
## TopicChronic Obstructive Pulmonary Disease ***
## TopicCognitive Health and Caregiving
## TopicDiabetes ***
## TopicDisability
## TopicHealth Status
## TopicImmunization
## TopicMaternal Health
## TopicMental Health
## TopicNutrition, Physical Activity, and Weight Status
## TopicOral Health
## TopicSleep
## TopicSocial Determinants of Health
## TopicTobacco
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16060 on 209098 degrees of freedom
## Multiple R-squared: 0.0182, Adjusted R-squared: 0.01775
## F-statistic: 39.96 on 97 and 209098 DF, p-value: < 2.2e-16
confint(model)
## 2.5 %
## (Intercept) -724.25846
## Stratification1Age 0-44 -2403.42755
## Stratification1Age 1-5 -2687.94647
## Stratification1Age 10-13 -3637.03462
## Stratification1Age 12-17 -2663.49118
## Stratification1Age 18-44 -1230.42446
## Stratification1Age 4 m - 5 y -3699.25733
## Stratification1Age 45-64 -1301.76246
## Stratification1Age 6-11 -2658.16425
## Stratification1Age 6-14 -3700.29291
## Stratification1Age 6-9 -3625.39905
## Stratification1American Indian or Alaska Native, non-Hispanic -1890.68982
## Stratification1Asian or Pacific Islander, non-Hispanic -3297.60963
## Stratification1Asian, non-Hispanic -1838.01819
## Stratification1Black, non-Hispanic -1501.55223
## Stratification1Female -675.09965
## Stratification1Grade 10 -1648.77148
## Stratification1Grade 11 -1646.62218
## Stratification1Grade 12 -1644.43536
## Stratification1Grade 9 -1650.68596
## Stratification1Hawaiian or Pacific Islander, non-Hispanic -4010.03949
## Stratification1Hispanic -1563.75315
## Stratification1Male -633.43378
## Stratification1Multiracial, non-Hispanic -1429.65291
## Stratification1Overall -126.33668
## Stratification1White, non-Hispanic -385.51550
## LocationAbbrAL -482.62079
## LocationAbbrAR -577.26478
## LocationAbbrAZ -400.68930
## LocationAbbrCA 749.49140
## LocationAbbrCO -538.48100
## LocationAbbrCT -571.50949
## LocationAbbrDC -813.79182
## LocationAbbrDE -803.58573
## LocationAbbrFL 768.19697
## LocationAbbrGA -246.37873
## LocationAbbrGU -232.82359
## LocationAbbrHI -443.39097
## LocationAbbrIA -619.72587
## LocationAbbrID -828.28353
## LocationAbbrIL -75.39090
## LocationAbbrIN -354.37684
## LocationAbbrKS -561.37901
## LocationAbbrKY -490.03788
## LocationAbbrLA -535.66857
## LocationAbbrMA -392.45358
## LocationAbbrMD -382.91205
## LocationAbbrME -771.50302
## LocationAbbrMI -94.73445
## LocationAbbrMN -451.09896
## LocationAbbrMO -353.47294
## LocationAbbrMS -692.62121
## LocationAbbrMT -749.35863
## LocationAbbrNC -182.31991
## LocationAbbrND -818.87963
## LocationAbbrNE -649.47229
## LocationAbbrNH -888.47919
## LocationAbbrNJ -291.30896
## LocationAbbrNM -677.79913
## LocationAbbrNV -683.50486
## LocationAbbrNY 291.72459
## LocationAbbrOH 48.92375
## LocationAbbrOK -457.50213
## LocationAbbrOR -578.30157
## LocationAbbrPA 57.85752
## LocationAbbrPR 1420.09893
## LocationAbbrRI -745.40822
## LocationAbbrSC -459.22988
## LocationAbbrSD -799.44703
## LocationAbbrTN -321.78078
## LocationAbbrTX 437.52939
## LocationAbbrUS 11510.81088
## LocationAbbrUT -663.21409
## LocationAbbrVA -321.28817
## LocationAbbrVI -1085.17857
## LocationAbbrVT -880.85117
## LocationAbbrWA -318.75408
## LocationAbbrWI -474.48994
## LocationAbbrWV -749.68331
## LocationAbbrWY -979.80938
## TopicArthritis -322.06031
## TopicAsthma -693.39184
## TopicCancer 1644.56865
## TopicCardiovascular Disease 1635.55523
## TopicChronic Kidney Disease -1097.09037
## TopicChronic Obstructive Pulmonary Disease 2672.45119
## TopicCognitive Health and Caregiving -941.67367
## TopicDiabetes 409.10158
## TopicDisability -403.43831
## TopicHealth Status -271.96751
## TopicImmunization -338.83744
## TopicMaternal Health -1218.24708
## TopicMental Health -318.32166
## TopicNutrition, Physical Activity, and Weight Status -230.64569
## TopicOral Health -365.47873
## TopicSleep -410.31564
## TopicSocial Determinants of Health -312.36307
## TopicTobacco -367.72334
## 97.5 %
## (Intercept) 579.6152
## Stratification1Age 0-44 -715.5267
## Stratification1Age 1-5 1781.8578
## Stratification1Age 10-13 2587.7285
## Stratification1Age 12-17 1806.3131
## Stratification1Age 18-44 -246.2044
## Stratification1Age 4 m - 5 y 2590.1297
## Stratification1Age 45-64 -394.0931
## Stratification1Age 6-11 1811.6400
## Stratification1Age 6-14 2589.0941
## Stratification1Age 6-9 2599.3641
## Stratification1American Indian or Alaska Native, non-Hispanic -966.1042
## Stratification1Asian or Pacific Islander, non-Hispanic -1825.5069
## Stratification1Asian, non-Hispanic -860.2991
## Stratification1Black, non-Hispanic -689.8689
## Stratification1Female 102.1319
## Stratification1Grade 10 472.9353
## Stratification1Grade 11 475.0846
## Stratification1Grade 12 477.2714
## Stratification1Grade 9 471.0208
## Stratification1Hawaiian or Pacific Islander, non-Hispanic -2169.8547
## Stratification1Hispanic -756.5138
## Stratification1Male 144.7858
## Stratification1Multiracial, non-Hispanic -541.8509
## Stratification1Overall 642.7186
## Stratification1White, non-Hispanic 389.0753
## LocationAbbrAL 975.5126
## LocationAbbrAR 874.0635
## LocationAbbrAZ 999.1519
## LocationAbbrCA 2154.8886
## LocationAbbrCO 881.7954
## LocationAbbrCT 861.1427
## LocationAbbrDC 676.8593
## LocationAbbrDE 684.6291
## LocationAbbrFL 2231.8057
## LocationAbbrGA 1174.5794
## LocationAbbrGU 1471.6164
## LocationAbbrHI 973.1258
## LocationAbbrIA 833.7665
## LocationAbbrID 679.6827
## LocationAbbrIL 1364.1159
## LocationAbbrIN 1085.5206
## LocationAbbrKS 848.4320
## LocationAbbrKY 982.7071
## LocationAbbrLA 915.9013
## LocationAbbrMA 1031.2121
## LocationAbbrMD 1024.2651
## LocationAbbrME 731.7432
## LocationAbbrMI 1317.1473
## LocationAbbrMN 959.0897
## LocationAbbrMO 1069.7631
## LocationAbbrMS 798.3204
## LocationAbbrMT 726.4968
## LocationAbbrNC 1238.5819
## LocationAbbrND 691.4885
## LocationAbbrNE 788.5200
## LocationAbbrNH 663.1293
## LocationAbbrNJ 1191.2961
## LocationAbbrNM 759.2004
## LocationAbbrNV 767.4101
## LocationAbbrNY 1679.1901
## LocationAbbrOH 1473.3642
## LocationAbbrOK 950.4346
## LocationAbbrOR 865.9202
## LocationAbbrPA 1490.9505
## LocationAbbrPR 3225.5759
## LocationAbbrRI 723.7955
## LocationAbbrSC 981.5126
## LocationAbbrSD 702.4048
## LocationAbbrTN 1118.5241
## LocationAbbrTX 1840.2739
## LocationAbbrUS 12876.4679
## LocationAbbrUT 778.2418
## LocationAbbrVA 1093.2351
## LocationAbbrVI 1534.3208
## LocationAbbrVT 652.1361
## LocationAbbrWA 1069.2862
## LocationAbbrWI 962.0872
## LocationAbbrWV 760.2294
## LocationAbbrWY 576.9944
## TopicArthritis 437.8168
## TopicAsthma 308.6832
## TopicCancer 2359.7623
## TopicCardiovascular Disease 2281.4275
## TopicChronic Kidney Disease 5105.8484
## TopicChronic Obstructive Pulmonary Disease 3354.8240
## TopicCognitive Health and Caregiving 196.8748
## TopicDiabetes 1188.0310
## TopicDisability 706.4068
## TopicHealth Status 401.6124
## TopicImmunization 416.7902
## TopicMaternal Health 1400.0917
## TopicMental Health 445.3866
## TopicNutrition, Physical Activity, and Weight Status 442.2572
## TopicOral Health 484.2160
## TopicSleep 710.1333
## TopicSocial Determinants of Health 443.5558
## TopicTobacco 446.5360
Interpretation In this model, each coefficient shows how much the average chronic disease rate changes compared to the reference group for that variable. A positive coefficient means that category has a higher average DataValue than the baseline group, and a negative coefficient means it has a lower average value. Several categories within sex, state, and disease topic were statistically significant, meaning their p-values were below 0.05, while others were not. This shows that some demographic groups, states, and disease topics show different levels of chronic disease rates compared to their reference groups. Some of the strongest significant predictors included certain age groups in Stratification1 and several disease topics like Cancer, Cardiovascular Disease, Chronic Obstructive Pulmonary Disease, and Diabetes.
For this multiple linear regression model, I checked the standard assumptions, linearity, independence, homoscedasticity, normality of residuals, and multicollinearity. These checks help understand if the model is right for the data and if the results can be interpreted correctly. The diagnostic plots were used to visually show the assumptions. The residuals vs. fitted plot helps interpret linearity and constant variance. The Q-Q plot shows if residuals follow a normal distribution. The Scale-Location plot checks for homoscedasticity, and the Residuals vs. Leverage plot helps show any influential points. Overall, the plots show some variability and deviations, which is expected with a very big dataset.
par(mfrow = c(2,2))
plot(model)
summary(residuals(model))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -15388.20 -972.09 -59.98 0.00 624.13 2910062.85
In this project, I looked at how sex, state, and disease topic relate to chronic disease rates. The results showed that a few disease topics were significant, meaning they had higher or lower average values compared to the baseline topic. Most sex and state categories were not significant, so disease type explained more of the differences in rates than the other variables. The R squared value was low, which means the model didn’t explain much of the variation in the data. This is expected because chronic disease rates depend on many other factors that were not included. In the future, adding more variables or testing different models could help improve the results.