A. Introduction

Research Question: How do sex, state, and disease type predict chronic disease rates in the United States?

This project uses the U.S. Chronic Disease Indicators dataset to show patterns in chronic disease rates across the country. This dataset contains 309,215 observations and 34 variables, and it includes health data reported in the United States. Each row represents a specific health indicator for a certain state, year, disease topic, and demographic group. While the dataset has many variables, this analysis focuses on the most relevant ones for the research question: DataValue (the measured disease rate), Stratification1 (sex: Male/Female), Topic (chronic disease category), and LocationAbbr (state abbreviation). These variables allow us to study how chronic disease levels differ by demographic group, state, and disease type.

The dataset is from the source on Data.gov, it can be accessed directly at this following link: https://catalog.data.gov/dataset/u-s-chronic-disease-indicators/resource/011ec939-38cc-4d22-b2e9-fb81217225c9

B. Data Analysis

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.4     âś” readr     2.1.5
## âś” forcats   1.0.0     âś” stringr   1.5.2
## âś” ggplot2   4.0.0     âś” tibble    3.3.0
## âś” lubridate 1.9.4     âś” tidyr     1.3.1
## âś” purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("~/Downloads")
data <- read_csv("U.S._Chronic_Disease_Indicators (1).csv")
## Rows: 309215 Columns: 34
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): LocationAbbr, LocationDesc, DataSource, Topic, Question, DataValue...
## dbl  (6): YearStart, YearEnd, DataValue, DataValueAlt, LowConfidenceLimit, H...
## lgl (10): Response, StratificationCategory2, Stratification2, Stratification...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Load Dataset

data_small <- data %>%
  select(DataValue, Stratification1, LocationAbbr, Topic, YearStart)

Selecting the variables needed

cleaned <- data_small %>%
  filter(!is.na(DataValue), !is.na(Stratification1), !is.na(LocationAbbr), !is.na(Topic))

Cleaning the dataset and EDA

summary(cleaned$DataValue)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##       0.0      12.4      27.0     694.4      57.8 2925456.0

C. Regression Analysis

For this analysis, I used a multiple linear regression model to examine how sex, state, and disease type predict chronic disease rates in the United States. The outcome variable, DataValue, is continuous, so the lm() function is right for this project. The model summary provides estimates, standard errors, p-values, and confidence intervals for each predictor. These values show how the average disease rate changes depending on sex, state, or disease category. Because the predictors are categorical, each coefficient represents the difference in the expected DataValue compared to its reference group.

model <- lm(DataValue ~ Stratification1 + LocationAbbr + Topic, data = cleaned)
summary(model)
## 
## Call:
## lm(formula = DataValue ~ Stratification1 + LocationAbbr + Topic, 
##     data = cleaned)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -15388    -972     -60     624 2910063 
## 
## Coefficients:
##                                                                Estimate
## (Intercept)                                                     -72.322
## Stratification1Age 0-44                                       -1559.477
## Stratification1Age 1-5                                         -453.044
## Stratification1Age 10-13                                       -524.653
## Stratification1Age 12-17                                       -428.589
## Stratification1Age 18-44                                       -738.314
## Stratification1Age 4 m - 5 y                                   -554.564
## Stratification1Age 45-64                                       -847.928
## Stratification1Age 6-11                                        -423.262
## Stratification1Age 6-14                                        -555.599
## Stratification1Age 6-9                                         -513.017
## Stratification1American Indian or Alaska Native, non-Hispanic -1428.397
## Stratification1Asian or Pacific Islander, non-Hispanic        -2561.558
## Stratification1Asian, non-Hispanic                            -1349.159
## Stratification1Black, non-Hispanic                            -1095.711
## Stratification1Female                                          -286.484
## Stratification1Grade 10                                        -587.918
## Stratification1Grade 11                                        -585.769
## Stratification1Grade 12                                        -583.582
## Stratification1Grade 9                                         -589.833
## Stratification1Hawaiian or Pacific Islander, non-Hispanic     -3089.947
## Stratification1Hispanic                                       -1160.133
## Stratification1Male                                            -244.324
## Stratification1Multiracial, non-Hispanic                       -985.752
## Stratification1Overall                                          258.191
## Stratification1White, non-Hispanic                                1.780
## LocationAbbrAL                                                  246.446
## LocationAbbrAR                                                  148.399
## LocationAbbrAZ                                                  299.231
## LocationAbbrCA                                                 1452.190
## LocationAbbrCO                                                  171.657
## LocationAbbrCT                                                  144.817
## LocationAbbrDC                                                  -68.466
## LocationAbbrDE                                                  -59.478
## LocationAbbrFL                                                 1500.001
## LocationAbbrGA                                                  464.100
## LocationAbbrGU                                                  619.396
## LocationAbbrHI                                                  264.867
## LocationAbbrIA                                                  107.020
## LocationAbbrID                                                  -74.300
## LocationAbbrIL                                                  644.363
## LocationAbbrIN                                                  365.572
## LocationAbbrKS                                                  143.526
## LocationAbbrKY                                                  246.335
## LocationAbbrLA                                                  190.116
## LocationAbbrMA                                                  319.379
## LocationAbbrMD                                                  320.677
## LocationAbbrME                                                  -19.880
## LocationAbbrMI                                                  611.206
## LocationAbbrMN                                                  253.995
## LocationAbbrMO                                                  358.145
## LocationAbbrMS                                                   52.850
## LocationAbbrMT                                                  -11.431
## LocationAbbrNC                                                  528.131
## LocationAbbrND                                                  -63.696
## LocationAbbrNE                                                   69.524
## LocationAbbrNH                                                 -112.675
## LocationAbbrNJ                                                  449.994
## LocationAbbrNM                                                   40.701
## LocationAbbrNV                                                   41.953
## LocationAbbrNY                                                  985.457
## LocationAbbrOH                                                  761.144
## LocationAbbrOK                                                  246.466
## LocationAbbrOR                                                  143.809
## LocationAbbrPA                                                  774.404
## LocationAbbrPR                                                 2322.837
## LocationAbbrRI                                                  -10.806
## LocationAbbrSC                                                  261.141
## LocationAbbrSD                                                  -48.521
## LocationAbbrTN                                                  398.372
## LocationAbbrTX                                                 1138.902
## LocationAbbrUS                                                12193.639
## LocationAbbrUT                                                   57.514
## LocationAbbrVA                                                  385.973
## LocationAbbrVI                                                  224.571
## LocationAbbrVT                                                 -114.358
## LocationAbbrWA                                                  375.266
## LocationAbbrWI                                                  243.799
## LocationAbbrWV                                                    5.273
## LocationAbbrWY                                                 -201.407
## TopicArthritis                                                   57.878
## TopicAsthma                                                    -192.354
## TopicCancer                                                    2002.165
## TopicCardiovascular Disease                                    1958.491
## TopicChronic Kidney Disease                                    2004.379
## TopicChronic Obstructive Pulmonary Disease                     3013.638
## TopicCognitive Health and Caregiving                           -372.399
## TopicDiabetes                                                   798.566
## TopicDisability                                                 151.484
## TopicHealth Status                                               64.822
## TopicImmunization                                                38.976
## TopicMaternal Health                                             90.922
## TopicMental Health                                               63.532
## TopicNutrition, Physical Activity, and Weight Status            105.806
## TopicOral Health                                                 59.369
## TopicSleep                                                      149.909
## TopicSocial Determinants of Health                               65.596
## TopicTobacco                                                     39.406
##                                                               Std. Error
## (Intercept)                                                      332.625
## Stratification1Age 0-44                                          430.592
## Stratification1Age 1-5                                          1140.271
## Stratification1Age 10-13                                        1587.970
## Stratification1Age 12-17                                        1140.271
## Stratification1Age 18-44                                         251.080
## Stratification1Age 4 m - 5 y                                    1604.456
## Stratification1Age 45-64                                         231.551
## Stratification1Age 6-11                                         1140.271
## Stratification1Age 6-14                                         1604.456
## Stratification1Age 6-9                                          1587.970
## Stratification1American Indian or Alaska Native, non-Hispanic    235.867
## Stratification1Asian or Pacific Islander, non-Hispanic           375.541
## Stratification1Asian, non-Hispanic                               249.421
## Stratification1Black, non-Hispanic                               207.065
## Stratification1Female                                            198.276
## Stratification1Grade 10                                          541.259
## Stratification1Grade 11                                          541.259
## Stratification1Grade 12                                          541.259
## Stratification1Grade 9                                           541.259
## Stratification1Hawaiian or Pacific Islander, non-Hispanic        469.441
## Stratification1Hispanic                                          205.931
## Stratification1Male                                              198.528
## Stratification1Multiracial, non-Hispanic                         226.483
## Stratification1Overall                                           196.190
## Stratification1White, non-Hispanic                               197.602
## LocationAbbrAL                                                   371.977
## LocationAbbrAR                                                   370.241
## LocationAbbrAZ                                                   357.107
## LocationAbbrCA                                                   358.524
## LocationAbbrCO                                                   362.320
## LocationAbbrCT                                                   365.477
## LocationAbbrDC                                                   380.273
## LocationAbbrDE                                                   379.651
## LocationAbbrFL                                                   373.374
## LocationAbbrGA                                                   362.494
## LocationAbbrGU                                                   434.812
## LocationAbbrHI                                                   361.361
## LocationAbbrIA                                                   370.794
## LocationAbbrID                                                   384.690
## LocationAbbrIL                                                   367.226
## LocationAbbrIN                                                   367.325
## LocationAbbrKS                                                   359.650
## LocationAbbrKY                                                   375.705
## LocationAbbrLA                                                   370.303
## LocationAbbrMA                                                   363.185
## LocationAbbrMD                                                   358.978
## LocationAbbrME                                                   383.486
## LocationAbbrMI                                                   360.178
## LocationAbbrMN                                                   359.747
## LocationAbbrMO                                                   363.075
## LocationAbbrMS                                                   380.347
## LocationAbbrMT                                                   376.498
## LocationAbbrNC                                                   362.480
## LocationAbbrND                                                   385.303
## LocationAbbrNE                                                   366.839
## LocationAbbrNH                                                   395.823
## LocationAbbrNJ                                                   378.220
## LocationAbbrNM                                                   366.586
## LocationAbbrNV                                                   370.136
## LocationAbbrNY                                                   353.950
## LocationAbbrOH                                                   363.382
## LocationAbbrOK                                                   359.172
## LocationAbbrOR                                                   368.429
## LocationAbbrPA                                                   365.590
## LocationAbbrPR                                                   460.587
## LocationAbbrRI                                                   374.802
## LocationAbbrSC                                                   367.541
## LocationAbbrSD                                                   383.130
## LocationAbbrTN                                                   367.429
## LocationAbbrTX                                                   357.847
## LocationAbbrUS                                                   348.386
## LocationAbbrUT                                                   367.723
## LocationAbbrVA                                                   360.852
## LocationAbbrVI                                                   668.248
## LocationAbbrVT                                                   391.073
## LocationAbbrWA                                                   354.096
## LocationAbbrWI                                                   366.478
## LocationAbbrWV                                                   385.187
## LocationAbbrWY                                                   397.149
## TopicArthritis                                                   193.849
## TopicAsthma                                                      255.635
## TopicCancer                                                      182.450
## TopicCardiovascular Disease                                      164.765
## TopicChronic Kidney Disease                                     1582.402
## TopicChronic Obstructive Pulmonary Disease                       174.077
## TopicCognitive Health and Caregiving                             290.450
## TopicDiabetes                                                    198.709
## TopicDisability                                                  283.127
## TopicHealth Status                                               171.834
## TopicImmunization                                                192.765
## TopicMaternal Health                                             667.952
## TopicMental Health                                               194.826
## TopicNutrition, Physical Activity, and Weight Status             171.661
## TopicOral Health                                                 216.762
## TopicSleep                                                       285.832
## TopicSocial Determinants of Health                               192.839
## TopicTobacco                                                     207.722
##                                                               t value Pr(>|t|)
## (Intercept)                                                    -0.217 0.827876
## Stratification1Age 0-44                                        -3.622 0.000293
## Stratification1Age 1-5                                         -0.397 0.691137
## Stratification1Age 10-13                                       -0.330 0.741104
## Stratification1Age 12-17                                       -0.376 0.707017
## Stratification1Age 18-44                                       -2.941 0.003277
## Stratification1Age 4 m - 5 y                                   -0.346 0.729614
## Stratification1Age 45-64                                       -3.662 0.000250
## Stratification1Age 6-11                                        -0.371 0.710493
## Stratification1Age 6-14                                        -0.346 0.729129
## Stratification1Age 6-9                                         -0.323 0.746646
## Stratification1American Indian or Alaska Native, non-Hispanic  -6.056 1.40e-09
## Stratification1Asian or Pacific Islander, non-Hispanic         -6.821 9.07e-12
## Stratification1Asian, non-Hispanic                             -5.409 6.34e-08
## Stratification1Black, non-Hispanic                             -5.292 1.21e-07
## Stratification1Female                                          -1.445 0.148494
## Stratification1Grade 10                                        -1.086 0.277389
## Stratification1Grade 11                                        -1.082 0.279149
## Stratification1Grade 12                                        -1.078 0.280948
## Stratification1Grade 9                                         -1.090 0.275828
## Stratification1Hawaiian or Pacific Islander, non-Hispanic      -6.582 4.65e-11
## Stratification1Hispanic                                        -5.634 1.77e-08
## Stratification1Male                                            -1.231 0.218445
## Stratification1Multiracial, non-Hispanic                       -4.352 1.35e-05
## Stratification1Overall                                          1.316 0.188167
## Stratification1White, non-Hispanic                              0.009 0.992813
## LocationAbbrAL                                                  0.663 0.507633
## LocationAbbrAR                                                  0.401 0.688555
## LocationAbbrAZ                                                  0.838 0.402070
## LocationAbbrCA                                                  4.050 5.11e-05
## LocationAbbrCO                                                  0.474 0.635663
## LocationAbbrCT                                                  0.396 0.691929
## LocationAbbrDC                                                 -0.180 0.857117
## LocationAbbrDE                                                 -0.157 0.875509
## LocationAbbrFL                                                  4.017 5.89e-05
## LocationAbbrGA                                                  1.280 0.200442
## LocationAbbrGU                                                  1.425 0.154298
## LocationAbbrHI                                                  0.733 0.463576
## LocationAbbrIA                                                  0.289 0.772869
## LocationAbbrID                                                 -0.193 0.846847
## LocationAbbrIL                                                  1.755 0.079316
## LocationAbbrIN                                                  0.995 0.319627
## LocationAbbrKS                                                  0.399 0.689840
## LocationAbbrKY                                                  0.656 0.512044
## LocationAbbrLA                                                  0.513 0.607667
## LocationAbbrMA                                                  0.879 0.379193
## LocationAbbrMD                                                  0.893 0.371696
## LocationAbbrME                                                 -0.052 0.958656
## LocationAbbrMI                                                  1.697 0.089707
## LocationAbbrMN                                                  0.706 0.480164
## LocationAbbrMO                                                  0.986 0.323927
## LocationAbbrMS                                                  0.139 0.889489
## LocationAbbrMT                                                 -0.030 0.975779
## LocationAbbrNC                                                  1.457 0.145119
## LocationAbbrND                                                 -0.165 0.868698
## LocationAbbrNE                                                  0.190 0.849685
## LocationAbbrNH                                                 -0.285 0.775905
## LocationAbbrNJ                                                  1.190 0.234140
## LocationAbbrNM                                                  0.111 0.911596
## LocationAbbrNV                                                  0.113 0.909758
## LocationAbbrNY                                                  2.784 0.005367
## LocationAbbrOH                                                  2.095 0.036207
## LocationAbbrOK                                                  0.686 0.492584
## LocationAbbrOR                                                  0.390 0.696292
## LocationAbbrPA                                                  2.118 0.034156
## LocationAbbrPR                                                  5.043 4.58e-07
## LocationAbbrRI                                                 -0.029 0.976998
## LocationAbbrSC                                                  0.711 0.477389
## LocationAbbrSD                                                 -0.127 0.899222
## LocationAbbrTN                                                  1.084 0.278272
## LocationAbbrTX                                                  3.183 0.001460
## LocationAbbrUS                                                 35.000  < 2e-16
## LocationAbbrUT                                                  0.156 0.875714
## LocationAbbrVA                                                  1.070 0.284793
## LocationAbbrVI                                                  0.336 0.736826
## LocationAbbrVT                                                 -0.292 0.769966
## LocationAbbrWA                                                  1.060 0.289244
## LocationAbbrWI                                                  0.665 0.505893
## LocationAbbrWV                                                  0.014 0.989078
## LocationAbbrWY                                                 -0.507 0.612062
## TopicArthritis                                                  0.299 0.765265
## TopicAsthma                                                    -0.752 0.451776
## TopicCancer                                                    10.974  < 2e-16
## TopicCardiovascular Disease                                    11.887  < 2e-16
## TopicChronic Kidney Disease                                     1.267 0.205275
## TopicChronic Obstructive Pulmonary Disease                     17.312  < 2e-16
## TopicCognitive Health and Caregiving                           -1.282 0.199792
## TopicDiabetes                                                   4.019 5.85e-05
## TopicDisability                                                 0.535 0.592623
## TopicHealth Status                                              0.377 0.705996
## TopicImmunization                                               0.202 0.839763
## TopicMaternal Health                                            0.136 0.891726
## TopicMental Health                                              0.326 0.744350
## TopicNutrition, Physical Activity, and Weight Status            0.616 0.537655
## TopicOral Health                                                0.274 0.784170
## TopicSleep                                                      0.524 0.599956
## TopicSocial Determinants of Health                              0.340 0.733735
## TopicTobacco                                                    0.190 0.849539
##                                                                  
## (Intercept)                                                      
## Stratification1Age 0-44                                       ***
## Stratification1Age 1-5                                           
## Stratification1Age 10-13                                         
## Stratification1Age 12-17                                         
## Stratification1Age 18-44                                      ** 
## Stratification1Age 4 m - 5 y                                     
## Stratification1Age 45-64                                      ***
## Stratification1Age 6-11                                          
## Stratification1Age 6-14                                          
## Stratification1Age 6-9                                           
## Stratification1American Indian or Alaska Native, non-Hispanic ***
## Stratification1Asian or Pacific Islander, non-Hispanic        ***
## Stratification1Asian, non-Hispanic                            ***
## Stratification1Black, non-Hispanic                            ***
## Stratification1Female                                            
## Stratification1Grade 10                                          
## Stratification1Grade 11                                          
## Stratification1Grade 12                                          
## Stratification1Grade 9                                           
## Stratification1Hawaiian or Pacific Islander, non-Hispanic     ***
## Stratification1Hispanic                                       ***
## Stratification1Male                                              
## Stratification1Multiracial, non-Hispanic                      ***
## Stratification1Overall                                           
## Stratification1White, non-Hispanic                               
## LocationAbbrAL                                                   
## LocationAbbrAR                                                   
## LocationAbbrAZ                                                   
## LocationAbbrCA                                                ***
## LocationAbbrCO                                                   
## LocationAbbrCT                                                   
## LocationAbbrDC                                                   
## LocationAbbrDE                                                   
## LocationAbbrFL                                                ***
## LocationAbbrGA                                                   
## LocationAbbrGU                                                   
## LocationAbbrHI                                                   
## LocationAbbrIA                                                   
## LocationAbbrID                                                   
## LocationAbbrIL                                                .  
## LocationAbbrIN                                                   
## LocationAbbrKS                                                   
## LocationAbbrKY                                                   
## LocationAbbrLA                                                   
## LocationAbbrMA                                                   
## LocationAbbrMD                                                   
## LocationAbbrME                                                   
## LocationAbbrMI                                                .  
## LocationAbbrMN                                                   
## LocationAbbrMO                                                   
## LocationAbbrMS                                                   
## LocationAbbrMT                                                   
## LocationAbbrNC                                                   
## LocationAbbrND                                                   
## LocationAbbrNE                                                   
## LocationAbbrNH                                                   
## LocationAbbrNJ                                                   
## LocationAbbrNM                                                   
## LocationAbbrNV                                                   
## LocationAbbrNY                                                ** 
## LocationAbbrOH                                                *  
## LocationAbbrOK                                                   
## LocationAbbrOR                                                   
## LocationAbbrPA                                                *  
## LocationAbbrPR                                                ***
## LocationAbbrRI                                                   
## LocationAbbrSC                                                   
## LocationAbbrSD                                                   
## LocationAbbrTN                                                   
## LocationAbbrTX                                                ** 
## LocationAbbrUS                                                ***
## LocationAbbrUT                                                   
## LocationAbbrVA                                                   
## LocationAbbrVI                                                   
## LocationAbbrVT                                                   
## LocationAbbrWA                                                   
## LocationAbbrWI                                                   
## LocationAbbrWV                                                   
## LocationAbbrWY                                                   
## TopicArthritis                                                   
## TopicAsthma                                                      
## TopicCancer                                                   ***
## TopicCardiovascular Disease                                   ***
## TopicChronic Kidney Disease                                      
## TopicChronic Obstructive Pulmonary Disease                    ***
## TopicCognitive Health and Caregiving                             
## TopicDiabetes                                                 ***
## TopicDisability                                                  
## TopicHealth Status                                               
## TopicImmunization                                                
## TopicMaternal Health                                             
## TopicMental Health                                               
## TopicNutrition, Physical Activity, and Weight Status             
## TopicOral Health                                                 
## TopicSleep                                                       
## TopicSocial Determinants of Health                               
## TopicTobacco                                                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16060 on 209098 degrees of freedom
## Multiple R-squared:  0.0182, Adjusted R-squared:  0.01775 
## F-statistic: 39.96 on 97 and 209098 DF,  p-value: < 2.2e-16
confint(model)
##                                                                     2.5 %
## (Intercept)                                                    -724.25846
## Stratification1Age 0-44                                       -2403.42755
## Stratification1Age 1-5                                        -2687.94647
## Stratification1Age 10-13                                      -3637.03462
## Stratification1Age 12-17                                      -2663.49118
## Stratification1Age 18-44                                      -1230.42446
## Stratification1Age 4 m - 5 y                                  -3699.25733
## Stratification1Age 45-64                                      -1301.76246
## Stratification1Age 6-11                                       -2658.16425
## Stratification1Age 6-14                                       -3700.29291
## Stratification1Age 6-9                                        -3625.39905
## Stratification1American Indian or Alaska Native, non-Hispanic -1890.68982
## Stratification1Asian or Pacific Islander, non-Hispanic        -3297.60963
## Stratification1Asian, non-Hispanic                            -1838.01819
## Stratification1Black, non-Hispanic                            -1501.55223
## Stratification1Female                                          -675.09965
## Stratification1Grade 10                                       -1648.77148
## Stratification1Grade 11                                       -1646.62218
## Stratification1Grade 12                                       -1644.43536
## Stratification1Grade 9                                        -1650.68596
## Stratification1Hawaiian or Pacific Islander, non-Hispanic     -4010.03949
## Stratification1Hispanic                                       -1563.75315
## Stratification1Male                                            -633.43378
## Stratification1Multiracial, non-Hispanic                      -1429.65291
## Stratification1Overall                                         -126.33668
## Stratification1White, non-Hispanic                             -385.51550
## LocationAbbrAL                                                 -482.62079
## LocationAbbrAR                                                 -577.26478
## LocationAbbrAZ                                                 -400.68930
## LocationAbbrCA                                                  749.49140
## LocationAbbrCO                                                 -538.48100
## LocationAbbrCT                                                 -571.50949
## LocationAbbrDC                                                 -813.79182
## LocationAbbrDE                                                 -803.58573
## LocationAbbrFL                                                  768.19697
## LocationAbbrGA                                                 -246.37873
## LocationAbbrGU                                                 -232.82359
## LocationAbbrHI                                                 -443.39097
## LocationAbbrIA                                                 -619.72587
## LocationAbbrID                                                 -828.28353
## LocationAbbrIL                                                  -75.39090
## LocationAbbrIN                                                 -354.37684
## LocationAbbrKS                                                 -561.37901
## LocationAbbrKY                                                 -490.03788
## LocationAbbrLA                                                 -535.66857
## LocationAbbrMA                                                 -392.45358
## LocationAbbrMD                                                 -382.91205
## LocationAbbrME                                                 -771.50302
## LocationAbbrMI                                                  -94.73445
## LocationAbbrMN                                                 -451.09896
## LocationAbbrMO                                                 -353.47294
## LocationAbbrMS                                                 -692.62121
## LocationAbbrMT                                                 -749.35863
## LocationAbbrNC                                                 -182.31991
## LocationAbbrND                                                 -818.87963
## LocationAbbrNE                                                 -649.47229
## LocationAbbrNH                                                 -888.47919
## LocationAbbrNJ                                                 -291.30896
## LocationAbbrNM                                                 -677.79913
## LocationAbbrNV                                                 -683.50486
## LocationAbbrNY                                                  291.72459
## LocationAbbrOH                                                   48.92375
## LocationAbbrOK                                                 -457.50213
## LocationAbbrOR                                                 -578.30157
## LocationAbbrPA                                                   57.85752
## LocationAbbrPR                                                 1420.09893
## LocationAbbrRI                                                 -745.40822
## LocationAbbrSC                                                 -459.22988
## LocationAbbrSD                                                 -799.44703
## LocationAbbrTN                                                 -321.78078
## LocationAbbrTX                                                  437.52939
## LocationAbbrUS                                                11510.81088
## LocationAbbrUT                                                 -663.21409
## LocationAbbrVA                                                 -321.28817
## LocationAbbrVI                                                -1085.17857
## LocationAbbrVT                                                 -880.85117
## LocationAbbrWA                                                 -318.75408
## LocationAbbrWI                                                 -474.48994
## LocationAbbrWV                                                 -749.68331
## LocationAbbrWY                                                 -979.80938
## TopicArthritis                                                 -322.06031
## TopicAsthma                                                    -693.39184
## TopicCancer                                                    1644.56865
## TopicCardiovascular Disease                                    1635.55523
## TopicChronic Kidney Disease                                   -1097.09037
## TopicChronic Obstructive Pulmonary Disease                     2672.45119
## TopicCognitive Health and Caregiving                           -941.67367
## TopicDiabetes                                                   409.10158
## TopicDisability                                                -403.43831
## TopicHealth Status                                             -271.96751
## TopicImmunization                                              -338.83744
## TopicMaternal Health                                          -1218.24708
## TopicMental Health                                             -318.32166
## TopicNutrition, Physical Activity, and Weight Status           -230.64569
## TopicOral Health                                               -365.47873
## TopicSleep                                                     -410.31564
## TopicSocial Determinants of Health                             -312.36307
## TopicTobacco                                                   -367.72334
##                                                                   97.5 %
## (Intercept)                                                     579.6152
## Stratification1Age 0-44                                        -715.5267
## Stratification1Age 1-5                                         1781.8578
## Stratification1Age 10-13                                       2587.7285
## Stratification1Age 12-17                                       1806.3131
## Stratification1Age 18-44                                       -246.2044
## Stratification1Age 4 m - 5 y                                   2590.1297
## Stratification1Age 45-64                                       -394.0931
## Stratification1Age 6-11                                        1811.6400
## Stratification1Age 6-14                                        2589.0941
## Stratification1Age 6-9                                         2599.3641
## Stratification1American Indian or Alaska Native, non-Hispanic  -966.1042
## Stratification1Asian or Pacific Islander, non-Hispanic        -1825.5069
## Stratification1Asian, non-Hispanic                             -860.2991
## Stratification1Black, non-Hispanic                             -689.8689
## Stratification1Female                                           102.1319
## Stratification1Grade 10                                         472.9353
## Stratification1Grade 11                                         475.0846
## Stratification1Grade 12                                         477.2714
## Stratification1Grade 9                                          471.0208
## Stratification1Hawaiian or Pacific Islander, non-Hispanic     -2169.8547
## Stratification1Hispanic                                        -756.5138
## Stratification1Male                                             144.7858
## Stratification1Multiracial, non-Hispanic                       -541.8509
## Stratification1Overall                                          642.7186
## Stratification1White, non-Hispanic                              389.0753
## LocationAbbrAL                                                  975.5126
## LocationAbbrAR                                                  874.0635
## LocationAbbrAZ                                                  999.1519
## LocationAbbrCA                                                 2154.8886
## LocationAbbrCO                                                  881.7954
## LocationAbbrCT                                                  861.1427
## LocationAbbrDC                                                  676.8593
## LocationAbbrDE                                                  684.6291
## LocationAbbrFL                                                 2231.8057
## LocationAbbrGA                                                 1174.5794
## LocationAbbrGU                                                 1471.6164
## LocationAbbrHI                                                  973.1258
## LocationAbbrIA                                                  833.7665
## LocationAbbrID                                                  679.6827
## LocationAbbrIL                                                 1364.1159
## LocationAbbrIN                                                 1085.5206
## LocationAbbrKS                                                  848.4320
## LocationAbbrKY                                                  982.7071
## LocationAbbrLA                                                  915.9013
## LocationAbbrMA                                                 1031.2121
## LocationAbbrMD                                                 1024.2651
## LocationAbbrME                                                  731.7432
## LocationAbbrMI                                                 1317.1473
## LocationAbbrMN                                                  959.0897
## LocationAbbrMO                                                 1069.7631
## LocationAbbrMS                                                  798.3204
## LocationAbbrMT                                                  726.4968
## LocationAbbrNC                                                 1238.5819
## LocationAbbrND                                                  691.4885
## LocationAbbrNE                                                  788.5200
## LocationAbbrNH                                                  663.1293
## LocationAbbrNJ                                                 1191.2961
## LocationAbbrNM                                                  759.2004
## LocationAbbrNV                                                  767.4101
## LocationAbbrNY                                                 1679.1901
## LocationAbbrOH                                                 1473.3642
## LocationAbbrOK                                                  950.4346
## LocationAbbrOR                                                  865.9202
## LocationAbbrPA                                                 1490.9505
## LocationAbbrPR                                                 3225.5759
## LocationAbbrRI                                                  723.7955
## LocationAbbrSC                                                  981.5126
## LocationAbbrSD                                                  702.4048
## LocationAbbrTN                                                 1118.5241
## LocationAbbrTX                                                 1840.2739
## LocationAbbrUS                                                12876.4679
## LocationAbbrUT                                                  778.2418
## LocationAbbrVA                                                 1093.2351
## LocationAbbrVI                                                 1534.3208
## LocationAbbrVT                                                  652.1361
## LocationAbbrWA                                                 1069.2862
## LocationAbbrWI                                                  962.0872
## LocationAbbrWV                                                  760.2294
## LocationAbbrWY                                                  576.9944
## TopicArthritis                                                  437.8168
## TopicAsthma                                                     308.6832
## TopicCancer                                                    2359.7623
## TopicCardiovascular Disease                                    2281.4275
## TopicChronic Kidney Disease                                    5105.8484
## TopicChronic Obstructive Pulmonary Disease                     3354.8240
## TopicCognitive Health and Caregiving                            196.8748
## TopicDiabetes                                                  1188.0310
## TopicDisability                                                 706.4068
## TopicHealth Status                                              401.6124
## TopicImmunization                                               416.7902
## TopicMaternal Health                                           1400.0917
## TopicMental Health                                              445.3866
## TopicNutrition, Physical Activity, and Weight Status            442.2572
## TopicOral Health                                                484.2160
## TopicSleep                                                      710.1333
## TopicSocial Determinants of Health                              443.5558
## TopicTobacco                                                    446.5360

Interpretation In this model, each coefficient shows how much the average chronic disease rate changes compared to the reference group for that variable. A positive coefficient means that category has a higher average DataValue than the baseline group, and a negative coefficient means it has a lower average value. Several categories within sex, state, and disease topic were statistically significant, meaning their p-values were below 0.05, while others were not. This shows that some demographic groups, states, and disease topics show different levels of chronic disease rates compared to their reference groups. Some of the strongest significant predictors included certain age groups in Stratification1 and several disease topics like Cancer, Cardiovascular Disease, Chronic Obstructive Pulmonary Disease, and Diabetes.

D. Model Assumptions and Diagnostics

For this multiple linear regression model, I checked the standard assumptions, linearity, independence, homoscedasticity, normality of residuals, and multicollinearity. These checks help understand if the model is right for the data and if the results can be interpreted correctly. The diagnostic plots were used to visually show the assumptions. The residuals vs. fitted plot helps interpret linearity and constant variance. The Q-Q plot shows if residuals follow a normal distribution. The Scale-Location plot checks for homoscedasticity, and the Residuals vs. Leverage plot helps show any influential points. Overall, the plots show some variability and deviations, which is expected with a very big dataset.

par(mfrow = c(2,2))
plot(model)

summary(residuals(model))
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
##  -15388.20    -972.09     -59.98       0.00     624.13 2910062.85

E. Conclusion and Future Directions:

In this project, I looked at how sex, state, and disease topic relate to chronic disease rates. The results showed that a few disease topics were significant, meaning they had higher or lower average values compared to the baseline topic. Most sex and state categories were not significant, so disease type explained more of the differences in rates than the other variables. The R squared value was low, which means the model didn’t explain much of the variation in the data. This is expected because chronic disease rates depend on many other factors that were not included. In the future, adding more variables or testing different models could help improve the results.