Homework 6

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

suicide_data <- read_csv("C:/Users/Campo/Downloads/Death_rates_for_suicide__by_sex__race__Hispanic_origin__and_age__United_States.csv")

## Rows: 6390 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): INDICATOR, UNIT, STUB_NAME, STUB_LABEL, AGE, FLAG
## dbl (7): UNIT_NUM, STUB_NAME_NUM, STUB_LABEL_NUM, YEAR, YEAR_NUM, AGE_NUM, E...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(suicide_data)

## # A tibble: 6 × 13
##   INDICATOR     UNIT  UNIT_NUM STUB_NAME STUB_NAME_NUM STUB_LABEL STUB_LABEL_NUM
##   <chr>         <chr>    <dbl> <chr>             <dbl> <chr>               <dbl>
## 1 Death rates … Deat…        1 Total                 0 All perso…              0
## 2 Death rates … Deat…        1 Total                 0 All perso…              0
## 3 Death rates … Deat…        1 Total                 0 All perso…              0
## 4 Death rates … Deat…        1 Total                 0 All perso…              0
## 5 Death rates … Deat…        1 Total                 0 All perso…              0
## 6 Death rates … Deat…        1 Total                 0 All perso…              0
## # ℹ 6 more variables: YEAR <dbl>, YEAR_NUM <dbl>, AGE <chr>, AGE_NUM <dbl>,
## #   ESTIMATE <dbl>, FLAG <chr>

colnames(suicide_data)

##  [1] "INDICATOR"      "UNIT"           "UNIT_NUM"       "STUB_NAME"     
##  [5] "STUB_NAME_NUM"  "STUB_LABEL"     "STUB_LABEL_NUM" "YEAR"          
##  [9] "YEAR_NUM"       "AGE"            "AGE_NUM"        "ESTIMATE"      
## [13] "FLAG"

suicide_model <- lm(ESTIMATE ~ AGE_NUM + YEAR_NUM, data = suicide_data)

summary(suicide_model)

## 
## Call:
## lm(formula = ESTIMATE ~ AGE_NUM + YEAR_NUM, data = suicide_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.327  -7.956  -2.431   5.710  54.158 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.70495    0.40441   24.00  < 2e-16 ***
## AGE_NUM      1.94131    0.07664   25.33  < 2e-16 ***
## YEAR_NUM    -0.04736    0.01260   -3.76 0.000172 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.89 on 5481 degrees of freedom
##   (906 observations deleted due to missingness)
## Multiple R-squared:  0.1084, Adjusted R-squared:  0.1081 
## F-statistic: 333.1 on 2 and 5481 DF,  p-value: < 2.2e-16

#5, R-squared is 0.1084, the model explains 10.84% of the variance in the suicide death rate for the dependent variable. It has some predictions, but this can also mean there are other factors not included that contributes to the ESTMATEs. 
#6, Both AGE_NUM and YEAR NUM are significant variables because they are lower than the .05 threshold making these two good predictors of ESTIMATE of suicide deaths age.AGE_NUM (Coefficient = 1.94131),  meaning that for each additional year in age, the estimated suicide death rate (ESTIMATE) increases by approximately 1.94 units.this could tell us that as individuals age, the suicide death rate tends to increase. This could reflect an increased risk of suicide in older age groups.YEAR_NUM (Coefficient = -0.04736) -  implies that for each additional year (moving forward in time), the estimated suicide death rate (ESTIMATE) decreases by about 0.047 units. this could Effect the Dependent Variable that over time, there is a slight decline in suicide death rates.
#7, No, the model violates the assumption of linearity. It shows they are not randomly distributed implying that these two variables may not be the best fit for the data set. Unless I did something wrong here. 


summary(suicide_model)

## 
## Call:
## lm(formula = ESTIMATE ~ AGE_NUM + YEAR_NUM, data = suicide_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.327  -7.956  -2.431   5.710  54.158 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.70495    0.40441   24.00  < 2e-16 ***
## AGE_NUM      1.94131    0.07664   25.33  < 2e-16 ***
## YEAR_NUM    -0.04736    0.01260   -3.76 0.000172 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.89 on 5481 degrees of freedom
##   (906 observations deleted due to missingness)
## Multiple R-squared:  0.1084, Adjusted R-squared:  0.1081 
## F-statistic: 333.1 on 2 and 5481 DF,  p-value: < 2.2e-16

plot(suicide_model, which = 1)

Homework 6

Jose Campos

2024-10-31