PAD 4833 Homework

QUESTION #1

Load your chosen dataset into Rmarkdown

#load libraries
library(tidyverse)

## Warning: package 'readr' was built under R version 4.5.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)

## Warning: package 'readxl' was built under R version 4.5.3

library(dplyr)

#Vet Data boierplate, data cleaning
# create a file path
file_path <-"C:/Users/Administrator/Desktop/Graduate School/Applied Quant Methods/My Class Stuff/Data Project/Veteran Homelessness/2023 Homeless Veterans.xlsx"

# Read sheet 2 and remove totals row
sheet_2<- read_excel(file_path, sheet="2023") 

# Remove Sheet 2 "2023"  Totals row
vet_data_sheet_2 <- sheet_2 |> filter (State != "Total")

# Read Sheet 1
change_sheet <-read_excel(file_path, sheet="Change")

# Remove Sheet 1 "Change" totals row
change_column <- change_sheet |> filter (State != "Total")

# Merge change column into 2023 data
vet_data_long_variable_names <- vet_data_sheet_2 |> left_join (change_column, by = "State")

#change variable names
vet_data <- rename(vet_data_long_variable_names, 
                   CoC_Count = "Number of CoCs", 
                   ES_Count = "Sheltered ES Homeless Veterans", 
                   TH_Count = "Sheltered TH Homeless Veterans", 
                   SH_Count = "Sheltered SH Homeless Veterans", 
                   Sheltered = "Sheltered Total Homeless Veterans", 
                   Unsheltered = "Unsheltered Homeless Veterans", 
                   Homeless_Vets = "Homeless Veterans",
                   Homeless_Rate_Change = "Change in Veteran Homelessness, 2022-2023")


#Create column for unsheltered rate
vet_data <- vet_data |> mutate(Unsheltered_Rate = Unsheltered/`Homeless_Vets`)

view(vet_data)

QUESTION #2

Select the dependent variable you are interested in, along with independent variables which you believe are causing the dependent variable

Dependent Variable: Total number of homeless veterans (Homeless_Veterans) Independent Variable 1: Number of Emergency Sheltered veterans (ES_Count) Independent Variable 2: Number of Transitional Housing sheltered veterans (TH_Count) Independent Variable 3: Number of Safe Haven sheltered veterans (SH_Count) Independent Variable 4: Number of Continuums of Care (CoCs)

QUESTION #3

Create a linear model using the “lm()” command, save it to some object

# Linear Model for Homeless Veterans and Emergency Shelter count, Transitional Housing Count, and Continuums of Care counts
Homeless_kitchen_sink_model<- lm(Homeless_Vets~ES_Count+CoC_Count+TH_Count+SH_Count, data =vet_data)

QUESTION #4

Call a “summary()” on your new model

summary(Homeless_kitchen_sink_model)

## 
## Call:
## lm(formula = Homeless_Vets ~ ES_Count + CoC_Count + TH_Count + 
##     SH_Count, data = vet_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -635.48  -96.19   43.23  105.89  561.43 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.0897    63.4340  -0.269  0.78875    
## ES_Count      1.2873     0.5140   2.504  0.01564 *  
## CoC_Count   -25.8525     9.2285  -2.801  0.00727 ** 
## TH_Count      1.4422     0.5657   2.549  0.01397 *  
## SH_Count     16.8539     1.4509  11.617 1.11e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 296.1 on 49 degrees of freedom
## Multiple R-squared:  0.9626, Adjusted R-squared:  0.9596 
## F-statistic: 315.5 on 4 and 49 DF,  p-value: < 2.2e-16

QUESTION #5

Interpret the model’s r-squared and p-values. How much of the dependent variable does the overall model explain? What are the significant variables? What are the insignificant variables?

The model yields an R-squared value of 95.96%. Meaning this model explains about 96% of the number of homeless veterans across the states. Based upon the p-value, there are 4 significant variables; the Emergency Shelter count, the Transitional Housing count, the Safe Haven count and the Continuums of Care count.There are no insignificant variables within the model. All individual p-values are less than 0.05. The p-value is very small for the model and is essentially zero meaning that the odds of this model’s significant being due to random chance is very small.

QUESTION #6

Choose some significant independent variables. Interpret its Estimates (or Beta Coefficients). How do the independent variables individually affect the dependent variable? * If you have no significant variables, then just pick one and pretend it’s significant.

Emergency Shelter counts: For every increase in 1 veteran sheltered at an Emergency Shelter, we see the homeless veteran population increase by ~1.

Safe Haven counts: For every increase in 1 veteran sheltered at a Safe Haven shelter, we see the homeless veteran population increase by ~ 17.

Transitional Housing counts: For every increase in 1 veteran sheltered at a Transitional Housing shelter, we see the homeless veteran population increase by ~1.

Continuums of Care counts: For every increase in 1 Continuum of Care in the country, we see the homeless veteran population decrease by around 26.

All of these tested independent variables have a significant influence over the total number of homeless veterans across the U.S. states.

QUESTION #7

Does the model you create meet or violate the assumption of linearity? Show your work with “plot(x,which=1)”

plot(Homeless_kitchen_sink_model, which=1)

It appears that the line of residuals vs fitted is not straight at all. This suggests that the relationship between variables is nonlinear. This model does NOT meet the linearity assumption.