library(pastecs)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:pastecs':
##
## first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data <- read.csv("C:/Users/atg516/Desktop/Project/HHS_Unaccompanied_Alien_Children_Program.csv")
summary(data)
## Date Children.apprehended.and.placed.in.CBP.custody.
## Length:1015 Min. : 1.00
## Class :character 1st Qu.: 59.75
## Mode :character Median :123.00
## Mean :116.70
## 3rd Qu.:167.00
## Max. :333.00
## NA's :701
## Children.in.CBP.custody Children.transferred.out.of.CBP.custody
## Min. : 7.00 Min. : 0.00
## 1st Qu.: 82.25 1st Qu.: 84.75
## Median :233.50 Median :176.00
## Mean :218.11 Mean :164.52
## 3rd Qu.:300.25 3rd Qu.:227.50
## Max. :531.00 Max. :440.00
## NA's :701 NA's :701
## Children.in.HHS.Care Children.discharged.from.HHS.Care
## Min. : 2109 Min. : 2.0
## 1st Qu.: 6940 1st Qu.:196.0
## Median : 8998 Median :279.0
## Mean : 9196 Mean :283.3
## 3rd Qu.:10766 3rd Qu.:360.0
## Max. :22557 Max. :827.0
##
data_clean <- data %>%
filter(!is.na(Children.discharged.from.HHS.Care) & !is.na(Children.in.HHS.Care))
data_clean <- data_clean %>%
mutate(discharge_rate = Children.discharged.from.HHS.Care / Children.in.HHS.Care)
View(data_clean)
The variables I chose to test are the discharge_rate as my dependent variable and Children.in.HHS.Care as my independent variable.
model <- lm(discharge_rate ~ Children.in.HHS.Care, data = data_clean)
summary(model)
##
## Call:
## lm(formula = discharge_rate ~ Children.in.HHS.Care, data = data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.024568 -0.005341 -0.000029 0.006325 0.038167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.411e-02 8.132e-04 29.643 < 2e-16 ***
## Children.in.HHS.Care 6.290e-07 8.218e-08 7.653 4.55e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.009566 on 1013 degrees of freedom
## Multiple R-squared: 0.05466, Adjusted R-squared: 0.05373
## F-statistic: 58.57 on 1 and 1013 DF, p-value: 4.554e-14
In creating this linear regression model to test whether the number of children in HHS care affects the discharge rate, the results indicate that the relationship is statistically significant (p < 0.001), with a coefficient of 6.29e-07.
This means that as the number of children in HHS care increases, the discharge rate also increases—albeit by a very small proportion.
However, the model’s adjusted R-squared is only 0.0537, meaning it explains 5.37% of the variability in the discharge rate. While there is a clear association, the model is limited, and additional variables could likely provide more accurate predictions.
plot(model, which = 1)
A residuals vs. fitted plot was used to test the assumption of linearity.
The red smoothing line showed a subtle curve, suggesting a slight violation of linearity.
While the residuals are relatively centered around zero, the slight curvature indicates that a linear model may not fully capture the relationship between the number of children in HHS care and the discharge rate.