library(pastecs)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:pastecs':
## 
##     first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data <- read.csv("C:/Users/atg516/Desktop/Project/HHS_Unaccompanied_Alien_Children_Program.csv")
summary(data)
##      Date           Children.apprehended.and.placed.in.CBP.custody.
##  Length:1015        Min.   :  1.00                                 
##  Class :character   1st Qu.: 59.75                                 
##  Mode  :character   Median :123.00                                 
##                     Mean   :116.70                                 
##                     3rd Qu.:167.00                                 
##                     Max.   :333.00                                 
##                     NA's   :701                                    
##  Children.in.CBP.custody Children.transferred.out.of.CBP.custody
##  Min.   :  7.00          Min.   :  0.00                         
##  1st Qu.: 82.25          1st Qu.: 84.75                         
##  Median :233.50          Median :176.00                         
##  Mean   :218.11          Mean   :164.52                         
##  3rd Qu.:300.25          3rd Qu.:227.50                         
##  Max.   :531.00          Max.   :440.00                         
##  NA's   :701             NA's   :701                            
##  Children.in.HHS.Care Children.discharged.from.HHS.Care
##  Min.   : 2109        Min.   :  2.0                    
##  1st Qu.: 6940        1st Qu.:196.0                    
##  Median : 8998        Median :279.0                    
##  Mean   : 9196        Mean   :283.3                    
##  3rd Qu.:10766        3rd Qu.:360.0                    
##  Max.   :22557        Max.   :827.0                    
## 
data_clean <- data %>%
  filter(!is.na(Children.discharged.from.HHS.Care) & !is.na(Children.in.HHS.Care))

data_clean <- data_clean %>%
  mutate(discharge_rate = Children.discharged.from.HHS.Care / Children.in.HHS.Care)
View(data_clean)

The variables I chose to test are the discharge_rate as my dependent variable and Children.in.HHS.Care as my independent variable.

model <- lm(discharge_rate ~ Children.in.HHS.Care, data = data_clean)
summary(model)
## 
## Call:
## lm(formula = discharge_rate ~ Children.in.HHS.Care, data = data_clean)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.024568 -0.005341 -0.000029  0.006325  0.038167 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          2.411e-02  8.132e-04  29.643  < 2e-16 ***
## Children.in.HHS.Care 6.290e-07  8.218e-08   7.653 4.55e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.009566 on 1013 degrees of freedom
## Multiple R-squared:  0.05466,    Adjusted R-squared:  0.05373 
## F-statistic: 58.57 on 1 and 1013 DF,  p-value: 4.554e-14

In creating this linear regression model to test whether the number of children in HHS care affects the discharge rate, the results indicate that the relationship is statistically significant (p < 0.001), with a coefficient of 6.29e-07.

This means that as the number of children in HHS care increases, the discharge rate also increases—albeit by a very small proportion.

However, the model’s adjusted R-squared is only 0.0537, meaning it explains 5.37% of the variability in the discharge rate. While there is a clear association, the model is limited, and additional variables could likely provide more accurate predictions.

plot(model, which = 1)

A residuals vs. fitted plot was used to test the assumption of linearity.

The red smoothing line showed a subtle curve, suggesting a slight violation of linearity.

While the residuals are relatively centered around zero, the slight curvature indicates that a linear model may not fully capture the relationship between the number of children in HHS care and the discharge rate.