The code below loads the dataset “CPSWaitingForAdoptionFY2014_2023.csv” as an object called “waitadopt”, and deletes the original file name.

library(readr)

CPSWaitingForAdoptionFY2014_2023 <- read_csv("CPSWaitingForAdoptionFY2014-2023.csv")
## Rows: 12158 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Region, Gender, Race/Ethnicity, Age Group
## dbl (3): Fiscal Year, Chidlren Waiting on Adoption 31 August, Average Months...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
waitadopt <- CPSWaitingForAdoptionFY2014_2023

rm(CPSWaitingForAdoptionFY2014_2023)

The first correlation to be analyzed is Gender and Average Months since Termination of Parental Rights. Most observations are “male” or “female,” but one was entered as “unknown.” The first chunk changes “Unknown” to “NA”.

waitadopt$Gender[waitadopt$Gender == 'Unknown'] <- NA

The chunk below recodes “gender” from “female” to “1”, and “male” to “0” onto a new column called “Gendervar”.

waitadopt2 <- waitadopt %>% mutate(Gendervar=ifelse(Gender=='Female',1,0))
waitadopt2
## # A tibble: 12,158 × 8
##    `Fiscal Year` Region    Gender `Race/Ethnicity` `Age Group`         
##            <dbl> <chr>     <chr>  <chr>            <chr>               
##  1          2023 1-Lubbock Female African American Birth to 5 Years Old
##  2          2023 1-Lubbock Female African American 6-12 Years Old      
##  3          2023 1-Lubbock Female African American 13-17 Years Old     
##  4          2023 1-Lubbock Female African American 13-17 Years Old     
##  5          2023 1-Lubbock Female African American Birth to 5 Years Old
##  6          2023 1-Lubbock Female African American Birth to 5 Years Old
##  7          2023 1-Lubbock Female African American Birth to 5 Years Old
##  8          2023 1-Lubbock Female African American 6-12 Years Old      
##  9          2023 1-Lubbock Female African American 6-12 Years Old      
## 10          2023 1-Lubbock Female African American 6-12 Years Old      
## # ℹ 12,148 more rows
## # ℹ 3 more variables: `Chidlren Waiting on Adoption 31 August` <dbl>,
## #   `Average Months since Termination of Parental Rights` <dbl>,
## #   Gendervar <dbl>

The code below performs a correlation operation between Gender and Average Months since Termination of Parental Rights:

cor.test(waitadopt2$Gendervar, waitadopt2$`Average Months since Termination of Parental Rights`)
## 
##  Pearson's product-moment correlation
## 
## data:  waitadopt2$Gendervar and waitadopt2$`Average Months since Termination of Parental Rights`
## t = -11.036, df = 12155, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.11717500 -0.08197491
## sample estimates:
##         cor 
## -0.09960612

This shows an extremely low p-value, which means that the observations are likely to be very similar if another sample is taken. The actual correlation is around “-1”, and given that males were coded as “0”, and females as “1”, the “negative” correlation suggests that males were more likely to wait longer. This can be further examined by creating a graph:

ggplot(waitadopt2, aes(x = `Gender`, y = `Average Months since Termination of Parental Rights`, group = `Gender`)) +
  geom_boxplot(outlier.alpha = .5, outlier.shape = 3) +
  labs(x = "Gender",
       y = "Avg Months since Parental Rights Term.") +
  theme_minimal()

The graph above makes it easier to understand the spread of the average months since termination of parental rights, grouped by gender. It is worth noting that the outliers don’t start showing until a certain threshold, and it makes it seem as if there are no values between the standard deviations and the outliers.

The second variable compared to the average months of parental rights is age group. The chunk below recodes “age group” into three new variables: 1) “zerotofive:” Newborns up to five years old, 2) “sixtotwelve”: Six to twelve year-olds, and 3) “thirtsevent” for thirteen to seventeen year-olds.

waitadopt2 <- waitadopt2 %>% mutate(zerotofive = ifelse(`Age Group` == 'Birth to 5 Years Old', 1, 0))
waitadopt2 <- waitadopt2 %>% mutate(sixtotwelve = ifelse(`Age Group` == '6-12 Years Old', 1, 0))
waitadopt2 <- waitadopt2 %>% mutate(thirtsevent = ifelse(`Age Group` == '13-17 Years Old', 1, 0))

The code below performs a correlation operation between the first age group, which includes only newborns to five year-olds, and Average Months since Termination of Parental Rights:

cor.test(waitadopt2$zerotofive, waitadopt2$`Average Months since Termination of Parental Rights`)
## 
##  Pearson's product-moment correlation
## 
## data:  waitadopt2$zerotofive and waitadopt2$`Average Months since Termination of Parental Rights`
## t = -56.039, df = 12156, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4671180 -0.4388638
## sample estimates:
##        cor 
## -0.4531047

This shows an extremely low p-value, which means that the observations are likely to be very similar if another sample is taken. The actual correlation is around “-.46”, and given that children in other age groups were coded as “0”, and newborns to five year-olds as “1”, the “negative” correlation suggests that children age six and older were more likely to wait longer. This can be further examined by creating a graph:

ggplot(waitadopt2, aes(x = `zerotofive`, y = `Average Months since Termination of Parental Rights`, group = `Age Group`)) +
  geom_boxplot(outlier.alpha = .5, outlier.shape = 3) +
  labs(x = "Age Group",
       y = "Avg Months since Parental Rights Term.") +
  theme_minimal()

The graph above makes it pretty clear that newborns to five year-olds had a much lower average time since termination of parental rights.

The code below performs a correlation operation between the second age group, six to twelve year-olds, and Average Months since Termination of Parental Rights:

cor.test(waitadopt2$sixtotwelve, waitadopt2$`Average Months since Termination of Parental Rights`)
## 
##  Pearson's product-moment correlation
## 
## data:  waitadopt2$sixtotwelve and waitadopt2$`Average Months since Termination of Parental Rights`
## t = -8.9879, df = 12156, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.09888319 -0.06356656
## sample estimates:
##         cor 
## -0.08125038

This shows an extremely low p-value, which means that the observations are likely to be very similar if another sample is taken. The actual correlation is around “-.08”, and given that children in other age groups were coded as “0”, and six to twelve year-olds as “1”, the weak “negative” correlation suggests that children in other group ages were still more likely to wait longer. This can be further examined by creating a graph:

ggplot(waitadopt2, aes(x = `sixtotwelve`, y = `Average Months since Termination of Parental Rights`, group = `Age Group`)) +
  geom_boxplot(outlier.alpha = .5, outlier.shape = 3) +
  labs(x = "Age Group",
       y = "Avg Months since Parental Rights Term.") +
  theme_minimal()

The graph above makes it visible that six to twelve year-olds still had a lower average time since termination of parental rights, compared to thirteen to seventeen year-olds.

The code below performs a correlation operation between thirteen to seventeen year-olds, and Average Months since Termination of Parental Rights:

cor.test(waitadopt2$thirtsevent, waitadopt2$`Average Months since Termination of Parental Rights`)
## 
##  Pearson's product-moment correlation
## 
## data:  waitadopt2$thirtsevent and waitadopt2$`Average Months since Termination of Parental Rights`
## t = 75.955, df = 12156, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5551392 0.5792508
## sample estimates:
##       cor 
## 0.5673165

This shows an extremely low p-value, which means that the observations are likely to be very similar if another sample is taken. The actual correlation is around “.56”, and given that children in other age groups were coded as “0”, and thirteen to seventeen year-olds as “1”, the “positive” correlation suggests that children in this age group were more likely to wait longer than other children. This can be further examined by creating a graph:

ggplot(waitadopt2, aes(x = `thirtsevent`, y = `Average Months since Termination of Parental Rights`, group = `Age Group`)) +
  geom_boxplot(outlier.alpha = .5, outlier.shape = 3) +
  labs(x = "Age Group",
       y = "Avg Months since Parental Rights Term.") +
  theme_minimal()

The graph above makes it clear that thirteen to seventeen year-olds had a higher average time since termination of parental rights, compared to thirteen to seventeen year-olds.

The code below examines multiple variables using the PAIRS command:

pairs(~`Average Months since Termination of Parental Rights`+`Gendervar`+`zerotofive`+`sixtotwelve`+`thirtsevent`,data=waitadopt2)

The code below correlates gender and average time since termination of parental rights using the Kendall method. The Kendall method is used because the data is not normally distributed, and most variables being compared are categorical:

cor.test(waitadopt2$`Average Months since Termination of Parental Rights`,waitadopt2$Gendervar,method="kendall")
## 
##  Kendall's rank correlation tau
## 
## data:  waitadopt2$`Average Months since Termination of Parental Rights` and waitadopt2$Gendervar
## z = -9.0184, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##         tau 
## -0.06680054

Again, this p-value is very low, and the correlation is negative, which suggests that male children had longer average times since termination of parental rights.

The code below correlates the thirteen to seventeen year-old age group to the average times since termination of parental rights, using Kendall:

cor.test(waitadopt2$`Average Months since Termination of Parental Rights`,waitadopt2$thirtsevent,method="kendall")
## 
##  Kendall's rank correlation tau
## 
## data:  waitadopt2$`Average Months since Termination of Parental Rights` and waitadopt2$thirtsevent
## z = 61.515, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##       tau 
## 0.4556355

While the p-value is also very low, there is a strong positive relationship between being in the thirteen to seventeen year-old age group, and the average time since the termination of parental rights.