Homework 5

library(readxl)
district <- read_excel("district.xls")
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(pastecs)

## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## The following object is masked from 'package:tidyr':
## 
##     extract

#Create a R Markdown with "DISTNAME", "DPFPABILP", and "DDH00A001S22R".
obj2 <- district %>% select(DISTNAME,DPFPABILP,DDH00A001S22R)

#Remove the observations.
obj2_cleaned <- obj2 %>% filter(!is.na(DPFPABILP)&(!is.na(DDH00A001S22R))&(DDH00A001S22R>0))

library(dplyr)

obj2_cleaned <- district %>% select(DPFPABILP,DDH00A001S22R) %>% arrange(-DPFPABILP,DDH00A001S22R)

obj2_cleaned

## # A tibble: 1,207 × 2
##    DPFPABILP DDH00A001S22R
##        <dbl>         <dbl>
##  1      26              58
##  2      18.8            94
##  3       9.1            71
##  4       8              75
##  5       7.8            69
##  6       6.8            54
##  7       6.5            56
##  8       5.9            54
##  9       5.9            70
## 10       5.8            72
## # ℹ 1,197 more rows

#Select some variables of interest and see if there is any obvious correlations using the COR command.
cor(obj2_cleaned, use = "complete.obs")

##                 DPFPABILP DDH00A001S22R
## DPFPABILP      1.00000000   -0.09548528
## DDH00A001S22R -0.09548528    1.00000000

#Examine the same variables visually using the PAIRS command.
pairs(~DPFPABILP+DDH00A001S22R,data = district)

#Select two variables that seem correlated (positively or negatively) and examine them using PEARSON, SPEARMAN or KENDALL.
cor.test(obj2_cleaned$DPFPABILP,obj2_cleaned$DDH00A001S22R,method="kendall")

## 
##  Kendall's rank correlation tau
## 
## data:  obj2_cleaned$DPFPABILP and obj2_cleaned$DDH00A001S22R
## z = -8.5832, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##        tau 
## -0.1736505

For this data set I chose to use the KENDALL test because the data is not normal. From the results, the KENDALL test shows a negative association between the two variables. The tau value is closer to 0 which indicates a weak correlation between the two variables. The p value is statistically significant.

Homework 5

Sarah Rodriguez

2024-10-10