library(readxl)
district<-read_excel("district.xls")
New data frame…
sally=district[,c("DISTNAME","DPETSPEP","DPFPASPEP")]
Summary statistics…
summary(district$DPETSPEP)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 9.90 12.10 12.27 14.20 51.70
summary(district$DPFPASPEP)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 5.800 8.900 9.711 12.500 49.000 5
the minimum variable has missing values
sally_clean<-na.omit(sally)
summary(sally_clean$DPETSPEP)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 9.9 12.2 12.3 14.2 51.7
summary(sally_clean$DPFPASPEP)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 5.800 8.900 9.711 12.500 49.000
1202 observations are left overall
Point graph…
installed.packages("ggplot2")
## Package LibPath Version Priority Depends Imports LinkingTo Suggests
## Enhances License License_is_FOSS License_restricts_use OS_type Archs
## MD5sum NeedsCompilation Built
library(ggplot2)
ggplot(sally_clean, aes(x=DPFPASPEP, y=DPETSPEP))+geom_point()
Weak correlation
Mathematical correlation…
cor(sally_clean$DPFPASPEP,sally_clean$DPETSPEP)
## [1] 0.3700234
I interpreted these results as having a strong correlation the lower the two were but as either one increased the correlation weakened.