library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(readxl)
district=read_excel("district.xls")
Dataframe for DISTNAME”, “DPETSPEP”, and “DPFPASPEP” is now referred to as dataframe1.
dataframe1=data.frame(district,"DISTNAME", "DPETSPEP", "DPFPASPEP")
Summary for DPETSPEP
summary(dataframe1$DPETSPEP)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 9.90 12.10 12.27 14.20 51.70
Summary for DFPASPEP
summary(dataframe1$DPFPASPEP)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 5.800 8.900 9.711 12.500 49.000 5
DPFPASPEP has 5 missing values
compare_two<-district %>% select(DPETSPEP,DPFPASPEP)
compare_two2<-compare_two %>% filter(DPFPASPEP>0)
Comparing DPFPASPEP and DPETSPEP with ggplot
ggplot(compare_two2,aes(DPETSPEP,DPFPASPEP)) + geom_point()
Doing a mathematical check on the values.
cor(compare_two2$DPETSPEP,compare_two2$DPFPASPEP)
## [1] 0.371033
The correlation of ~0.37 indicates a positive relationship, although it is a modest one. Subjectively, the plot visually supports this by showing a loose, but discernible diagonal pattern going to the top right.