Homework 9.11

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

library(readxl)
district=read_excel("district.xls")

Dataframe for DISTNAME”, “DPETSPEP”, and “DPFPASPEP” is now referred to as dataframe1.

dataframe1=data.frame(district,"DISTNAME", "DPETSPEP", "DPFPASPEP")

Summary for DPETSPEP

summary(dataframe1$DPETSPEP)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    9.90   12.10   12.27   14.20   51.70

Summary for DFPASPEP

summary(dataframe1$DPFPASPEP)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   5.800   8.900   9.711  12.500  49.000       5

DPFPASPEP has 5 missing values

compare_two<-district %>% select(DPETSPEP,DPFPASPEP)

compare_two2<-compare_two %>% filter(DPFPASPEP>0)

Comparing DPFPASPEP and DPETSPEP with ggplot

ggplot(compare_two2,aes(DPETSPEP,DPFPASPEP)) + geom_point()

Doing a mathematical check on the values.

cor(compare_two2$DPETSPEP,compare_two2$DPFPASPEP)

## [1] 0.371033

The correlation of ~0.37 indicates a positive relationship, although it is a modest one. Subjectively, the plot visually supports this by showing a loose, but discernible diagonal pattern going to the top right.

Homework 9.11

2024-09-12