library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
setwd("C:/Users/pec278-9277/Documents/Week 3")
district <- read_excel("district.xls")
  1. Next step is to create a new data frame with DISTNAME, DPETSPEP, and DPFPASPEP. The new table is named “dist_sub1”. Modern Statistics with R describes the process in section 3.2.3.
dist_sub1 <- district |>
  select(DISTNAME, DPETSPEP, DPFPASPEP)
  1. Generate summary statistics for DPETSPEP and DFPASPEP using the new table.
summary(dist_sub1)
##    DISTNAME            DPETSPEP       DPFPASPEP     
##  Length:1207        Min.   : 0.00   Min.   : 0.000  
##  Class :character   1st Qu.: 9.90   1st Qu.: 5.800  
##  Mode  :character   Median :12.10   Median : 8.900  
##                     Mean   :12.27   Mean   : 9.711  
##                     3rd Qu.:14.20   3rd Qu.:12.500  
##                     Max.   :51.70   Max.   :49.000  
##                                     NA's   :5
  1. The summary above shows that DPFPASPEP has 5 missing values.

  2. The command using “na.rm” continues to show 1207 observations, while indicating the number of NA’s.

summary(dist_sub1, na.rm)
##    DISTNAME            DPETSPEP       DPFPASPEP     
##  Length:1207        Min.   : 0.00   Min.   : 0.000  
##  Class :character   1st Qu.: 9.90   1st Qu.: 5.800  
##  Mode  :character   Median :12.10   Median : 8.900  
##                     Mean   :12.27   Mean   : 9.711  
##                     3rd Qu.:14.20   3rd Qu.:12.500  
##                     Max.   :51.70   Max.   :49.000  
##                                     NA's   :5
  1. To create a point graph comparing DPFPASPEP and DPETSPEP, first, loading ggplot2. This command will save the chart as an object “Plot1”, which can be called using the console.
library(ggplot2)
ggplot(data = dist_sub1, mapping = aes(x = DPETSPEP, y = DPFPASPEP)) +
  geom_point()
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_point()`).

  1. A correlation between DPETSPEP and DPFPASPEP can be obtained by using the command below:
dist_sub1 |>
  with(cor(DPFPASPEP, 
           DPETSPEP, use = "pairwise.complete"))
## [1] 0.3700234

The result is 0.3700234.

  1. The result of .37 shows that there is a weak positive relationship between the Special Education students and expenditures in special education.