R Markdown

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(readr)
library(readxl)
district<-read_excel("district.xls")
library(pastecs)
## 
## Attaching package: 'pastecs'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## The following object is masked from 'package:tidyr':
## 
##     extract
stat.desc(district$DPETECOP)
##      nbr.val     nbr.null       nbr.na          min          max        range 
## 1.207000e+03 4.000000e+00 0.000000e+00 0.000000e+00 1.000000e+02 1.000000e+02 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
## 7.332580e+04 6.190000e+01 6.075046e+01 6.251430e-01 1.226489e+00 4.717001e+02 
##      std.dev     coef.var 
## 2.171866e+01 3.575061e-01
district_clean <- district %>% 
  filter(!is.na(DPETECOP))

```{r

summary(district_clean$DPETECOP)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   47.95   61.90   60.75   77.15  100.00

The variable in question represents the population of students who are economically disadvantaged. Some of the highlighted information from the data set is that there is median 61.9% of students who are economically disadvantaged within these districts. The median is higher than the mean at 60.7% which means that there is a skewness toward the lower end.

stats <- stat.desc(district_clean$DPETECOP)
print(stats)
##      nbr.val     nbr.null       nbr.na          min          max        range 
## 1.207000e+03 4.000000e+00 0.000000e+00 0.000000e+00 1.000000e+02 1.000000e+02 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
## 7.332580e+04 6.190000e+01 6.075046e+01 6.251430e-01 1.226489e+00 4.717001e+02 
##      std.dev     coef.var 
## 2.171866e+01 3.575061e-01
ggplot(district_clean, aes(x = DPETECOP)) +
  geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
  theme_minimal() +
  labs(title = "Histogram of Economically Disadvantaged Students",
       x = "Percentage of Economically Disadvantaged Students",
       y = "Frequency")

district_transformed <- district_clean %>%
  mutate(sqrt_DPETECOP = sqrt(DPETECOP))
ggplot(district_transformed, aes(x = sqrt_DPETECOP)) +
  geom_histogram(binwidth = 0.5, fill = "lightgreen", color = "black") +
  theme_minimal() +
  labs(title = "Histogram of Sqrt Transformed Economically Disadvantaged Students",
       x = "Square Root of Percentage",
       y = "Frequency")