library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pastecs)
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## The following object is masked from 'package:tidyr':
##
## extract
library(readxl)
Public_School_Characteristics_2022_23 <- read_csv("Public_School_Characteristics_2022-23.csv")
## Rows: 101390 Columns: 77
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (23): NCESSCH, SURVYEAR, STABR, LEAID, ST_LEAID, LEA_NAME, SCH_NAME, LST...
## dbl (54): X, Y, OBJECTID, STATUS, TOTFRL, FRELCH, REDLCH, DIRECTCERT, PK, KG...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pastecs::stat.desc(Public_School_Characteristics_2022_23$STUTERATIO)
## nbr.val nbr.null nbr.na min max
## 9.957600e+04 9.800000e+02 1.814000e+03 -2.000000e+00 3.600000e+03
## range sum median mean SE.mean
## 3.602000e+03 1.493953e+06 1.440000e+01 1.500315e+01 6.986391e-02
## CI.mean.0.95 var std.dev coef.var
## 1.369324e-01 4.860270e+02 2.204602e+01 1.469426e+00
The variable described above is the Student to Teacher ratio at various schools surveyed. According to the descriptions above, it looks like the mean student to teacher ratio is 15 students to 1 teacher, while the minimum is -2 (most likely used to signify missing or invalid data) and 3,600 (also possibly missing or invalid data).
new_Public_School_Characteristics_2022_23 <-Public_School_Characteristics_2022_23 |> filter(STUTERATIO>0) |> drop_na(STUTERATIO)
hist(new_Public_School_Characteristics_2022_23$STUTERATIO)
transformed_new_Public_School_Characteristics_2022_23 <-new_Public_School_Characteristics_2022_23 |> mutate(STUTERATIO_log=(STUTERATIO))
hist(transformed_new_Public_School_Characteristics_2022_23$STUTERATIO_log)
hist(log(new_Public_School_Characteristics_2022_23$STUTERATIO))