library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
districtbase <- read_xls("district.xls")
Homework #4
1.From the data you have chosen, select a variable that you are interested in
I will bve examining the variable DA0GR21N.
districtbase2<-districtbase %>% select(DISTNAME,DZCAMPUS,DPETSPEP,DA0AT21R,DA0GR21N) %>% na.omit(.)
pastecs::stat.desc(districtbase2$DA0GR21N)
## nbr.val nbr.null nbr.na min max range
## 1.081000e+03 0.000000e+00 0.000000e+00 1.000000e+00 1.158800e+04 1.158700e+04
## sum median mean SE.mean CI.mean.0.95 var
## 3.585130e+05 6.900000e+01 3.316494e+02 2.650349e+01 5.200417e+01 7.593325e+05
## std.dev coef.var
## 8.713968e+02 2.627464e+00
From the districtbase dataset, I will examine the variable of graduation rates. In this adminstrative case of student population examination, graduation rates are important but it determines the end goal of what a student strives to do in school, which is cross the proverbial finish line and graduate. This variable in the dataset is DA0GR21N.
ggplot(districtbase2, aes(x = DA0GR21N)) +
geom_histogram(col='red')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
I will now use the mutate function on the DA0GR21N variable to square it.
districtbase2 <- districtbase2 %>%
mutate(DA0GR21Nsquared = sqrt(DA0GR21N))
ggplot(districtbase2, aes(x = DA0GR21Nsquared)) +
geom_histogram(col='red')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This histogram shows a pronounced skew to the right.