1. From the data you have chosen, select a variable that you are interested in
  2. Use pastecs::stat.desc to describe the variable. Include a few sentences about what the variable is and what it’s measuring. Remember to load pastecs “library(pastecs)”
  3. Remove NA’s if needed using dplyr:filter (or anything similar)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pastecs)
## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
library(readxl)
district <- read_excel("district.xls")
clean_district<-district |> drop_na()
stat.desc(clean_district$DPETGIFP,norm=T)
##      nbr.val     nbr.null       nbr.na          min          max        range 
## 3.230000e+02 1.200000e+01 0.000000e+00 0.000000e+00 2.460000e+01 2.460000e+01 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
## 2.234200e+03 6.500000e+00 6.917028e+00 1.962334e-01 3.860615e-01 1.243794e+01 
##      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE 
## 3.526746e+00 5.098644e-01 1.240916e+00 4.573423e+00 3.682716e+00 6.806890e+00 
##   normtest.W   normtest.p 
## 9.175607e-01 2.503586e-12

DPETGIFP = STUDENTS: % GIFTED & TALENTED EDUCATION. It is the variable that measures the percentage of students within a school district that participate in the Gifted and Talented program.

  1. Provide a histogram of the variable (as shown in this lesson)
hist(clean_district$DPETGIFP,breaks=40,probability = T)
lines(density(clean_district$DPETGIFP),col='red',lwd=2)

  1. transform the variable using the log transformation or square root transformation (whatever is more appropriate) using dplyr::mutate or something similar
district_transformed<-clean_district |> mutate(DPETGIFP_log=log(DPETGIFP))
  1. provide a histogram of the transformed variable
hist(district_transformed$DPETGIFP,breaks=40,probability = T)
lines(density(district_transformed$DPETGIFP),col='red',lwd=2)