HOMEWORK

  1. From the data you have chosen, select a variable that you are interested in
  2. Use pastecs::stat.desc to describe the variable. Include a few sentences about what the variable is and what it’s measuring. Remember to load pastecs “library(pastecs)”
  3. Remove NA’s if needed using dplyr:filter (or anything similar)
  4. Provide a histogram of the variable (as shown in this lesson)
  5. transform the variable using the log transformation or square root transformation (whatever is more appropriate) using dplyr::mutate or something similar
  6. provide a histogram of the transformed variable
  7. submit via rpubs on CANVAS
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(pastecs)
## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
library(readr)
CPI_Victims <- read_csv("CPI Victims.csv")
## Rows: 45891 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Region Code, Region, Confirmed Victims, Sex, Race/Ethnicity, Age
## dbl (2): Fiscal Year, Victims
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
victims_data <- CPI_Victims
pastecs::stat.desc(victims_data$Victims)
##      nbr.val     nbr.null       nbr.na          min          max        range 
## 4.589100e+04 0.000000e+00 0.000000e+00 1.000000e+00 9.770000e+02 9.760000e+02 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
## 2.626576e+06 1.200000e+01 5.723510e+01 5.185697e-01 1.016405e+00 1.234075e+04 
##      std.dev     coef.var 
## 1.110889e+02 1.940923e+00
#this variable is descibing the number of victims, confirmed and unconfirmed in Texas in 2025.
victims_data<-victims_data %>% drop_na()
hist(victims_data$Victims)

victims_data <-victims_data %>% mutate(victim_transformed=log(Victims))
hist(victims_data$victim_transformed)

victims_data <-victims_data %>% mutate(victim_transformed=sqrt(Victims))
hist(victims_data$victim_transformed)