HOMEWORK
- From the data you have chosen, select a variable that you are
interested in
- Use pastecs::stat.desc to describe the variable. Include a few
sentences about what the variable is and what it’s measuring. Remember
to load pastecs “library(pastecs)”
- Remove NA’s if needed using dplyr:filter (or anything similar)
- Provide a histogram of the variable (as shown in this lesson)
- transform the variable using the log transformation or square root
transformation (whatever is more appropriate) using dplyr::mutate or
something similar
- provide a histogram of the transformed variable
- submit via rpubs on CANVAS
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(pastecs)
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## The following object is masked from 'package:tidyr':
##
## extract
library(readr)
CPI_Victims <- read_csv("CPI Victims.csv")
## Rows: 45891 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Region Code, Region, Confirmed Victims, Sex, Race/Ethnicity, Age
## dbl (2): Fiscal Year, Victims
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
victims_data <- CPI_Victims
pastecs::stat.desc(victims_data$Victims)
## nbr.val nbr.null nbr.na min max range
## 4.589100e+04 0.000000e+00 0.000000e+00 1.000000e+00 9.770000e+02 9.760000e+02
## sum median mean SE.mean CI.mean.0.95 var
## 2.626576e+06 1.200000e+01 5.723510e+01 5.185697e-01 1.016405e+00 1.234075e+04
## std.dev coef.var
## 1.110889e+02 1.940923e+00
#this variable is descibing the number of victims, confirmed and unconfirmed in Texas in 2025.
victims_data<-victims_data %>% drop_na()
hist(victims_data$Victims)

victims_data <-victims_data %>% mutate(victim_transformed=log(Victims))
hist(victims_data$victim_transformed)

victims_data <-victims_data %>% mutate(victim_transformed=sqrt(Victims))
hist(victims_data$victim_transformed)
