1. From the data you have chosen, select a variable that you are interested in
  2. Use pastecs::stat.desc to describe the variable. Include a few sentences about what the variable is and what it’s measuring. Remember to load pastecs “library(pastecs)”
  3. Remove NA’s if needed using dplyr:filter (or anything similar)
  4. Provide a histogram of the variable (as shown in this lesson)
  5. transform the variable using the log transformation or square root transformation (whatever is more appropriate) using dplyr::mutate or something similar
  6. provide a histogram of the transformed variable
  7. submit via rpubs on CANVAS
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pastecs)
## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
dog_data <- read_csv("Dog_Bite_Data_20260204.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 76472 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Bite Number, Bite Type, Incident Date, Victim Relationship, Bite L...
## dbl  (2): Victim Age, Treatment Cost
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pastecs::stat.desc(dog_data$`Victim Age`)
##      nbr.val     nbr.null       nbr.na          min          max        range 
## 5.078200e+04 1.050000e+02 2.569000e+04 0.000000e+00 7.680000e+02 7.680000e+02 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
## 1.602031e+06 2.800000e+01 3.154722e+01 9.897458e-02 1.939912e-01 4.974589e+02 
##      std.dev     coef.var 
## 2.230379e+01 7.069968e-01

The variable I have chosen to analyze is vitim age. This variable explains the ages of the victims that have been biten by dogs. This summary provides the minimum value of the varuable, the maximum, the the median, mean, range, standard deviation, and number of NA values.

#Remove NA's 

dog_data<- dog_data %>% drop_na(`Victim Age`)
#Histogram of the variable 

hist(dog_data$`Victim Age`)

# remove age outliers to see if there is a difference in graphing
dog_data <- dog_data %>% filter(`Victim Age`<100)

hist(dog_data$`Victim Age`)

Once I dropped the age outlier, the data looks different, but still not normal bell curve shape.

#Transforming data using square root since I have zero values 

dog_data <- dog_data %>% mutate(Transformed_age=sqrt(`Victim Age`))

#Histogram of transformed variable 

hist(dog_data$Transformed_age)