library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pastecs)
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## The following object is masked from 'package:tidyr':
##
## extract
dog_data <- read_csv("Dog_Bite_Data_20260204.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 76472 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Bite Number, Bite Type, Incident Date, Victim Relationship, Bite L...
## dbl (2): Victim Age, Treatment Cost
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pastecs::stat.desc(dog_data$`Victim Age`)
## nbr.val nbr.null nbr.na min max range
## 5.078200e+04 1.050000e+02 2.569000e+04 0.000000e+00 7.680000e+02 7.680000e+02
## sum median mean SE.mean CI.mean.0.95 var
## 1.602031e+06 2.800000e+01 3.154722e+01 9.897458e-02 1.939912e-01 4.974589e+02
## std.dev coef.var
## 2.230379e+01 7.069968e-01
The variable I have chosen to analyze is vitim age. This variable explains the ages of the victims that have been biten by dogs. This summary provides the minimum value of the varuable, the maximum, the the median, mean, range, standard deviation, and number of NA values.
#Remove NA's
dog_data<- dog_data %>% drop_na(`Victim Age`)
#Histogram of the variable
hist(dog_data$`Victim Age`)
# remove age outliers to see if there is a difference in graphing
dog_data <- dog_data %>% filter(`Victim Age`<100)
hist(dog_data$`Victim Age`)
Once I dropped the age outlier, the data looks different, but still not normal bell curve shape.
#Transforming data using square root since I have zero values
dog_data <- dog_data %>% mutate(Transformed_age=sqrt(`Victim Age`))
#Histogram of transformed variable
hist(dog_data$Transformed_age)