── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(infer)library(skimr)
BRIEF DESCRIPTION
For this assignment, after installing the DS Labs package, i choose a data set called “murders”. The data set is made of 51 observations representing the 50 State of the USA plus the District of Colombia. Five (5) different variables are presented including the States, the Regions, the abbreviation of the states, the total population per State and the number of murder per State. After cleaning up the data set, i will create a box plot of total murders per region, to explore variation of murders across different regions.
Loading package and data
# Load DS Labs (package)library(dslabs)# Load the data setdata(murders)
# Looking at understanding variables definitionsnames(murders)
[1] "state" "abb" "region" "population" "total"
# Make all headers lowercase and remove spacesnames(murders) <-tolower(names(murders))names(murders) <-gsub(" ","",names(murders))head(murders)
state abb region population total
1 Alabama AL South 4779736 135
2 Alaska AK West 710231 19
3 Arizona AZ West 6392017 232
4 Arkansas AR South 2915918 93
5 California CA West 37253956 1257
6 Colorado CO West 5029196 65
summary(murders)
state abb region population
Length:51 Length:51 Northeast : 9 Min. : 563626
Class :character Class :character South :17 1st Qu.: 1696962
Mode :character Mode :character North Central:12 Median : 4339367
West :13 Mean : 6075769
3rd Qu.: 6636084
Max. :37253956
total
Min. : 2.0
1st Qu.: 24.5
Median : 97.0
Mean : 184.4
3rd Qu.: 268.0
Max. :1257.0
Multivariate Box Plot
ggplot(murders, aes(x = region, y = total, fill = region)) +geom_boxplot() +labs( title ="Distribution of Total Murders by Region",x ="region",y ="Total Murders") +theme_minimal()