Hint: You can choose any data you like but can’t take one that is already taken by other groups.
library(tidyverse)
seattle_pets <- read_csv("seattle_pets.csv")
library(vcd)
Hint: Source and description of data, and definition of variables.
The source of the data is articles on the most popular pet names in 2018. This article specifically is from Seattle. The variables are the date the animal was registered with Seattle, the unique license number, the animal’s name, animal’s species, the primary breed of the animal, the secodary breed if it is mixed, and the zip code the animal is refistered under.
Hint: Create at least two plots.
library(ggplot2)
data(seattle_pets)
Seattle_animals <- filter(seattle_pets,
species == "Dog" |
species == "Cat")
# plot the distribution of race
ggplot(Seattle_animals, aes(x = species)) +
geom_bar()
Seattle_animals <-
Seattle_animals %>%
mutate(primary_breed = fct_lump(primary_breed, 10))
library(ggplot2)
# stacked bar chart
ggplot(Seattle_animals,
aes(x = species,
fill = primary_breed)) +
geom_bar(position = "stack")
Seattle_animal_zip <-
Seattle_animals %>%
mutate(zip_code = fct_lump(zip_code, 5))
tbl <- xtabs(~species + zip_code, Seattle_animal_zip)
ftable(tbl)
## zip_code 98103 98115 98117 98122 98125 Other
## species
## Cat 1655 1626 1344 931 934 10677
## Dog 2845 3111 2524 1584 1939 22908
library(vcd)
mosaic(tbl,
shade = TRUE,
legend = TRUE,, main = "Animal data")