Import data

Hint: You can choose any data you like but can’t take one that is already taken by other groups.

library(tidyverse)
seattle_pets <- read_csv("seattle_pets.csv")
library(vcd)

Explain data

Hint: Source and description of data, and definition of variables.

The source of the data is articles on the most popular pet names in 2018. This article specifically is from Seattle. The variables are the date the animal was registered with Seattle, the unique license number, the animal’s name, animal’s species, the primary breed of the animal, the secodary breed if it is mixed, and the zip code the animal is refistered under.

Visualize data

Hint: Create at least two plots.

library(ggplot2)
data(seattle_pets)

Seattle_animals <- filter(seattle_pets, 
                  species == "Dog" | 
                  species == "Cat")
# plot the distribution of race
ggplot(Seattle_animals, aes(x = species)) + 
  geom_bar()

Seattle_animals <- 
  Seattle_animals %>%
  mutate(primary_breed = fct_lump(primary_breed, 10))

library(ggplot2)

# stacked bar chart
ggplot(Seattle_animals, 
       aes(x = species, 
           fill = primary_breed)) + 
  geom_bar(position = "stack")

# What is the most popular pet in Seattle?
# Number of pets by zip code and by species
library(tidytext)
Seattle_animals %>%
  count(species, zip_code, sort = TRUE) %>%
  group_by(species) %>%
  top_n(5) %>% 
  ungroup() %>%
  mutate(zip_code = reorder_within(zip_code, n, species)) %>%
  ggplot(aes(zip_code, n, fill = species)) +
  geom_col() +
  facet_wrap(~species, scales = "free_y") +
  coord_flip() +
  scale_x_reordered() +
  theme(legend.position = "none") +
  labs(title = "Top Five Zip Codes by Category of Pets",
       x = "Zip Code",
       y = "Number of Pets")


# What is the most popular pet in Seattle?
# Number of pets by zip code and by species
library(tidytext)
Seattle_animals %>%
  filter(!is.na(animals_name)) %>%
  count(species, animals_name, sort = TRUE) %>% 
  group_by(species) %>%
  top_n(5) %>% 
  ungroup() %>%
  mutate(animals_name = reorder_within(animals_name, n, species)) %>%
  ggplot(aes(animals_name, n, fill = species)) +
  geom_col() +
  facet_wrap(~species, scales = "free_y") +
  coord_flip() +
  scale_x_reordered() +
  theme(legend.position = "none") +
  labs(title = "Top Five Most Popular Pet Names in Seattle by Category",
       x = NULL,
       y = "Number of Pets")

Correlation and regression analysis

Seattle_animal_zip <- 
  Seattle_animals %>%
  mutate(zip_code = fct_lump(zip_code, 5))


tbl <- xtabs(~species + zip_code, Seattle_animal_zip)
ftable(tbl)
##         zip_code 98103 98115 98117 98122 98125 Other
## species                                             
## Cat               1655  1626  1344   931   934 10677
## Dog               2845  3111  2524  1584  1939 22908

library(vcd)
mosaic(tbl,
       shade = TRUE,
       legend = TRUE, 
       main = "Pets by Zip Code in Seattle", 
       labeling= labeling_border(rot_labels = c(45,0,0,90),
                                 offset_varnames = c(1, 0, 0, 0)))

Share interesting stories you found from the data

There is at least 10,000 more dogs than cats in the Seattle area. There are more domestic shorthair’s than there are other breeds for cats. There is more other breeds the there are for the dog breeds in the top 10.

Hide the messages, but display the code and its results on the webpage.

List names of all group members (both first and last name) at the top of the webpage.

Use the correct slug.