Name analysis

library(tidyverse)

## ── Attaching packages ────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  2.0.0     ✔ dplyr   0.7.8
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.3.1     ✔ forcats 0.3.0

## ── Conflicts ───────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(wordcloud2)
library(babynames)

Analysis of the name Julia.

babynames %>%                             
  filter(year == 2003, sex == "F") %>%    
  mutate(rank = row_number()) %>%         
  mutate(percent = round(prop * 100, 1)) %>% 
  filter(name == "Julia")

Julia was the 33rd most popular girl’s name the year she was born, 2003.

Word cloud of girls’ names, 2003:

babynames %>%
  filter(year == 2003) %>%     # use only one year
  filter(sex == "F") %>%       # use only one sex
  select(name, n) %>%          # select the two relevant variables: the name and how often it occurs
  top_n(100, n) %>%            # use only the top names or it could get too big
  wordcloud2(size = .5)        # generate the word cloud at a font size of .5

Graph of popularity of the name Julia over time.

babynames %>%                                    # start with the data
  filter(name == "Julia", sex == "F") %>%      # choose the name and sex
  mutate(percent = round(prop * 100, 1)) %>%     # create a new variable called percent
  ggplot(aes(x = year, y = percent)) +           # put year on the x-axis and prop (proportion) on y
  geom_line(color = "red")                      # make it a line graph and give the line a color

What years was the name Julia most popular, as a proportion of baby girls’ names?

babynames %>%                                  # Start with the dataset
  filter(name == "Julia", sex == "F") %>%    # only look at the name and sex you want
  top_n(10, prop) %>%                          # get the top 10 names
  arrange(-prop)                               # sort in descending order

Comparison of the poplarity of names in my family.

babynames %>%
  filter(name == "Emma" | name == "Michele" | name == "Julia") %>% 
  filter(sex == "F") %>% 
  ggplot(aes(x = year, y = n, color = name)) +
  geom_line()