Literary Character Names

Author

Ella Kucera

This project looks at the popularity of Literary Character names, as baby names throughout the United States during different periods of time.

The data was collected here:

https://www.britannica.com/topic/list-of-fictional-characters-2045983

Hypothesis: Traditional literary names will be at peak popularity during the 1900s and drastically loose popularity in the 2000s because unique names are becoming popular for babies.

library("readxl")
library("rvest")
library(babynames)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()         masks stats::filter()
✖ readr::guess_encoding() masks rvest::guess_encoding()
✖ dplyr::lag()            masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

names <- read_excel("Names.xlsx")

babynames -> babynames

read_excel("Names.xlsx") -> names

babynames |> 
  inner_join(names, relationship = "many-to-many") -> char_names

Joining with `by = join_by(name)`

char_names |> 
  group_by(name) |> 
  filter(sex=="F") |> 
  summarize(total = sum(n)) |> 
  arrange(desc(total)) |> 
  head(15) -> literary_namesf

The top 15 female names from the data set were; Anna, Anne, Carol, Catherine, Clara, Dorothy, Emma, Hazel, Jean, Mary, Nancy, Pamela, Robin, Sara and Sophia.

char_names |> 
  group_by(name) |> 
  filter(sex=="M") |> 
  summarize(total = sum(n)) |> 
  arrange(desc(total)) |> 
  head(15) -> literary_namesm

The top 15 male names from the data set were; Anthony, Charles, Christopher, Daniel, Edward, George, Jacob, James, John, Jonathan, Matthew, Nicholas, Richard, Samuel and William.

babynames |>
  filter(name %in% c("Charles", "James", "John", "Charles", "William", "Richard", "Christopher", "Daniel", "Matthew", "George", "Anthony", "Edward", "Jacob", "Nicholas", "Jonathan", "Samuel" ))|> 
  filter(year > 1900 )|> 
  filter(sex=="M") |> 
  ggplot(aes(year, prop, color = name)) + geom_line()

According to this graph that showcases the male names from 1900 to 2000, the names; Nicholas, Charles, Anthony, Willian, George, James, John and Edward between 1900 and 1925 are staying at a consistent popularity rate. However after 1925, they begin to decrease, and Richard, Christopher, Daniel, Matthew, and Samuel start to gain popularity. Christopher, Samuel, Jacob, Matthew being to gain popularity in the 90s into the 2000s.

babynames |>
  filter(name %in% c("Charles", "James", "John", "Charles", "William", "Richard", "Christopher", "Daniel", "Matthew", "George", "Anthony", "Edward", "Jacob", "Nicholas", "Jonathan", "Samuel" ))|> 
  filter(year > 2000) |> 
  filter(sex=="M") |> 
  ggplot(aes(year, prop, color = name)) + geom_line()

This graph show us the male names from 2000 to 2017. Early 2000s majority of the names stayed at a consistent level however names such as Nicholas, Richard, Samuel, Matthew, George, Anthony, Christoper, and Daniel begin to decline in popularity after 2015.

babynames |>
  filter(name %in% c("Mary", "Emma", "Dorothy", "Nancy", "Anna", "Carol", "Catherine", "Pamela", "Jean", "Sara", "Sophia", "Anne", "Robin", "Clara", "Hazel"))|> 
  filter(year > 1900) |> 
  filter(sex=="F") |> 
  ggplot(aes(year, prop, color = name)) + geom_line()

The female names gave more variation compared to the male names with less common names as Dorothy, Robin, and Pamela. However this graph shows that unsurprisingly Mary was the most popular from 1900 to 1950, then begins to slowly decline. In the early 1900s Dorothy, Carol, Catherine, Anna, Nancy, and Jean were in the 0.02 mark. After 1950 you can see a shift in the ranks, Pamela, Robin, Sara, Sophia and Anna gain more popularity. Names that weren’t as popular and remained on the bottom of the graph were, Hazel, Clara, Anne, and Emma.

babynames |>
  filter(name %in% c("Mary", "Emma", "Dorothy", "Nancy", "Anna", "Carol", "Catherine", "Pamela", "Jean", "Sara", "Sophia", "Anne", "Robin", "Clara", "Hazel"))|> 
  filter(year > 2000) |> 
  filter(sex=="F") |> 
  ggplot(aes(year, prop, color = name)) + geom_line()

During the 2000s and 2010s consistently Emma and Anna stayed above the 0.006 percentage for popularity, as well as Hazel gaining popularity after 2010s, which can be inferred after the popular book Fault in Our Stars book was published and the movie adaptation came out in 2014. Names such as Sara, Anna, Mary, Robin, Pamela, Jean, Catherine, Carol, Anne, Dorothy, Nancy, and Clara have consistently stayed below the 0.003 line for popularity from 2000 to 2015.

Overall female names had more variation in terms of being unique and male names stuck to the traditional or formal route. Its safe to say that my predication was right, with unique names on the rise it’s unlikely we will see traditional baby names for a while.

library(readr)
babynamespub <- read_csv("babynamespub.qmd")

Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)

Rows: 76 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): ---

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.