I am looking at the frequency of Greek God and Goddess’ names. I am going to start broad and eventually work my way down to a short list of names. I plan to first visualize the data by looking at the data set by showing each name with its corresponding year and totals. I then want to look at the top six names and break them down by gender. After that, I want to look to see if the release of the Percy Jackson books and movie caused any spikes in the data. I hypothesize that there will be a spike 2005-2009 which is when the books were released and a spike in 2010 for when the movie was released.

First I will load the packages.

library(babynames)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(readr)
greek_gods <- read_csv("C:/Users/stilt/OneDrive/Desktop/greek_gods.csv")
## Rows: 445 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name-english, name-greek, main-type, sub-type, description
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
babynames
## # A tibble: 1,924,665 × 5
##     year sex   name          n   prop
##    <dbl> <chr> <chr>     <int>  <dbl>
##  1  1880 F     Mary       7065 0.0724
##  2  1880 F     Anna       2604 0.0267
##  3  1880 F     Emma       2003 0.0205
##  4  1880 F     Elizabeth  1939 0.0199
##  5  1880 F     Minnie     1746 0.0179
##  6  1880 F     Margaret   1578 0.0162
##  7  1880 F     Ida        1472 0.0151
##  8  1880 F     Alice      1414 0.0145
##  9  1880 F     Bertha     1320 0.0135
## 10  1880 F     Sarah      1288 0.0132
## # … with 1,924,655 more rows
colnames(greek_gods)[1] <- "name" 

greek_gods %>% 
  left_join(babynames, by="name") -> greek_god_names

This is the original data set.

greek_god_names %>% 
  arrange(desc(prop)) %>% 
  head(10)
## # A tibble: 10 × 9
##    name   `name-greek` `main-type` `sub-type` descri…¹  year sex       n    prop
##    <chr>  <chr>        <chr>       <chr>      <chr>    <dbl> <chr> <int>   <dbl>
##  1 Damon  Δαμων        god         sea        sea spi…  1976 M      2455 0.00150
##  2 Damon  Δαμων        god         sea        sea spi…  1974 M      2360 0.00145
##  3 Damon  Δαμων        god         sea        sea spi…  1975 M      2281 0.00141
##  4 Damon  Δαμων        god         sea        sea spi…  1977 M      2356 0.00138
##  5 Damon  Δαμων        god         sea        sea spi…  1973 M      2048 0.00127
##  6 Athena Ἀθηνᾶ        god         olympian   goddess…  2017 F      2365 0.00126
##  7 Damon  Δαμων        god         sea        sea spi…  1978 M      1986 0.00116
##  8 Damon  Δαμων        god         sea        sea spi…  1972 M      1926 0.00115
##  9 Athena Ἀθηνᾶ        god         olympian   goddess…  2016 F      2171 0.00113
## 10 Athena Ἀθηνᾶ        god         olympian   goddess…  2015 F      2048 0.00105
## # … with abbreviated variable name ¹​description
greek_god_names %>% 
  group_by(name, year) %>% 
  summarize(total = sum(n)) %>% 
  arrange(desc(total)) -> god_summary
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.

I am creating a chart to show each name with its corresponding total.

god_summary %>% 
  group_by(name) %>%
  summarise(total = sum(total)) %>% 
  arrange(desc(total))
## # A tibble: 444 × 2
##    name    total
##    <chr>   <int>
##  1 Iris    80311
##  2 Damon   63566
##  3 Simon   58830
##  4 Daphne  35066
##  5 Phoebe  32518
##  6 Athena  31186
##  7 Lupe    24571
##  8 Angelia 22208
##  9 Rhea    16397
## 10 Thalia  12945
## # … with 434 more rows

I am graphing the top six most popular names into two bar graphs organized by gender.

Based on these graphs, majority of Iris, Athena, Phoebe, and Daphne were females, and Simon and Damon were males.

babynames %>% 
  filter(name %in% c("Iris", "Damon", "Simon", "Daphne", "Phoebe", "Athena")) %>%
  group_by(name, sex) %>% 
  summarize(mean = mean(prop)) %>%
  arrange(desc(mean)) %>%
  ggplot(aes(x = mean, y = reorder(name, mean))) + geom_col() + facet_wrap(~sex)
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.

I am looking at the 6 most popular names starting after the year 2000. I believe that there will be a spike in popularity in the years 2005-2009 because that is when the Percy Jackson series was released.

After looking at this graph, my initial hypothesis was incorrect, there is not a spike between 2005 and 2009. There does seem to be a substantial spike for Iris and Athena around 2015. One explanation for this could be that the teenagers who were reading Percy Jackson when it came out did not have kids until 10 years later and then started naming their children Athena and Iris.

babynames %>%
  filter(year > 2000) -> greek_gods_2000 
greek_gods_2000 %>%
  filter(name %in% c("Iris", "Damon", "Simon", "Daphne", "Phoebe", "Athena")) %>%
  group_by(name, year) %>% 
  summarize(total = sum(n)) %>%
  ggplot(aes(year, total, color = name)) + geom_line()
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.

After seeing spikes starting to form in 2010-2015 from the last graph, I wanted to look at the three main characters names from the movie (Percy, Annabeth, and Grover.”)

This graph is interesting in the way that there are two clear spikes for the name Annabeth. The first spike is during the release of the Percy Jackson books and the second spike begins after the release of the movie.

greek_gods_2000 %>%
  filter(name %in% c("Percy", "Annabeth", "Grover")) %>%
  group_by(name, year) %>% 
  summarize(total = sum(n)) %>%
  ggplot(aes(year, total, color = name)) + geom_line() -> main_characters
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.
main_characters +
  annotate("segment", x = 2005, xend = 2005, y = 150, yend = 125) +
  annotate("text", x = 2005, y = 160, label = "Book Release", size = 2) +
  annotate("segment", x = 2010, xend = 2010, y = 160, yend = 125) +
  annotate("text", x = 2010, y = 170, label = "Movie Release", size = 2)

In conclusion, I have found that Iris, Damon, Simon, Daphne, Phoebe, and Athena are the most popular names from the Greek God and Goddess data set. I further found that my initial hypothesis of the Percy Jackson books release date directly affecting Greek names was incorrect. What did prove to be true is that there was a spike around the movie release date for Athena and Iris. There was also a clear spike for Annabeth, one of the main character’s names, during both the book and movie release dates. During this project I ran into errors with the ggplots and how they were displaying the data. I also was not expecting my hypothesis to as off as it was. In the future, I think it would interesting to look at more of the names of the specific characters in the Percy Jackson series instead of Greek names as a whole.