I am looking at the frequency of Greek God and Goddess’ names. I am going to start broad and eventually work my way down to a short list of names. I plan to first visualize the data by looking at the data set by showing each name with its corresponding year and totals. I then want to look at the top six names and break them down by gender. After that, I want to look to see if the release of the Percy Jackson books and movie caused any spikes in the data. I hypothesize that there will be a spike 2005-2009 which is when the books were released and a spike in 2010 for when the movie was released.
First I will load the packages.
library(babynames)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(readr)
greek_gods <- read_csv("C:/Users/stilt/OneDrive/Desktop/greek_gods.csv")
## Rows: 445 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name-english, name-greek, main-type, sub-type, description
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
babynames
## # A tibble: 1,924,665 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1880 F Mary 7065 0.0724
## 2 1880 F Anna 2604 0.0267
## 3 1880 F Emma 2003 0.0205
## 4 1880 F Elizabeth 1939 0.0199
## 5 1880 F Minnie 1746 0.0179
## 6 1880 F Margaret 1578 0.0162
## 7 1880 F Ida 1472 0.0151
## 8 1880 F Alice 1414 0.0145
## 9 1880 F Bertha 1320 0.0135
## 10 1880 F Sarah 1288 0.0132
## # … with 1,924,655 more rows
colnames(greek_gods)[1] <- "name"
greek_gods %>%
left_join(babynames, by="name") -> greek_god_names
This is the original data set.
greek_god_names %>%
arrange(desc(prop)) %>%
head(10)
## # A tibble: 10 × 9
## name `name-greek` `main-type` `sub-type` descri…¹ year sex n prop
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <int> <dbl>
## 1 Damon Δαμων god sea sea spi… 1976 M 2455 0.00150
## 2 Damon Δαμων god sea sea spi… 1974 M 2360 0.00145
## 3 Damon Δαμων god sea sea spi… 1975 M 2281 0.00141
## 4 Damon Δαμων god sea sea spi… 1977 M 2356 0.00138
## 5 Damon Δαμων god sea sea spi… 1973 M 2048 0.00127
## 6 Athena Ἀθηνᾶ god olympian goddess… 2017 F 2365 0.00126
## 7 Damon Δαμων god sea sea spi… 1978 M 1986 0.00116
## 8 Damon Δαμων god sea sea spi… 1972 M 1926 0.00115
## 9 Athena Ἀθηνᾶ god olympian goddess… 2016 F 2171 0.00113
## 10 Athena Ἀθηνᾶ god olympian goddess… 2015 F 2048 0.00105
## # … with abbreviated variable name ¹description
greek_god_names %>%
group_by(name, year) %>%
summarize(total = sum(n)) %>%
arrange(desc(total)) -> god_summary
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.
I am creating a chart to show each name with its corresponding total.
god_summary %>%
group_by(name) %>%
summarise(total = sum(total)) %>%
arrange(desc(total))
## # A tibble: 444 × 2
## name total
## <chr> <int>
## 1 Iris 80311
## 2 Damon 63566
## 3 Simon 58830
## 4 Daphne 35066
## 5 Phoebe 32518
## 6 Athena 31186
## 7 Lupe 24571
## 8 Angelia 22208
## 9 Rhea 16397
## 10 Thalia 12945
## # … with 434 more rows
I am graphing the top six most popular names into two bar graphs organized by gender.
Based on these graphs, majority of Iris, Athena, Phoebe, and Daphne were females, and Simon and Damon were males.
babynames %>%
filter(name %in% c("Iris", "Damon", "Simon", "Daphne", "Phoebe", "Athena")) %>%
group_by(name, sex) %>%
summarize(mean = mean(prop)) %>%
arrange(desc(mean)) %>%
ggplot(aes(x = mean, y = reorder(name, mean))) + geom_col() + facet_wrap(~sex)
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.
I am looking at the 6 most popular names starting after the year 2000. I believe that there will be a spike in popularity in the years 2005-2009 because that is when the Percy Jackson series was released.
After looking at this graph, my initial hypothesis was incorrect, there is not a spike between 2005 and 2009. There does seem to be a substantial spike for Iris and Athena around 2015. One explanation for this could be that the teenagers who were reading Percy Jackson when it came out did not have kids until 10 years later and then started naming their children Athena and Iris.
babynames %>%
filter(year > 2000) -> greek_gods_2000
greek_gods_2000 %>%
filter(name %in% c("Iris", "Damon", "Simon", "Daphne", "Phoebe", "Athena")) %>%
group_by(name, year) %>%
summarize(total = sum(n)) %>%
ggplot(aes(year, total, color = name)) + geom_line()
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.
After seeing spikes starting to form in 2010-2015 from the last graph, I wanted to look at the three main characters names from the movie (Percy, Annabeth, and Grover.”)
This graph is interesting in the way that there are two clear spikes for the name Annabeth. The first spike is during the release of the Percy Jackson books and the second spike begins after the release of the movie.
greek_gods_2000 %>%
filter(name %in% c("Percy", "Annabeth", "Grover")) %>%
group_by(name, year) %>%
summarize(total = sum(n)) %>%
ggplot(aes(year, total, color = name)) + geom_line() -> main_characters
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.
main_characters +
annotate("segment", x = 2005, xend = 2005, y = 150, yend = 125) +
annotate("text", x = 2005, y = 160, label = "Book Release", size = 2) +
annotate("segment", x = 2010, xend = 2010, y = 160, yend = 125) +
annotate("text", x = 2010, y = 170, label = "Movie Release", size = 2)
In conclusion, I have found that Iris, Damon, Simon, Daphne, Phoebe, and Athena are the most popular names from the Greek God and Goddess data set. I further found that my initial hypothesis of the Percy Jackson books release date directly affecting Greek names was incorrect. What did prove to be true is that there was a spike around the movie release date for Athena and Iris. There was also a clear spike for Annabeth, one of the main character’s names, during both the book and movie release dates. During this project I ran into errors with the ggplots and how they were displaying the data. I also was not expecting my hypothesis to as off as it was. In the future, I think it would interesting to look at more of the names of the specific characters in the Percy Jackson series instead of Greek names as a whole.