The data set baby names provides all of the names for babies in the United States from the years 1880-2017. Using this resource I decided to explore the impact of Disney Princess films on the popularity of names in the US. I focused on the original 12 Disney Princesses according to the canonically accepted list the criteria of which is the following: a female protagonist, with some kind of royalty tie in their fictional universe.

Let’s begin by loading the necessary packages in R.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(babynames)

The first step is to analyze the popularity for each princess’s name individually. We will begin with Snow White and continue chronologically.

Snow White (1920)

babynames %>%
  filter(name == "Snow") %>%
  ggplot(aes(year, n, sex = "F", color = name)) +
  geom_line(color="blue2") +
  ggtitle("Popularity of the name Snow in US") +
  geom_vline(xintercept = 1937, color = "black", linetype = "dotted")


As the chart shows, there was not a significant increase in the name Snow following the release of the Disney film ‘Snow White and the Seven Dwarfs’ (which is represented by the dotted line near). The movie was released in 1937, which marks the beginning of a steady increase for the name Snow and it is is possible that the film contributed to that smaller increase. The clearer increase it in recent years after the year 2000, but it is unclear the cause of that increase. Since this was the first Disney Princess film, Disney did not have the popularity that it does today and therefore not as significant of an influence over the population. We can confirm each of these assuptions on the graph by looking at the values in a table.

babynames %>% 
  filter(name == 'Snow', sex == "F") %>%
  arrange(desc(n))
## # A tibble: 49 × 5
##     year sex   name      n       prop
##    <dbl> <chr> <chr> <int>      <dbl>
##  1  2014 F     Snow     77 0.0000394 
##  2  2013 F     Snow     59 0.0000307 
##  3  2015 F     Snow     59 0.0000303 
##  4  2017 F     Snow     59 0.0000315 
##  5  2016 F     Snow     55 0.0000285 
##  6  2012 F     Snow     35 0.0000181 
##  7  2006 F     Snow     17 0.00000814
##  8  2008 F     Snow     17 0.00000817
##  9  2010 F     Snow     15 0.00000766
## 10  2009 F     Snow     14 0.00000692
## # … with 39 more rows

The table confirms that the most popular years for the name Snow were from 2008 to 2017. It is unclear what lead to the name becoming so popular in recent years, but it is safe to say that the animated film was not the most significant factor in the popularity of this name.

The next name we will examine is Cinderella, and we can repeat the same process for each of the following princess names.

Cinderella (1950)

babynames %>% 
  filter(name == 'Cinderella') %>%
  ggplot(aes(year, n, color = name)) + 
  geom_line(color="deepskyblue1") +
  ggtitle("Popularity of the Name Cinderella in the US") +
  geom_vline(xintercept = 1950, color = "black", linetype = "dotted") +
  geom_vline(xintercept = 1922, color ="gray", linetype = "dotted")


The graph shows that name Cinderella has always varied in popularity. The popular Disney film that people associate with Cinderella was released in 1950, but there was actually one before that released in 1922. We can see a clear increase in the years following the release of both of the films. The 1922 version made the name slightly more popular, but it is clear that the movies had an impact on the popularity of the name. Additionally, there have been countless remakes using the popular “Cinderella trope” about a rags to riches story that could contribute to the continued popularity of the name.

babynames %>% 
  filter(name == 'Cinderella') %>%
  arrange(desc(n))
## # A tibble: 82 × 5
##     year sex   name           n       prop
##    <dbl> <chr> <chr>      <int>      <dbl>
##  1  1922 F     Cinderella    25 0.0000200 
##  2  1921 F     Cinderella    23 0.0000180 
##  3  1951 F     Cinderella    23 0.0000124 
##  4  1927 F     Cinderella    21 0.0000170 
##  5  1952 F     Cinderella    20 0.0000105 
##  6  1950 F     Cinderella    19 0.0000108 
##  7  1924 F     Cinderella    18 0.0000139 
##  8  1958 F     Cinderella    18 0.00000872
##  9  1954 F     Cinderella    17 0.00000854
## 10  1917 F     Cinderella    16 0.0000142 
## # … with 72 more rows

The table proves that the most popular years for the name Cinderella are in the years following the release of both movies. But it is also important to consider the origin of this name, so we will load a table to see when the name was first used.

babynames %>% 
  filter(name == 'Cinderella')
## # A tibble: 82 × 5
##     year sex   name           n       prop
##    <dbl> <chr> <chr>      <int>      <dbl>
##  1  1894 F     Cinderella     7 0.0000297 
##  2  1897 F     Cinderella     5 0.0000201 
##  3  1900 F     Cinderella     5 0.0000157 
##  4  1904 F     Cinderella     8 0.0000274 
##  5  1907 F     Cinderella     6 0.0000178 
##  6  1911 F     Cinderella     7 0.0000158 
##  7  1912 F     Cinderella     5 0.00000852
##  8  1913 F     Cinderella    11 0.0000168 
##  9  1914 F     Cinderella     6 0.00000753
## 10  1915 F     Cinderella    11 0.0000107 
## # … with 72 more rows

It can also be noted that the first popular use of the name Cinderella was in the mid 1890s. The original story of Cinderella was written by the Grimm brothers in 1820, so it is possible that the story impacted the name Cinderella, but it would have been about a 70 year delay.

Aurora (1959)

babynames %>%
  filter(name == "Aurora") %>%
  ggplot(aes(year, n, sex = "F", color = name, year)) +
  geom_line(color= "deeppink1") +
  ggtitle("Popularity of the name Aurora in US") +
  geom_vline(xintercept = 1959, color = "black", linetype = "dotted")


The original Sleeping Beauty Disney film came out in 1959. Even though that was not a significantly popular year for the name, there is still a possibility that the Disney story effected this names popularity. The largest increase is in the more recent years leading up to 2017. In 2014 the movie Maleficent, that was based on the Sleeping Beauty story, came out, which could have had an impact on the recent popularity of the name.

babynames %>% 
  filter(name == 'Aurora') %>%
  arrange(desc(n)) %>%
  head(20)
## # A tibble: 20 × 5
##     year sex   name       n     prop
##    <dbl> <chr> <chr>  <int>    <dbl>
##  1  2017 F     Aurora  4573 0.00244 
##  2  2016 F     Aurora  3983 0.00207 
##  3  2015 F     Aurora  3639 0.00187 
##  4  2014 F     Aurora  2739 0.00140 
##  5  2013 F     Aurora  2127 0.00111 
##  6  2012 F     Aurora  1903 0.000983
##  7  2011 F     Aurora  1724 0.000891
##  8  2010 F     Aurora  1527 0.000780
##  9  2009 F     Aurora  1487 0.000735
## 10  2008 F     Aurora  1173 0.000564
## 11  2006 F     Aurora  1067 0.000511
## 12  2007 F     Aurora  1066 0.000504
## 13  2004 F     Aurora   989 0.000490
## 14  2005 F     Aurora   950 0.000468
## 15  2003 F     Aurora   810 0.000404
## 16  2001 F     Aurora   684 0.000345
## 17  2002 F     Aurora   664 0.000336
## 18  2000 F     Aurora   557 0.000279
## 19  1999 F     Aurora   520 0.000267
## 20  1998 F     Aurora   503 0.000260

The table shows that the name has had an increase in popularity in recent years. I expanded the most popular list to 20 to see that before the steady increase since the year 2000, the late 1920s were the most popular for the name Aurora. It is unclear why the 1920s had an increase in the name’s popularity. I believe that a possible reason the movie did not have a significant impact on the name was because the main character Aurora is mainly referred to as Sleeping Beauty for a majority of the film and is in very small amount of the film.

Ariel (1989)

babynames %>% 
  filter(name == 'Ariel') %>%
  ggplot(aes(year, n, color = name)) + 
  geom_line(color= "brown1") +
  ggtitle("Popularity of the name Ariel in US") +
  geom_vline(xintercept = 1989, color = "black", linetype = "dotted")


The graph shows that there is a significant increase in the name Ariel following the release of the movie. The Disney animated film “The Little Mermaid” was released in 1989, and since the name was practically non-existent before the film, it is clear that the movie had a significant influence on the name Ariel.

 babynames %>% 
  filter(name == 'Ariel') %>%
  arrange(desc(n))
## # A tibble: 208 × 5
##     year sex   name      n    prop
##    <dbl> <chr> <chr> <int>   <dbl>
##  1  1991 F     Ariel  5410 0.00266
##  2  1992 F     Ariel  3960 0.00198
##  3  1990 F     Ariel  3606 0.00176
##  4  1993 F     Ariel  2708 0.00137
##  5  2014 F     Ariel  2434 0.00125
##  6  2015 F     Ariel  2344 0.00120
##  7  1997 F     Ariel  2212 0.00116
##  8  1994 F     Ariel  2187 0.00112
##  9  2016 F     Ariel  2185 0.00113
## 10  1995 F     Ariel  2149 0.00112
## # … with 198 more rows

According to the table the most popular time for the name Ariel was in the four years following the release of the movie proving that the film had an extremely strong influence over the name Ariel.

Belle (1991)

babynames %>% 
  filter(name == 'Belle') %>%
  ggplot(aes(year, n, color = name)) + 
  geom_line(color= "darkgoldenrod1" ) +
  ggtitle("Popularity of the name Belle in US") +
  geom_vline(xintercept = 1991, color = "black", linetype = "dotted")+
  geom_vline(xintercept = 2017, color = "gray", linetype = "dotted")


As it can be seen on the graph there were a significant number of babies named Belle in the 1880s, with the largest increase right before the 1920s. The Disney animated film “Beauty and the Beast” was released in 1991, which appears to create another increase in the name’s popularity. Considering the name had been on a decrease since the peak before the 1920s, I believe that the film had an impact on the name’s popularity, even if it did not create the largest increase in popularity. There was also a live action remake of “Beauty and the Beast” released in 2017, that could have been a factor to the final increase. However, we would need to have more information about baby names past 2017 in order to see that trend.

babynames %>% 
  filter(name == 'Belle') %>%
  arrange(desc(n)) %>%
  head(30)
## # A tibble: 30 × 5
##     year sex   name      n     prop
##    <dbl> <chr> <chr> <int>    <dbl>
##  1  1915 F     Belle   407 0.000398
##  2  1914 F     Belle   353 0.000443
##  3  2017 F     Belle   335 0.000179
##  4  1916 F     Belle   334 0.000308
##  5  1888 F     Belle   317 0.00167 
##  6  1917 F     Belle   314 0.000279
##  7  1890 F     Belle   304 0.00151 
##  8  2016 F     Belle   293 0.000152
##  9  1900 F     Belle   291 0.000916
## 10  1884 F     Belle   286 0.00208 
## # … with 20 more rows

The name Belle is one of the older names being studied from this Disney set. According to the table the most popular time to use the name Belle was around 1915, therefore the Disney movie did not have the most impact on this name, but it is possible that the live action movie have been an influence in its recent popularity.

Jasmine (1992)

babynames %>%
  filter(name == "Jasmine") %>%
  ggplot(aes(year, n, sex = "F", color = name)) +
  geom_line(color="cyan3") +
  ggtitle("Popularity of the name Jasmine in US") +
  geom_vline(xintercept = 1992, color = "black", linetype = "dotted")


From the graph we can see that following the release of the movie ‘Aladdin’ in 1992 there was an increase in the name Jasmine. However, we can also see that there was a steady increase in the name before the release of the movie, so the Disney movie was not the only cause of this increase. Therefore it is clear that the popularity of the name Jasmine was not created by the Disney film, but was certainly effected by it.

babynames %>% 
  filter(name == 'Jasmine', sex == "F") %>%
  arrange(desc(n))
## # A tibble: 84 × 5
##     year sex   name        n    prop
##    <dbl> <chr> <chr>   <int>   <dbl>
##  1  1993 F     Jasmine 12060 0.00612
##  2  1994 F     Jasmine 11711 0.00601
##  3  1991 F     Jasmine 11524 0.00567
##  4  1990 F     Jasmine 11035 0.00537
##  5  1992 F     Jasmine 10475 0.00523
##  6  1995 F     Jasmine 10279 0.00535
##  7  1996 F     Jasmine  9708 0.00506
##  8  1997 F     Jasmine  9678 0.00507
##  9  1989 F     Jasmine  9549 0.00479
## 10  1998 F     Jasmine  9484 0.00489
## # … with 74 more rows

As seen in the table the two years following the release of the movie were the two most popular years for the name Jasmine.

Mulan (1998) The first use of the name Mulan was in 1998, which was the same year that the Disney film was released.

babynames %>% 
  filter(name == 'Mulan') %>%
  ggplot(aes(year, n, color = name)) + 
  geom_line(color="darkolivegreen") +
  ggtitle("Popularity of the name Mulan in US") +
  geom_vline(xintercept = 1998, color = "black", linetype = "dotted")


The graph shows that there was no use of the name Mulan before the release of the film in 1998. It can also be seen that there has been an increase in the name’s popularity since about 2010. Even though there have been more popular years for the name Mulan since the film was released, it is still clear that the Disney movie had an impact because it essentially created the name’s popularity.

babynames %>% 
  filter(name == 'Mulan') %>%
  arrange(desc(n))
## # A tibble: 19 × 5
##     year sex   name      n       prop
##    <dbl> <chr> <chr> <int>      <dbl>
##  1  2016 F     Mulan    32 0.0000166 
##  2  2017 F     Mulan    29 0.0000155 
##  3  2013 F     Mulan    27 0.0000140 
##  4  2014 F     Mulan    26 0.0000133 
##  5  2012 F     Mulan    23 0.0000119 
##  6  2015 F     Mulan    23 0.0000118 
##  7  1998 F     Mulan    16 0.00000826
##  8  2009 F     Mulan    11 0.00000544
##  9  2010 F     Mulan    11 0.00000562
## 10  2006 F     Mulan    10 0.00000479
## 11  2001 F     Mulan     8 0.00000404
## 12  2008 F     Mulan     8 0.00000384
## 13  2011 F     Mulan     8 0.00000413
## 14  1999 F     Mulan     7 0.0000036 
## 15  2000 F     Mulan     6 0.00000301
## 16  2002 F     Mulan     6 0.00000304
## 17  1999 M     Mulan     5 0.00000245
## 18  2005 F     Mulan     5 0.00000247
## 19  2007 F     Mulan     5 0.00000236

According to the table 2016 and 2017 were the most popular years for the name Mulan. It is unclear the origin of the increase in the name. There was a live action remake by Disney for Mulan, but that was released in 2020 which lies outside of the range for this baby names data set. The seventh most popular year for the name was the year that the film was released, and as we saw on the graph there were no uses for this name before the release of the film.

Pocahontas (1995)

babynames %>% 
  filter(name == 'Pocahantas')
## # A tibble: 0 × 5
## # … with 5 variables: year <dbl>, sex <chr>, name <chr>, n <int>, prop <dbl>

There was no one named Pocahontas from this data set and therefore the Disney movie had no effect on the popularity of the name.

Tiana (2009)

babynames %>% 
  filter(name == 'Tiana') %>%
  ggplot(aes(year, n, color = name)) + 
  geom_line(color="chartreuse2") +
  ggtitle("Popularity of the name Tiana in US") +
  geom_vline(xintercept = 2009, color = "black", linetype = "dotted")


From the graph it can be seen that the most popular years for the name Tiana were in the years leading up to 2000, but the Disney animated film “The Princess and the Frog” was released in 2009. It is unclear the reason behind the name’s popularity in the years prior to 2000, but following that there was a decrease until the film was released. Therefore the film still had an impact on the popularity of the name Tiana because after the film was released it created an increase in the name that had been decreasing for about 10 years.

  babynames %>% 
    filter(name == 'Tiana') %>%
    arrange(desc(n))
## # A tibble: 65 × 5
##     year sex   name      n     prop
##    <dbl> <chr> <chr> <int>    <dbl>
##  1  1998 F     Tiana  1028 0.000530
##  2  1995 F     Tiana  1025 0.000534
##  3  2010 F     Tiana   964 0.000492
##  4  1994 F     Tiana   948 0.000486
##  5  1999 F     Tiana   939 0.000482
##  6  1996 F     Tiana   937 0.000489
##  7  1997 F     Tiana   937 0.000491
##  8  2000 F     Tiana   887 0.000445
##  9  2001 F     Tiana   865 0.000437
## 10  1993 F     Tiana   825 0.000419
## # … with 55 more rows

According to the table the third most popular year for the name Tiana was the same year that the Disney movie was released. Even though that was not the most popular year for the name Tiana, it can still be concluded that the movie had an impact on the name because it changed the trend from decreasing to increasing.

Rapunzel (2010)

  babynames %>% 
    filter(name == 'Rapunzel') %>%
    ggplot(aes(year, n, color = name)) + 
    geom_line(color="orchid") +
    ggtitle("Popularity of the name Rapunzel in US") +
    geom_vline(xintercept = 2010, color = "black", linetype = "dotted")


The name Rapunzel was clearly not popular. The original story was released by the Grimm brothers in the 1820s and the popular Disney film “Tangled” based on the Rapunzel story was released in 2010. However none of those dates are impacted on the graph and it can be concluded that the Disney film most likely did not impact the popularity of the name. But we should look at a table to more clearly see when the name was used.

  babynames %>% 
    filter(name == 'Rapunzel') %>%
    arrange(desc(n))
## # A tibble: 2 × 5
##    year sex   name         n       prop
##   <dbl> <chr> <chr>    <int>      <dbl>
## 1  1959 F     Rapunzel     9 0.00000433
## 2  2017 F     Rapunzel     6 0.0000032

The table shows that the only years with the name Rapunzel were in 1959 and 2017. It is possible that the 2010 film was a factor for the sudden use of the name in 2017, but if that is the reason, is unclear why that would occur in 2017 and not closer to the release of the film.

Merida (2012)

  babynames %>% 
    filter(name == 'Merida') %>%
    ggplot(aes(year, n, color = name)) + 
    geom_line(color = "darkcyan") +
    ggtitle("Popularity of the name Merida in US") +
    geom_vline(xintercept = 2012, color = "black", linetype = "dotted")

The name Merida shows a significant increase in recent years because the Disney Pixar movie “Brave” was released in 2012. As the graph shows there were very few uses of the name until the release of the film created the steepest increase we’ve seen so far.

  babynames %>% 
    filter(name == 'Merida') %>%
    arrange(desc(n))
## # A tibble: 58 × 5
##     year sex   name       n       prop
##    <dbl> <chr> <chr>  <int>      <dbl>
##  1  2017 F     Merida   116 0.0000619 
##  2  2013 F     Merida   110 0.0000572 
##  3  2014 F     Merida   103 0.0000528 
##  4  2016 F     Merida   101 0.0000524 
##  5  2015 F     Merida    99 0.0000509 
##  6  2012 F     Merida    19 0.00000981
##  7  1949 F     Merida    13 0.00000741
##  8  1957 F     Merida    12 0.00000572
##  9  1943 F     Merida    11 0.00000766
## 10  1968 F     Merida    11 0.00000643
## # … with 48 more rows

The table shows that the most common years for the name Merida were in the years following the release fo the film. It is logical to conclude that the Disney film had a significant impact on the popularity of the name Merida.

Moana (2016)

  babynames %>% 
    filter(name == 'Moana') %>%
    ggplot(aes(year, n, color = name)) + 
    geom_line(color = "steelblue1") +
    ggtitle("Popularity of the name Moana in US") +
    geom_vline(xintercept = 2016, color = "black", linetype = "dotted")

The graph shows that there is a significant increase in the name Moana in the recent years because the Disney movie was released in 2016. There is a limitation to this data because this movie was released in 2016 and the baby names data stops in 2017.

  babynames %>% 
    filter(name == 'Moana') %>%
    arrange(desc(n))
## # A tibble: 60 × 5
##     year sex   name      n       prop
##    <dbl> <chr> <chr> <int>      <dbl>
##  1  2017 F     Moana   141 0.0000752 
##  2  2016 F     Moana    57 0.0000296 
##  3  2008 F     Moana    19 0.00000913
##  4  2013 F     Moana    19 0.00000988
##  5  2015 F     Moana    18 0.00000925
##  6  2006 F     Moana    16 0.00000766
##  7  2014 F     Moana    16 0.0000082 
##  8  1977 F     Moana    13 0.0000079 
##  9  2000 F     Moana    13 0.00000652
## 10  1970 F     Moana    12 0.00000655
## # … with 50 more rows

The most popular year for the name Moana was in 2017, the year following the release of the movie. Similarly to the name Merida, it is a fairly uncommon name in the US before the movie it is easy to see the impact that the movie had on the popularity of the name.

All Princesses Now that we’ve looked at each princess individually, we will look at the effects of all the Disney princesses names. The first step is to create a variable for all of the princess names, excluding Pocahontas because there were no records with that name.

babynames %>%
  filter(name == c("Snow", "Cinderella", "Jasmine", "Aurora", "Ariel", "Belle", "Tiana", "Mulan", "Rapuzel", "Merida", "Moana")) -> PrincessesNames
## Warning in name == c("Snow", "Cinderella", "Jasmine", "Aurora", "Ariel", :
## longer object length is not a multiple of shorter object length

Now we can use that variable to visualize it in a graph.

PrincessesNames %>%
  ggplot(aes(year, n, color = name)) +
  geom_line() +
  ggtitle("Popularity of Disney Princess Names in the US") +
  ylab("Popularity") +
  xlab("Year")


The graph of the Popularity of Disney Princess Names in the US show that the names Ariel and Jasmine are clearly the most popular princess names. As already determined, the popularity of the name Jasmine was a direct influence of the Disney film, but the sudden popularity for the name Ariel was. In summary from the individual graphs the names Snow and Aurora were not impacted by the release of the Disney Princess films. The names Cinderella, Belle, Mulan, Tiana, and Rapunzel were effected by the release of their respective Disney film, but did not cause the most significant increase in the name. Finally, the names Ariel, Jasmine, and Moana were the most impacted by the release of their films and had the most names in the recent years following the release of their films.
Explanation for the Increase
It also makes sense that in some cases the most popular year for a specific name is not the year immediately following the release of the movie because a pregnancy lasts nine months. If an individual watches one of the Disney princess movies, falls in love with the name of that princess, and wants to name their future child that, but isn’t pregnant, then they will not have the opportunity to name their child that for at least nine months. It could take a couple years to get pregnant and carry the baby before having the opportunity to name their baby. Therefore, when there is a clear increase in being named after a Princess for the following few years after the movie is released.
Limitations and Further Investigations
A limitation to this data is that it stops in the year 2017. We were unable to explore multiple effects of Disney names such as the longer term effects of the name Moana, the effect made by more recent Disney remake releases (such as the live action Aladdin), and the 13th unofficial Disney princess Raya that was released in 2021. There could also be further explorations into similar effects of the popularity of Disney Princess names on US names. One exploration could be a further exploration into the names that were not effected by the release of their movie or effected, but not the most and investigate what other causes increase the popularity of these names. For example, the first use of the name Mulan was after the release of the movie, but the years 2012-2017 were the six most popular years for that name. What caused the increase after the release of the film?