The data set baby names provides all of the names for babies in the United States from the years 1880-2017. Using this resource I decided to explore the impact of Disney Princess films on the popularity of names in the US. I focused on the original 12 Disney Princesses according to the canonically accepted list the criteria of which is the following: a female protagonist, with some kind of royalty tie in their fictional universe.
Let’s begin by loading the necessary packages in R.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(babynames)
The first step is to analyze the popularity for each princess’s name individually. We will begin with Snow White and continue chronologically.
Snow White (1920)
babynames %>%
filter(name == "Snow") %>%
ggplot(aes(year, n, sex = "F", color = name)) +
geom_line(color="blue2") +
ggtitle("Popularity of the name Snow in US") +
geom_vline(xintercept = 1937, color = "black", linetype = "dotted")
As the chart shows, there was not a significant increase in the
name Snow following the release of the Disney film ‘Snow White and the
Seven Dwarfs’ (which is represented by the dotted line near). The movie
was released in 1937, which marks the beginning of a steady increase for
the name Snow and it is is possible that the film contributed to that
smaller increase. The clearer increase it in recent years after the year
2000, but it is unclear the cause of that increase. Since this was the
first Disney Princess film, Disney did not have the popularity that it
does today and therefore not as significant of an influence over the
population. We can confirm each of these assuptions on the graph by
looking at the values in a table.
babynames %>%
filter(name == 'Snow', sex == "F") %>%
arrange(desc(n))
## # A tibble: 49 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 2014 F Snow 77 0.0000394
## 2 2013 F Snow 59 0.0000307
## 3 2015 F Snow 59 0.0000303
## 4 2017 F Snow 59 0.0000315
## 5 2016 F Snow 55 0.0000285
## 6 2012 F Snow 35 0.0000181
## 7 2006 F Snow 17 0.00000814
## 8 2008 F Snow 17 0.00000817
## 9 2010 F Snow 15 0.00000766
## 10 2009 F Snow 14 0.00000692
## # … with 39 more rows
The table confirms that the most popular years for the name Snow were from 2008 to 2017. It is unclear what lead to the name becoming so popular in recent years, but it is safe to say that the animated film was not the most significant factor in the popularity of this name.
The next name we will examine is Cinderella, and we can repeat the same process for each of the following princess names.
Cinderella (1950)
babynames %>%
filter(name == 'Cinderella') %>%
ggplot(aes(year, n, color = name)) +
geom_line(color="deepskyblue1") +
ggtitle("Popularity of the Name Cinderella in the US") +
geom_vline(xintercept = 1950, color = "black", linetype = "dotted") +
geom_vline(xintercept = 1922, color ="gray", linetype = "dotted")
The graph shows that name Cinderella has always varied in
popularity. The popular Disney film that people associate with
Cinderella was released in 1950, but there was actually one before that
released in 1922. We can see a clear increase in the years following the
release of both of the films. The 1922 version made the name slightly
more popular, but it is clear that the movies had an impact on the
popularity of the name. Additionally, there have been countless remakes
using the popular “Cinderella trope” about a rags to riches story that
could contribute to the continued popularity of the name.
babynames %>%
filter(name == 'Cinderella') %>%
arrange(desc(n))
## # A tibble: 82 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1922 F Cinderella 25 0.0000200
## 2 1921 F Cinderella 23 0.0000180
## 3 1951 F Cinderella 23 0.0000124
## 4 1927 F Cinderella 21 0.0000170
## 5 1952 F Cinderella 20 0.0000105
## 6 1950 F Cinderella 19 0.0000108
## 7 1924 F Cinderella 18 0.0000139
## 8 1958 F Cinderella 18 0.00000872
## 9 1954 F Cinderella 17 0.00000854
## 10 1917 F Cinderella 16 0.0000142
## # … with 72 more rows
The table proves that the most popular years for the name Cinderella are in the years following the release of both movies. But it is also important to consider the origin of this name, so we will load a table to see when the name was first used.
babynames %>%
filter(name == 'Cinderella')
## # A tibble: 82 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1894 F Cinderella 7 0.0000297
## 2 1897 F Cinderella 5 0.0000201
## 3 1900 F Cinderella 5 0.0000157
## 4 1904 F Cinderella 8 0.0000274
## 5 1907 F Cinderella 6 0.0000178
## 6 1911 F Cinderella 7 0.0000158
## 7 1912 F Cinderella 5 0.00000852
## 8 1913 F Cinderella 11 0.0000168
## 9 1914 F Cinderella 6 0.00000753
## 10 1915 F Cinderella 11 0.0000107
## # … with 72 more rows
It can also be noted that the first popular use of the name Cinderella was in the mid 1890s. The original story of Cinderella was written by the Grimm brothers in 1820, so it is possible that the story impacted the name Cinderella, but it would have been about a 70 year delay.
Aurora (1959)
babynames %>%
filter(name == "Aurora") %>%
ggplot(aes(year, n, sex = "F", color = name, year)) +
geom_line(color= "deeppink1") +
ggtitle("Popularity of the name Aurora in US") +
geom_vline(xintercept = 1959, color = "black", linetype = "dotted")
The original Sleeping Beauty Disney film came out in 1959. Even
though that was not a significantly popular year for the name, there is
still a possibility that the Disney story effected this names
popularity. The largest increase is in the more recent years leading up
to 2017. In 2014 the movie Maleficent, that was based on the Sleeping
Beauty story, came out, which could have had an impact on the recent
popularity of the name.
babynames %>%
filter(name == 'Aurora') %>%
arrange(desc(n)) %>%
head(20)
## # A tibble: 20 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 2017 F Aurora 4573 0.00244
## 2 2016 F Aurora 3983 0.00207
## 3 2015 F Aurora 3639 0.00187
## 4 2014 F Aurora 2739 0.00140
## 5 2013 F Aurora 2127 0.00111
## 6 2012 F Aurora 1903 0.000983
## 7 2011 F Aurora 1724 0.000891
## 8 2010 F Aurora 1527 0.000780
## 9 2009 F Aurora 1487 0.000735
## 10 2008 F Aurora 1173 0.000564
## 11 2006 F Aurora 1067 0.000511
## 12 2007 F Aurora 1066 0.000504
## 13 2004 F Aurora 989 0.000490
## 14 2005 F Aurora 950 0.000468
## 15 2003 F Aurora 810 0.000404
## 16 2001 F Aurora 684 0.000345
## 17 2002 F Aurora 664 0.000336
## 18 2000 F Aurora 557 0.000279
## 19 1999 F Aurora 520 0.000267
## 20 1998 F Aurora 503 0.000260
The table shows that the name has had an increase in popularity in recent years. I expanded the most popular list to 20 to see that before the steady increase since the year 2000, the late 1920s were the most popular for the name Aurora. It is unclear why the 1920s had an increase in the name’s popularity. I believe that a possible reason the movie did not have a significant impact on the name was because the main character Aurora is mainly referred to as Sleeping Beauty for a majority of the film and is in very small amount of the film.
Ariel (1989)
babynames %>%
filter(name == 'Ariel') %>%
ggplot(aes(year, n, color = name)) +
geom_line(color= "brown1") +
ggtitle("Popularity of the name Ariel in US") +
geom_vline(xintercept = 1989, color = "black", linetype = "dotted")
The graph shows that there is a significant increase in the name
Ariel following the release of the movie. The Disney animated film “The
Little Mermaid” was released in 1989, and since the name was practically
non-existent before the film, it is clear that the movie had a
significant influence on the name Ariel.
babynames %>%
filter(name == 'Ariel') %>%
arrange(desc(n))
## # A tibble: 208 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1991 F Ariel 5410 0.00266
## 2 1992 F Ariel 3960 0.00198
## 3 1990 F Ariel 3606 0.00176
## 4 1993 F Ariel 2708 0.00137
## 5 2014 F Ariel 2434 0.00125
## 6 2015 F Ariel 2344 0.00120
## 7 1997 F Ariel 2212 0.00116
## 8 1994 F Ariel 2187 0.00112
## 9 2016 F Ariel 2185 0.00113
## 10 1995 F Ariel 2149 0.00112
## # … with 198 more rows
According to the table the most popular time for the name Ariel was in the four years following the release of the movie proving that the film had an extremely strong influence over the name Ariel.
Belle (1991)
babynames %>%
filter(name == 'Belle') %>%
ggplot(aes(year, n, color = name)) +
geom_line(color= "darkgoldenrod1" ) +
ggtitle("Popularity of the name Belle in US") +
geom_vline(xintercept = 1991, color = "black", linetype = "dotted")+
geom_vline(xintercept = 2017, color = "gray", linetype = "dotted")
As it can be seen on the graph there were a significant number of
babies named Belle in the 1880s, with the largest increase right before
the 1920s. The Disney animated film “Beauty and the Beast” was released
in 1991, which appears to create another increase in the name’s
popularity. Considering the name had been on a decrease since the peak
before the 1920s, I believe that the film had an impact on the name’s
popularity, even if it did not create the largest increase in
popularity. There was also a live action remake of “Beauty and the
Beast” released in 2017, that could have been a factor to the final
increase. However, we would need to have more information about baby
names past 2017 in order to see that trend.
babynames %>%
filter(name == 'Belle') %>%
arrange(desc(n)) %>%
head(30)
## # A tibble: 30 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1915 F Belle 407 0.000398
## 2 1914 F Belle 353 0.000443
## 3 2017 F Belle 335 0.000179
## 4 1916 F Belle 334 0.000308
## 5 1888 F Belle 317 0.00167
## 6 1917 F Belle 314 0.000279
## 7 1890 F Belle 304 0.00151
## 8 2016 F Belle 293 0.000152
## 9 1900 F Belle 291 0.000916
## 10 1884 F Belle 286 0.00208
## # … with 20 more rows
The name Belle is one of the older names being studied from this Disney set. According to the table the most popular time to use the name Belle was around 1915, therefore the Disney movie did not have the most impact on this name, but it is possible that the live action movie have been an influence in its recent popularity.
Jasmine (1992)
babynames %>%
filter(name == "Jasmine") %>%
ggplot(aes(year, n, sex = "F", color = name)) +
geom_line(color="cyan3") +
ggtitle("Popularity of the name Jasmine in US") +
geom_vline(xintercept = 1992, color = "black", linetype = "dotted")
From the graph we can see that following the release of the movie
‘Aladdin’ in 1992 there was an increase in the name Jasmine. However, we
can also see that there was a steady increase in the name before the
release of the movie, so the Disney movie was not the only cause of this
increase. Therefore it is clear that the popularity of the name Jasmine
was not created by the Disney film, but was certainly effected by
it.
babynames %>%
filter(name == 'Jasmine', sex == "F") %>%
arrange(desc(n))
## # A tibble: 84 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1993 F Jasmine 12060 0.00612
## 2 1994 F Jasmine 11711 0.00601
## 3 1991 F Jasmine 11524 0.00567
## 4 1990 F Jasmine 11035 0.00537
## 5 1992 F Jasmine 10475 0.00523
## 6 1995 F Jasmine 10279 0.00535
## 7 1996 F Jasmine 9708 0.00506
## 8 1997 F Jasmine 9678 0.00507
## 9 1989 F Jasmine 9549 0.00479
## 10 1998 F Jasmine 9484 0.00489
## # … with 74 more rows
As seen in the table the two years following the release of the movie were the two most popular years for the name Jasmine.
Mulan (1998) The first use of the name Mulan was in 1998, which was the same year that the Disney film was released.
babynames %>%
filter(name == 'Mulan') %>%
ggplot(aes(year, n, color = name)) +
geom_line(color="darkolivegreen") +
ggtitle("Popularity of the name Mulan in US") +
geom_vline(xintercept = 1998, color = "black", linetype = "dotted")
The graph shows that there was no use of the name Mulan before the
release of the film in 1998. It can also be seen that there has been an
increase in the name’s popularity since about 2010. Even though there
have been more popular years for the name Mulan since the film was
released, it is still clear that the Disney movie had an impact because
it essentially created the name’s popularity.
babynames %>%
filter(name == 'Mulan') %>%
arrange(desc(n))
## # A tibble: 19 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 2016 F Mulan 32 0.0000166
## 2 2017 F Mulan 29 0.0000155
## 3 2013 F Mulan 27 0.0000140
## 4 2014 F Mulan 26 0.0000133
## 5 2012 F Mulan 23 0.0000119
## 6 2015 F Mulan 23 0.0000118
## 7 1998 F Mulan 16 0.00000826
## 8 2009 F Mulan 11 0.00000544
## 9 2010 F Mulan 11 0.00000562
## 10 2006 F Mulan 10 0.00000479
## 11 2001 F Mulan 8 0.00000404
## 12 2008 F Mulan 8 0.00000384
## 13 2011 F Mulan 8 0.00000413
## 14 1999 F Mulan 7 0.0000036
## 15 2000 F Mulan 6 0.00000301
## 16 2002 F Mulan 6 0.00000304
## 17 1999 M Mulan 5 0.00000245
## 18 2005 F Mulan 5 0.00000247
## 19 2007 F Mulan 5 0.00000236
According to the table 2016 and 2017 were the most popular years for the name Mulan. It is unclear the origin of the increase in the name. There was a live action remake by Disney for Mulan, but that was released in 2020 which lies outside of the range for this baby names data set. The seventh most popular year for the name was the year that the film was released, and as we saw on the graph there were no uses for this name before the release of the film.
Pocahontas (1995)
babynames %>%
filter(name == 'Pocahantas')
## # A tibble: 0 × 5
## # … with 5 variables: year <dbl>, sex <chr>, name <chr>, n <int>, prop <dbl>
There was no one named Pocahontas from this data set and therefore the Disney movie had no effect on the popularity of the name.
Tiana (2009)
babynames %>%
filter(name == 'Tiana') %>%
ggplot(aes(year, n, color = name)) +
geom_line(color="chartreuse2") +
ggtitle("Popularity of the name Tiana in US") +
geom_vline(xintercept = 2009, color = "black", linetype = "dotted")
From the graph it can be seen that the most popular years for the
name Tiana were in the years leading up to 2000, but the Disney animated
film “The Princess and the Frog” was released in 2009. It is unclear the
reason behind the name’s popularity in the years prior to 2000, but
following that there was a decrease until the film was released.
Therefore the film still had an impact on the popularity of the name
Tiana because after the film was released it created an increase in the
name that had been decreasing for about 10 years.
babynames %>%
filter(name == 'Tiana') %>%
arrange(desc(n))
## # A tibble: 65 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1998 F Tiana 1028 0.000530
## 2 1995 F Tiana 1025 0.000534
## 3 2010 F Tiana 964 0.000492
## 4 1994 F Tiana 948 0.000486
## 5 1999 F Tiana 939 0.000482
## 6 1996 F Tiana 937 0.000489
## 7 1997 F Tiana 937 0.000491
## 8 2000 F Tiana 887 0.000445
## 9 2001 F Tiana 865 0.000437
## 10 1993 F Tiana 825 0.000419
## # … with 55 more rows
According to the table the third most popular year for the name Tiana was the same year that the Disney movie was released. Even though that was not the most popular year for the name Tiana, it can still be concluded that the movie had an impact on the name because it changed the trend from decreasing to increasing.
Rapunzel (2010)
babynames %>%
filter(name == 'Rapunzel') %>%
ggplot(aes(year, n, color = name)) +
geom_line(color="orchid") +
ggtitle("Popularity of the name Rapunzel in US") +
geom_vline(xintercept = 2010, color = "black", linetype = "dotted")
The name Rapunzel was clearly not popular. The original story was
released by the Grimm brothers in the 1820s and the popular Disney film
“Tangled” based on the Rapunzel story was released in 2010. However none
of those dates are impacted on the graph and it can be concluded that
the Disney film most likely did not impact the popularity of the name.
But we should look at a table to more clearly see when the name was
used.
babynames %>%
filter(name == 'Rapunzel') %>%
arrange(desc(n))
## # A tibble: 2 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1959 F Rapunzel 9 0.00000433
## 2 2017 F Rapunzel 6 0.0000032
The table shows that the only years with the name Rapunzel were in 1959 and 2017. It is possible that the 2010 film was a factor for the sudden use of the name in 2017, but if that is the reason, is unclear why that would occur in 2017 and not closer to the release of the film.
Merida (2012)
babynames %>%
filter(name == 'Merida') %>%
ggplot(aes(year, n, color = name)) +
geom_line(color = "darkcyan") +
ggtitle("Popularity of the name Merida in US") +
geom_vline(xintercept = 2012, color = "black", linetype = "dotted")
The name Merida shows a significant increase in recent years because the
Disney Pixar movie “Brave” was released in 2012. As the graph shows
there were very few uses of the name until the release of the film
created the steepest increase we’ve seen so far.
babynames %>%
filter(name == 'Merida') %>%
arrange(desc(n))
## # A tibble: 58 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 2017 F Merida 116 0.0000619
## 2 2013 F Merida 110 0.0000572
## 3 2014 F Merida 103 0.0000528
## 4 2016 F Merida 101 0.0000524
## 5 2015 F Merida 99 0.0000509
## 6 2012 F Merida 19 0.00000981
## 7 1949 F Merida 13 0.00000741
## 8 1957 F Merida 12 0.00000572
## 9 1943 F Merida 11 0.00000766
## 10 1968 F Merida 11 0.00000643
## # … with 48 more rows
The table shows that the most common years for the name Merida were in the years following the release fo the film. It is logical to conclude that the Disney film had a significant impact on the popularity of the name Merida.
Moana (2016)
babynames %>%
filter(name == 'Moana') %>%
ggplot(aes(year, n, color = name)) +
geom_line(color = "steelblue1") +
ggtitle("Popularity of the name Moana in US") +
geom_vline(xintercept = 2016, color = "black", linetype = "dotted")
The graph shows that there is a significant increase in the name Moana
in the recent years because the Disney movie was released in 2016. There
is a limitation to this data because this movie was released in 2016 and
the baby names data stops in 2017.
babynames %>%
filter(name == 'Moana') %>%
arrange(desc(n))
## # A tibble: 60 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 2017 F Moana 141 0.0000752
## 2 2016 F Moana 57 0.0000296
## 3 2008 F Moana 19 0.00000913
## 4 2013 F Moana 19 0.00000988
## 5 2015 F Moana 18 0.00000925
## 6 2006 F Moana 16 0.00000766
## 7 2014 F Moana 16 0.0000082
## 8 1977 F Moana 13 0.0000079
## 9 2000 F Moana 13 0.00000652
## 10 1970 F Moana 12 0.00000655
## # … with 50 more rows
The most popular year for the name Moana was in 2017, the year following the release of the movie. Similarly to the name Merida, it is a fairly uncommon name in the US before the movie it is easy to see the impact that the movie had on the popularity of the name.
All Princesses Now that we’ve looked at each princess individually, we will look at the effects of all the Disney princesses names. The first step is to create a variable for all of the princess names, excluding Pocahontas because there were no records with that name.
babynames %>%
filter(name == c("Snow", "Cinderella", "Jasmine", "Aurora", "Ariel", "Belle", "Tiana", "Mulan", "Rapuzel", "Merida", "Moana")) -> PrincessesNames
## Warning in name == c("Snow", "Cinderella", "Jasmine", "Aurora", "Ariel", :
## longer object length is not a multiple of shorter object length
Now we can use that variable to visualize it in a graph.
PrincessesNames %>%
ggplot(aes(year, n, color = name)) +
geom_line() +
ggtitle("Popularity of Disney Princess Names in the US") +
ylab("Popularity") +
xlab("Year")
The graph of the Popularity of Disney Princess Names in the US show
that the names Ariel and Jasmine are clearly the most popular princess
names. As already determined, the popularity of the name Jasmine was a
direct influence of the Disney film, but the sudden popularity for the
name Ariel was. In summary from the individual graphs the names Snow and
Aurora were not impacted by the release of the Disney Princess films.
The names Cinderella, Belle, Mulan, Tiana, and Rapunzel were effected by
the release of their respective Disney film, but did not cause the most
significant increase in the name. Finally, the names Ariel, Jasmine, and
Moana were the most impacted by the release of their films and had the
most names in the recent years following the release of their films.
Explanation for the Increase
It also makes
sense that in some cases the most popular year for a specific name is
not the year immediately following the release of the movie because a
pregnancy lasts nine months. If an individual watches one of the Disney
princess movies, falls in love with the name of that princess, and wants
to name their future child that, but isn’t pregnant, then they will not
have the opportunity to name their child that for at least nine months.
It could take a couple years to get pregnant and carry the baby before
having the opportunity to name their baby. Therefore, when there is a
clear increase in being named after a Princess for the following few
years after the movie is released.
Limitations and Further
Investigations
A limitation to this data is that it stops
in the year 2017. We were unable to explore multiple effects of Disney
names such as the longer term effects of the name Moana, the effect made
by more recent Disney remake releases (such as the live action Aladdin),
and the 13th unofficial Disney princess Raya that was released in 2021.
There could also be further explorations into similar effects of the
popularity of Disney Princess names on US names. One exploration could
be a further exploration into the names that were not effected by the
release of their movie or effected, but not the most and investigate
what other causes increase the popularity of these names. For example,
the first use of the name Mulan was after the release of the movie, but
the years 2012-2017 were the six most popular years for that name. What
caused the increase after the release of the film?