Hypothesis: Parents do not choose their baby’s names based on how common a color is or on the color itself.
To test this hypothesis, I chose to analyze the name trends (if any) for basic color names within the babynames dataset (i.e. Red, Orange, Yellow, Green, Blue, Violet) and cross-reference those names with those on a different dataset that lists all shades of color names grouped by the color type itself.
This analysis utilizes R and RStudio as well as the babynames and tidyverse packages that were installed in R. The analysis also utilized the scales package to clean and change the x-axis notation in some of the visuals.
To begin, I wanted to first analyze the popularity, if any, of the basic rainbow colors over time. From this visual, I was able to conclude that baby names based on the basic rainbow colors (i.e. Red, Orange, Yellow, Green, Blue, Violet) have drastically decreased in popularity over time so much that these names have not been popular for quite some time.
library(babynames)
library(tidyverse)
rainbow = c("Red", "Orange", "Yellow", "Green", "Blue", "Violet")
babynames %>%
filter(sex == "M" & name %in% rainbow) -> rainbowNames
ggplot(rainbowNames, aes(year, prop)) + geom_line()
To analyze this aspect further, I found the total number of uses of these color names in the babynames database. This shows a total of only 3,783 instances of these names in the database, and the highest ‘n’ value is 48. This is a very low number of instances.
rainbowNames %>%
summarise(total = sum(n))
## # A tibble: 1 x 1
## total
## <int>
## 1 3783
rainbowNames %>%
arrange(desc(n))
## # A tibble: 268 x 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1927 M Green 48 0.0000413
## 2 1916 M Green 47 0.0000509
## 3 1914 M Green 46 0.0000673
## 4 1886 M Green 45 0.000378
## 5 1920 M Green 44 0.0000400
## 6 1918 M Green 43 0.000041
## 7 1917 M Green 42 0.0000438
## 8 1919 M Green 42 0.0000414
## 9 1921 M Green 42 0.0000369
## 10 1922 M Green 42 0.0000373
## # … with 258 more rows
To prove my hypothesis, I loaded another names dataset from online to cross reference with the baby names set for any overlap. I found the dataset from https://data.world/.
df <- read.csv("https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fquery.data.world%2Fs%2F3hl25k2iqijv37rxwpb4voyofdh5yx&data=02%7C01%7C%7C72ee268ed27e41fa543d08d7b183b853%7Cba18326d711f4ae286816115493a7a53%7C1%7C0%7C637173051592363185&sdata=AQErHo2MRuUQCblZOj4O9oZx2qST2Q09059%2BljsfUBU%3D&reserved=0", header=TRUE, stringsAsFactors=FALSE);
After exporting the dataset into R, I combined the babynames and color names data frames and only kept overlapping results. These results would be incredibly unique to analyze later on. I used the intersect() function to filter any over the overlapping names discussed earlier. I called the new dataset colorsList.
intersect(babynames$name, df$Color.Name) -> colorsList
colorsList
## [1] "Rose" "Myrtle" "Pearl" "Olive" "Ruby" "Violet"
## [7] "Veronica" "Iris" "Amber" "Magnolia" "Cherry" "Jasper"
## [13] "Pink" "Ivory" "Coral" "Lemon" "Almond" "Olivine"
## [19] "Gray" "Red" "Auburn" "Carmine" "Snow" "Ceil"
## [25] "Silver" "Vanilla" "Fawn" "White" "Emerald" "Jasmine"
## [31] "Ginger" "Onyx" "Salmon" "Lion" "Pear" "Orchid"
## [37] "Bubbles" "Ruddy" "Jonquil" "Scarlet" "Jade" "Ebony"
## [43] "Cerise" "Rajah" "Buff" "Capri" "Flame" "Jet"
## [49] "Teal" "Sand" "Beaver" "Sienna" "Tangerine" "Cadet"
## [55] "Cinnamon" "Amethyst" "Blue" "Aqua" "Burgundy" "Indigo"
## [61] "Topaz" "Sepia" "Denim" "Crimson" "Saffron" "Shadow"
## [67] "Azure" "Mahogany" "Sangria" "Tan" "Turquoise" "Champagne"
## [73] "Copper" "Lilac" "Sapphire" "Wisteria" "Beige" "Magenta"
## [79] "Sunset" "Cyan" "Xanadu" "Umber" "Tangelo" "Bronze"
## [85] "Maize" "Platinum" "Harlequin" "Thistle" "Linen" "Cobalt"
## [91] "Amaranth" "Cerulean"
View(colorsList)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org
This intersection gave me a variety of results, but I wanted to narrow down my names even further. To do so, I filtered the names included in colorsList based on the top 15 most popular names. I called this visual chart1.
babynames %>%
filter(name %in% colorsList) %>%
group_by(name, sex) %>%
summarise(total = sum(n)) %>%
arrange(desc(total)) %>%
head(15) %>%
ggplot(aes(reorder(name, total), total)) + geom_col() + coord_flip() -> chart1
chart1
The visual that I got from the code was not super clean. To make it easier to analyze, I changed its appearance. I added a theme display, axes titles, a chart title and got rid of the scientific notation on the x-axis by installing the scales package in R. It is important to note that the x and y axes are flipped because I flipped the columns with the coord_flip() function listed in the code above.
install.packages("scales")
## Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror
library(scales)
chart1 + theme_light() +
xlab("Name") +
ylab("Total Number of Names Over Time") +
ggtitle("Most Popular Baby Names of Colors from 1880-2017") +
scale_y_continuous(labels = comma)
It is clear from these visuals that the color names parents choose to give their children are not necessarily the most popular or based on a common color. To more closely observe this concept as well as compare more closely the male and female names across the time span of 1880-2017, I created name rankings for the year 1880, the year 1949 (1948.5 would be the median year), and the final year of 2017.
babynames %>%
filter(name %in% colorsList) %>%
filter(year == 1880, sex =="F") %>%
arrange(desc(prop)) %>%
mutate(rank = row_number()) %>%
mutate(percent = (prop * 100)) -> rankedColorsF1880
rankedColorsF1880
## # A tibble: 11 x 7
## year sex name n prop rank percent
## <dbl> <chr> <chr> <int> <dbl> <int> <dbl>
## 1 1880 F Rose 700 0.00717 1 0.717
## 2 1880 F Myrtle 615 0.00630 2 0.630
## 3 1880 F Pearl 569 0.00583 3 0.583
## 4 1880 F Olive 224 0.00229 4 0.229
## 5 1880 F Ruby 92 0.000943 5 0.0943
## 6 1880 F Violet 42 0.000430 6 0.0430
## 7 1880 F Veronica 14 0.000143 7 0.0143
## 8 1880 F Iris 11 0.000113 8 0.0113
## 9 1880 F Amber 9 0.0000922 9 0.00922
## 10 1880 F Magnolia 8 0.0000820 10 0.00820
## 11 1880 F Cherry 6 0.0000615 11 0.00615
babynames %>%
filter(name %in% colorsList) %>%
filter(year == 1880, sex =="M") %>%
arrange(desc(prop)) %>%
mutate(rank = row_number()) %>%
mutate(percent = (prop * 100)) -> rankedColorsM1880
rankedColorsM1880
## # A tibble: 7 x 7
## year sex name n prop rank percent
## <dbl> <chr> <chr> <int> <dbl> <int> <dbl>
## 1 1880 M Jasper 98 0.000828 1 0.0828
## 2 1880 M Pearl 62 0.000524 2 0.0524
## 3 1880 M Pink 30 0.000253 3 0.0253
## 4 1880 M Ivory 8 0.0000676 4 0.00676
## 5 1880 M Rose 7 0.0000591 5 0.00591
## 6 1880 M Ruby 6 0.0000507 6 0.00507
## 7 1880 M Myrtle 5 0.0000422 7 0.00422
During the year 1880, the male and female names that were ranked by highest numerical count (n) only showed overlap in the names “Myrtle,” “Pearl” and “Ruby.” This was interseting to note. During 1880, “Myrtle” had the second-higheset count for females. For males, it was the seventh-highest. “Myrtle” is not a common color name. It is simply a unique name that sounded very admirable and likeable amongst expecting parents. The name “Pearl” amongst females has 569 instances, compared to 62 instances for males. “Ruby” had 92 instances for females and 6 instances for males.
In comparing males and females during the year 1880, color names were much more common amongst females than males. The males category only had a list of seven names total. In addition, there was only a fair amount of overlap, but not enough for the names to be considered “popular.”
babynames %>%
filter(name %in% colorsList) %>%
filter(year == 1949, sex =="F") %>%
arrange(desc(prop)) %>%
mutate(rank = row_number()) %>%
mutate(percent = (prop * 100)) -> rankedColorsF1949
rankedColorsF1949
## # A tibble: 20 x 7
## year sex name n prop rank percent
## <dbl> <chr> <chr> <int> <dbl> <int> <dbl>
## 1 1949 F Rose 5370 0.00306 1 0.306
## 2 1949 F Ruby 2576 0.00147 2 0.147
## 3 1949 F Veronica 1424 0.000811 3 0.0811
## 4 1949 F Iris 920 0.000524 4 0.0524
## 5 1949 F Ginger 665 0.000379 5 0.0379
## 6 1949 F Pearl 575 0.000328 6 0.0328
## 7 1949 F Myrtle 523 0.000298 7 0.0298
## 8 1949 F Violet 483 0.000275 8 0.0275
## 9 1949 F Cherry 292 0.000166 9 0.0166
## 10 1949 F Olive 97 0.0000553 10 0.00553
## 11 1949 F Amber 93 0.0000530 11 0.00530
## 12 1949 F Ivory 53 0.0000302 12 0.00302
## 13 1949 F Coral 47 0.0000268 13 0.00268
## 14 1949 F Magnolia 43 0.0000245 14 0.00245
## 15 1949 F Scarlet 29 0.0000165 15 0.00165
## 16 1949 F Fawn 26 0.0000148 16 0.00148
## 17 1949 F Jade 16 0.00000911 17 0.000911
## 18 1949 F Silver 6 0.00000342 18 0.000342
## 19 1949 F Ceil 5 0.00000285 19 0.000285
## 20 1949 F Emerald 5 0.00000285 20 0.000285
babynames %>%
filter(name %in% colorsList) %>%
filter(year == 1949, sex =="M") %>%
arrange(desc(prop)) %>%
mutate(rank = row_number()) %>%
mutate(percent = (prop * 100)) -> rankedColorsM1949
rankedColorsM1949
## # A tibble: 15 x 7
## year sex name n prop rank percent
## <dbl> <chr> <chr> <int> <dbl> <int> <dbl>
## 1 1949 M Jasper 193 0.000107 1 0.0107
## 2 1949 M Carmine 133 0.0000738 2 0.00738
## 3 1949 M Ivory 123 0.0000683 3 0.00683
## 4 1949 M Gray 32 0.0000178 4 0.00178
## 5 1949 M Rose 24 0.0000133 5 0.00133
## 6 1949 M Ruby 23 0.0000128 6 0.00128
## 7 1949 M Pearl 20 0.0000111 7 0.00111
## 8 1949 M Lemon 14 0.00000777 8 0.000777
## 9 1949 M Iris 11 0.0000061 9 0.00061
## 10 1949 M Cherry 10 0.00000555 10 0.000555
## 11 1949 M Auburn 9 0.00000499 11 0.000499
## 12 1949 M Almond 8 0.00000444 12 0.000444
## 13 1949 M Pink 8 0.00000444 13 0.000444
## 14 1949 M Ruddy 7 0.00000388 14 0.000388
## 15 1949 M Coral 6 0.00000333 15 0.000333
During the year 1949, the names “Rose,” “Ruby,” “Iris,” “Pearl,” “Cherry,” “Ivory and”Coral" were the ones that intersected amongst both males and females. None of these names was in the very original list of rainbow names (i.e. Red, Orange, Yellow, Green, Blue, Violet). These names were unique amongst both sexes, even though all of the names that have intersected in the year 1990 listed above were most numerous amongst females.
babynames %>%
filter(name %in% colorsList) %>%
filter(year == 2017, sex =="F") %>%
arrange(desc(prop)) %>%
mutate(rank = row_number()) %>%
mutate(percent = (prop * 100)) -> rankedColorsF2017
rankedColorsF2017
## # A tibble: 49 x 7
## year sex name n prop rank percent
## <dbl> <chr> <chr> <int> <dbl> <int> <dbl>
## 1 2017 F Violet 4699 0.00251 1 0.251
## 2 2017 F Ruby 3540 0.00189 2 0.189
## 3 2017 F Jade 2725 0.00145 3 0.145
## 4 2017 F Jasmine 2256 0.00120 4 0.120
## 5 2017 F Rose 2059 0.00110 5 0.110
## 6 2017 F Iris 1969 0.00105 6 0.105
## 7 2017 F Sienna 1392 0.000742 7 0.0742
## 8 2017 F Olive 1241 0.000662 8 0.0662
## 9 2017 F Veronica 817 0.000436 9 0.0436
## 10 2017 F Magnolia 808 0.000431 10 0.0431
## # … with 39 more rows
babynames %>%
filter(name %in% colorsList) %>%
filter(year == 2017, sex =="M") %>%
arrange(desc(prop)) %>%
mutate(rank = row_number()) %>%
mutate(percent = (prop * 100)) -> rankedColorsM2017
rankedColorsM2017
## # A tibble: 32 x 7
## year sex name n prop rank percent
## <dbl> <chr> <chr> <int> <dbl> <int> <dbl>
## 1 2017 M Jasper 2083 0.00106 1 0.106
## 2 2017 M Onyx 188 0.0000958 2 0.00958
## 3 2017 M Gray 144 0.0000733 3 0.00734
## 4 2017 M Denim 141 0.0000718 4 0.00718
## 5 2017 M Jet 131 0.0000667 5 0.00667
## 6 2017 M Carmine 125 0.0000637 6 0.00637
## 7 2017 M Indigo 52 0.0000265 7 0.00265
## 8 2017 M Ivory 48 0.0000244 8 0.00244
## 9 2017 M Jade 40 0.0000204 9 0.00204
## 10 2017 M Crimson 36 0.0000183 10 0.00183
## # … with 22 more rows
During the year 2017, the name “Jade” was the only that intersected between males and females. This is interesting, considering in years prior, the tables between males and females have resulted in more commmonalities.
Another fascinating aspect to note is - from looking at both tables for 2017, the number of instances of the names for both sexes is the highest it has ever been. We do not see the stereotypically common color names here. In addition, the color names that are at the top of these lists may not be chosen amongst parents because they are based on colors. The names in all of the tables are unique and sound very outside-the-box and unordinary to me.
In looking back to the original list that I compiled when I used the intersect() function for the babynames and imported data set from online, I noticed that the only “rainbow” names that showed up were “Red,” “White,” “Gray” and “Blue.” When keeping these names in mind and looking at the tables for both sexes during 1880, 1949 and 2017, none of these intersected names showed as part of the notable data. This furthermore alludes to the position that the color itself is not significant to parents when choosing a name for their baby.
The popularity of names based on color is not significant to explore here. What matters to parents, when it comes to choosing a name for their baby, is the way the name sounds and fits with the overall family unit. The uniqueness of the names over time is more important to look at than analyzing overall popularity.