In the late 90s, I was a Marketing Executive for an instant beverage company, which introduced new brands to meet consumer demand. One of the biggest challenges was to give the new products appropriate names. Then, my boss always told me to have the letter ‘M’ as the first letter because, McDonalds, Michael Jackson and Madonna were famous names to him.
Then, I did not have enough data science knowledge not to reject or reject my boss hypothesis - ‘Famous brands/names start with the letter M’. Now, I have scrapped some data from TheFamousPeople, to create some famous brand names.
library(tidyverse)
famous <- read_csv('Famous names.csv')
famous %>% select(First_name) %>%
mutate(first_letter = str_sub(First_name, 1, 1)) %>%
drop_na() %>%
count(first_letter) %>%
mutate(first_letter = fct_reorder(first_letter, n, .desc = FALSE)) %>%
top_n(26) %>%
ggplot(aes(first_letter, n)) +
geom_bar(stat = 'identity') +
coord_flip() +
ggtitle('Frequency chart of the first letter for First Name')
## Selecting by n
famous %>% select(First_name) %>%
mutate(last_letter = str_sub(First_name, -1)) %>%
drop_na() %>%
count(last_letter) %>%
mutate(last_letter = fct_reorder(last_letter, n, .desc = FALSE)) %>%
top_n(26) %>%
ggplot(aes(last_letter, n)) +
geom_bar(stat = 'identity') +
coord_flip() +
ggtitle('Frequency chart of the last letter for First Name')
## Selecting by n
famous %>% select(First_name) %>%
mutate(name_length = str_length(First_name)) %>%
drop_na() %>%
count(name_length) %>%
filter(name_length < 20) %>%
ggplot(aes(name_length, n)) +
geom_bar(stat = 'identity') +
ggtitle('Distribution for the length of First Name')
famous %>% select(First_name) %>%
mutate(vowel = str_count(First_name, 'a|e|i|o|u')) %>%
drop_na() %>%
count(vowel) %>%
filter(vowel < 9) %>%
ggplot(aes(vowel, n)) +
geom_bar(stat = 'identity') +
ggtitle('Number of vowels in First Name')
My ex-boss wasn’t completely wrong; the letter M is the 3rd most famous first letter. However, if I am tasked to name an instant beverage, I would name it ‘Jalan’.