R Notebook

An analysis of the name Samantha as it relates to the person and popularity of the name. Names are pulled from the Social Security Administration from 1980 to 2015+. (Semi-witty familial annotations included and to be disregarded if it affects Samantha’s grade.)

library(tidyverse)

## ── Attaching packages ───────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  2.0.0     ✔ dplyr   0.7.8
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.3.1     ✔ forcats 0.3.0

## ── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(wordcloud2)
library(babynames)

babynames %>%                             
  filter(year == 1988, sex == "F") %>%    
  mutate(rank = row_number()) %>%         
  mutate(percent = round(prop * 100, 1)) %>% 
  filter(name == "Samantha")

The name Samantha ranked based off of year 1988. 8th most popular name with 1.1% of females being named Samantha in that year.

babynames %>%
  filter(year == 1988) %>%     # use only one year
  filter(sex == "F") %>%       # use only one sex
  select(name, n) %>%          # select the two relevant variables: the name and how often it occurs
  top_n(100, n)  %>%           # use only the top names or it could get too big
  wordcloud2(size = .5, shape ="triangle") + WCtheme(1)    # generate the word cloud and add theme 1, 2, or 3

Visual representation of the top 100 baby names for females in 1988. The greater the size of the name the more common it was for the time. The name Samantha being relatively common and not the most sizable contribution to female names.

babynames %>%                                    # start with the data
  filter(name == "Matthew", sex == "M") %>%      # choose the name and sex
  mutate(percent = round(prop * 100, 1)) %>%     # create a new variable called percent
  ggplot(aes(x = year, y = percent)) +           # put year on the x-axis and prop (proportion) on y
  geom_line(color = "blue")                      # make it a line graph and give the line a color

Samantha became popular starting in the 1960s coincidentally around (perhaps a result of) the time the T.V. show Bewitched aired which was 1964. Though on the rise throughout the 1970s, it gained immense poularity towards the end of the 1980s into the first part of the 1990s. As a result of my folks’ attraction to the show and craft demeanor of its main character my name was thrown into the popular ranks of the time. Thank you Mom and Dad.

babynames %>%                                  # Start with the dataset
  filter(name == "Samantha", sex == "F") %>%    # only look at the name and sex you want
  top_n(120, prop) %>%                          # get the top 10 names
  mutate(percent = round(prop * 100, 1)) %>% 
  arrange(-prop)                               # sort in descending order

Year in which Samantha was most popular in descending order.

babynames %>%
  filter(name == "Samantha" | name == "Brittney" | name == "Cindy") %>% 
  filter(sex == "F") %>% 
  ggplot(aes(x = year, y = n, color = name)) +
  geom_line()

Samantha versus Cindy and Brittney achieved greater popularity in both duration and number. (Proving once and for all that she is the favorite amongst her siblings.) *Note that the spelling of Brittney has mutlitple renditions and that Cindy is the nickname for Cynthia, which may or may not have skewed the results in Samantha’s favor.