Introduction:

My friend who has been working at the nursery for quite some time now, mentioned to me that in her class of 100 kids, none of them is called “Anthony”. I was very sad, since to me, the name Anthony is very popular. Or was she just kidding with me?

Method:

I was curious, so I aggregated 258000 names from the data made available from the social security administration from 1880 to 2008.

Loading the data in R

Here is a summary of how the data looks like:

summary(bnames_csv)
##       year          name                prop               sex           
##  Min.   :1880   Length:258000      Min.   :0.0000260   Length:258000     
##  1st Qu.:1912   Class :character   1st Qu.:0.0000810   Class :character  
##  Median :1944   Mode  :character   Median :0.0001640   Mode  :character  
##  Mean   :1944                      Mean   :0.0008945                     
##  3rd Qu.:1976                      3rd Qu.:0.0005070                     
##  Max.   :2008                      Max.   :0.0815410                     
##    soundex         
##  Length:258000     
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

From the summary, I am interested in finding out when the name Anthony was at its peak. I used below the code to sort the proportion when the name Anthony was at its peak

Anthony<-bnames_csv[bnames_csv$name=="Anthony",]
b1<-arrange(Anthony,desc(prop))
head(b1)
## # A tibble: 6 x 5
##    year    name     prop   sex soundex
##   <int>   <chr>    <dbl> <chr>   <chr>
## 1  1987 Anthony 0.011862   boy    A535
## 2  1988 Anthony 0.011810   boy    A535
## 3  1967 Anthony 0.011739   boy    A535
## 4  1990 Anthony 0.011662   boy    A535
## 5  1989 Anthony 0.011660   boy    A535
## 6  1991 Anthony 0.011540   boy    A535
b2<-arrange(Anthony,)

From the above,the probability of the name Anthony was at its peak in the 1980’s to 1990’s. Is my friend’s observation correct? What might be the cause, has the nurses been naming females “Anthony”? Let me check visually with the code below:

qplot(year,prop,data=Anthony, geom = "point", color= interaction(sex,name))

Conclusion:

From the graph above, there was negligible number of females with the name Anthony in the 1990’s. Also, the name Anthony was at its peak in the 1990’s but started declining in the early 2000’s. My friend is right, the name Anthony is not as popular as it used to be. Why has the name been declining? I couldn’t get data to tackle this question. However, there might be so many factors beyond my control that might contribute to the decline of the name Anthony.

PS

From the code below: The famous boy and girl name in the data were:

b1<- arrange(bnames_csv,sex=="boy",desc(prop))
b2<- arrange(bnames_csv,sex=="girl",desc(prop))
head(b1)
## # A tibble: 6 x 5
##    year  name     prop   sex soundex
##   <int> <chr>    <dbl> <chr>   <chr>
## 1  1880  Mary 0.072381  girl    M600
## 2  1882  Mary 0.070431  girl    M600
## 3  1881  Mary 0.069986  girl    M600
## 4  1884  Mary 0.066990  girl    M600
## 5  1883  Mary 0.066737  girl    M600
## 6  1886  Mary 0.064334  girl    M600
head(b2)
## # A tibble: 6 x 5
##    year    name     prop   sex soundex
##   <int>   <chr>    <dbl> <chr>   <chr>
## 1  1880    John 0.081541   boy    J500
## 2  1881    John 0.080975   boy    J500
## 3  1880 William 0.080511   boy    W450
## 4  1883    John 0.079066   boy    J500
## 5  1881 William 0.078712   boy    W450
## 6  1882    John 0.078314   boy    J500

Also in summary from the code below:

summarise(Anthony, min=min(prop),max=max(prop),mean(prop))
## # A tibble: 1 x 3
##       min      max `mean(prop)`
##     <dbl>    <dbl>        <dbl>
## 1 5.9e-05 0.011862  0.005338392

Eventhough, the name Anthony peaked in the 1990’s, less than 1% were name Anthony. That is fine, I now get it that Anthony is not a common name.