Introduction:
My friend who has been working at the nursery for quite some time now, mentioned to me that in her class of 100 kids, none of them is called “Anthony”. I was very sad, since to me, the name Anthony is very popular. Or was she just kidding with me?
Method:
I was curious, so I aggregated 258000 names from the data made available from the social security administration from 1880 to 2008.
Loading the data in R
Here is a summary of how the data looks like:
summary(bnames_csv)
## year name prop sex
## Min. :1880 Length:258000 Min. :0.0000260 Length:258000
## 1st Qu.:1912 Class :character 1st Qu.:0.0000810 Class :character
## Median :1944 Mode :character Median :0.0001640 Mode :character
## Mean :1944 Mean :0.0008945
## 3rd Qu.:1976 3rd Qu.:0.0005070
## Max. :2008 Max. :0.0815410
## soundex
## Length:258000
## Class :character
## Mode :character
##
##
##
From the summary, I am interested in finding out when the name Anthony was at its peak. I used below the code to sort the proportion when the name Anthony was at its peak
Anthony<-bnames_csv[bnames_csv$name=="Anthony",]
b1<-arrange(Anthony,desc(prop))
head(b1)
## # A tibble: 6 x 5
## year name prop sex soundex
## <int> <chr> <dbl> <chr> <chr>
## 1 1987 Anthony 0.011862 boy A535
## 2 1988 Anthony 0.011810 boy A535
## 3 1967 Anthony 0.011739 boy A535
## 4 1990 Anthony 0.011662 boy A535
## 5 1989 Anthony 0.011660 boy A535
## 6 1991 Anthony 0.011540 boy A535
b2<-arrange(Anthony,)
From the above,the probability of the name Anthony was at its peak in the 1980’s to 1990’s. Is my friend’s observation correct? What might be the cause, has the nurses been naming females “Anthony”? Let me check visually with the code below:
qplot(year,prop,data=Anthony, geom = "point", color= interaction(sex,name))
Conclusion:
From the graph above, there was negligible number of females with the name Anthony in the 1990’s. Also, the name Anthony was at its peak in the 1990’s but started declining in the early 2000’s. My friend is right, the name Anthony is not as popular as it used to be. Why has the name been declining? I couldn’t get data to tackle this question. However, there might be so many factors beyond my control that might contribute to the decline of the name Anthony.
PS
From the code below: The famous boy and girl name in the data were:
b1<- arrange(bnames_csv,sex=="boy",desc(prop))
b2<- arrange(bnames_csv,sex=="girl",desc(prop))
head(b1)
## # A tibble: 6 x 5
## year name prop sex soundex
## <int> <chr> <dbl> <chr> <chr>
## 1 1880 Mary 0.072381 girl M600
## 2 1882 Mary 0.070431 girl M600
## 3 1881 Mary 0.069986 girl M600
## 4 1884 Mary 0.066990 girl M600
## 5 1883 Mary 0.066737 girl M600
## 6 1886 Mary 0.064334 girl M600
head(b2)
## # A tibble: 6 x 5
## year name prop sex soundex
## <int> <chr> <dbl> <chr> <chr>
## 1 1880 John 0.081541 boy J500
## 2 1881 John 0.080975 boy J500
## 3 1880 William 0.080511 boy W450
## 4 1883 John 0.079066 boy J500
## 5 1881 William 0.078712 boy W450
## 6 1882 John 0.078314 boy J500
Also in summary from the code below:
summarise(Anthony, min=min(prop),max=max(prop),mean(prop))
## # A tibble: 1 x 3
## min max `mean(prop)`
## <dbl> <dbl> <dbl>
## 1 5.9e-05 0.011862 0.005338392
Eventhough, the name Anthony peaked in the 1990’s, less than 1% were name Anthony. That is fine, I now get it that Anthony is not a common name.