I downloaded the popular_baby_names data set from data.gov which is based on baby names collected in the city of New York. The access and use data specifically states that is intended for public use. Popular_Baby_Names
baby_names <- read.csv(file = "/Users/brettmcgillivary/Documents/Coursework/R save files/Popular_Baby_Names.csv", header=TRUE, sep=",")
I changed some header names to make columns easier to read.
names(baby_names)[1] <- c('Birth_Year')
names(baby_names)[4] <- c('Child_First_Name')
| Birth_Year | Gender | Ethnicity | Child_First_Name | Count | Rank | |
|---|---|---|---|---|---|---|
| 19413 | 2015 | MALE | WHITE NON HISPANIC | Joel | 32 | 72 |
| 19414 | 2016 | FEMALE | HISPANIC | Alayna | 10 | 74 |
| 19415 | 2015 | FEMALE | HISPANIC | Yaritza | 12 | 79 |
| 19416 | 2015 | MALE | WHITE NON HISPANIC | Mendel | 42 | 64 |
| 19417 | 2016 | MALE | ASIAN AND PACIFIC ISLANDER | Isaac | 21 | 48 |
| 19418 | 2015 | FEMALE | WHITE NON HISPANIC | Alessia | 12 | 81 |
kable(baby_names[0:20, 1:6]) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| Birth_Year | Gender | Ethnicity | Child_First_Name | Count | Rank |
|---|---|---|---|---|---|
| 2011 | FEMALE | HISPANIC | GERALDINE | 13 | 75 |
| 2011 | FEMALE | HISPANIC | GIA | 21 | 67 |
| 2011 | FEMALE | HISPANIC | GIANNA | 49 | 42 |
| 2011 | FEMALE | HISPANIC | GISELLE | 38 | 51 |
| 2011 | FEMALE | HISPANIC | GRACE | 36 | 53 |
| 2011 | FEMALE | HISPANIC | GUADALUPE | 26 | 62 |
| 2011 | FEMALE | HISPANIC | HAILEY | 126 | 8 |
| 2011 | FEMALE | HISPANIC | HALEY | 14 | 74 |
| 2011 | FEMALE | HISPANIC | HANNAH | 17 | 71 |
| 2011 | FEMALE | HISPANIC | HAYLEE | 17 | 71 |
| 2011 | FEMALE | HISPANIC | HAYLEY | 13 | 75 |
| 2011 | FEMALE | HISPANIC | HAZEL | 10 | 78 |
| 2011 | FEMALE | HISPANIC | HEAVEN | 15 | 73 |
| 2011 | FEMALE | HISPANIC | HEIDI | 15 | 73 |
| 2011 | FEMALE | HISPANIC | HEIDY | 16 | 72 |
| 2011 | FEMALE | HISPANIC | HELEN | 13 | 75 |
| 2011 | FEMALE | HISPANIC | IMANI | 11 | 77 |
| 2011 | FEMALE | HISPANIC | INGRID | 11 | 77 |
| 2011 | FEMALE | HISPANIC | IRENE | 11 | 77 |
| 2011 | FEMALE | HISPANIC | IRIS | 10 | 78 |
df1 = baby_names %>%
group_by(Birth_Year,Gender) %>% count( ) %>% ungroup() %>%
rename('Count' = n)
ggplot(df1,aes(x = Birth_Year,y = Count, fill = Gender))+
ggtitle("Count vs Birth_Year")+
xlab("Year") + ylab("Count")+
theme(
plot.title = element_text(color="red", size=14, face="bold"),
axis.title.x = element_text(color="blue", size=14, face="bold"),
axis.title.y = element_text(color="#993333", size=14, face="bold"))+
geom_bar(stat='identity',position = position_dodge(width = 0.9))
I found it interesting that there was a sharp decline in birth rates starting in 2015 and that generally speaking the ratio of males to females in the city is pretty even.
df2 = baby_names %>%
group_by( Birth_Year,Ethnicity) %>% count() %>% ungroup() %>% as.data.frame()
df2$Ethnicity = as.character(df2$Ethnicity)
df2 = df2 %>%
mutate(Ethnicity = ifelse(Ethnicity=='ASIAN AND PACI', 'ASIAN AND PACIFIC ISLANDER', Ethnicity)) %>%
mutate(Ethnicity = ifelse(Ethnicity=='BLACK NON HISP', 'BLACK NON HISPANIC', Ethnicity)) %>%
mutate(Ethnicity = ifelse(Ethnicity=='WHITE NON HISP', 'WHITE NON HISPANIC', Ethnicity))
ggplot(df2, aes(x = Birth_Year, y = n , color = Ethnicity)) +
ggtitle("Birth_Year vs Count with Ethnicity dimension")+
xlab("Year") + ylab("Count")+
geom_line()
It is interesting that the decline in birth rates was even across ethnicities. Researching for relevant news I found multiple articles that attributed the decline in birth rates to a marked reduction in teen pregnancies and an increase in “induced terminations”
It goes without saying there were multiple issues with plots. I found styling the plots far easier that creating the plots themselves. I would say it took me about 3 days to complete the line plot until because I couldn’t figure a way to convert the ethnicity data so I could mutate it. In the original data, each ethnicity has two groups, for example (“ASIAN AND PACIFIC ISLANDER” AND “ASIAN PAND PAC”) both referenced the same group. Eventually, I was able to convert them to characters and then mutate the names so they matched.