I downloaded the popular_baby_names data set from data.gov which is based on baby names collected in the city of New York. The access and use data specifically states that is intended for public use. Popular_Baby_Names

baby_names <- read.csv(file = "/Users/brettmcgillivary/Documents/Coursework/R save files/Popular_Baby_Names.csv", header=TRUE, sep=",")

I changed some header names to make columns easier to read.

names(baby_names)[1] <- c('Birth_Year')
names(baby_names)[4] <- c('Child_First_Name')

Baby Names (tail) Table

	Birth_Year	Gender	Ethnicity	Child_First_Name	Count	Rank
19413	2015	MALE	WHITE NON HISPANIC	Joel	32	72
19414	2016	FEMALE	HISPANIC	Alayna	10	74
19415	2015	FEMALE	HISPANIC	Yaritza	12	79
19416	2015	MALE	WHITE NON HISPANIC	Mendel	42	64
19417	2016	MALE	ASIAN AND PACIFIC ISLANDER	Isaac	21	48
19418	2015	FEMALE	WHITE NON HISPANIC	Alessia	12	81

Slightly longer printing of data

  kable(baby_names[0:20, 1:6]) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

Birth_Year	Gender	Ethnicity	Child_First_Name	Count	Rank
2011	FEMALE	HISPANIC	GERALDINE	13	75
2011	FEMALE	HISPANIC	GIA	21	67
2011	FEMALE	HISPANIC	GIANNA	49	42
2011	FEMALE	HISPANIC	GISELLE	38	51
2011	FEMALE	HISPANIC	GRACE	36	53
2011	FEMALE	HISPANIC	GUADALUPE	26	62
2011	FEMALE	HISPANIC	HAILEY	126	8
2011	FEMALE	HISPANIC	HALEY	14	74
2011	FEMALE	HISPANIC	HANNAH	17	71
2011	FEMALE	HISPANIC	HAYLEE	17	71
2011	FEMALE	HISPANIC	HAYLEY	13	75
2011	FEMALE	HISPANIC	HAZEL	10	78
2011	FEMALE	HISPANIC	HEAVEN	15	73
2011	FEMALE	HISPANIC	HEIDI	15	73
2011	FEMALE	HISPANIC	HEIDY	16	72
2011	FEMALE	HISPANIC	HELEN	13	75
2011	FEMALE	HISPANIC	IMANI	11	77
2011	FEMALE	HISPANIC	INGRID	11	77
2011	FEMALE	HISPANIC	IRENE	11	77
2011	FEMALE	HISPANIC	IRIS	10	78

Birth Year vs Gender - Bar Chart

df1 = baby_names %>% 
  group_by(Birth_Year,Gender) %>% count( ) %>% ungroup() %>% 
  rename('Count' = n)

ggplot(df1,aes(x = Birth_Year,y = Count, fill = Gender))+
  ggtitle("Count vs Birth_Year")+
  xlab("Year") + ylab("Count")+
  theme(
    plot.title = element_text(color="red", size=14, face="bold"),
    axis.title.x = element_text(color="blue", size=14, face="bold"),
    axis.title.y = element_text(color="#993333", size=14, face="bold"))+
  geom_bar(stat='identity',position = position_dodge(width = 0.9))

I found it interesting that there was a sharp decline in birth rates starting in 2015 and that generally speaking the ratio of males to females in the city is pretty even.

Line Graph

df2 = baby_names %>% 
  group_by( Birth_Year,Ethnicity) %>% count() %>% ungroup() %>% as.data.frame()
df2$Ethnicity = as.character(df2$Ethnicity)

df2 = df2 %>% 
  mutate(Ethnicity = ifelse(Ethnicity=='ASIAN AND PACI',    'ASIAN AND PACIFIC ISLANDER', Ethnicity)) %>% 
  mutate(Ethnicity = ifelse(Ethnicity=='BLACK NON HISP',    'BLACK NON HISPANIC', Ethnicity)) %>%
  mutate(Ethnicity = ifelse(Ethnicity=='WHITE NON HISP',    'WHITE NON HISPANIC', Ethnicity))

ggplot(df2, aes(x = Birth_Year, y = n , color = Ethnicity)) + 
  ggtitle("Birth_Year vs Count with Ethnicity dimension")+
  xlab("Year") + ylab("Count")+
  geom_line()

It is interesting that the decline in birth rates was even across ethnicities. Researching for relevant news I found multiple articles that attributed the decline in birth rates to a marked reduction in teen pregnancies and an increase in “induced terminations”

Issues

It goes without saying there were multiple issues with plots. I found styling the plots far easier that creating the plots themselves. I would say it took me about 3 days to complete the line plot until because I couldn’t figure a way to convert the ethnicity data so I could mutate it. In the original data, each ethnicity has two groups, for example (“ASIAN AND PACIFIC ISLANDER” AND “ASIAN PAND PAC”) both referenced the same group. Eventually, I was able to convert them to characters and then mutate the names so they matched.

Assignment 2

Brett McGillivary

11/22/2019

Baby Names (tail) Table

Slightly longer printing of data

Birth Year vs Gender - Bar Chart

Line Graph

Issues