Our team interpreted this question to mean the following … we have aggregated the total frequency of male and female baby names across all years (no repeats), and ouputed both with their respective frequency.
The most popular male baby name is [James] and the count is [5164280]. The most popular female baby name is [Mary] and the count is [4125675].
male = subset(mydf, mydf$Sex == 'M')
female = subset(mydf, mydf$Sex == 'F')
lookupTop = function(dataframe, iterations){ # this function takes in a datafile and the number of iterations you want returned
aggregates = ddply(dataframe, 'Name', numcolwise(sum))
sorted_frame = arrange(aggregates, Count, decreasing = TRUE)
for(i in seq(1, iterations)){ # loops through sequence range
print(paste(as.character(sorted_frame[i,]$Name), sorted_frame[i,]$Count)) # prints name and frequency of top i entries
}
}
print(sprintf('The most popular baby name and its frequency are: %s', lookupTop(male, 10)))
## [1] "James 5164280"
## [1] "John 5124817"
## [1] "Robert 4820129"
## [1] "Michael 4362731"
## [1] "William 4117369"
## [1] "David 3621322"
## [1] "Joseph 2613304"
## [1] "Richard 2565301"
## [1] "Charles 2392779"
## [1] "Thomas 2311849"
## character(0)
print(sprintf('The most popular baby name and its frequency are: %s', lookupTop(female, 10)))
## [1] "Mary 4125675"
## [1] "Elizabeth 1638349"
## [1] "Patricia 1572016"
## [1] "Jennifer 1467207"
## [1] "Linda 1452668"
## [1] "Barbara 1434397"
## [1] "Margaret 1248985"
## [1] "Susan 1121703"
## [1] "Dorothy 1107635"
## [1] "Sarah 1077746"
## character(0)
The top five baby names in 1950 for males and females are shown below.
The top five baby names for males and females for 1980 are shown below.
The file with the top 10 baby names ever has been saved as a csv file named mostpop.csv in the current folder.
df_count <- aggregate(df$Count,by=list(name=df$Name),sum)
df_output <- df_count[order(df_count$x,decreasing = TRUE),]
only_10 = head(df_output,10)
print(only_10)
## name x
## 17688 Liam 19860
## 9039 Emma 18697
## 21579 Noah 18442
## 21943 Olivia 17929
## 3578 Ava 14933
## 27868 William 14526
## 11757 Isabella 14479
## 25674 Sophia 13943
## 12546 James 13569
## 17971 Logan 13426
write.csv(only_10,'mostpop.csv')