Install the package “babynames” Plot the number of male and female babies named Taylor by year
if (!require("babynames")) install.packages("babynames")
## Loading required package: babynames
if (!require("tidyverse")) install.packages("tidyverse")
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(babynames)
babynames %>% filter(name == "Taylor") %>% arrange(year) %>% ggplot(aes(year, n, color = sex)) + geom_line() +
xlab("year") +
ylab("Frequency") +
ggtitle("Year wise Frequency for the name Taylor for Female & Male")
Answer the following questions, showing plots to substantiate your
answers (except 4):
1.Is a 23-year old named Quinn more likely to be a boy or a girl? Ans. The plot has taken into account the survey years where the last year shows 2017 but the babies born in 2017 would be 0 years so we need to consider 2018 instead to account for all 23 year old’s named Quinn. Although I have made graphs for both 2017 & 2023. As per the graph the probability of having a Male Quinn is very likely for 23 year olds
quinn_23yr2017 <- babynames %>% filter(name == "Quinn" & year == (2018-23)) %>% select(sex, n) %>% ggplot(aes(sex, n)) + geom_col(fill="green") +
xlab("sex") +
ylab("Frequency") +
ggtitle("Sex wise Frequency for the name Quinn at 23years old age.")
quinn_23yr2017
#with 2023
quinn_23yr2023 <- babynames %>% filter(name == "Quinn" & year == (2023-23)) %>% select(sex, n) %>% ggplot(aes(sex, n)) + geom_col(fill="blue") +
xlab("sex") +
ylab("Frequency") +
ggtitle("Sex wise Frequency for the name Quinn at 23years old age.")
quinn_23yr2023
Is a 6 year old named Quinn more likely to be a boy or a girl? Ans. The plot has taken into account the survey years where the last year shows 2017 but the babies born in 2017 would be 0 years so we need to consider 2018 instead to account for all 6 year old’s named Quinn. Although I have made graphs for both 2017 & 2023. As per the graph the probability of having a Female Quinn is very likely for 6 year olds
quinn_6yr2017 <- babynames %>% filter(name == "Quinn" & year == (2018-6)) %>% select(sex, n) %>% ggplot(aes(sex, n)) + geom_col(fill="purple") +
xlab("sex") +
ylab("Frequency") +
ggtitle("Sex wise Frequency for the name Quinn at 6years old age.")
quinn_6yr2017
#with 2023 as year
quinn_6yr2023 <- babynames %>% filter(name == "Quinn" & year == (2023-6)) %>% select(sex, n) %>% ggplot(aes(sex, n)) + geom_col(fill="yellow") +
xlab("sex") +
ylab("Frequency") +
ggtitle("Sex wise Frequency for the name Quinn at 6years old age.")
quinn_6yr2023
What is your best guess as to how old a woman named Susan is? Ans. As per the dataframe and graph we can see that Susan is 63 years old if we consider 2017/18 as last year. If we consider 2023 year it will be 68 years old.
susanage <- babynames %>% mutate(age = 2018-year) %>% filter(name == "Susan" & sex == "F" & age <= 100) %>% arrange(year)
susanage %>% ggplot(aes(age, n)) + geom_line() +
xlab("Age") +
ylab("Frequency") +
ggtitle("Finding Susan's age")
susanage %>% arrange(desc(n))
## # A tibble: 100 × 6
## year sex name n prop age
## <dbl> <chr> <chr> <int> <dbl> <dbl>
## 1 1955 F Susan 47397 0.0236 63
## 2 1954 F Susan 47158 0.0237 64
## 3 1956 F Susan 46567 0.0226 62
## 4 1957 F Susan 45951 0.0219 61
## 5 1958 F Susan 45172 0.0219 60
## 6 1953 F Susan 44285 0.0230 65
## 7 1959 F Susan 41598 0.0200 59
## 8 1952 F Susan 41350 0.0217 66
## 9 1951 F Susan 40227 0.0218 67
## 10 1960 F Susan 39200 0.0188 58
## # … with 90 more rows
#with 2023 year
susanage <- babynames %>% mutate(age = 2023-year) %>% filter(name == "Susan" & sex == "F" & age <= 100) %>% arrange(year)
susanage %>% ggplot(aes(age, n)) + geom_line() +
xlab("Age") +
ylab("Frequency") +
ggtitle("Finding Susan age")
susanage %>% arrange(desc(n))
## # A tibble: 95 × 6
## year sex name n prop age
## <dbl> <chr> <chr> <int> <dbl> <dbl>
## 1 1955 F Susan 47397 0.0236 68
## 2 1954 F Susan 47158 0.0237 69
## 3 1956 F Susan 46567 0.0226 67
## 4 1957 F Susan 45951 0.0219 66
## 5 1958 F Susan 45172 0.0219 65
## 6 1953 F Susan 44285 0.0230 70
## 7 1959 F Susan 41598 0.0200 64
## 8 1952 F Susan 41350 0.0217 71
## 9 1951 F Susan 40227 0.0218 72
## 10 1960 F Susan 39200 0.0188 63
## # … with 85 more rows
Find the five most popular female names in the year 2017. Ans.We can see that by simply arranging the data after applying filter we get the five most popular names as: Emma, Olivia, Ava, Isabella and Sophia
female_popular <- babynames %>% filter(year == 2017 & sex == "F") %>% arrange(desc(n))
head(female_popular, 5)
## # A tibble: 5 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 2017 F Emma 19738 0.0105
## 2 2017 F Olivia 18632 0.00994
## 3 2017 F Ava 15902 0.00848
## 4 2017 F Isabella 15100 0.00805
## 5 2017 F Sophia 14831 0.00791