mardia_dm1633_assignment1.knit

Install the package “babynames” Plot the number of male and female babies named Taylor by year

if (!require("babynames")) install.packages("babynames")

## Loading required package: babynames

if (!require("tidyverse")) install.packages("tidyverse")

## Loading required package: tidyverse

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(babynames)

babynames %>% filter(name == "Taylor") %>% arrange(year) %>% ggplot(aes(year, n, color = sex)) + geom_line() +
  xlab("year") +
  ylab("Frequency") +
  ggtitle("Year wise Frequency for the name Taylor for Female & Male")

Answer the following questions, showing plots to substantiate your answers (except 4):

1.Is a 23-year old named Quinn more likely to be a boy or a girl? Ans. The plot has taken into account the survey years where the last year shows 2017 but the babies born in 2017 would be 0 years so we need to consider 2018 instead to account for all 23 year old’s named Quinn. Although I have made graphs for both 2017 & 2023. As per the graph the probability of having a Male Quinn is very likely for 23 year olds

quinn_23yr2017 <- babynames %>% filter(name == "Quinn" & year == (2018-23)) %>% select(sex, n) %>% ggplot(aes(sex, n)) + geom_col(fill="green") +
  xlab("sex") +
  ylab("Frequency") +
  ggtitle("Sex wise Frequency for the name Quinn at 23years old age.")
quinn_23yr2017

#with 2023 

quinn_23yr2023 <- babynames %>% filter(name == "Quinn" & year == (2023-23)) %>% select(sex, n) %>% ggplot(aes(sex, n)) + geom_col(fill="blue") + 
  xlab("sex") +
  ylab("Frequency") +
  ggtitle("Sex wise Frequency for the name Quinn at 23years old age.")
quinn_23yr2023

Is a 6 year old named Quinn more likely to be a boy or a girl? Ans. The plot has taken into account the survey years where the last year shows 2017 but the babies born in 2017 would be 0 years so we need to consider 2018 instead to account for all 6 year old’s named Quinn. Although I have made graphs for both 2017 & 2023. As per the graph the probability of having a Female Quinn is very likely for 6 year olds

quinn_6yr2017 <- babynames %>% filter(name == "Quinn" & year == (2018-6)) %>% select(sex, n) %>% ggplot(aes(sex, n)) + geom_col(fill="purple") +
  xlab("sex") +
  ylab("Frequency") +
  ggtitle("Sex wise Frequency for the name Quinn at 6years old age.")
quinn_6yr2017

#with 2023 as year 

quinn_6yr2023 <- babynames %>% filter(name == "Quinn" & year == (2023-6)) %>% select(sex, n) %>% ggplot(aes(sex, n)) + geom_col(fill="yellow")  + 
  xlab("sex") +
  ylab("Frequency") +
  ggtitle("Sex wise Frequency for the name Quinn at 6years old age.")
quinn_6yr2023

What is your best guess as to how old a woman named Susan is? Ans. As per the dataframe and graph we can see that Susan is 63 years old if we consider 2017/18 as last year. If we consider 2023 year it will be 68 years old.

susanage <- babynames %>% mutate(age = 2018-year)  %>% filter(name == "Susan" & sex == "F" & age <= 100) %>% arrange(year)

susanage %>% ggplot(aes(age, n)) + geom_line() +
  xlab("Age") +
  ylab("Frequency") +
  ggtitle("Finding Susan's age")

susanage %>% arrange(desc(n))

## # A tibble: 100 × 6
##     year sex   name      n   prop   age
##    <dbl> <chr> <chr> <int>  <dbl> <dbl>
##  1  1955 F     Susan 47397 0.0236    63
##  2  1954 F     Susan 47158 0.0237    64
##  3  1956 F     Susan 46567 0.0226    62
##  4  1957 F     Susan 45951 0.0219    61
##  5  1958 F     Susan 45172 0.0219    60
##  6  1953 F     Susan 44285 0.0230    65
##  7  1959 F     Susan 41598 0.0200    59
##  8  1952 F     Susan 41350 0.0217    66
##  9  1951 F     Susan 40227 0.0218    67
## 10  1960 F     Susan 39200 0.0188    58
## # … with 90 more rows

#with 2023 year

susanage <- babynames %>% mutate(age = 2023-year)  %>% filter(name == "Susan" & sex == "F" & age <= 100) %>% arrange(year)

susanage %>% ggplot(aes(age, n)) + geom_line() +
  xlab("Age") +
  ylab("Frequency") +
  ggtitle("Finding Susan age")

susanage %>% arrange(desc(n))

## # A tibble: 95 × 6
##     year sex   name      n   prop   age
##    <dbl> <chr> <chr> <int>  <dbl> <dbl>
##  1  1955 F     Susan 47397 0.0236    68
##  2  1954 F     Susan 47158 0.0237    69
##  3  1956 F     Susan 46567 0.0226    67
##  4  1957 F     Susan 45951 0.0219    66
##  5  1958 F     Susan 45172 0.0219    65
##  6  1953 F     Susan 44285 0.0230    70
##  7  1959 F     Susan 41598 0.0200    64
##  8  1952 F     Susan 41350 0.0217    71
##  9  1951 F     Susan 40227 0.0218    72
## 10  1960 F     Susan 39200 0.0188    63
## # … with 85 more rows

Find the five most popular female names in the year 2017. Ans.We can see that by simply arranging the data after applying filter we get the five most popular names as: Emma, Olivia, Ava, Isabella and Sophia

female_popular <- babynames %>% filter(year == 2017 & sex == "F") %>% arrange(desc(n))
head(female_popular, 5)

## # A tibble: 5 × 5
##    year sex   name         n    prop
##   <dbl> <chr> <chr>    <int>   <dbl>
## 1  2017 F     Emma     19738 0.0105 
## 2  2017 F     Olivia   18632 0.00994
## 3  2017 F     Ava      15902 0.00848
## 4  2017 F     Isabella 15100 0.00805
## 5  2017 F     Sophia   14831 0.00791