library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
##load data for USA from HMD.
load the data for USA males and USA females. Add a variable country and set it to “USA” for combined data. Select country, Year, Age and qx. Make Age numeric. Eliminate any missing data.
# USA male data from HMD
USAM <- read_table("./USA/USA/STATS/mltper_1x1.txt", skip = 2) %>%
mutate(country = "USA") %>%
select(country, Year, Age, qxm = "qx") %>%
mutate(Age = as.numeric(Age)) %>%
drop_na()
##
## -- Column specification --------------------------------------------------------
## cols(
## Year = col_double(),
## Age = col_character(),
## mx = col_double(),
## qx = col_double(),
## ax = col_double(),
## lx = col_double(),
## dx = col_double(),
## Lx = col_double(),
## Tx = col_double(),
## ex = col_double()
## )
## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
# USA female data from HMD
USAF <- read_table("./USA/USA/STATS/fltper_1x1.txt", skip = 2) %>%
mutate(country = "USA") %>%
rename(qxf = qx) %>%
select(country, Year, Age, qxf) %>%
mutate(Age = as.numeric(Age)) %>%
drop_na()
##
## -- Column specification --------------------------------------------------------
## cols(
## Year = col_double(),
## Age = col_character(),
## mx = col_double(),
## qx = col_double(),
## ax = col_double(),
## lx = col_double(),
## dx = col_double(),
## Lx = col_double(),
## Tx = col_double(),
## ex = col_double()
## )
## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
#combine both male and female to one dataframe using inner join from USAM to USAF
USA = USAM %>%
inner_join(USAF)
## Joining, by = c("country", "Year", "Age")
#check the head for combined USA
head(USA)
## # A tibble: 6 x 5
## country Year Age qxm qxf
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 USA 1933 0 0.0649 0.0521
## 2 USA 1933 1 0.00999 0.00883
## 3 USA 1933 2 0.00466 0.00402
## 4 USA 1933 3 0.00333 0.00287
## 5 USA 1933 4 0.00253 0.00223
## 6 USA 1933 5 0.00209 0.00185
Problem 1
Prepare your analysis dataframe using data extracted from the Human Mortality Database. Begin by selecting another country(not Canada), which you will compare with the USA. When you are done, your dataframe should have the following variables: country, Year, Age, qxm, and qxf.
Do an str() on the dataframe and create a table of the values in the variable country. There should be two values in country, USA and whatever other country you select.
# Place your code here.
# japan male data from HMD
JPNM <- read_table("./JPN/JPN/STATS/mltper_1x1.txt", skip = 2) %>%
mutate(country = "JPN") %>%
select(country, Year, Age, qxm = "qx") %>%
mutate(Age = as.numeric(Age)) %>%
drop_na()
##
## -- Column specification --------------------------------------------------------
## cols(
## Year = col_double(),
## Age = col_character(),
## mx = col_double(),
## qx = col_double(),
## ax = col_double(),
## lx = col_double(),
## dx = col_double(),
## Lx = col_double(),
## Tx = col_double(),
## ex = col_double()
## )
## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
# JPN female data from HMD
JPNF <- read_table("./JPN/JPN/STATS/fltper_1x1.txt", skip = 2) %>%
mutate(country = "JPN") %>%
rename(qxf = qx) %>%
select(country, Year, Age, qxf) %>%
mutate(Age = as.numeric(Age)) %>%
drop_na()
##
## -- Column specification --------------------------------------------------------
## cols(
## Year = col_double(),
## Age = col_character(),
## mx = col_double(),
## qx = col_double(),
## ax = col_double(),
## lx = col_double(),
## dx = col_double(),
## Lx = col_double(),
## Tx = col_double(),
## ex = col_double()
## )
## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
#combine Japan's both male and female to one dataframe using inner join.
JPN = JPNM %>%
inner_join(JPNF)
## Joining, by = c("country", "Year", "Age")
#check the head for combined USA
head(JPN)
## # A tibble: 6 x 5
## country Year Age qxm qxf
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 JPN 1947 0 0.0895 0.0791
## 2 JPN 1947 1 0.0364 0.0350
## 3 JPN 1947 2 0.0171 0.0171
## 4 JPN 1947 3 0.0113 0.0112
## 5 JPN 1947 4 0.00745 0.007
## 6 JPN 1947 5 0.0051 0.00482
## combining the USA and Japan dataframe together using row bind.
USA_JPN = rbind(USA, JPN)
#check the unique variables in country of the combined usa and japan dataframe
unique(USA_JPN$country)
## [1] "USA" "JPN"
#check the structure
str(USA_JPN)
## tibble [17,710 x 5] (S3: tbl_df/tbl/data.frame)
## $ country: chr [1:17710] "USA" "USA" "USA" "USA" ...
## $ Year : num [1:17710] 1933 1933 1933 1933 1933 ...
## $ Age : num [1:17710] 0 1 2 3 4 5 6 7 8 9 ...
## $ qxm : num [1:17710] 0.06486 0.00999 0.00466 0.00333 0.00253 ...
## $ qxf : num [1:17710] 0.05208 0.00883 0.00402 0.00287 0.00223 ...
Problem 2
Create a graph of infant mortality for both countries over as many years as you have. Be sure to comment on the results. Use plotly.
# Place your code here.
#male infant mortality rate for usa and japan
g1 = USA_JPN %>%
filter(Age == 0 & Year > 1946) %>%
ggplot(aes(x = Year, y = qxm, color = country)) +
geom_point() +
ggtitle("Infant mortality rate for male for usa and japan")
ggplotly(g1)
From 1947 to 1961, the infant male mortality rate for Japan was higher than male infant mortality rate for USA. The 1947 infant mortality for Japan seems way higher for some reason. In 1962, both countrie’s male infant mortality seems to be same. From 1963, USA has high infant male mortality rate compared to japan. Overall, the male infant mortality rate seems to decrease as time progresses for both countries.
#female infant mortality rate for usa and japan
g2 = USA_JPN %>%
filter(Age == 0 & Year > 1946) %>%
ggplot(aes(x = Year, y = qxf, color = country)) +
geom_point() +
ggtitle("Infant mortality rate for female for usa and japan")
ggplotly(g2)
From 1947 to 1961, the infant female mortality rate for Japan was higher than male infant mortality rate for USA. The 1947 infant mortality for Japan seems way higher for some reason. In 1962, both countrie’s female infant mortality seems to be same. From 1963, USA has high infant female mortality rate compared to japan. Overall, the female infant mortality rate seems to decrease as time progresses for both countries.
Create a graph of mortality for your age and gender for both countries over as many years as you have. Be sure to comment on the results. Use plotly.
# Place your code here.
# graph for Age=33 and gender= male for both couturiers.
g3 = USA_JPN %>%
filter(Age == 33 & Year > 1946) %>%
ggplot(aes(x = Year, y = qxm, color = country)) +
geom_point()+
ggtitle("Age 33 Male mortality - USA and Japan")
ggplotly(g3)
The age 33 male mortality seems to be decreasing over time for both countries. For japan, the mortality seems very high from 1947 to 1951 for some reason as compared to other time line.. Japan’s male age 33 mortality seems higher till 1962. After that it is lower than USA’s.
Create a graph of mortality at age 21 for both genders against year. Use facet_wrap() to do this for both countries side-by-side. Be sure to comment. Use plotly.
# Place your code here.
g4 = USA_JPN %>%
filter(Age == 21) %>%
ggplot((aes(x = Year))) +
geom_point(aes(y = qxm), color = "blue") +
geom_point(aes(y = qxf), color = "red") +
facet_wrap(~country)
ggplotly(g4)
The mortality rate 21 years for japan seems to decrease significantly over time as we can see steep decline. The USA mortality is consistently decreasing over time.
Create a graph of the ratio qxm/qxf at age 21 for both countries in a single plot. Be sure to comment. Use plotly.
# Place your code here.
g5 = USA_JPN %>%
filter(Age == 21) %>%
mutate( Ratio = qxm/qxf) %>%
ggplot(aes(x = Year, y = Ratio, color = country)) +
geom_point() +
ggtitle("Ratio at Age 21 USA and Japan")
ggplotly(g5)
The ratio for age 21 is mostly always lower for Japan as compared to USA. We can also see both both countries, the ratio is increasing with time. That is to say male mortality for age 21 is increasing for both countries as compared to female mortality as time progresses.