For this project, I used three different datasets from the Week 5 Discussion 5A post. With these three datasets, I prepared each of them by creating a .csv file and importing the data. Then I worked on tidying the data, and performing an analysis on the dataset. I also made sure that the code within the Quarto Markdown file is reproducible in a clean environment. I used a similar process to what we did in Assignment 5A with the Airline Delays, as I feel like that is very similar assignment to this.
Dataset 1:
Birth Rate by Countries posted by Brandon Chanderban
# A tibble: 63 × 6
`Country Name` `Country Code` `Indicator Name` `Indicator Code` Year
<chr> <chr> <chr> <chr> <chr>
1 United States USA Birth rate, crude (per … SP.DYN.CBRT.IN 1960
2 United States USA Birth rate, crude (per … SP.DYN.CBRT.IN 1961
3 United States USA Birth rate, crude (per … SP.DYN.CBRT.IN 1962
4 United States USA Birth rate, crude (per … SP.DYN.CBRT.IN 1963
5 United States USA Birth rate, crude (per … SP.DYN.CBRT.IN 1964
6 United States USA Birth rate, crude (per … SP.DYN.CBRT.IN 1965
7 United States USA Birth rate, crude (per … SP.DYN.CBRT.IN 1966
8 United States USA Birth rate, crude (per … SP.DYN.CBRT.IN 1967
9 United States USA Birth rate, crude (per … SP.DYN.CBRT.IN 1968
10 United States USA Birth rate, crude (per … SP.DYN.CBRT.IN 1969
# ℹ 53 more rows
# ℹ 1 more variable: `Birth Rate` <dbl>
US_only_df |>gt() |>cols_hide(columns =c(`Country Name`, `Country Code`, `Indicator Name`, `Indicator Code`)) |>tab_header(title ="Birth rates in the US (per 1,000 people)", )
Birth rates in the US (per 1,000 people)
Year
Birth Rate
1960
23.7
1961
23.3
1962
22.4
1963
21.7
1964
21.1
1965
19.4
1966
18.4
1967
17.8
1968
17.6
1969
17.9
1970
18.4
1971
17.2
1972
15.6
1973
14.8
1974
14.8
1975
14.6
1976
14.6
1977
15.1
1978
15.0
1979
15.6
1980
15.9
1981
15.8
1982
15.9
1983
15.6
1984
15.6
1985
15.8
1986
15.6
1987
15.7
1988
16.0
1989
16.4
1990
16.7
1991
16.2
1992
15.8
1993
15.4
1994
15.0
1995
14.6
1996
14.4
1997
14.2
1998
14.3
1999
14.2
2000
14.4
2001
14.1
2002
14.0
2003
14.1
2004
14.0
2005
14.0
2006
14.3
2007
14.3
2008
14.0
2009
13.5
2010
13.0
2011
12.7
2012
12.6
2013
12.4
2014
12.5
2015
12.4
2016
12.2
2017
11.8
2018
11.6
2019
11.4
2020
10.9
2021
11.0
2022
0.0
I created a line chart for the US birth rate data using the following code chunk. This was done to make it easier to see trends.
library(ggplot2)ggplot(US_only_df, aes(x =as.numeric(Year), y =`Birth Rate`)) +geom_line(linewidth =1) +geom_point(size =2) +scale_x_continuous(breaks =seq(1960, 2022, by =10)) +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +labs(title ="US Birth Rate Trends Over Time",subtitle ="Annual births per 1,000 persons",x ="Year",y ="Birth Rate",color ="Region" )
Ignoring unknown labels:
• colour : "Region"
From this plot, we are able to see that the US has been on a downward trend in crude birth rates between the year 1960 and 2022.