library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/Thu Nguyen/Downloads/Midterm Project")
birth_data<- read_csv("arbuthnot.csv")
## Rows: 82 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): year, boys, girls
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
This dataset will focus mostly on the birth rate in London from 1629 until 1710. In the dataset you could see the dramatic change from the birth rate through out the year. this dataset was pulished by John Arbuthnot which in the set it included the year, gender and the amount of people that was born during the year.First I will start cleaning the dataset which to convert to lowercase and replace spaces.
In this case, I choose to to clean name to make the data more relevant and make the data more clear.Also I was able to clear out the dupes with the data which make it easier for me to analyzing my information.
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
birth_data <- clean_names(birth_data)
library(janitor)
birth_data <- janitor::get_dupes(birth_data)
## No variable names specified - using all columns.
## No duplicate combinations found of: year, boys, girls
library(dplyr)
library(tidyr)
library(ggplot2)
birth_data <- data.frame(
year = 1629:1710,
boys = c(5218, 4858, 4422, 4994, 5158, 5035, 5106, 4917, 4703, 5359, 5366, 5518, 5470, 5460,
4793, 4107, 4047, 3768, 3796, 3363, 3079, 2890, 3231, 3220, 3196, 3441, 3655, 3668,
3396, 3157, 3209, 3724, 4748, 5216, 5411, 6041, 5114, 4678, 5616, 6073, 6506, 6278,
6449, 6443, 6073, 6113, 6058, 6552, 6423, 6568, 6247, 6548, 6822, 6909, 7577, 7575,
7484, 7575, 7737, 7487, 7604, 7909, 7662, 7602, 7676, 6985, 7263, 7632, 8062, 8426,
7911, 7578, 8102, 8031, 7765, 6113, 8366, 7952, 8379, 8239, 7840, 7640),
girls = c(4683, 4457, 4102, 4590, 4839, 4820, 4928, 4605, 4457, 4952, 4784, 5332, 5200, 4910,
4617, 3997, 3919, 3395, 3536, 3181, 2746, 2722, 2840, 2908, 2959, 3179, 3349, 3382,
3289, 3013, 2781, 3247, 4107, 4803, 4881, 5681, 4858, 4319, 5322, 5560, 5829, 5719,
6061, 6120, 5822, 5738, 5717, 5847, 6203, 6033, 6041, 6299, 6533, 6744, 7158, 7127,
7246, 7119, 7214, 7101, 7167, 7302, 7392, 7316, 7483, 6647, 6713, 7229, 7767, 7626,
7452, 7061, 7514, 7656, 7683, 5738, 7779, 7417, 7687, 7623, 7380, 7288)
)
birth_long <- birth_data %>%
pivot_longer(cols = c(boys, girls), names_to = "gender", values_to = "births")
ggplot(birth_long, aes(x = year, y = births, color = gender)) +
geom_line(linewidth = 1.2) +
scale_color_manual(values = c("boys" = "blue", "girls" = "red")) +
labs(
title = "Births Per Year (1629–1710)",
x = "Year",
y = "Number of Births",
color = "Gender"
) +
theme_minimal()
After analyzing the data I was able to come out with the conclusion that over the course of 82 there’s a dramatic change to the birth rate between boys and girls. There was a little dip between the 20 years course of 1640 to 1665. The number increased from 1710 which the number grew to 14,928 and this shows a steady growth in population especially from the 1660s. To answer my question that if there is a connection between the birth of boys and girls in London which is yes because the gender gap remain stable and also there will always 400 more boys than girls born per year.Potential reasoning for the change in birth rate is social, health and environmental factors like war, famine and diseases.
“data page:”Male and female births in London”openintro.org(2025).Data adapted from https://towardsdatascience.com/the-birth-of-data-science-historys-first-hypothesis-test-python-insights-4745dccaf6d/ [online source]