First we get the “present” and “arbuthnot” objects
source("C:/Users/Exped/Desktop/Textbooks/606 Homeworks/Lab material/DATA606-master/inst/labs/Lab0/more/present.r")
source("C:/Users/Exped/Desktop/Textbooks/606 Homeworks/Lab material/DATA606-master/inst/labs/Lab0/more/arbuthnot.r")
df1 = arbuthnot
df2 = present
We use range to find all the years
range(df2$year)
## [1] 1940 2002
We use dim (short for dimensions) to get the dimensions of our dataframe
dim(df2)
## [1] 63 3
That 63 rows and 3 columns
We use names to get the names of columns/attributes/variables
names(df2)
## [1] "year" "boys" "girls"
rangeOfYearsdf2 <- range(df2$year)
rangeOfYearsdf1 = range(df1$year)
sumOfBGdf1 = sum(df1$boy)+sum(df1$girl)
sumOfBGdf2 = sum(df2$boy)+sum(df2$girl)
We can see that present df studies 19 less years (1940, 2002 >>> total of (62 years)) than the arbuthnot df(1629, 1710 >>> total of (81 years))
However, the sample size for the present df 2.318094210^{8} is much larger than the arbuthnot df 938223
Present df is of greater scale, by 2.30871210^{8}
plot(df2$year,df2$boys/df2$girls, type = 's', main = 'Boys to girls ratio in present df', ylab='B/G Ratio', xlab = 'Years', xlim = c(1940, 2050))
lineOfBestFit = lm(df2$boys/df2$girls ~ df2$year)
abline(lineOfBestFit,col='#641399')
We see a steady decline in boys to girl birth ratio, however Arbuthnot’s observation still holds true that boys are still born at a greater ratio than girls.
answer = df2$year[df2$girls+df2$boys == max(df2$girls+df2$boys)]
The year with most births is 1961