initialization

## 
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='DATA606') will list the demos that are available.
## 
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
## 
##     demo

Excercise 1

arbuthnot$girls
##  [1] 4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 4784 5332 5200 4910
## [15] 4617 3997 3919 3395 3536 3181 2746 2722 2840 2908 2959 3179 3349 3382
## [29] 3289 3013 2781 3247 4107 4803 4881 5681 4858 4319 5322 5560 5829 5719
## [43] 6061 6120 5822 5738 5717 5847 6203 6033 6041 6299 6533 6744 7158 7127
## [57] 7246 7119 7214 7101 7167 7302 7392 7316 7483 6647 6713 7229 7767 7626
## [71] 7452 7061 7514 7656 7683 5738 7779 7417 7687 7623 7380 7288

Excercise 2

The number of girls babtized appears to increase steadily, other than for about 1650 to 1660, where it dips signfigantly. This must correspond with a historical event that lowered the populatoin or decreased interest in religion. It could also be something like a war.

Excercise 3

arbuthnot$ratios <- arbuthnot$boys / (arbuthnot$boys + arbuthnot$girls)
ggplot(arbuthnot) + geom_line(mapping = aes(x = year, y = ratios), color = "red")

The plot appears to be noise around a mean of about .52. It seams the proportion of boys baptized is higher than that of girls.

On Your Own

Question 1

present$year
##  [1] 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953
## [15] 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967
## [29] 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981
## [43] 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995
## [57] 1996 1997 1998 1999 2000 2001 2002
colnames(present)
## [1] "year"  "boys"  "girls"
dimensions<- c(NROW(present), NCOL(present))
dimensions 
## [1] 63  3

Questin 2

present$counts <- present$boys + present$girls
present$counts
##  [1] 2360399 2513427 2808996 2936860 2794800 2735456 3288672 3699940
##  [9] 3535068 3559529 3554149 3750850 3846986 3902120 4017362 4047295
## [17] 4163090 4254784 4203812 4244796 4257850 4268326 4167362 4098020
## [25] 4027490 3760358 3606274 3520959 3501564 3600206 3731386 3555970
## [33] 3258411 3136965 3159958 3144198 3167788 3326632 3333279 3494398
## [41] 3612258 3629238 3680537 3638933 3669141 3760561 3756547 3809394
## [49] 3909510 4040958 4158212 4110907 4065014 4000240 3952767 3899589
## [57] 3891494 3880894 3941553 3959417 4058814 4025933 4021726

These counts are a couple orders of magnitude greater than Arbuthnot’s

Question 3

present$ratios <- present$boys/present$girls
present$ratios
##  [1] 1.054817 1.053969 1.058429 1.056767 1.055757 1.055391 1.058698
##  [8] 1.055449 1.053820 1.053760 1.053716 1.052078 1.050934 1.053399
## [15] 1.051460 1.050742 1.051286 1.050672 1.049374 1.049480 1.048873
## [22] 1.050057 1.047948 1.052717 1.047188 1.051137 1.048540 1.049964
## [29] 1.053417 1.052997 1.054719 1.051845 1.051271 1.052129 1.054797
## [36] 1.053605 1.052538 1.052569 1.052657 1.051749 1.052837 1.051615
## [43] 1.050597 1.051976 1.050199 1.052061 1.050876 1.050000 1.049991
## [50] 1.049720 1.049676 1.045849 1.050017 1.049955 1.047877 1.048928
## [57] 1.047062 1.047643 1.047190 1.048791 1.047998 1.045686 1.047986
ggplot(present) + geom_line(mapping = aes(x=year, y = ratios), color = "blue")

It appears boys are born in greater proportion in the US, as the proportion never dips below 1.045-1. If the data is unbiased, I think we can make this conclusion.

Question 4

## [1] "Highest Count: 4268326"