Example 1.What command would you use to extract just the counts of girls baptized?
source("http://www.openintro.org/stat/data/arbuthnot.R")
####1.What command would you use to extract just the counts of girls baptized?
sum(arbuthnot$girls)
## [1] 453841
Example 2.Is there an apparent trend in the number of girls baptized over the years?How would you describe it?
library(ggplot2)
trend <- ggplot(arbuthnot, aes(x = year))
trend <- trend + geom_line(aes(y = girls, colour = "Girls"))
trend <- trend + geom_line(aes(y = boys, colour = "Boys"))
trend <- trend + scale_colour_manual(values = c("blue", "red"))
trend <- trend + labs(y = "Count",x = "Year",colour = "Legend")
trend <- trend + theme(legend.position = c(0.1,0.8))
trend <- trend + ggtitle("Baptism: Boys vs Girls Trend") + theme(plot.title = element_text(hjust = 0.5))
trend
Trend shows that number of boys baptised are steady and more compared to girls on every year based on arbuthnot data Example 3:Make a plot of the proportion of boys over time. What do you see?
Prop_Boys = arbuthnot$boys / (arbuthnot$boys + arbuthnot$girls)
summary(Prop_Boys)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5027 0.5118 0.5157 0.5170 0.5210 0.5362
ggplot(data = arbuthnot, aes(x = year, y = Prop_Boys)) + geom_line(color = "blue", size = 1) + ggtitle("Propotion of Boys") + theme(plot.title = element_text(hjust = 0.5)) + labs(y = "Propotion=Boys/(Boys+Girls)",x = "Year",colour = "Legend")
Question 1.What years are included in this data set? What are the dimensions of the data frame and what are the variable or column names?
source("http://www.openintro.org/stat/data/present.R")
present$year
## [1] 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953
## [15] 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967
## [29] 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981
## [43] 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995
## [57] 1996 1997 1998 1999 2000 2001 2002
dim(present)
## [1] 63 3
names(present)
## [1] "year" "boys" "girls"
Question 2.How do these counts compare to Arbuthnot’s? Are they on a similar scale?
str(arbuthnot)
## 'data.frame': 82 obs. of 3 variables:
## $ year : int 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 ...
## $ boys : int 5218 4858 4422 4994 5158 5035 5106 4917 4703 5359 ...
## $ girls: int 4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 ...
str(present)
## 'data.frame': 63 obs. of 3 variables:
## $ year : num 1940 1941 1942 1943 1944 ...
## $ boys : num 1211684 1289734 1444365 1508959 1435301 ...
## $ girls: num 1148715 1223693 1364631 1427901 1359499 ...
summary(arbuthnot)
## year boys girls
## Min. :1629 Min. :2890 Min. :2722
## 1st Qu.:1649 1st Qu.:4759 1st Qu.:4457
## Median :1670 Median :6073 Median :5718
## Mean :1670 Mean :5907 Mean :5535
## 3rd Qu.:1690 3rd Qu.:7576 3rd Qu.:7150
## Max. :1710 Max. :8426 Max. :7779
summary(present)
## year boys girls
## Min. :1940 Min. :1211684 Min. :1148715
## 1st Qu.:1956 1st Qu.:1799857 1st Qu.:1711405
## Median :1971 Median :1924868 Median :1831679
## Mean :1971 Mean :1885600 Mean :1793915
## 3rd Qu.:1986 3rd Qu.:2058524 3rd Qu.:1965538
## Max. :2002 Max. :2186274 Max. :2082052
Arbuthnot data has 19 more year data covered compared to present data, also by looking at the summary information implies that they are not on similar scale and present data has significant increase in number of births Question 3.Make a plot that displays the boy-to-girl ratio for every year in the data set. What do you see? Does Arbuthnot’s observation #about boys being born in greater proportion than girls hold up in the U.S.? Include the plot in your response.
boy2girl_ratio=present$boys / (present$boys + present$girls)
summary(boy2girl_ratio)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5112 0.5121 0.5125 0.5125 0.5130 0.5143
ggplot(data = present, aes(x = year, y = boy2girl_ratio)) + geom_line(color = "blue", size = 1) + ggtitle("Boy-to-Girl Ratio") + theme(plot.title = element_text(hjust = 0.5)) + labs(y = "Boy-to_Girl Ratio=Boys/(Boys+Girls)",x = "Year",colour = "Legend")
The present data Ratio is between 0.511 to 0.514, which is more steadier than arthorbot data 0.502 to 0.536
Question 4.In what year did we see the most total number of births in the U.S.?
present$total=present$boys+present$girls
year<-present[order(-present$total),]
year[1:1,]
## year boys girls total
## 22 1961 2186274 2082052 4268326
Year 1961 is the most total number of births in US