library(mosaic)
NCbirths <- read.csv("births.csv")
Habit and weight
A quantitative variable that might affect the baby is Gained because the weight a mother gains during pregnancy typically coincides with the weight of the child. A categorical variable that might affect weight is Premie because premature babies typically weigh less than those born normally.
result <- subset(NCbirths, subset = Habit == "NonSmoker",
select = c("weight", "Habit", "Gained", "Premie"))
head() function to print out the first few rows of your data frame.head(result)
mean(NCbirths$weight)
#> [1] 116.0591
mean(NCbirths$weight ~ NCbirths$Habit)
#> NonSmoker Smoker
#> 118.6667 116.8416 108.4225
sd(NCbirths$weight)
#> [1] 20.40667
sd(NCbirths$weight ~ NCbirths$Habit)
#> NonSmoker Smoker
#> 21.09660 20.29014 20.03352
Births <- NCbirths[NCbirths$Habit != "", ]
Births, create two graphics that help answer this question: Does the birth weight of babies born to smoking mothers differ from those born to non-smoking mothers?histogram(~ weight | Births$Habit, data = Births, layout = c(1,2))
freqpolygon(~ weight | Births$Habit, data = Births, layout = c(1,2))
set.seed(45)
do(4) * rflip(n = 20, prob = 0.4)
set.seed(45)
flip.data45 <- do(1000) * rflip(n = 20, prob = 0.4)
head(flip.data45)
set.seed(225)
sample(1:1998, replace=FALSE, size = 25)
#> [1] 1464 1693 1217 944 1592 1161 1094 556 1362 615 749 1383 1549 1421 1495
#> [16] 1653 1866 1057 130 972 836 325 192 523 875
sample() function, where the probability of heads (represented by 1) is 0.4. Print the proportion of heads obtained.set.seed(45)
coin11 <- sample(c(1,0), replace = TRUE, size = 20, prob = c(.4, .6))
print(coin11)
#> [1] 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
mean(coin11)
#> [1] 0.15
Ho pi = 0.75 Ha pi > 0.75
sample() function). Set the seed to 405. Do not use the do() * rflip() notation.pi <- 0.75
p_hat <- 8/10
n <- 10
N <- 1000
sim_prop <- numeric(N)
set.seed(405)
for(i in 1:N){
flips <- sample(c(1,0), size = n, replace = TRUE, prob = c(pi, 1-pi))
sim_prop[i] <- mean(flips)
}
histogram(sim_prop, breaks = 10)
sum(sim_prop >= p_hat) / N
#> [1] 0.522
(p_hat - mean(sim_prop)) / sd(sim_prop)
#> [1] 0.3839956
With a p-value of .522 and a z-statistic of 0.38 there is little evidence in support of the alternative and against the null hypothesis.
pi <- 0.75
p_hat <- 10/10
n <- 10
N <- 1000
sim_prop <- numeric(N)
set.seed(405)
for(i in 1:N){
flips <- sample(c(1,0), size = n, replace = TRUE, prob = c(pi, 1-pi))
sim_prop[i] <- mean(flips)
}
sum(sim_prop >= p_hat) / N
#> [1] 0.055
Based on the p-value of .055 there is moderate evidence against the null hypothesis. The possible constraints may be the small sample size of only 10 shots which would allow for high variability in the data.