Data Frames and Subsetting

Question 1 (With Your TA)

(1 point) Read in the births.csv file into an object called births. Print out the first 6 rows and verify it matches the lab manual.

# Type your CODE in here
 births<-read.csv("births.csv")
 head(births)

Question 2 (On Your Own)

(1 point) Print a subset of the births data that contains the eighth and eleventh rows, and the Visits and Gained variables. Only use numeric vectors to make this subset.

# Type your CODE in here 
 births[c(8, 11), c(10, 16)]

Question 3 (On Your Own)

(1 point) Print a subset of the births data that contains the second through seventh rows, and the Racemom and Racedad variables. Only use numeric vectors created with the colon operator to make this subset.

# Type your CODE in here
 births_sub <- births[2:7, 12:13]
 head(births_sub)

Question 4 (On Your Own)

(1 point) Print a subset of the births data that contains the second, third, and sixth rows, and the Premie and weight variables. Only use a character vector to specify the columns to subset.

# Type your CODE in here
 births[c(2, 3,6), c("Premie", "weight")]

Model Notation and Descriptive Statistics

Question 5 (On Your Own)

(1 point) Use model notation to print the mean of the father’s age conditional on marital status.

# Type your CODE in here
mean(~ Fage | Marital, data = births)
##   Married Unmarried 
##  31.57728  28.40992

Question 6 (On Your Own)

(1 point) Use model notation to print the mean of the baby’s weight conditional on their gender.

# Type your CODE in here
 mean(~ weight | Gender, data = births)
##   Female     Male 
## 113.7503 118.1986

Question 7 (On Your Own)

(1 point) Use model notation to print the variance of the baby’s weight conditional on their premature status.

# Type your CODE in here
 var(~ weight | Premie, data = births)
##       No      Yes 
## 255.3130 652.4227

Model Notation and Data Visualization

Question 8 (On Your Own)

(1 point) Use model notation to print a histogram of the mothers’ ages (unconditional).

# Type your CODE in here
 histogram(~ Mage, data = births)

Question 9 (On Your Own)

(1 point) Use model notation to print a histogram of the mothers’ ages conditional on marital status.

# Type your CODE in here
histogram(~ Mage | Marital, data = births)

Simulating Binomial Random Variables

Question 10 (On Your Own)

(1 point) Set the seed to 95472, simulate \(M = 100\) coin flips, and save the result. Calculate and print the proportion.

# Type your CODE in here
# Parameters
 M <- 100
 n <- 1
 pi <- 0.5
 # Random Sampling
 set.seed(95472)
 X <- rbinom(M, size = n, prob = pi)
 mean(X)
## [1] 0.53

Question 11 (On Your Own)

(1 point) Set the seed to 83954, simulate \(M = 1000\) coin flips, and save the result. Calculate and print the proportion.

# Type your CODE in here
# Parameters
 M <- 1000
 n <- 1
 pi <- 0.5
 # Random Sampling
 set.seed(83954)
 X <- rbinom(M, size = n, prob = pi)
 mean(X)
## [1] 0.525

Question 12 (On Your Own)

(1 point) Set the seed to 64732, simulate \(M = 10000\) coin flips, and save the result. Calculate and print the proportion.

# Type your CODE in here
# Parameters
 M <- 10000
 n <- 1
 pi <- 0.5
 # Random Sampling
 set.seed(64732)
 X <- rbinom(M, size = n, prob = pi)
 mean(X)
## [1] 0.5047

Question 13 (On Your Own)

(1 point) Consider the proportion of heads that you printed from Questions 10, 11, and 12. What number is the proportion approaching as \(M\) increases? Why?

As our sample trials increases our proportion is getting closer to 0.5. This because this because as our trials increase the closer our expected proportion balances out the various results and you thus gets closer to 0.5.

Building Probability Distributions

Question 14 (With Your TA)

(1 point) State the null and alternative hypotheses. Write it in \(\LaTeX\) code (your TA will teach you).

\[H_{0}: \pi=0.5\] \[H_{1}: \pi>0.5\]

Question 15 (On Your Own)

(1 point) Like the example shown in the lab manual above, simulate \(M = 1000\) samples from a binomial distribution, with an appropriate \(n\) and \(\pi\) for this hypothesis test. Use 485 as the seed. Store these results in a vector called X. Print the frequencies of X with the tally() function.

# Type your CODE in here
 # Parameters
 M <- 1000
 n <- 1
 pi <- 0.5
 # Random Sampling
 set.seed(485)
 X <- rbinom(M, size = n, prob = pi)
 tally(X)
## X
##   0   1 
## 514 486

Question 16 (On Your Own)

(1 point) Like the example shown in the lab manual, use the tally() and data.frame() functions to start a data frame of sums and frequencies. Call this data frame prob_dist. Print this data frame.

# Type your CODE in here
 prob_dist <- data.frame(tally(X))
 prob_dist

Question 17 (On Your Own)

(1 point) Create a character vector that enumerates all possible proportions that could be obtained from this experiment (as fractions) and store them as a column named p in the prob_dist data frame. Print the prob_dist data frame.

# Type your CODE in here
prob_dist$p <- paste0(0:1, "/", n)
 prob_dist 

Question 18 (On Your Own)

(1 point) Convert the frequencies of the sums into proportions and store them as a column named prob_p in the prob_dist data frame. Print the prob_dist data frame.

# Type your CODE in here
prob_dist$prob_p <- prob_dist$Freq/M
 prob_dist 

Question 19 (On Your Own)

(1 point) Calculate and print the \(p\)-value.

# Type your CODE in here
pvalue <- sum(prob_dist[1:1, "prob_p"])
 pvalue
## [1] 0.514

Question 20 (On Your Own)

(1 point) What is the decision of this hypothesis test?

We reject H0 based on our results.

Question 21 (On Your Own)

(1 point) What is the conclusion of this hypothesis test?

The coin is more biased to heads on average.