Self-grading Instructions

1. Review

Ensure that for each question, you have included:

  • Correct codes: Verify the accuracy of the coding syntax, labels and logic.

  • R outputs: Check if the outputs generated by R are correctly interpreted and relevant to the question.

  • Written answer: Make sure answers are in complete sentences. Reference the data dictionary/documentation for precision and thoroughness.

2. Revisions

If revisions are necessary, specify the question numbers requiring adjustments. Update both the codes and/or the written responses in your original submission to correct any inaccuracies or to provide additional clarity.

3. Reflection

Summarize the key concepts and skills you have practiced this week. Reflect on how these exercises have contributed to your understanding of the material and your ability to apply statistical analysis using R. Consider the challenges you faced, how you overcame them, what you learned from the process, and how you will get more support and clarity if challenges were not resolved.


Self-grading

  • Questions that need corrections:

  • My Reflections for this week:

A Reminder

  • keep your #comment short in a code chunk.

  • Write paragraphs above or below the code chunks.

Answer Key

Answer the following questions using the appropriate dataset and codebook. For each question, provide (1) your codes, (2) R outputs AND (3) the answer in complete sentences.

Set up and load the libraries

library(ggpubr)  # for ggboxplot
library(dplyr) # summarize and pipes %>%
options(scipen=999) # remove scientific notation 

Q1

Import the dataset named “birthweight_smoking.csv”, name your dataset.

smoking_data <- read.csv("birthweight_smoking.csv")

Q2

Use ggboxplot() to create a boxplot for another categorical (i.e., including dummy) variable and birthweight. Find the mean, median and IQR using the summarise() function from the dplyr library. Discuss any differences in mean, median and IQR based on the graph and outputs. [2 points]

  • Correct code using ggboxplot() and display a correct graph (No NA and error) [1 point]
  • If the ggpubr and dplyr library were not loaded, you will not be able to use these functions
  • Report the mean, median and IQR and the differences for the two groups [1 point]
  • 0.5 points deducted for missing part of the description or incorrect description.

Discuss: Choose ONE of the following variables for discussion (except for smoker, which was demonstrated in the lab).

  • alcohol, unmarried, tripre0, tripre1, tripre2, tripre3 are the dummy variables with two dichotomous values of 1 and 0 besides smoker.

For alcohol:

  • alcohol: The boxplot and the descriptive results suggest that the average birthweight of infants for non-drinking mothers was higher (3385.73 grams) than that for drinking mothers (3241.05 grams), with a difference of144.68 grams.

  • Similarly, the median birthweight for non-drinking mothers was also higher (3425 grams) than that for drinking mothers (3374 grams), with a difference of 51 grams.

  • The interquartile range for non-drinking mothers was wider (698 grams; difference = 141.75 grams) (i.e., more widespread and variation) than that for drinking mothers with IQR being 556.25 grams.

# alcohol
smoking_data %>% 
  group_by(alcohol) %>% 
  summarise(count = n(),
            mean_bw = mean(birthweight),
            median_bw = median(birthweight),
            IQR_bw = IQR(birthweight))
## # A tibble: 2 Ă— 5
##   alcohol count mean_bw median_bw IQR_bw
##     <int> <int>   <dbl>     <dbl>  <dbl>
## 1       0  2942   3386.      3425   698 
## 2       1    58   3241.      3374   556.
ggboxplot(smoking_data, x = "alcohol", y = "birthweight", 
          color = "alcohol", palette = c("#00AFBB", "#FC4E07"),
          order = c("0",  "1"),
          ylab = "Infants' Birthweight (in grams)", 
          xlab = "Drinking Mothers")

For unmarried:

  • unmarried: The boxplot and the descriptive results suggest that the average birthweight of infants for married mothers (unmarried = 0) was higher (3448.08 grams) than that for unmarried mothers (unmarried = 1) (3160.676 grams), with a difference of 287.40 grams.

  • Similarly, the median birthweight for married mothers was also higher (3459 grams) compared to unmarried mothers (3204 grams), with a difference of 255 grams.

  • The interquartile range for unmarried mothers was wider (709 grams; difference = 141.75 grams) (i.e., more spread and variation) than that for married mothers with IQR being 664.75 grams.

# unmarried 
smoking_data %>% 
  group_by(unmarried) %>% 
  summarise(count = n(),
            mean_bw = mean(birthweight),
            median_bw = median(birthweight),
            IQR_bw = IQR(birthweight))
## # A tibble: 2 Ă— 5
##   unmarried count mean_bw median_bw IQR_bw
##       <int> <int>   <dbl>     <dbl>  <dbl>
## 1         0  2320   3448.      3459   665.
## 2         1   680   3161.      3204   709
ggboxplot(smoking_data, x = "unmarried", y = "birthweight", 
          color = "unmarried", palette = c("#00AFBB", "#FC4E07"),
          order = c("0",  "1"),
          ylab = "Infants' Birthweight (in grams)", 
          xlab = "Married and Unmarried Mothers")

For tripre0:

  • tripre0: The boxplot and the descriptive results suggest that the average birthweight of infants for mothers who had been to a prenatal visit (tripre0 = 0) was higher (3390.28 grams) than that for mothers had never been to a prenatal visit (tripre0 = 1) (2655.4 grams), with a difference of 734.88 grams.

Similarly, the median birthweight was also higher (3430 grams) compared to the latter (2693.5 grams), with a difference of 736.5 grams.

The interquartile range for mothers who had never been to a prenatal visit was wider (1014 grams; difference = 316 grams) (i.e., more spread and variation) than that for mothers who had a prenatal visit with IQR being 698 grams.

# tripre0
smoking_data %>% 
  group_by(tripre0) %>% 
  summarise(count = n(),
            mean_bw = mean(birthweight),
            median_bw = median(birthweight),
            IQR_bw = IQR(birthweight))
## # A tibble: 2 Ă— 5
##   tripre0 count mean_bw median_bw IQR_bw
##     <int> <int>   <dbl>     <dbl>  <dbl>
## 1       0  2970   3390.     3430     698
## 2       1    30   2655.     2694.   1014
ggboxplot(smoking_data, x = "tripre0", y = "birthweight", 
          color = "tripre0", palette = c("#00AFBB", "#FC4E07"),
          order = c("0",  "1"),
          ylab = "Infants' Birthweight (in grams)", 
          xlab = "No Prenatal Visits")

For tripre1:

  • tripre1: The boxplot and the descriptive results suggest that the average birthweight of infants for mothers who had a prenatal visit in the first trimester (tripre1 = 1) was higher (3415.784 grams) than that for mothers had not (tripre1 = 0) (3248.179 grams), with a difference of 167.61 grams.

  • Similarly, the median birthweight was also higher (3430 grams) compared to the latter (3289 grams), with a difference of 141 grams.

  • The interquartile range was narrower (652 grams; difference = 85 grams) (i.e., less spread and variation) than that for mothers who did not have a prenatal visit in the first trimester with IQR being 737 grams.

# tripre1
smoking_data %>% 
  group_by(tripre1) %>% 
  summarise(count = n(),
            mean_bw = mean(birthweight),
            median_bw = median(birthweight),
            IQR_bw = IQR(birthweight))
## # A tibble: 2 Ă— 5
##   tripre1 count mean_bw median_bw IQR_bw
##     <int> <int>   <dbl>     <dbl>  <dbl>
## 1       0   588   3248.      3289    737
## 2       1  2412   3416.      3430    652
ggboxplot(smoking_data, x = "tripre1", y = "birthweight", 
          color = "tripre1", palette = c("#00AFBB", "#FC4E07"),
          order = c("0",  "1"),
          ylab = "Infants' Birthweight (in grams)", 
          xlab = "Prenatal Visits in the First Trimester")

For tripre2:

  • tripre2: The boxplot and the descriptive results suggest that the average birthweight of infants for mothers who had a prenatal visit during the second trimester (tripre2 = 1) was lower (3290 grams) than that for mothers had not (tripre2 = 0) (3399.72 grams), with a difference of 109.72 grams.

  • Similarly, the median birthweight was also lower (3317 grams) compared to the latter (3430 grams), with a difference of 113 grams.

  • The interquartile range was wider (711 grams; difference = 30 grams) (i.e., more spread and variation) than that for mothers who had a prenatal visit with the interquartile range being 681 grams.

# tripre2
smoking_data %>% 
  group_by(tripre2) %>% 
  summarise(count = n(),
            mean_bw = mean(birthweight),
            median_bw = median(birthweight),
            IQR_bw = IQR(birthweight))
## # A tibble: 2 Ă— 5
##   tripre2 count mean_bw median_bw IQR_bw
##     <int> <int>   <dbl>     <int>  <dbl>
## 1       0  2541   3400.      3430    681
## 2       1   459   3290.      3317    711
ggboxplot(smoking_data, x = "tripre2", y = "birthweight", 
          color = "tripre2", palette = c("#00AFBB", "#FC4E07"),
          order = c("0",  "1"),
          ylab = "Infants' Birthweight (in grams)", 
          xlab = "Prenatal Visits in the Second Trimester")

For tripre3:

  • tripre3: The boxplot and the descriptive results suggest that the average birthweight of infants for mothers who had a prenatal visit during the third trimester (tripre3 = 1) was lower (3233.92 grams) than that for mothers had not (tripre3 = 0) (3388.02 grams), with a difference of 154.1 grams.

  • Similarly, the median birthweight was also lower (3317 grams) compared to the latter (3430 grams), with a difference of 113 grams.

  • The interquartile range was wider (751.5 grams; difference = 42.5 grams) (i.e., more spread and variation) than that for mothers who did not have a prenatal visit in the third trimester with the interquartile range being 709 grams.

# tripre3
smoking_data %>% 
  group_by(tripre3) %>% 
  summarise(count = n(),
            mean_bw = mean(birthweight),
            median_bw = median(birthweight),
            IQR_bw = IQR(birthweight))
## # A tibble: 2 Ă— 5
##   tripre3 count mean_bw median_bw IQR_bw
##     <int> <int>   <dbl>     <int>  <dbl>
## 1       0  2901   3388.      3430   709 
## 2       1    99   3234.      3317   752.
ggboxplot(smoking_data, x = "tripre3", y = "birthweight", 
          color = "tripre3", palette = c("#00AFBB", "#FC4E07"),
          order = c("0",  "1"),
          ylab = "Infants' Birthweight (in grams)", 
          xlab = "Prenatal Visits in the Third Trimester")

Q3

Which dummy variable(s) among unmarried, tripre0 and alcohol might make a significant difference in the birthweight of infants? Run a t.test() to test the differences across groups in the sample, respectively. [4 points]

  • Correct code for using t.test() and correct order of variables y ~ x [3 points]
  • The variables unmarried and tripre0 made a significant difference on infants’ birthweight, while alcohol did not. [1 point]
# the order of y and x matters for t.test()
# birthweight is the y (dependent variable) 
# unmarried, tripre0 and alcohol are the x (independent variable)

t.test(birthweight ~ unmarried, data = smoking_data) 
## 
##  Welch Two Sample t-test
## 
## data:  birthweight by unmarried
## t = 10.46, df = 990.49, p-value < 0.00000000000000022
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  233.4824 341.3207
## sample estimates:
## mean in group 0 mean in group 1 
##        3448.078        3160.676
t.test(birthweight ~ tripre0, data = smoking_data) 
## 
##  Welch Two Sample t-test
## 
## data:  birthweight by tripre0
## t = 4.8617, df = 29.295, p-value = 0.00003639
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##   425.8632 1043.9017
## sample estimates:
## mean in group 0 mean in group 1 
##        3390.282        2655.400
t.test(birthweight ~ alcohol, data = smoking_data) 
## 
##  Welch Two Sample t-test
## 
## data:  birthweight by alcohol
## t = 1.7523, df = 59.04, p-value = 0.08491
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -20.53091 309.88905
## sample estimates:
## mean in group 0 mean in group 1 
##        3385.731        3241.052

Q4

What conclusion can you draw from the t-test results from Q3? Interpret your findings based on the mean difference of the two groups and the p-value at the 0.05 alpha level. (Refer to WK3 lecture slides)

A total of 2 points for each variable, below is the breakdown: [6 points]

  • mean difference (0.5 points)
  • p-value (0.5 points)
  • interpretation (0.5 points)
  • conclusion (0.5 points)

Round to 2 decimal places to be precise.

For unmarried:

  • The average birthweight among infants of married mothers (3448.08) was higher than the average birthweight among infants of unmarried mothers (3160.68) by 287.4 grams. (Or the other way round, as long as you describe the means and difference by subtracting the two means.)

  • The p-value of the two sample t-test for the mean difference is less than 0.05.

  • The results show that the difference in the birthweight between the two groups was statistically significant at the 0.05 level, indicating that infants whose mothers are unmarried have significantly lower birthweight on average than infants whose mothers are married.

  • Therefore, we reject the null hypothesis that there was no difference between the two groups at the 0.05 level.

For tripre0:

  • The average birthweight among infants of mothers who did not have prenatal visits (2655.4) was lower than the average birthweight among infants of mothers who had prenatal visits (3390.28) by 734.88 grams. (Or the other way round, as long as you describe the means and difference by subtracting the two means.)

  • The p-value of the two sample t-test for the mean difference is less than 0.05.

  • The results show that the difference in the birthweight between the two groups was statistically significant at the 0.05 level, indicating that infants whose mothers did not have prenatal visits had significantly lower birthweight on average than infants whose mothers had prenatal visits.

  • Therefore, we reject the null hypothesis that there was no difference between the two groups at the 0.05 level.

For alcohol:

  • The average birthweight among infants of mothers who drank alcohol during pregnancy (3241.05) was lower than the average birthweight among infants of mothers who did not drink alcohol during pregnancy (3385.73) by 144.68 grams. (Or the other way round, as long as you describe the means and difference by subtracting the two means.)

  • The p-value of the two sample t-test for the mean difference is 0.084 which is greater than 0.05.

  • The results show that the difference in the birthweight between the two groups was not statistically significant at the 0.05 level, indicating that infants whose mothers drank alcohol during pregnancy did not have significantly lower birthweight on average than infants whose mothers did not drink alcohol during pregnancy.

  • Therefore, we cannot reject the null hypothesis at the 0.05 level. (A hypothesis can be rejected, but we do not accept or prove it correct. See Wk3 Slides)

The end of Lab 3 Assignment.