Ensure that for each question, you have included:
Correct codes: Verify the accuracy of the coding syntax, labels and logic.
R outputs: Check if the outputs generated by R are correctly interpreted and relevant to the question.
Written answer: Make sure answers are in complete sentences. Reference the data dictionary/documentation for precision and thoroughness.
If revisions are necessary, specify the question numbers requiring adjustments. Update both the codes and/or the written responses in your original submission to correct any inaccuracies or to provide additional clarity.
Summarize the key concepts and skills you have practiced this week. Reflect on how these exercises have contributed to your understanding of the material and your ability to apply statistical analysis using R. Consider the challenges you faced, how you overcame them, what you learned from the process, and how you will get more support and clarity if challenges were not resolved.
Questions that need corrections:
My Reflections for this week:
keep your #comment short in a code chunk.
Write paragraphs above or below the code chunks.
Answer the following questions using the appropriate dataset and codebook. For each question, provide (1) your codes, (2) R outputs AND (3) the answer in complete sentences.
Set up and load the libraries
library(ggpubr) # for ggboxplot
library(dplyr) # summarize and pipes %>%
options(scipen=999) # remove scientific notation
Import the dataset named “birthweight_smoking.csv”, name your dataset.
Use ggboxplot()
to create a boxplot for another
categorical (i.e., including dummy) variable and
birthweight
. Find the mean, median and IQR using the
summarise()
function from the dplyr
library.
Discuss any differences in mean, median and IQR based on the graph and
outputs. [2 points]
ggboxplot()
and display a correct
graph (No NA and error) [1 point]ggpubr
and dplyr
library were not
loaded, you will not be able to use these functionsDiscuss: Choose ONE of the following variables for discussion (except for smoker, which was demonstrated in the lab).
alcohol
, unmarried
, tripre0
,
tripre1
, tripre2
, tripre3
are the
dummy variables with two dichotomous values of 1 and 0 besides
smoker
.For alcohol:
alcohol
: The boxplot and the descriptive results
suggest that the average birthweight of infants for non-drinking mothers
was higher (3385.73 grams) than that for drinking mothers (3241.05
grams), with a difference of144.68 grams.
Similarly, the median birthweight for non-drinking mothers was also higher (3425 grams) than that for drinking mothers (3374 grams), with a difference of 51 grams.
The interquartile range for non-drinking mothers was wider (698 grams; difference = 141.75 grams) (i.e., more widespread and variation) than that for drinking mothers with IQR being 556.25 grams.
# alcohol
smoking_data %>%
group_by(alcohol) %>%
summarise(count = n(),
mean_bw = mean(birthweight),
median_bw = median(birthweight),
IQR_bw = IQR(birthweight))
## # A tibble: 2 Ă— 5
## alcohol count mean_bw median_bw IQR_bw
## <int> <int> <dbl> <dbl> <dbl>
## 1 0 2942 3386. 3425 698
## 2 1 58 3241. 3374 556.
ggboxplot(smoking_data, x = "alcohol", y = "birthweight",
color = "alcohol", palette = c("#00AFBB", "#FC4E07"),
order = c("0", "1"),
ylab = "Infants' Birthweight (in grams)",
xlab = "Drinking Mothers")
For unmarried:
unmarried
: The boxplot and the descriptive results
suggest that the average birthweight of infants for married mothers
(unmarried = 0) was higher (3448.08 grams) than that for unmarried
mothers (unmarried = 1) (3160.676 grams), with a difference of 287.40
grams.
Similarly, the median birthweight for married mothers was also higher (3459 grams) compared to unmarried mothers (3204 grams), with a difference of 255 grams.
The interquartile range for unmarried mothers was wider (709 grams; difference = 141.75 grams) (i.e., more spread and variation) than that for married mothers with IQR being 664.75 grams.
# unmarried
smoking_data %>%
group_by(unmarried) %>%
summarise(count = n(),
mean_bw = mean(birthweight),
median_bw = median(birthweight),
IQR_bw = IQR(birthweight))
## # A tibble: 2 Ă— 5
## unmarried count mean_bw median_bw IQR_bw
## <int> <int> <dbl> <dbl> <dbl>
## 1 0 2320 3448. 3459 665.
## 2 1 680 3161. 3204 709
ggboxplot(smoking_data, x = "unmarried", y = "birthweight",
color = "unmarried", palette = c("#00AFBB", "#FC4E07"),
order = c("0", "1"),
ylab = "Infants' Birthweight (in grams)",
xlab = "Married and Unmarried Mothers")
For tripre0:
tripre0
: The boxplot and the descriptive results
suggest that the average birthweight of infants for mothers who had been
to a prenatal visit (tripre0 = 0) was higher (3390.28 grams) than that
for mothers had never been to a prenatal visit (tripre0 = 1) (2655.4
grams), with a difference of 734.88 grams.Similarly, the median birthweight was also higher (3430 grams) compared to the latter (2693.5 grams), with a difference of 736.5 grams.
The interquartile range for mothers who had never been to a prenatal visit was wider (1014 grams; difference = 316 grams) (i.e., more spread and variation) than that for mothers who had a prenatal visit with IQR being 698 grams.
# tripre0
smoking_data %>%
group_by(tripre0) %>%
summarise(count = n(),
mean_bw = mean(birthweight),
median_bw = median(birthweight),
IQR_bw = IQR(birthweight))
## # A tibble: 2 Ă— 5
## tripre0 count mean_bw median_bw IQR_bw
## <int> <int> <dbl> <dbl> <dbl>
## 1 0 2970 3390. 3430 698
## 2 1 30 2655. 2694. 1014
ggboxplot(smoking_data, x = "tripre0", y = "birthweight",
color = "tripre0", palette = c("#00AFBB", "#FC4E07"),
order = c("0", "1"),
ylab = "Infants' Birthweight (in grams)",
xlab = "No Prenatal Visits")
For tripre1:
tripre1
: The boxplot and the descriptive results
suggest that the average birthweight of infants for mothers who had a
prenatal visit in the first trimester (tripre1 = 1) was higher (3415.784
grams) than that for mothers had not (tripre1 = 0) (3248.179 grams),
with a difference of 167.61 grams.
Similarly, the median birthweight was also higher (3430 grams) compared to the latter (3289 grams), with a difference of 141 grams.
The interquartile range was narrower (652 grams; difference = 85 grams) (i.e., less spread and variation) than that for mothers who did not have a prenatal visit in the first trimester with IQR being 737 grams.
# tripre1
smoking_data %>%
group_by(tripre1) %>%
summarise(count = n(),
mean_bw = mean(birthweight),
median_bw = median(birthweight),
IQR_bw = IQR(birthweight))
## # A tibble: 2 Ă— 5
## tripre1 count mean_bw median_bw IQR_bw
## <int> <int> <dbl> <dbl> <dbl>
## 1 0 588 3248. 3289 737
## 2 1 2412 3416. 3430 652
ggboxplot(smoking_data, x = "tripre1", y = "birthweight",
color = "tripre1", palette = c("#00AFBB", "#FC4E07"),
order = c("0", "1"),
ylab = "Infants' Birthweight (in grams)",
xlab = "Prenatal Visits in the First Trimester")
For tripre2:
tripre2
: The boxplot and the descriptive results
suggest that the average birthweight of infants for mothers who had a
prenatal visit during the second trimester (tripre2 = 1) was lower (3290
grams) than that for mothers had not (tripre2 = 0) (3399.72 grams), with
a difference of 109.72 grams.
Similarly, the median birthweight was also lower (3317 grams) compared to the latter (3430 grams), with a difference of 113 grams.
The interquartile range was wider (711 grams; difference = 30 grams) (i.e., more spread and variation) than that for mothers who had a prenatal visit with the interquartile range being 681 grams.
# tripre2
smoking_data %>%
group_by(tripre2) %>%
summarise(count = n(),
mean_bw = mean(birthweight),
median_bw = median(birthweight),
IQR_bw = IQR(birthweight))
## # A tibble: 2 Ă— 5
## tripre2 count mean_bw median_bw IQR_bw
## <int> <int> <dbl> <int> <dbl>
## 1 0 2541 3400. 3430 681
## 2 1 459 3290. 3317 711
ggboxplot(smoking_data, x = "tripre2", y = "birthweight",
color = "tripre2", palette = c("#00AFBB", "#FC4E07"),
order = c("0", "1"),
ylab = "Infants' Birthweight (in grams)",
xlab = "Prenatal Visits in the Second Trimester")
For tripre3:
tripre3
: The boxplot and the descriptive results
suggest that the average birthweight of infants for mothers who had a
prenatal visit during the third trimester (tripre3 = 1) was lower
(3233.92 grams) than that for mothers had not (tripre3 = 0) (3388.02
grams), with a difference of 154.1 grams.
Similarly, the median birthweight was also lower (3317 grams) compared to the latter (3430 grams), with a difference of 113 grams.
The interquartile range was wider (751.5 grams; difference = 42.5 grams) (i.e., more spread and variation) than that for mothers who did not have a prenatal visit in the third trimester with the interquartile range being 709 grams.
# tripre3
smoking_data %>%
group_by(tripre3) %>%
summarise(count = n(),
mean_bw = mean(birthweight),
median_bw = median(birthweight),
IQR_bw = IQR(birthweight))
## # A tibble: 2 Ă— 5
## tripre3 count mean_bw median_bw IQR_bw
## <int> <int> <dbl> <int> <dbl>
## 1 0 2901 3388. 3430 709
## 2 1 99 3234. 3317 752.
ggboxplot(smoking_data, x = "tripre3", y = "birthweight",
color = "tripre3", palette = c("#00AFBB", "#FC4E07"),
order = c("0", "1"),
ylab = "Infants' Birthweight (in grams)",
xlab = "Prenatal Visits in the Third Trimester")
Which dummy variable(s) among unmarried
,
tripre0
and alcohol
might make a significant
difference in the birthweight of infants? Run a t.test()
to
test the differences across groups in the sample, respectively. [4
points]
y ~ x
[3 points]unmarried
and tripre0
made a
significant difference on infants’ birthweight, while
alcohol
did not. [1 point]# the order of y and x matters for t.test()
# birthweight is the y (dependent variable)
# unmarried, tripre0 and alcohol are the x (independent variable)
t.test(birthweight ~ unmarried, data = smoking_data)
##
## Welch Two Sample t-test
##
## data: birthweight by unmarried
## t = 10.46, df = 990.49, p-value < 0.00000000000000022
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 233.4824 341.3207
## sample estimates:
## mean in group 0 mean in group 1
## 3448.078 3160.676
##
## Welch Two Sample t-test
##
## data: birthweight by tripre0
## t = 4.8617, df = 29.295, p-value = 0.00003639
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 425.8632 1043.9017
## sample estimates:
## mean in group 0 mean in group 1
## 3390.282 2655.400
##
## Welch Two Sample t-test
##
## data: birthweight by alcohol
## t = 1.7523, df = 59.04, p-value = 0.08491
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -20.53091 309.88905
## sample estimates:
## mean in group 0 mean in group 1
## 3385.731 3241.052
What conclusion can you draw from the t-test results from Q3? Interpret your findings based on the mean difference of the two groups and the p-value at the 0.05 alpha level. (Refer to WK3 lecture slides)
A total of 2 points for each variable, below is the breakdown: [6 points]
Round to 2 decimal places to be precise.
For
unmarried
:
The average birthweight among infants of married mothers (3448.08) was higher than the average birthweight among infants of unmarried mothers (3160.68) by 287.4 grams. (Or the other way round, as long as you describe the means and difference by subtracting the two means.)
The p-value of the two sample t-test for the mean difference is less than 0.05.
The results show that the difference in the birthweight between the two groups was statistically significant at the 0.05 level, indicating that infants whose mothers are unmarried have significantly lower birthweight on average than infants whose mothers are married.
Therefore, we reject the null hypothesis that there was no difference between the two groups at the 0.05 level.
For
tripre0
:
The average birthweight among infants of mothers who did not have prenatal visits (2655.4) was lower than the average birthweight among infants of mothers who had prenatal visits (3390.28) by 734.88 grams. (Or the other way round, as long as you describe the means and difference by subtracting the two means.)
The p-value of the two sample t-test for the mean difference is less than 0.05.
The results show that the difference in the birthweight between the two groups was statistically significant at the 0.05 level, indicating that infants whose mothers did not have prenatal visits had significantly lower birthweight on average than infants whose mothers had prenatal visits.
Therefore, we reject the null hypothesis that there was no difference between the two groups at the 0.05 level.
For
alcohol
:
The average birthweight among infants of mothers who drank alcohol during pregnancy (3241.05) was lower than the average birthweight among infants of mothers who did not drink alcohol during pregnancy (3385.73) by 144.68 grams. (Or the other way round, as long as you describe the means and difference by subtracting the two means.)
The p-value of the two sample t-test for the mean difference is 0.084 which is greater than 0.05.
The results show that the difference in the birthweight between the two groups was not statistically significant at the 0.05 level, indicating that infants whose mothers drank alcohol during pregnancy did not have significantly lower birthweight on average than infants whose mothers did not drink alcohol during pregnancy.
Therefore, we cannot reject the null hypothesis at the 0.05 level. (A hypothesis can be rejected, but we do not accept or prove it correct. See Wk3 Slides)
The end of Lab 3 Assignment.