Rows: 21 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): lc, hs
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(village)
1-
Null hypothesis: The intervention had no effect on changes in living conditions and mean change in LC is 0
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.989681 0.4966727 6.01942 6.941596e-06
The test suggests that the average change in living conditions across all villages is approximately 2.99
95% Confidence Interval for Mean Change in LC: (1.95, 4.03)
p-value: 6.94e-06, means that there is only about a 0.000694% chance of observing an average change in living conditions as large as 2.99 (or larger) purely by chance if the intervention had no effect.
The very small p-value says that the observed improvement in living conditions is affected by intervention and the intervention had a significant positive impact on living conditions.
As a result of change in the estimate of the mean which is positive, the has been improvement in people’s lives after the intervention.
HS~LC
a-
systematic part: HS = B0 + B1(LC)
random part: HS = B0 + B1(LC) + e
e: random error which counts for variability in HS which is not explained by LC
b- Make a scatter plot with a LOWESS line of changes in health status (HS) (y-axis) and changes in living conditions (LC) (x-axis). Does a simple linear regression appear as an appropriate model?
x is LC and y is HS
library(ggplot2)ggplot(data = village, aes(x=lc, y=hs)) +geom_point() +geom_smooth(method ="loess", color ="red", se =FALSE)
`geom_smooth()` using formula = 'y ~ x'
labs(x="Living Condition", y="Health Status", title ="Scatter Plot of Changes in HS vs. LC ")
$x
[1] "Living Condition"
$y
[1] "Health Status"
$title
[1] "Scatter Plot of Changes in HS vs. LC "
attr(,"class")
[1] "labels"
plot(village$lc, village$hs, xlab ="Changes in Living Conditions (LC)", ylab ="Changes in Health Status (HS)",main ="Scatter Plot of Changes in HS vs. LC with LOWESS Line")lines(lowess(village$lc, village$hs), col ="red")
Based on the depicted LOWESS lines, the line is curved which shows inconsistent behavior, as a result, nonlinear model might be more suitable and linear model is not suitable.
c-
systematic part: HS = B0 + B1(LC)
systematic_model <-lm(data = village, hs~lc)systematic_model
d- Null hypothesis : No relation between changes in HS and changes in LC if based on the model, HS = B0 + B1(LC) , for approving the null hypothesis, B1=0
systematic_model <-lm(data = village, hs~lc)systematic_model
Call:
lm(formula = hs ~ lc, data = village)
Residuals:
Min 1Q Median 3Q Max
-10.5068 -2.5308 0.3102 2.4989 11.2161
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.7967 2.1803 2.200 0.0404 *
lc 0.5021 0.5854 0.858 0.4018
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.959 on 19 degrees of freedom
Multiple R-squared: 0.03727, Adjusted R-squared: -0.0134
F-statistic: 0.7356 on 1 and 19 DF, p-value: 0.4018
The p-value for the slope (B1)is 0.4018, which is much greater than 0.05. As a result, we cannot reject the null hypothesis of no relationship between changes in health status (HS) and changes in living conditions (LC). Indeed, This p-value suggests that the data do not provide significant evidence that changes in living conditions are associated with changes in health status within this dataset.
e-
village$hs[13] <--5library(ggplot2)ggplot(data = village, aes(x=lc, y=hs)) +geom_point() +geom_smooth(method ="loess", color ="red", se =FALSE)
`geom_smooth()` using formula = 'y ~ x'
labs(x="Living Condition", y="Health Status", title ="Scatter Plot of Changes in HS vs. LC ")
$x
[1] "Living Condition"
$y
[1] "Health Status"
$title
[1] "Scatter Plot of Changes in HS vs. LC "
attr(,"class")
[1] "labels"
plot(village$lc, village$hs, xlab ="Changes in Living Conditions (LC)", ylab ="Changes in Health Status (HS)",main ="Scatter Plot of Changes in HS vs. LC with LOWESS Line")lines(lowess(village$lc, village$hs), col ="red")
Although #13’s HS value changed from 15 to −5, based on the new scatter plots with a LOWESS line, the depicted LOWESS line still does not follow a straight path but instead has a nonlinear pattern. In conclusion, a simple linear regression may not be appropriate for modeling this relationship due to the observed nonlinearity in the data.
systematic_model2 <-lm(data = village, hs~lc)summary(systematic_model2)
Call:
lm(formula = hs ~ lc, data = village)
Residuals:
Min 1Q Median 3Q Max
-10.532 -2.212 -0.282 2.928 8.760
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.9548 1.8960 0.504 0.62034
lc 1.4686 0.5091 2.885 0.00949 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.182 on 19 degrees of freedom
Multiple R-squared: 0.3046, Adjusted R-squared: 0.268
F-statistic: 8.322 on 1 and 19 DF, p-value: 0.00949
Null hypothesis : No relation between changes in HS and changes in LC if based on the model, HS = B0 + B1(LC) , for approving the null hypothesis, B1=0
Since the new p-value is less than 0.05(0.00949), we reject the null hypothesis of no relationship between HS and LC. As a result, the new dataset shows that living conditions are significantly associated with changes in health status after correcting the data.
f- The large changes in the model after correcting the HS value for Village #13 occurred because Village #13’s LC value was an outlier, which gave it a strong influence on the regression line. Adjusting the HS value for this data, resulted in the model to better provide the relationship between LC and HS.
g-
systematic_model2 <-lm(data = village, hs~lc)par(mfrow =c(1, 2))plot(fitted(systematic_model2), resid(systematic_model2), main ="Residuals vs Fitted Values",xlab ="Fitted values", ylab ="Residuals")abline(h =0, col ="red")plot(fitted(systematic_model2), sqrt(abs(resid(systematic_model2))), main ="Scale-Location Plot",xlab ="Fitted values", ylab ="Sqrt(|Residuals|)")abline(h =0, col ="red")
Rows: 200 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): MODE
dbl (2): time, hosp
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(baby)
the average time (TIME) from birth to the first breast feeding differs by the mode of delivery (SVD = Spontaneous Vaginal Delivery, IVD = Instrumentally-assisted Vaginal Delivery, and CS = Cesarean Section) TIME = y SVD + IVD + CS = x
a-
ggplot(baby, aes(x =factor(hosp), y = time, fill =factor(MODE) )) +geom_boxplot() +labs(x ="Hospital", y ="Time from Birth to First Breastfeeding", fill ="Mode of Delivery") +ggtitle("Time from Birth to First Breastfeeding by Mode of Delivery and Hospital") +theme_minimal()
Based on the provided boxplots, I believe that there is an interaction effect between mode of delivery and the hospital on the time to first breastfeeding.
Call:
lm(formula = time ~ hosp * MODE, data = baby)
Residuals:
Min 1Q Median 3Q Max
-5.6444 -1.4755 -0.3875 1.4468 7.9668
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.4750 0.7513 15.273 < 2e-16 ***
hosp -2.2264 0.2581 -8.625 2.27e-15 ***
MODEIVD -4.4353 1.0502 -4.223 3.70e-05 ***
MODESVD -3.7635 0.9425 -3.993 9.24e-05 ***
hosp:MODEIVD 0.9730 0.3814 2.551 0.0115 *
hosp:MODESVD 0.7256 0.3329 2.180 0.0305 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.219 on 194 degrees of freedom
Multiple R-squared: 0.4581, Adjusted R-squared: 0.4442
F-statistic: 32.8 on 5 and 194 DF, p-value: < 2.2e-16
The p-values for interaction terms are: hosp:MODEIVD = 0.0115 hosp:MODESVD = 0.0305
This values suggest that the null hypothesis of no interaction is rejected and actually there is a significant interaction effect between hospital and mode of delivery on the time to first breastfeeding. This means that the effect of mode of delivery on time to first breastfeeding varies by hospital.
Call:
lm(formula = time ~ hosp + MODE, data = baby)
Residuals:
Min 1Q Median 3Q Max
-5.8722 -1.3608 -0.5212 1.3758 7.7727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.9544 0.4899 20.318 < 2e-16 ***
hosp -1.6548 0.1429 -11.579 < 2e-16 ***
MODEIVD -1.9604 0.4389 -4.467 1.34e-05 ***
MODESVD -1.8612 0.3873 -4.806 3.06e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.25 on 196 degrees of freedom
Multiple R-squared: 0.4372, Adjusted R-squared: 0.4286
F-statistic: 50.76 on 3 and 196 DF, p-value: < 2.2e-16
This model show the independant effect of hosp and mode of delivery. p-values are as it follows: hosp = < 2e-16 MODEIVD = 1.34e-05 MODESVD = 3.06e-06
Based on this model, both hosp and MODE have significant impacts on time. Specifically, hosp has a strong inverse relationship with time, and both MODEIVD and MODESVD significantly reduce time compared to the reference mode.
Call:
lm(formula = time ~ MODE * BFH, data = baby)
Residuals:
Min 1Q Median 3Q Max
-6.0246 -1.1065 -0.0610 0.8418 5.7170
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.7564 0.3095 25.062 < 2e-16 ***
MODEIVD -2.8786 0.4168 -6.906 6.94e-11 ***
MODESVD -2.6729 0.3773 -7.085 2.51e-11 ***
BFH -6.4908 0.5311 -12.222 < 2e-16 ***
MODEIVD:BFH 2.1560 0.8338 2.586 0.0104 *
MODESVD:BFH 1.8245 0.6935 2.631 0.0092 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.831 on 194 degrees of freedom
Multiple R-squared: 0.631, Adjusted R-squared: 0.6215
F-statistic: 66.36 on 5 and 194 DF, p-value: < 2.2e-16
p-values are less than 0.05, as a result, the null hypothesis is rejected and there’s a significant interaction between BFH factor (hospital being baby friendly) and mode of delivery on time to first breastfeeding. As a matter of fact, we are able to retain the interaction model, rather than reducing to a main-effects-only model. Final model: Interaction Model
Analysis of this model
For non baby friendly hospitals: SVD vs. CS: - Estimated difference : - 2.67 - 95% CI : [−3.42,−1.93] As a result, in non baby friendly hospitals, spontaneous vaginal delivery (SVD) will cause shorter time to breastfeeding compared to cesarean section (CS) by approximately 2.67 minutes.
In non baby friendly hospitals, SVD will result to longer time to breastfeeding compared to instrumentally-assisted vaginal delivery (IVD), but the difference is small (0.21 minutes).
For baby friendly hospitals: SVD vs. CS: - Estimated Difference: −0.85 minutes - 95% CI: [−2.96,1.26] In baby friendly hospitals, the difference in time to breastfeeding initiation between SVD and CS is reduced and is not statistically significant, suggesting that the baby friendliness may reduce the differences between with modes of delivery.
SVD vs. IVD: Estimated Difference: −0.13 minutes 95% CI: [−0.48,0.23] In baby friendly hospitals, the time difference between SVD and IVD is the lowest and insignificant.
In conclusion, the data analysis suggests that, in non baby friendly hospitals, the mode of delivery has a significant impact on breastfeeding initiation time. However, in baby friendly hospitals, the differences in breastfeeding initiation time among different delivery modes are insignificant, suggesting that this hospitals can equalize breastfeeding initiation times, regardless of the delivery method.