Hosseinali_M

library(readr)
village <- read_csv("~/Documents/UW/PHARM 656/6th Assignment/village.csv")
Rows: 21 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): lc, hs

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(village)

1-

Null hypothesis: The intervention had no effect on changes in living conditions and mean change in LC is 0

no_effect <- lm(lc~1, data=village)
no_effect

Call:
lm(formula = lc ~ 1, data = village)

Coefficients:
(Intercept)  
       2.99  
CI_estimate <- confint(no_effect)
CI_estimate
              2.5 %   97.5 %
(Intercept) 1.95364 4.025722
p_value <- summary(no_effect)$coefficients
p_value
            Estimate Std. Error t value     Pr(>|t|)
(Intercept) 2.989681  0.4966727 6.01942 6.941596e-06

The test suggests that the average change in living conditions across all villages is approximately 2.99

95% Confidence Interval for Mean Change in LC: (1.95, 4.03)

p-value: 6.94e-06, means that there is only about a 0.000694% chance of observing an average change in living conditions as large as 2.99 (or larger) purely by chance if the intervention had no effect.

The very small p-value says that the observed improvement in living conditions is affected by intervention and the intervention had a significant positive impact on living conditions.

As a result of change in the estimate of the mean which is positive, the has been improvement in people’s lives after the intervention.

HS~LC

a-

systematic part: HS = B0 + B1(LC)

random part: HS = B0 + B1(LC) + e

e: random error which counts for variability in HS which is not explained by LC

b- Make a scatter plot with a LOWESS line of changes in health status (HS) (y-axis) and changes in living conditions (LC) (x-axis). Does a simple linear regression appear as an appropriate model?

x is LC and y is HS

library(ggplot2)

ggplot(data = village, aes(x=lc, y=hs)) + geom_point() +
  geom_smooth(method = "loess", color = "red", se = FALSE) 
`geom_smooth()` using formula = 'y ~ x'

  labs(x= "Living Condition", y="Health Status", title = "Scatter Plot of Changes in HS vs. LC ") 
$x
[1] "Living Condition"

$y
[1] "Health Status"

$title
[1] "Scatter Plot of Changes in HS vs. LC "

attr(,"class")
[1] "labels"
plot(village$lc, village$hs, 
     xlab = "Changes in Living Conditions (LC)", 
     ylab = "Changes in Health Status (HS)",
     main = "Scatter Plot of Changes in HS vs. LC with LOWESS Line")
lines(lowess(village$lc, village$hs), col = "red")

Based on the depicted LOWESS lines, the line is curved which shows inconsistent behavior, as a result, nonlinear model might be more suitable and linear model is not suitable.

c-

systematic part: HS = B0 + B1(LC)

systematic_model <- lm(data = village, hs~lc)
systematic_model

Call:
lm(formula = hs ~ lc, data = village)

Coefficients:
(Intercept)           lc  
     4.7967       0.5021  
CI_2 <- confint(systematic_model)
CI_2
                 2.5 %   97.5 %
(Intercept)  0.2331811 9.360145
lc          -0.7231782 1.727345

B0 = 4.7967

B1=0.5021

95% Confidence intervals : [0.2331811, 9.360154]

d- Null hypothesis : No relation between changes in HS and changes in LC if based on the model, HS = B0 + B1(LC) , for approving the null hypothesis, B1=0

systematic_model <- lm(data = village, hs~lc)
systematic_model

Call:
lm(formula = hs ~ lc, data = village)

Coefficients:
(Intercept)           lc  
     4.7967       0.5021  
summary(systematic_model)

Call:
lm(formula = hs ~ lc, data = village)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.5068  -2.5308   0.3102   2.4989  11.2161 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   4.7967     2.1803   2.200   0.0404 *
lc            0.5021     0.5854   0.858   0.4018  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.959 on 19 degrees of freedom
Multiple R-squared:  0.03727,   Adjusted R-squared:  -0.0134 
F-statistic: 0.7356 on 1 and 19 DF,  p-value: 0.4018

The p-value for the slope (B1)is 0.4018, which is much greater than 0.05. As a result, we cannot reject the null hypothesis of no relationship between changes in health status (HS) and changes in living conditions (LC). Indeed, This p-value suggests that the data do not provide significant evidence that changes in living conditions are associated with changes in health status within this dataset.

e-

village$hs[13] <- -5

library(ggplot2)

ggplot(data = village, aes(x=lc, y=hs)) + geom_point() +
  geom_smooth(method = "loess", color = "red", se = FALSE) 
`geom_smooth()` using formula = 'y ~ x'

  labs(x= "Living Condition", y="Health Status", title = "Scatter Plot of Changes in HS vs. LC ") 
$x
[1] "Living Condition"

$y
[1] "Health Status"

$title
[1] "Scatter Plot of Changes in HS vs. LC "

attr(,"class")
[1] "labels"
  plot(village$lc, village$hs, 
     xlab = "Changes in Living Conditions (LC)", 
     ylab = "Changes in Health Status (HS)",
     main = "Scatter Plot of Changes in HS vs. LC with LOWESS Line")
lines(lowess(village$lc, village$hs), col = "red")

Although #13’s HS value changed from 15 to −5, based on the new scatter plots with a LOWESS line, the depicted LOWESS line still does not follow a straight path but instead has a nonlinear pattern. In conclusion, a simple linear regression may not be appropriate for modeling this relationship due to the observed nonlinearity in the data.

systematic_model2 <- lm(data = village, hs~lc)
summary(systematic_model2)

Call:
lm(formula = hs ~ lc, data = village)

Residuals:
    Min      1Q  Median      3Q     Max 
-10.532  -2.212  -0.282   2.928   8.760 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   0.9548     1.8960   0.504  0.62034   
lc            1.4686     0.5091   2.885  0.00949 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.182 on 19 degrees of freedom
Multiple R-squared:  0.3046,    Adjusted R-squared:  0.268 
F-statistic: 8.322 on 1 and 19 DF,  p-value: 0.00949
CI_3 <- confint(systematic_model2)
CI_3
                 2.5 %   97.5 %
(Intercept) -3.0135932 4.923202
lc           0.4030809 2.534053

95% Confidence intervals :[0.4030809, 2.534053]

Null hypothesis : No relation between changes in HS and changes in LC if based on the model, HS = B0 + B1(LC) , for approving the null hypothesis, B1=0

Since the new p-value is less than 0.05(0.00949), we reject the null hypothesis of no relationship between HS and LC. As a result, the new dataset shows that living conditions are significantly associated with changes in health status after correcting the data.

f- The large changes in the model after correcting the HS value for Village #13 occurred because Village #13’s LC value was an outlier, which gave it a strong influence on the regression line. Adjusting the HS value for this data, resulted in the model to better provide the relationship between LC and HS.

g-

systematic_model2 <- lm(data = village, hs~lc)

par(mfrow = c(1, 2))


plot(fitted(systematic_model2), resid(systematic_model2), 
     main = "Residuals vs Fitted Values",
     xlab = "Fitted values", ylab = "Residuals")
abline(h = 0, col = "red")


plot(fitted(systematic_model2), sqrt(abs(resid(systematic_model2))), 
     main = "Scale-Location Plot",
     xlab = "Fitted values", ylab = "Sqrt(|Residuals|)")
abline(h = 0, col = "red")

library(readr)
baby <- read_csv("~/Documents/UW/PHARM 656/6th Assignment/baby.csv")
Rows: 200 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): MODE
dbl (2): time, hosp

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(baby)

the average time (TIME) from birth to the first breast feeding differs by the mode of delivery (SVD = Spontaneous Vaginal Delivery, IVD = Instrumentally-assisted Vaginal Delivery, and CS = Cesarean Section) TIME = y SVD + IVD + CS = x

a-

ggplot(baby, aes(x = factor(hosp), y = time, fill = factor(MODE) )) +
  geom_boxplot() +
  labs(x = "Hospital", y = "Time from Birth to First Breastfeeding", fill = "Mode of Delivery") +
  ggtitle("Time from Birth to First Breastfeeding by Mode of Delivery and Hospital") +
  theme_minimal()

Based on the provided boxplots, I believe that there is an interaction effect between mode of delivery and the hospital on the time to first breastfeeding.

b-

Baby_lm <- lm(data = baby, time~hosp*MODE)
summary(Baby_lm)

Call:
lm(formula = time ~ hosp * MODE, data = baby)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.6444 -1.4755 -0.3875  1.4468  7.9668 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   11.4750     0.7513  15.273  < 2e-16 ***
hosp          -2.2264     0.2581  -8.625 2.27e-15 ***
MODEIVD       -4.4353     1.0502  -4.223 3.70e-05 ***
MODESVD       -3.7635     0.9425  -3.993 9.24e-05 ***
hosp:MODEIVD   0.9730     0.3814   2.551   0.0115 *  
hosp:MODESVD   0.7256     0.3329   2.180   0.0305 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.219 on 194 degrees of freedom
Multiple R-squared:  0.4581,    Adjusted R-squared:  0.4442 
F-statistic:  32.8 on 5 and 194 DF,  p-value: < 2.2e-16

The p-values for interaction terms are: hosp:MODEIVD = 0.0115 hosp:MODESVD = 0.0305

This values suggest that the null hypothesis of no interaction is rejected and actually there is a significant interaction effect between hospital and mode of delivery on the time to first breastfeeding. This means that the effect of mode of delivery on time to first breastfeeding varies by hospital.

c-

Baby_lm2 <- lm(data = baby, time~hosp+MODE)
summary(Baby_lm2)

Call:
lm(formula = time ~ hosp + MODE, data = baby)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.8722 -1.3608 -0.5212  1.3758  7.7727 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.9544     0.4899  20.318  < 2e-16 ***
hosp         -1.6548     0.1429 -11.579  < 2e-16 ***
MODEIVD      -1.9604     0.4389  -4.467 1.34e-05 ***
MODESVD      -1.8612     0.3873  -4.806 3.06e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.25 on 196 degrees of freedom
Multiple R-squared:  0.4372,    Adjusted R-squared:  0.4286 
F-statistic: 50.76 on 3 and 196 DF,  p-value: < 2.2e-16

This model show the independant effect of hosp and mode of delivery. p-values are as it follows: hosp = < 2e-16 MODEIVD = 1.34e-05 MODESVD = 3.06e-06

Based on this model, both hosp and MODE have significant impacts on time. Specifically, hosp has a strong inverse relationship with time, and both MODEIVD and MODESVD significantly reduce time compared to the reference mode.

d-

baby$BFH <- ifelse(baby$hosp == 4, 1, 0)

baby$MODE <- factor(baby$MODE, levels = c("CS", "IVD", "SVD"))

interaction_model <- lm(time ~ MODE * BFH, data = baby )

summary(interaction_model)

Call:
lm(formula = time ~ MODE * BFH, data = baby)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.0246 -1.1065 -0.0610  0.8418  5.7170 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   7.7564     0.3095  25.062  < 2e-16 ***
MODEIVD      -2.8786     0.4168  -6.906 6.94e-11 ***
MODESVD      -2.6729     0.3773  -7.085 2.51e-11 ***
BFH          -6.4908     0.5311 -12.222  < 2e-16 ***
MODEIVD:BFH   2.1560     0.8338   2.586   0.0104 *  
MODESVD:BFH   1.8245     0.6935   2.631   0.0092 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.831 on 194 degrees of freedom
Multiple R-squared:  0.631, Adjusted R-squared:  0.6215 
F-statistic: 66.36 on 5 and 194 DF,  p-value: < 2.2e-16

p-values are less than 0.05, as a result, the null hypothesis is rejected and there’s a significant interaction between BFH factor (hospital being baby friendly) and mode of delivery on time to first breastfeeding. As a matter of fact, we are able to retain the interaction model, rather than reducing to a main-effects-only model. Final model: Interaction Model

Analysis of this model

For non baby friendly hospitals: SVD vs. CS: - Estimated difference : - 2.67 - 95% CI : [−3.42,−1.93] As a result, in non baby friendly hospitals, spontaneous vaginal delivery (SVD) will cause shorter time to breastfeeding compared to cesarean section (CS) by approximately 2.67 minutes.

SVD vs. IVD: - Estimated Difference: 0.21 minutes 0.21 minutes - 95% CI: [0.13,0.28]

In non baby friendly hospitals, SVD will result to longer time to breastfeeding compared to instrumentally-assisted vaginal delivery (IVD), but the difference is small (0.21 minutes).

For baby friendly hospitals: SVD vs. CS: - Estimated Difference: −0.85 minutes - 95% CI: [−2.96,1.26] In baby friendly hospitals, the difference in time to breastfeeding initiation between SVD and CS is reduced and is not statistically significant, suggesting that the baby friendliness may reduce the differences between with modes of delivery.

SVD vs. IVD: Estimated Difference: −0.13 minutes 95% CI: [−0.48,0.23] In baby friendly hospitals, the time difference between SVD and IVD is the lowest and insignificant.

In conclusion, the data analysis suggests that, in non baby friendly hospitals, the mode of delivery has a significant impact on breastfeeding initiation time. However, in baby friendly hospitals, the differences in breastfeeding initiation time among different delivery modes are insignificant, suggesting that this hospitals can equalize breastfeeding initiation times, regardless of the delivery method.