| Anni.madre | N.gravidanze | Fumatrici | Gestazione | Peso | Lunghezza | Cranio | Tipo.parto | Ospedale | Sesso |
|---|---|---|---|---|---|---|---|---|---|
| 26 | 0 | 0 | 42 | 3380 | 490 | 325 | Nat | osp3 | M |
| 21 | 2 | 0 | 39 | 3150 | 490 | 345 | Nat | osp1 | F |
| 34 | 3 | 0 | 38 | 3640 | 500 | 375 | Nat | osp2 | M |
| 28 | 1 | 0 | 41 | 3690 | 515 | 365 | Nat | osp2 | M |
| 20 | 0 | 0 | 38 | 3700 | 480 | 335 | Nat | osp3 | F |
| 32 | 0 | 0 | 40 | 3200 | 495 | 340 | Nat | osp2 | F |
| Anni.madre | N.gravidanze | Fumatrici | Gestazione | Peso | Lunghezza | Cranio | Tipo.parto | Ospedale | Sesso | |
|---|---|---|---|---|---|---|---|---|---|---|
| Min. : 0.00 | Min. : 0.0000 | Min. :0.0000 | Min. :25.00 | Min. : 830 | Min. :310.0 | Min. :235 | Ces: 728 | osp1:816 | F:1256 | |
| 1st Qu.:25.00 | 1st Qu.: 0.0000 | 1st Qu.:0.0000 | 1st Qu.:38.00 | 1st Qu.:2990 | 1st Qu.:480.0 | 1st Qu.:330 | Nat:1772 | osp2:849 | M:1244 | |
| Median :28.00 | Median : 1.0000 | Median :0.0000 | Median :39.00 | Median :3300 | Median :500.0 | Median :340 | NA | osp3:835 | NA | |
| Mean :28.16 | Mean : 0.9812 | Mean :0.0416 | Mean :38.98 | Mean :3284 | Mean :494.7 | Mean :340 | NA | NA | NA | |
| 3rd Qu.:32.00 | 3rd Qu.: 1.0000 | 3rd Qu.:0.0000 | 3rd Qu.:40.00 | 3rd Qu.:3620 | 3rd Qu.:510.0 | 3rd Qu.:350 | NA | NA | NA | |
| Max. :46.00 | Max. :12.0000 | Max. :1.0000 | Max. :43.00 | Max. :4930 | Max. :565.0 | Max. :390 | NA | NA | NA |
##
## Shapiro-Wilk normality test
##
## data: Peso
## W = 0.97066, p-value < 2.2e-16
The Shapiro test on Peso variable rejects the null hypothesis that the sample values are distributed as a normal random variable.
Identified potential non-linearities:
Peso-Lunghezza
Peso-Cranio
Peso-Gestazione
##
## Welch Two Sample t-test
##
## data: Peso by Sesso
## t = -12.106, df = 2490.7, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
## -287.1051 -207.0615
## sample estimates:
## mean in group F mean in group M
## 3161.132 3408.215
## Df Sum Sq Mean Sq F value Pr(>F)
## Ospedale 2 936237 468118 1.699 0.183
## Residuals 2497 687952305 275512
##
## Welch Two Sample t-test
##
## data: Peso by Tipo.parto
## t = -0.12968, df = 1493, p-value = 0.8968
## alternative hypothesis: true difference in means between group Ces and group Nat is not equal to 0
## 95 percent confidence interval:
## -46.27992 40.54037
## sample estimates:
## mean in group Ces mean in group Nat
## 3282.047 3284.916
- t-test Peso vs Sesso:
p value very small –> the difference between the averages of the weight variable is significant –> Sex will have to be kept as a control variable (best practice in medical analysis).
p value very high –> the difference between the averages of the weight variable is not significant –> It will be removed from the variables.
Alfa value always set to 0.05 (5%).
H0: The proportion of cesarean deliveries is the same across hospitals.
H1: At least one hospital differs in its cesarean rate.
##
## Pearson's Chi-squared test
##
## data: tab_osp_parto
## X-squared = 1.083, df = 2, p-value = 0.5819
The p-value is > 0.05, so there is no statistical evidence to reject the equality of the proportions of cesarean sections. In other words, from the available data, the 3 hospitals do not show significant differences in terms of frequency of caesarean sections.
H0: The proportion of cesarean deliveries is the same across hospitals.
H1: At least one hospital differs in its cesarean rate.
##
## One Sample t-test
##
## data: neonati_clean$Peso
## t = 8.0108, df = 2497, p-value = 1.731e-15
## alternative hypothesis: true mean is not equal to 3200
## 95 percent confidence interval:
## 3263.577 3304.791
## sample estimates:
## mean of x
## 3284.184
##
## One Sample t-test
##
## data: neonati_clean$Lunghezza
## t = 844.17, df = 2497, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 50
## 95 percent confidence interval:
## 493.6628 495.7287
## sample estimates:
## mean of x
## 494.6958
p-value < 0.05, suggesting that the mean of our sample is different from the hypothesized parameters. This implies that newborns in the dataset, on average, do not exactly match the standard values of 3200 g and/or 50 cm, with possible clinical implications.
H0: Mean (Peso or Lunghezza) is the same for M and F.
H1: At least one differs.
##
## Welch Two Sample t-test
##
## data: Peso by Sesso
## t = -12.115, df = 2488.7, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
## -287.4841 -207.3844
## sample estimates:
## mean in group F mean in group M
## 3161.061 3408.496
##
## Welch Two Sample t-test
##
## data: Lunghezza by Sesso
## t = -9.5823, df = 2457.3, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
## -11.939001 -7.882672
## sample estimates:
## mean in group F mean in group M
## 489.7641 499.6750
p-value < 0.05, therefore there is a statistically significant difference in weight and length between males and females, in line with many researches documenting slight anthropometric differences by sex at birth.
##
## Call:
## lm(formula = Peso ~ ., data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1123.26 -181.53 -14.45 161.05 2611.89
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6735.7960 141.4790 -47.610 < 2e-16 ***
## Anni.madre 0.8018 1.1467 0.699 0.4845
## N.gravidanze 11.3812 4.6686 2.438 0.0148 *
## Fumatrici -30.2741 27.5492 -1.099 0.2719
## Gestazione 32.5773 3.8208 8.526 < 2e-16 ***
## Lunghezza 10.2922 0.3009 34.207 < 2e-16 ***
## Cranio 10.4722 0.4263 24.567 < 2e-16 ***
## Tipo.partoNat 29.6335 12.0905 2.451 0.0143 *
## Ospedaleosp2 -11.0912 13.4471 -0.825 0.4096
## Ospedaleosp3 28.2495 13.5054 2.092 0.0366 *
## SessoM 77.5723 11.1865 6.934 5.18e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 274 on 2487 degrees of freedom
## Multiple R-squared: 0.7289, Adjusted R-squared: 0.7278
## F-statistic: 668.7 on 10 and 2487 DF, p-value: < 2.2e-16
Adjusted R-squared: 0.7278
As hypothesized, smoking has negative impact on infant weight, however, the high p-value (> 0.05) suggests the low significance of this measure.
The variable Anni.madre turns out to be insignificant, however, it will be retained because it may prove useful for data outside the dataset.
Model Variables explanation:
For each additional unit of Anni.madre, the weight increases by 0.8018 grams
For every unit of N.Gravidanze more, the weight increases by 11.3812 grams
For each additional unit of Fumatrici, the weight decreases by 30.2741 grams
For each additional unit of Gestazione, the weight increases by 32.5773 grams
For each additional Lunghezza unit, the weight increases by 10.2922 grams
For each additional unit of Cranio, the weight increases by 10.4722 grams
Compared to the baseline value (Tipo.parto = Ces), Tipo.parto = Nat, implies a increase of weight by 29.6335 grams
Compared to the baseline value (Ospedale = Osp1), Ospedale = Osp2, implies a dicrease of weight by 11.0912 grams
Compared to the baseline value (Ospedale = Osp1), Ospedale = Osp3, implies an increase of weight by 28.2495 grams
Compared to the baseline value (Sesso = F), Sesso = M, implies an increase of weight by 77.5723 grams
##
## Call:
## lm(formula = Peso ~ Anni.madre + N.gravidanze + Fumatrici + Gestazione +
## Lunghezza + Cranio + Tipo.parto + Sesso, data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1140.10 -181.96 -14.86 160.30 2629.68
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6738.3356 141.5660 -47.599 < 2e-16 ***
## Anni.madre 0.8681 1.1479 0.756 0.4496
## N.gravidanze 11.6900 4.6733 2.501 0.0124 *
## Fumatrici -31.7061 27.5836 -1.149 0.2505
## Gestazione 32.8963 3.8248 8.601 < 2e-16 ***
## Lunghezza 10.2691 0.3012 34.098 < 2e-16 ***
## Cranio 10.4850 0.4268 24.564 < 2e-16 ***
## Tipo.partoNat 30.3855 12.1052 2.510 0.0121 *
## SessoM 78.0234 11.2013 6.966 4.17e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 274.4 on 2489 degrees of freedom
## Multiple R-squared: 0.7279, Adjusted R-squared: 0.727
## F-statistic: 832.3 on 8 and 2489 DF, p-value: < 2.2e-16
## Analysis of Variance Table
##
## Model 1: Peso ~ Anni.madre + N.gravidanze + Fumatrici + Gestazione + Lunghezza +
## Cranio + Tipo.parto + Sesso
## Model 2: Peso ~ Anni.madre + N.gravidanze + Fumatrici + Gestazione + Lunghezza +
## Cranio + Tipo.parto + Ospedale + Sesso
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2489 187430749
## 2 2487 186743194 2 687555 4.5783 0.01036 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call:
## lm(formula = Peso ~ Anni.madre + N.gravidanze + Gestazione +
## Lunghezza + Cranio + Tipo.parto + Sesso, data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1139.50 -181.60 -14.59 160.14 2633.16
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6737.9269 141.5747 -47.593 < 2e-16 ***
## Anni.madre 0.8793 1.1479 0.766 0.4438
## N.gravidanze 11.4176 4.6676 2.446 0.0145 *
## Gestazione 32.6300 3.8180 8.546 < 2e-16 ***
## Lunghezza 10.2839 0.3009 34.176 < 2e-16 ***
## Cranio 10.4896 0.4268 24.574 < 2e-16 ***
## Tipo.partoNat 30.1222 12.1038 2.489 0.0129 *
## SessoM 77.8374 11.2008 6.949 4.67e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 274.4 on 2490 degrees of freedom
## Multiple R-squared: 0.7278, Adjusted R-squared: 0.727
## F-statistic: 950.9 on 7 and 2490 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Peso ~ Anni.madre + Gestazione + Lunghezza + Cranio +
## Tipo.parto + Sesso, data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1143.29 -184.07 -15.52 161.15 2617.47
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6748.7308 141.6472 -47.645 < 2e-16 ***
## Anni.madre 1.9144 1.0681 1.792 0.0732 .
## Gestazione 32.1537 3.8168 8.424 < 2e-16 ***
## Lunghezza 10.2496 0.3009 34.065 < 2e-16 ***
## Cranio 10.5733 0.4259 24.826 < 2e-16 ***
## Tipo.partoNat 29.3644 12.1120 2.424 0.0154 *
## SessoM 78.6331 11.2073 7.016 2.93e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 274.7 on 2491 degrees of freedom
## Multiple R-squared: 0.7271, Adjusted R-squared: 0.7264
## F-statistic: 1106 on 6 and 2491 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Peso ~ Anni.madre + Gestazione + Lunghezza + Cranio +
## Sesso, data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1163.03 -184.20 -14.07 163.24 2618.69
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6723.2290 141.3943 -47.550 < 2e-16 ***
## Anni.madre 1.8995 1.0692 1.777 0.0758 .
## Gestazione 32.2256 3.8205 8.435 < 2e-16 ***
## Lunghezza 10.2137 0.3008 33.954 < 2e-16 ***
## Cranio 10.6047 0.4261 24.887 < 2e-16 ***
## SessoM 78.6738 11.2182 7.013 2.99e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 275 on 2492 degrees of freedom
## Multiple R-squared: 0.7265, Adjusted R-squared: 0.7259
## F-statistic: 1324 on 5 and 2492 DF, p-value: < 2.2e-16
## Start: AIC=28054.55
## Peso ~ Anni.madre + N.gravidanze + Fumatrici + Gestazione + Lunghezza +
## Cranio + Tipo.parto + Ospedale + Sesso
##
## Df Sum of Sq RSS AIC
## - Anni.madre 1 36710 186779904 28053
## - Fumatrici 1 90677 186833870 28054
## <none> 186743194 28055
## - N.gravidanze 1 446244 187189438 28058
## - Tipo.parto 1 451073 187194266 28059
## - Ospedale 2 687555 187430749 28060
## - Sesso 1 3610705 190353899 28100
## - Gestazione 1 5458852 192202046 28124
## - Cranio 1 45318506 232061700 28595
## - Lunghezza 1 87861708 274604902 29016
##
## Step: AIC=28053.05
## Peso ~ N.gravidanze + Fumatrici + Gestazione + Lunghezza + Cranio +
## Tipo.parto + Ospedale + Sesso
##
## Df Sum of Sq RSS AIC
## - Fumatrici 1 91599 186871503 28052
## <none> 186779904 28053
## + Anni.madre 1 36710 186743194 28055
## - Tipo.parto 1 452049 187231953 28057
## - Ospedale 2 693914 187473818 28058
## - N.gravidanze 1 631082 187410986 28060
## - Sesso 1 3617809 190397713 28099
## - Gestazione 1 5424800 192204704 28123
## - Cranio 1 45569477 232349381 28596
## - Lunghezza 1 87852027 274631931 29014
##
## Step: AIC=28052.27
## Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio + Tipo.parto +
## Ospedale + Sesso
##
## Df Sum of Sq RSS AIC
## <none> 186871503 28052
## + Fumatrici 1 91599 186779904 28053
## + Anni.madre 1 37633 186833870 28054
## - Tipo.parto 1 444404 187315907 28056
## - Ospedale 2 702925 187574428 28058
## - N.gravidanze 1 608136 187479640 28058
## - Sesso 1 3601860 190473363 28098
## - Gestazione 1 5358199 192229702 28121
## - Cranio 1 45613331 232484834 28596
## - Lunghezza 1 88259386 275130889 29017
##
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio +
## Tipo.parto + Ospedale + Sesso, data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1113.07 -181.71 -16.66 161.08 2619.57
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6707.9252 136.0257 -49.314 < 2e-16 ***
## N.gravidanze 12.3360 4.3344 2.846 0.00446 **
## Gestazione 32.0386 3.7925 8.448 < 2e-16 ***
## Lunghezza 10.3059 0.3006 34.286 < 2e-16 ***
## Cranio 10.4920 0.4257 24.648 < 2e-16 ***
## Tipo.partoNat 29.4080 12.0875 2.433 0.01505 *
## Ospedaleosp2 -10.8939 13.4447 -0.810 0.41786
## Ospedaleosp3 28.7917 13.4969 2.133 0.03301 *
## SessoM 77.4657 11.1842 6.926 5.48e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 274 on 2489 degrees of freedom
## Multiple R-squared: 0.7287, Adjusted R-squared: 0.7278
## F-statistic: 835.7 on 8 and 2489 DF, p-value: < 2.2e-16
## Start: AIC=28118.62
## Peso ~ Anni.madre + N.gravidanze + Fumatrici + Gestazione + Lunghezza +
## Cranio + Tipo.parto + Ospedale + Sesso
##
## Df Sum of Sq RSS AIC
## - Anni.madre 1 36710 186779904 28111
## - Fumatrici 1 90677 186833870 28112
## - Ospedale 2 687555 187430749 28112
## - N.gravidanze 1 446244 187189438 28117
## - Tipo.parto 1 451073 187194266 28117
## <none> 186743194 28119
## - Sesso 1 3610705 190353899 28159
## - Gestazione 1 5458852 192202046 28183
## - Cranio 1 45318506 232061700 28654
## - Lunghezza 1 87861708 274604902 29074
##
## Step: AIC=28111.29
## Peso ~ N.gravidanze + Fumatrici + Gestazione + Lunghezza + Cranio +
## Tipo.parto + Ospedale + Sesso
##
## Df Sum of Sq RSS AIC
## - Fumatrici 1 91599 186871503 28105
## - Ospedale 2 693914 187473818 28105
## - Tipo.parto 1 452049 187231953 28110
## <none> 186779904 28111
## - N.gravidanze 1 631082 187410986 28112
## + Anni.madre 1 36710 186743194 28119
## - Sesso 1 3617809 190397713 28151
## - Gestazione 1 5424800 192204704 28175
## - Cranio 1 45569477 232349381 28649
## - Lunghezza 1 87852027 274631931 29066
##
## Step: AIC=28104.69
## Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio + Tipo.parto +
## Ospedale + Sesso
##
## Df Sum of Sq RSS AIC
## - Ospedale 2 702925 187574428 28098
## - Tipo.parto 1 444404 187315907 28103
## <none> 186871503 28105
## - N.gravidanze 1 608136 187479640 28105
## + Fumatrici 1 91599 186779904 28111
## + Anni.madre 1 37633 186833870 28112
## - Sesso 1 3601860 190473363 28145
## - Gestazione 1 5358199 192229702 28168
## - Cranio 1 45613331 232484834 28642
## - Lunghezza 1 88259386 275130889 29063
##
## Step: AIC=28098.42
## Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio + Tipo.parto +
## Sesso
##
## Df Sum of Sq RSS AIC
## - Tipo.parto 1 467626 188042054 28097
## <none> 187574428 28098
## - N.gravidanze 1 648873 188223301 28099
## + Ospedale 2 702925 186871503 28105
## + Fumatrici 1 100610 187473818 28105
## + Anni.madre 1 44184 187530244 28106
## - Sesso 1 3644818 191219246 28139
## - Gestazione 1 5457887 193032315 28162
## - Cranio 1 45747094 233321522 28636
## - Lunghezza 1 87955701 275530129 29051
##
## Step: AIC=28096.81
## Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio + Sesso
##
## Df Sum of Sq RSS AIC
## <none> 188042054 28097
## - N.gravidanze 1 621053 188663107 28097
## + Tipo.parto 1 467626 187574428 28098
## + Ospedale 2 726146 187315907 28103
## + Fumatrici 1 92548 187949505 28103
## + Anni.madre 1 45366 187996688 28104
## - Sesso 1 3650790 191692844 28137
## - Gestazione 1 5477493 193519547 28161
## - Cranio 1 46098547 234140601 28637
## - Lunghezza 1 87532691 275574744 29044
##
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio +
## Sesso, data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1149.37 -180.98 -15.57 163.69 2639.09
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6681.7251 135.8036 -49.201 < 2e-16 ***
## N.gravidanze 12.4554 4.3416 2.869 0.00415 **
## Gestazione 32.3827 3.8008 8.520 < 2e-16 ***
## Lunghezza 10.2455 0.3008 34.059 < 2e-16 ***
## Cranio 10.5410 0.4265 24.717 < 2e-16 ***
## SessoM 77.9807 11.2111 6.956 4.47e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 274.7 on 2492 degrees of freedom
## Multiple R-squared: 0.727, Adjusted R-squared: 0.7265
## F-statistic: 1327 on 5 and 2492 DF, p-value: < 2.2e-16
Model’s adjusted R-squared:
Model 1 = 0.7278 Model 2 = 0.727 Model 3 = 0.727 Model 4 = 0.7265 Model AIC = 0.7278 Model BIC = 0.7265
## df BIC
## mod1 12 35215.45
## mod2 10 35208.98
## mod3 9 35202.49
## mod4 8 35200.66
## mod5 7 35198.72
## mod_aic 10 35201.52
## mod_bic 7 35193.65
## N.gravidanze Gestazione Lunghezza Cranio Sesso
## 1.023462 1.669779 2.075747 1.624568 1.040184
BIC:
Model obtained using BIC has the lowest BIC value: 35193.65.
VIF:
Almost all values are close to 1 and below the threshold (5), there is no danger of multicollinearity.
##
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio +
## Sesso + I(Lunghezza^2), data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1169.62 -181.77 -12.79 163.77 1786.03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 212.288548 723.852095 0.293 0.769336
## N.gravidanze 14.085464 4.266175 3.302 0.000975 ***
## Gestazione 42.551398 3.876629 10.976 < 2e-16 ***
## Lunghezza -20.267001 3.162718 -6.408 1.76e-10 ***
## Cranio 10.651783 0.418894 25.428 < 2e-16 ***
## SessoM 69.968733 11.038797 6.338 2.75e-10 ***
## I(Lunghezza^2) 0.031655 0.003267 9.690 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 269.7 on 2491 degrees of freedom
## Multiple R-squared: 0.7369, Adjusted R-squared: 0.7363
## F-statistic: 1163 on 6 and 2491 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio +
## Sesso + I(Gestazione^2), data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1144.0 -181.5 -12.9 165.8 2661.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4646.7158 898.6322 -5.171 2.52e-07 ***
## N.gravidanze 12.5489 4.3381 2.893 0.00385 **
## Gestazione -81.2309 49.7402 -1.633 0.10257
## Lunghezza 10.3502 0.3040 34.045 < 2e-16 ***
## Cranio 10.6376 0.4282 24.843 < 2e-16 ***
## SessoM 75.7563 11.2435 6.738 1.99e-11 ***
## I(Gestazione^2) 1.5168 0.6621 2.291 0.02206 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 274.5 on 2491 degrees of freedom
## Multiple R-squared: 0.7276, Adjusted R-squared: 0.7269
## F-statistic: 1109 on 6 and 2491 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio +
## Sesso + I(Cranio^2), data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1138.6 -179.4 -14.8 163.4 2622.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 84.10118 1151.77280 0.073 0.94180
## N.gravidanze 12.76356 4.31259 2.960 0.00311 **
## Gestazione 38.90540 3.93291 9.892 < 2e-16 ***
## Lunghezza 10.48745 0.30157 34.776 < 2e-16 ***
## Cranio -31.79371 7.16973 -4.434 9.63e-06 ***
## SessoM 73.10236 11.16590 6.547 7.11e-11 ***
## I(Cranio^2) 0.06262 0.01059 5.915 3.77e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 272.8 on 2491 degrees of freedom
## Multiple R-squared: 0.7308, Adjusted R-squared: 0.7301
## F-statistic: 1127 on 6 and 2491 DF, p-value: < 2.2e-16
Model’s adjusted R-squared: Model bic = 0.7265 Model bic_1 = 0.736 Model bic_2 = 0.7269 Model bic_3 = 0.73
## df BIC
## mod_bic 7 35193.65
## mod_bic_1 8 35109.04
## mod_bic_2 8 35196.21
## mod_bic_3 8 35166.63
## N.gravidanze Gestazione Lunghezza Cranio Sesso
## 1.025055 1.801815 238.007682 1.625780 1.046053
## I(Lunghezza^2)
## 230.033474
BIC:
Model_bic_1 has the lowest BIC value: 35109.04.
VIF:
There is multicollinearity between I(Lunghezza^2) and Lunghezza.
##
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Cranio + Sesso +
## I(Lunghezza^2), data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1161.44 -180.02 -11.17 165.90 2381.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.335e+03 1.442e+02 -30.059 < 2e-16 ***
## N.gravidanze 1.314e+01 4.298e+00 3.057 0.00226 **
## Gestazione 3.481e+01 3.713e+00 9.374 < 2e-16 ***
## Cranio 1.047e+01 4.212e-01 24.845 < 2e-16 ***
## SessoM 7.455e+01 1.110e+01 6.714 2.33e-11 ***
## I(Lunghezza^2) 1.081e-02 3.075e-04 35.160 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 271.9 on 2492 degrees of freedom
## Multiple R-squared: 0.7326, Adjusted R-squared: 0.7321
## F-statistic: 1365 on 5 and 2492 DF, p-value: < 2.2e-16
## df BIC
## mod_bic 7 35193.65
## mod_bic_1 8 35109.04
## mod_bic_1_1 7 35142.06
## N.gravidanze Gestazione Cranio Sesso I(Lunghezza^2)
## 1.023828 1.626672 1.617959 1.041656 2.006202
BIC: The value has increased
VIF: There is no more multicollinearity.
We will keep mod_bic_4.
##
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio +
## Sesso + Gestazione:Lunghezza, data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1133.41 -179.98 -11.52 168.93 2652.65
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.991e+03 9.206e+02 -2.163 0.030631 *
## N.gravidanze 1.303e+01 4.321e+00 3.015 0.002594 **
## Gestazione -9.391e+01 2.481e+01 -3.785 0.000157 ***
## Lunghezza -8.476e-02 2.028e+00 -0.042 0.966661
## Cranio 1.076e+01 4.264e-01 25.234 < 2e-16 ***
## SessoM 7.225e+01 1.121e+01 6.445 1.38e-10 ***
## Gestazione:Lunghezza 2.729e-01 5.298e-02 5.151 2.79e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 273.3 on 2491 degrees of freedom
## Multiple R-squared: 0.7299, Adjusted R-squared: 0.7292
## F-statistic: 1122 on 6 and 2491 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio +
## Sesso + Gestazione:Cranio, data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1137.04 -181.47 -12.19 167.45 2695.30
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -187.95215 1106.93645 -0.170 0.86519
## N.gravidanze 13.12748 4.31382 3.043 0.00237 **
## Gestazione -140.78001 29.53978 -4.766 1.99e-06 ***
## Lunghezza 10.46687 0.30113 34.759 < 2e-16 ***
## Cranio -9.85430 3.47659 -2.834 0.00463 **
## SessoM 72.00219 11.18136 6.439 1.43e-10 ***
## Gestazione:Cranio 0.53389 0.09033 5.910 3.88e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 272.8 on 2491 degrees of freedom
## Multiple R-squared: 0.7308, Adjusted R-squared: 0.7301
## F-statistic: 1127 on 6 and 2491 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio +
## Sesso + Lunghezza:Cranio, data = neonati_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1150.65 -180.93 -13.48 165.99 2865.46
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.803e+03 1.018e+03 -1.771 0.0767 .
## N.gravidanze 1.293e+01 4.323e+00 2.991 0.0028 **
## Gestazione 3.815e+01 3.967e+00 9.616 < 2e-16 ***
## Lunghezza -3.060e-01 2.203e+00 -0.139 0.8895
## Cranio -4.755e+00 3.192e+00 -1.490 0.1365
## SessoM 7.324e+01 1.120e+01 6.537 7.59e-11 ***
## Lunghezza:Cranio 3.157e-02 6.531e-03 4.835 1.41e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 273.5 on 2491 degrees of freedom
## Multiple R-squared: 0.7296, Adjusted R-squared: 0.7289
## F-statistic: 1120 on 6 and 2491 DF, p-value: < 2.2e-16
Model’s adjusted R-squared:
Model_bic = 0.7265. | baseline model
Model_bic_4 = 0.7292 | The new variable introduced has low predictive capacity
Model_bic_5 = 0.7301 | The new variable introduced has high predictive capacity (0.53389)
Model_bic_6 = 0.7289 | The new variable introduced has low predictive capacity
## df BIC
## mod_bic 7 35193.65
## mod_bic_4 8 35175.01
## mod_bic_5 8 35166.68
## mod_bic_6 8 35178.14
## N.gravidanze Gestazione Lunghezza Cranio
## 1.024173 102.234068 2.108378 109.430962
## Sesso Gestazione:Cranio
## 1.048767 301.410457
BIC:
mod_bic_5 has the lowest BIC value: 35166.68.
VIF:
All values are close to 1 and below the threshold (5), there is no danger of multicollinearity.
## [1] "RMSE del modello mod_bic: 274.37"
## [1] "RMSE del modello mod_bic_1: 269.34"
## [1] "RMSE del modello mod_bic_2: 274.08"
## [1] "RMSE del modello mod_bic_3: 272.46"
## [1] "RMSE del modello mod_bic_4: 272.92"
## [1] "RMSE del modello mod_bic_5: 272.46"
##
## Shapiro-Wilk normality test
##
## data: residuals(mod_bic_5)
## W = 0.97279, p-value < 2.2e-16
##
## studentized Breusch-Pagan test
##
## data: mod_bic_5
## BP = 84.705, df = 6, p-value = 3.801e-16
##
## Durbin-Watson test
##
## data: mod_bic_5
## DW = 1.9576, p-value = 0.1445
## alternative hypothesis: true autocorrelation is greater than 0
## 13 15 34 36 67 89
## 0.005747330 0.007609026 0.006784240 0.007352213 0.005989360 0.012817577
## 96 101 106 131 134 151
## 0.006146495 0.008444940 0.028194760 0.008650920 0.007876461 0.014278965
## 155 161 204 206 220 249
## 0.007925092 0.021393082 0.014567541 0.011697548 0.007490072 0.005839164
## 277 294 305 310 312 315
## 0.005862438 0.005915532 0.005640770 0.069170355 0.018098857 0.007419291
## 378 442 445 492 516 565
## 0.038362637 0.007816488 0.010389038 0.009076806 0.013228625 0.005764124
## 582 587 592 615 638 656
## 0.011674287 0.010321764 0.006476594 0.005958671 0.007120330 0.005982558
## 684 697 706 726 748 750
## 0.008886536 0.005999991 0.006031410 0.005800762 0.012150748 0.007694247
## 757 765 805 828 895 928
## 0.008217726 0.006677521 0.032449475 0.007259111 0.007366315 0.064739305
## 946 947 956 985 1014 1067
## 0.007594779 0.009685791 0.008659375 0.007083491 0.008573911 0.010209592
## 1091 1106 1130 1134 1181 1188
## 0.012879065 0.006797763 0.033998363 0.006363565 0.007731113 0.006517382
## 1200 1219 1238 1248 1273 1291
## 0.005629131 0.030699856 0.005918331 0.031717329 0.007529563 0.006179871
## 1293 1311 1321 1356 1357 1385
## 0.006293069 0.009801561 0.010149158 0.006630841 0.007732511 0.018979405
## 1400 1411 1428 1429 1450 1505
## 0.005936771 0.008050716 0.008198110 0.034614750 0.015106722 0.013335823
## 1551 1553 1556 1573 1593 1610
## 0.050017642 0.010102424 0.007092655 0.005694752 0.005627397 0.010710053
## 1619 1686 1693 1701 1712 1718
## 0.021786515 0.009720875 0.005678489 0.011295072 0.007098503 0.007068431
## 1727 1735 1780 1781 1809 1827
## 0.013551851 0.005600208 0.105097961 0.016980770 0.015055981 0.006083594
## 1868 1977 2016 2040 2046 2086
## 0.006246938 0.008805846 0.008844684 0.011937094 0.005766639 0.013322958
## 2089 2114 2115 2120 2140 2146
## 0.006474326 0.023946969 0.012422758 0.054102027 0.007930323 0.005863961
## 2148 2149 2157 2175 2200 2215
## 0.008293948 0.024776338 0.006255313 0.110152510 0.015250724 0.005603405
## 2216 2220 2221 2224 2225 2244
## 0.008617219 0.005942666 0.021633975 0.005842346 0.006220303 0.006930495
## 2257 2307 2317 2337 2359 2391
## 0.006355257 0.026451369 0.007789477 0.006100252 0.010102635 0.006109896
## 2408 2422 2437 2452 2458 2471
## 0.013231022 0.021615570 0.058362458 0.109498269 0.010261969 0.021040384
## 2478
## 0.005857822
## rstudent unadjusted p-value Bonferroni p
## 1551 10.348799 1.3258e-24 3.3118e-21
## 155 5.222863 1.9076e-07 4.7653e-04
## 1306 4.736467 2.2969e-06 5.7377e-03
## named numeric(0)
Residuals are randomly arranged around the mean (0) with no obvious patterns.
The residuals are disposed along the diagonal, following a normal distribution.
The variance appears to be constant.
There are no points beyond the cook distance.
p-value is 3.378e-15, we fail to refuse the hypotesis of normal distribution.
p-value is 2.2e-16, we fail to refuse the hypotesis of heteroskedasticity.
p-value is 0.09189, we refuse the null hypotesisof auto correlation.
There are many leverage values.
There are many outliers.
A potentially very influential observation was identified and could distort the model. The observation number 1551.
## [1] "Peso stimato del neonato: 3268.1 grammi"
The graph shows the relationship between “Peso” and “N.gravidanze,” colored according to the variable “Sesso”. There is no obvious difference in the behavior of the two groups according to the variable “Smokers.” Generally, males weigh more than females.
The graph shows the relationship between “Peso” and “Fumatrici,” colored according to the variable “Sesso”. The behavior of the two groups is similar because the straight lines have a similar slope. There is no obvious difference in the behavior of the two groups according to the variable “Smokers.” Generally, males weigh more than females.
All graphs show a linear correlation between the input and output variables. In each graph the same behavior is confirmed for both sexes with males generally heavier than females.