This project, developed for Neonatal Health Solutions, aims to build a statistical model that accurately predicts newborn birth weight based on clinical data from three hospitals. By analyzing factors such as gestational age, maternal smoking, and delivery type, the model supports early risk detection and better neonatal care planning. The insights also help optimize hospital resources and improve public health strategies.
| Var1 | Freq |
|---|---|
| Ces | 0.2912 |
| Nat | 0.7088 |
| Var1 | Freq |
|---|---|
| osp1 | 0.3264 |
| osp2 | 0.3396 |
| osp3 | 0.3340 |
| Var1 | Freq |
|---|---|
| F | 0.5024 |
| M | 0.4976 |
| Var1 | Freq |
|---|---|
| 0 | 0.9584 |
| 1 | 0.0416 |
So it looks like the entries are equally distributed for the variables hospitals and sex. Instead only 30% of the births are cesarean, while 70% are natural births. Also circa 4% of the mothers are, fortunately, smokers.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 25.00 28.00 28.16 32.00 46.00
Results look great, distribution looks symmetric with mean similar to median (about 28 yo). But there are some typos (mothers of age 0) So let’s remove this values and let’s also present the distribution with a density plot:
| Statistic | Value |
|---|---|
| Min. | 13.00 |
| 1st Qu. | 25.00 |
| Median | 28.00 |
| Mean | 28.19 |
| 3rd Qu. | 32.00 |
| Max. | 46.00 |
| Statistic | Value |
|---|---|
| Standard Deviation of Mother’s Age | 5.22 |
| Skewness of Mother’s Age | 0.15 |
| Statistic | Value |
|---|---|
| Min. | 0.00 |
| 1st Qu. | 0.00 |
| Median | 1.00 |
| Mean | 0.98 |
| 3rd Qu. | 1.00 |
| Max. | 12.00 |
| Pregnancies | Absolute | Relative_Percent |
|---|---|---|
| 0 | 1095 | 43.84 |
| 1 | 817 | 32.71 |
| 2 | 340 | 13.61 |
| 3 | 150 | 6.00 |
| 4 | 48 | 1.92 |
| 5 | 21 | 0.84 |
| 6 | 11 | 0.44 |
| 7 | 1 | 0.04 |
| 8 | 8 | 0.32 |
| 9 | 2 | 0.08 |
| 10 | 3 | 0.12 |
| 11 | 1 | 0.04 |
| 12 | 1 | 0.04 |
So Median says that half of the mother of the sample had at maximum 1
pregnancy and it’s consistent to nowadays data. Also apparently there is
“no limit” to the number of pregnancies in a woman’s life but numbers
seem to be logic according to life experience, and online found
data.
| Statistic | Value |
|---|---|
| Min. | 25.00 |
| 1st Qu. | 38.00 |
| Median | 39.00 |
| Mean | 38.98 |
| 3rd Qu. | 40.00 |
| Max. | 43.00 |
| Statistic | Value |
|---|---|
| Standard Deviation of Gestational Age | 1.87 |
| Skewness of Gestational Age | -2.07 |
The distribution has a quite long left tail, so in the sample there are some premature births. Mean and median are close, about 39 week (as they should be physiologically).
| Statistic | Value |
|---|---|
| Min. | 310.0 |
| 1st Qu. | 480.0 |
| Median | 500.0 |
| Mean | 494.7 |
| 3rd Qu. | 510.0 |
| Max. | 565.0 |
| Statistic | Value |
|---|---|
| Standard Deviation of Neonatal Length | 26.33 |
| Skewness of Neonatal Length | -1.51 |
Length is symmetric but there is a long tail on the left, meaning that there are very small newborns (most surely born premature).
| Statistic | Value |
|---|---|
| Min. | 235.00 |
| 1st Qu. | 330.00 |
| Median | 340.00 |
| Mean | 340.03 |
| 3rd Qu. | 350.00 |
| Max. | 390.00 |
| Statistic | Value |
|---|---|
| Standard Deviation of Cranial circumference | 16.43 |
| Skewness of Cranial circumference | -0.79 |
| Excess Kurtosis of Cranial circumference | 2.94 |
This variable is symmetric and mean is in accord with literature data.
| Statistic | Value |
|---|---|
| Min. | 830.00 |
| 1st Qu. | 2990.00 |
| Median | 3300.00 |
| Mean | 3284.18 |
| 3rd Qu. | 3620.00 |
| Max. | 4930.00 |
| Statistic | Value |
|---|---|
| Standard Deviation of Birth Weight | 525.23 |
| Skewness of Birth Weight | -0.65 |
| Excess Kurtosis of Birth Weight | 2.03 |
This variable looks symmetric, even if there is a longer left tail. It looks suitable for linear regression.
Is cesarean delivery more frequent in an hospital? The way to see it is to perform a Chi-squared test of independence between the two categorical variables: Type of delivery and Hospital.
| Delivery_Type | osp1 | osp2 | osp3 |
|---|---|---|---|
| Ces | 242 | 254 | 232 |
| Nat | 574 | 594 | 602 |
Let’s perform Chi-squared test of independence:
| Statistic | Value |
|---|---|
| Chi-squared | 1.0800 |
| Degrees of freedom | 2.0000 |
| P-value | 0.5819 |
p-value = 0.5819, so there is not enough evidence to reject the null hypothesis. The proportion of cesarean deliveries does not differ significantly among the three hospitals. Cesareans appear to be distributed homogeneously relative to the group sizes.
| Delivery Type | osp1 | osp2 | osp3 |
|---|---|---|---|
| Ces | 237.81 | 247.14 | 243.06 |
| Nat | 578.19 | 600.86 | 590.94 |
Indeed the table expected if the two variables were independent (as they are indeed) is very similar to the observed.
Assumptions of the means in population: 3300g and 500mm (sources in links below)
https://www.ospedalebambinogesu.it/da-0-a-30-giorni-come-si-presenta-e-come-cresce-80012/#:~:text=In%20media%20il%20peso%20nascita%20%C3%A8%20di%20circa,riguarda%20la%20lunghezza,%20pari%20mediamente%20a%2050%20centimetri.
https://www.my-personaltrainer.it/salute/lunghezza-neonato.html
Hypothesis:
H₀: the sample mean = population mean
H₁: the sample mean ≠ population mean
| Statistic | Value |
|---|---|
| t value | -1.51 |
| df | 2497 |
| p-value | 0.13 |
| Mean | 3284.18 |
| H0 Mean | 3300 |
| 95% CI Lower | 3263.58 |
| 95% CI Upper | 3304.79 |
| Statistic | Value |
|---|---|
| t value | -10.07 |
| df | 2497 |
| p-value | 2.10e-23 |
| Mean | 494.7 |
| H0 Mean | 500 |
| 95% CI Lower | 493.66 |
| 95% CI Upper | 495.73 |
The one-sample t-tests reveal two different outcomes for weight and length.
For weight, the sample mean is 3284.18 g, compared to the population mean of 3300 g. The p-value is 0.1324, and the 95% confidence interval is [3263.58, 3304.79]. Since the p-value is greater than 0.05 and the interval includes the population mean, we conclude that there is no statistically significant difference. This suggests that the average birth weight in the sample aligns with that of the general population.
On the other hand, length shows a significant deviation. The sample mean is 494.70 mm, the p-value is < 2.2e-16, and the 95% confidence interval [493.66, 495.73] does not include the expected 500 mm. This indicates a statistically significant difference: the newborns in this sample are, on average, shorter.
This likely reflects the presence of a substantial number of preterm births, which often results in babies who are physically smaller at birth. However, it appears that the weight statistic is balanced out by other infants in the sample who, despite being of average or above-average length, are relatively heavier. This counterbalance could explain why the sample mean weight remains consistent with that of the overall population, even though a portion of the sample includes lighter, premature infants.
Let’s see if the delta between the means of the two groups is significantly different from zero.
| Variable | Female | Male |
|---|---|---|
| Peso | 277215.90 | 243941.29 |
| Lunghezza | 758.73 | 578.22 |
| Cranio | 280.25 | 247.96 |
| Statistic | Value |
|---|---|
| t value | -12.11 |
| df | 2488.67 |
| p-value | < 2.2e-16 |
| Mean F | 3161.06 |
| Mean M | 3408.5 |
| 95% CI Lower | -287.48 |
| 95% CI Upper | -207.38 |
| Statistic | Value |
|---|---|
| t value | -9.58 |
| df | 2457.3 |
| p-value | < 2.2e-16 |
| Mean F | 489.76 |
| Mean M | 499.67 |
| 95% CI Lower | -11.94 |
| 95% CI Upper | -7.88 |
| Statistic | Value |
|---|---|
| t value | -7.44 |
| df | 2489.39 |
| p-value | 1.41e-13 |
| Mean F | 337.62 |
| Mean M | 342.46 |
| 95% CI Lower | -6.11 |
| 95% CI Upper | -3.56 |
p-value < 2.2e-16: This extremely small p-value provides strong evidence against the null hypothesis, indicating a statistically significant difference in birth weight between male and female newborns. 95% Confidence Interval: [-287.48, -207.38]: Since the interval does not include 0 and lies entirely below it, male newborns weigh significantly more than females, with a mean difference ranging from approximately 207 g to 287 g. Variance (F = 277216, M = 243941): Although the variances are relatively close, to ensure robustness, the Welch Two Sample t-test was used.
p-value < 2.2e-16: Again, the result is highly significant, suggesting that sex has a notable effect on birth length. 95% Confidence Interval: [-11.94, -7.88]: The interval shows that male newborns are significantly longer than females by approximately 8 to 12 mm. Variance (F = 758.73, M = 578.22): The difference in variance is more pronounced here, justifying the choice of Welch’s t-test over the classic one.
p-value = 1.414e-13: This result confirms a statistically significant difference in head circumference between sexes. 95% Confidence Interval: [-6.11, -3.56]: Male newborns have a significantly larger Cranial circumference than females, with an average difference of about 3.56 to 6 mm. Variance (F = 280.25, M = 247.96): The variances are relatively close, but Welch’s test was still preferred to maintain consistency across comparisons.
All three Welch Two Sample t-tests—on weight, length, and head circumference—demonstrate statistically significant differences between male and female newborns. In all three anthropometric measures, males exhibit higher average values. Given that the group variances were not perfectly equal, especially for length, the use of Welch’s test provided a more reliable and conservative assessment. These findings strongly support the presence of sexual dimorphism at birth within this sample.
The set of graphs shown represents the relationships between birth
weight and several potentially influential variables. The goal is to
understand which factors are most strongly associated with birth weight
and which ones seem to have little or no effect.
Starting with maternal age, no significant relationship emerges: the correlation value is practically zero (r = -0.02), and visually the data points are scattered with no clear trend. The same applies to the number of previous pregnancies, which shows a correlation of zero. In short, neither maternal age nor the number of prior pregnancies appears to have a direct impact on newborn weight.
The situation is different for gestational age: here the correlation reaches r = 0.59, indicating a moderate positive relationship. As expected, the longer a pregnancy lasts, the heavier the baby tends to be, confirming the natural intrauterine growth process.
The strongest associations, however, come from the newborn’s morphometric variables. Length shows a very high correlation (r = 0.80) with weight: longer babies clearly tend to weigh more. Cranial circumference is also strongly correlated (r = 0.70), indicating that body size at birth is a major factor associated with birth weight.
In summary, the analysis suggests that the newborn’s physical characteristics and the length of gestation are the key elements to consider when analyzing birth weight. In contrast, the maternal factors included in this dataset show no evident impact (but to some extent, gestation), at least from a linear correlation standpoint.
Type of birth and hospital could be deleted logically but let’s start from the full model and then we will simplify it.
##
## Call:
## lm(formula = Peso ~ Anni.madre + N.gravidanze + Fumatrici + Gestazione +
## Lunghezza + Cranio + Tipo.parto + Ospedale + Sesso, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1123.26 -181.53 -14.45 161.05 2611.89
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6735.7960 141.4790 -47.610 < 2e-16 ***
## Anni.madre 0.8018 1.1467 0.699 0.4845
## N.gravidanze 11.3812 4.6686 2.438 0.0148 *
## Fumatrici1 -30.2741 27.5492 -1.099 0.2719
## Gestazione 32.5773 3.8208 8.526 < 2e-16 ***
## Lunghezza 10.2922 0.3009 34.207 < 2e-16 ***
## Cranio 10.4722 0.4263 24.567 < 2e-16 ***
## Tipo.partoNat 29.6335 12.0905 2.451 0.0143 *
## Ospedaleosp2 -11.0912 13.4471 -0.825 0.4096
## Ospedaleosp3 28.2495 13.5054 2.092 0.0366 *
## SessoM 77.5723 11.1865 6.934 5.18e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 274 on 2487 degrees of freedom
## Multiple R-squared: 0.7289, Adjusted R-squared: 0.7278
## F-statistic: 668.7 on 10 and 2487 DF, p-value: < 2.2e-16
This model explains about 72,8% of the variability in birth weight (Adjusted R² = 0.7278), with good predictive accuracy (RSE = 274 g).
Significant predictors include gestational age (+32,5 g), neonatal length (+10,3 g) and cranial circumference (+10,4 g), male sex (+77,5 g), number of pregnancies (+11,4 g).
The model anyways has to be cleaned because it is the full initial model and better can be done.
let’s remove the variables that are not significant, so we can simplify the model (Anni.madre, Smokers/Not smokers, Hospitals) .
lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio + Sesso, data = df)
##
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio +
## Sesso, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1149.37 -180.98 -15.57 163.69 2639.09
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6681.7251 135.8036 -49.201 < 2e-16 ***
## N.gravidanze 12.4554 4.3416 2.869 0.00415 **
## Gestazione 32.3827 3.8008 8.520 < 2e-16 ***
## Lunghezza 10.2455 0.3008 34.059 < 2e-16 ***
## Cranio 10.5410 0.4265 24.717 < 2e-16 ***
## SessoM 77.9807 11.2111 6.956 4.47e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 274.7 on 2492 degrees of freedom
## Multiple R-squared: 0.727, Adjusted R-squared: 0.7265
## F-statistic: 1327 on 5 and 2492 DF, p-value: < 2.2e-16
This regression model explains about 72,65% of the variability in birth weight (Adjusted R² = 0.7265), with a low residual error (274.7 g), indicating strong predictive performance.
All included variables are statistically significant. Gestational age, length, and cranial circumference have a strong positive association with birth weight, confirming the central role of fetal growth. Male infants tend to weigh significantly more, and a higher number of pregnancies is also linked to a slight increase in weight.
In summary, the model highlights fetal development and sex as the most influential factors in determining birth weight.
Adjusted R² stays quite the same in regard to the previous one. That means the removed variables didn’t add much, but they didn’t hurt the model either. Actually, it’s a good sign. I simplified the model without losing explanatory power. I did also removed the kind of birth because once a baby is born it is useless to predict the weight.
| Variable | VIF |
|---|---|
| N.gravidanze | 1.02 |
| Gestazione | 1.67 |
| Lunghezza | 2.08 |
| Cranio | 1.62 |
| Sesso | 1.04 |
Everything is fine apparently with VIFs.
because I think that at some point (after the normal period needed or slightly above) it doesn’t matter anymore the time the baby stays in the womb, so it could be a good idea to transform it.
##
## Call:
## lm(formula = Peso ~ N.gravidanze + log_Gestazione + Lunghezza +
## Cranio + Sesso, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1153.33 -180.96 -15.52 162.10 2638.06
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9680.4831 440.1171 -21.995 < 2e-16 ***
## N.gravidanze 12.2975 4.3449 2.830 0.00469 **
## log_Gestazione 1163.2891 140.9956 8.251 2.52e-16 ***
## Lunghezza 10.2554 0.3026 33.886 < 2e-16 ***
## Cranio 10.5297 0.4274 24.639 < 2e-16 ***
## SessoM 78.7070 11.2197 7.015 2.95e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 274.9 on 2492 degrees of freedom
## Multiple R-squared: 0.7265, Adjusted R-squared: 0.726
## F-statistic: 1324 on 5 and 2492 DF, p-value: < 2.2e-16
The r² adjusted is the same but the model is more complex in general due to the log transformation.
This regression model includes a log-transformed gestational age, which improves interpretability for non-linear effects. The model remains strong, with Adjusted R² = 0.726 and a residual standard error of 274.9 grams.
All variables are statistically significant:
log(Gestation) has a strong positive effect (+1163 g per log-unit).
Length, cranial circumference and sex (male: +78.6 g) also show strong association
Number of pregnancies adds a small but still significant effect (+12.29 g per unit).
##
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio +
## Sesso + Lunghezza:Cranio, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1150.65 -180.93 -13.48 165.99 2865.46
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.803e+03 1.018e+03 -1.771 0.0767 .
## N.gravidanze 1.293e+01 4.323e+00 2.991 0.0028 **
## Gestazione 3.815e+01 3.967e+00 9.616 < 2e-16 ***
## Lunghezza -3.060e-01 2.203e+00 -0.139 0.8895
## Cranio -4.755e+00 3.192e+00 -1.490 0.1365
## SessoM 7.324e+01 1.120e+01 6.537 7.59e-11 ***
## Lunghezza:Cranio 3.157e-02 6.531e-03 4.835 1.41e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 273.5 on 2491 degrees of freedom
## Multiple R-squared: 0.7296, Adjusted R-squared: 0.7289
## F-statistic: 1120 on 6 and 2491 DF, p-value: < 2.2e-16
The R² adjusted is a bit higher, but the model is more complex.
Y^ = −1803 +12.93⋅N.gravidanze +38.15⋅Gestazione −0.306⋅Lunghezza −4.755⋅Cranio +73.24⋅SessoM +0.03157⋅(Lunghezza×Cranio)
Intercept (-1803): the baseline value of weight when all predictors are 0 (not directly interpretable but necessary for the model).
Number of Pregnancies (N.gravidanze): each additional pregnancy increases birth weight by ~12.93 grams (statistically significant).
Gestational Age: each additional week increases birth weight by ~38.15 grams (highly significant).
Body Length and Cranial circumference: individually not statistically significant, but relevant through interaction.
Sex (SessoM): male newborns are expected to weigh ~73.24 grams more on average (very significant).
Length × Cranial circumferencegh (Interaction): this interaction term is positive and highly significant, suggesting that the combination of longer body and larger cranial size leads to higher birth weight.
R-squared: 0.7296
Adjusted R-squared: 0.7289
The model explains approximately 73% of the variance in birth weight, which indicates a strong fit (it looks the best model).
Residual standard error: 273.5 grams, reflecting the average deviation of predictions from actual values.
This model confirms that gestational age, sex, and the interaction between body and head size are the most influential predictors of birth weight. Number of pregnancies, although included, play a lesser role compared to fetal growth metrics.
| Variable | GVIF | Df | GVIF^(1/(2*Df)) | Interacts With | Other Predictors |
|---|---|---|---|---|---|
| N.gravidanze | 1.02 | 1 | 1.01 | – | Gestazione, Lunghezza, Cranio, Sesso |
| Gestazione | 1.84 | 1 | 1.35 | – | N.gravidanze, Lunghezza, Cranio, Sesso |
| Lunghezza | 1.89 | 3 | 1.11 | Cranio | N.gravidanze, Gestazione, Sesso |
| Cranio | 1.89 | 3 | 1.11 | Lunghezza | N.gravidanze, Gestazione, Sesso |
| Sesso | 1.05 | 1 | 1.02 | – | N.gravidanze, Gestazione, Lunghezza, Cranio |
First let’s see the AIC and BIC of the models:
| Model | AIC | BIC |
|---|---|---|
| model1 | 35145.57 | 35215.45 |
| model2 | 35152.89 | 35193.65 |
| model_log | 35157.29 | 35198.05 |
| model_inter | 35131.55 | 35178.14 |
Both selection techniques suggest to keep the model with the interaction between Cranial circumference and length but let’s test if the difference from model2 is significant
## Analysis of Variance Table
##
## Model 1: Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio + Sesso +
## Lunghezza:Cranio
## Model 2: Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio + Sesso
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2491 186293990
## 2 2492 188042054 -1 -1748064 23.374 1.415e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since even BIC that commonly penalizes the more complex models suggests to keep the model with the interaction rather than model2, let’s keep it, also because the anova test suggested that the difference of the R² is significant.
But let’s try to understand why: probably becuse when a newborn is long and also has a large head, it means that the entire body has developed in a proportionate and complete way. This leads to a higher weight compared to those who grow in only one of the two dimensions.
| Observation | Bonf_p | R_Student | Outlier |
|---|---|---|---|
| 1549 | 0.0e+00 | 11.18 | Yes |
| 155 | 3.0e-07 | 5.12 | Yes |
| 1305 | 1.8e-06 | 4.78 | Yes |
## Max Cook = 1.44
The one obs. going over the Cook’s line is probably obs. n° 1549, it does go over 0.5 (Cook’s treshold), that effectively is a very particular case, the weight is 4370 g and the length only 315 mm with a Cranial circumference of 374 mm. Anyway it’s just a case and probably it doesn’t influence the model too much so let’s keep it.
| Test | Statistic | DF | p.value |
|---|---|---|---|
| Breusch-Pagan (Homoscedasticity of Residuals) | 140.3300 | 6 | < 2.2e-16 |
| Durbin-Watson (Independence of Residuals) | 1.9500 | NA | 1.2110e-01 |
| Shapiro-Wilk (Normality of Residuals) | 0.9697 | NA | < 2.2e-16 |
Homoscedasticity test is not passed (we must refuse it). This means that the variance of the residuals is not always the same, or in other word there are “zones” where it performs better than others. Residuals are independent and the Shapiro test is not passed (we refuse normality) but probably this is due to the amount of n, the shape of the distribution however is quite symmetric and centered to zero, with a long right tail though.
## RMSE = 273.09
We have a RMSE of 272.77 and it means that, on average, the predictions made by your model differ from the actual observed values by about 273 grams. Since RMSE keeps the same unit as your target variable (the weight), this value tells us directly how much error to expect. It tells us how well your model is performing. If most babies weigh around an RMSE of about 273 grams is acceptable, but not extremely accurate, but we already knew also by the adjusted R² value.
## Prediction 1: with a female that had 3 pregnancies, is at the 39° week of gestation and the sex of the newborn is Female
## Predicted birth weight = 3266.38 g +/- 273 g
##
## Prediction 2: with a male that had 0 pregnancies, is at the 38° week of gestation and the sex of the newborn is Male
## Predicted birth weight = 3262.68 g +/- 273 g
##
## Prediction 3: with a female that had 2 pregnancies, is at the 40° week of gestation and the sex of the newborn is Female
## Predicted birth weight = 3291.60 g +/- 273 g
##
## Prediction 4: with a male that had 4 pregnancies, is at the 37° week of gestation and the sex of the newborn is Male
## Predicted birth weight = 3276.26 g +/- 273 g
Other predictions:
## Prediction 1: A Female newborn from a mother with 1 pregnancies, 38 weeks of gestation,
## Cranial circumference = 345 mm, length = 490 mm
## Predicted birth weight = 3206.66 g +/- 273 g
##
## Prediction 2: A Male newborn from a mother with 2 pregnancies, 40 weeks of gestation,
## Cranial circumference = 360 mm, length = 510 mm
## Predicted birth weight = 3751.08 g +/- 273 g
##
## Prediction 3: A Female newborn from a mother with 5 pregnancies, 37 weeks of gestation,
## Cranial circumference = 375 mm, length = 500 mm
## Predicted birth weight = 3657.06 g +/- 273 g
##
## Prediction 4: A Male newborn from a mother with 0 pregnancies, 41 weeks of gestation,
## Cranial circumference = 340 mm, length = 495 mm
## Predicted birth weight = 3379.98 g +/- 273 g
##
## Prediction 5: A Female newborn from a mother with 3 pregnancies, 39 weeks of gestation,
## Cranial circumference = 355 mm, length = 505 mm
## Predicted birth weight = 3541.37 g +/- 273 g
##
## Prediction 6: A Male newborn from a mother with 6 pregnancies, 36 weeks of gestation,
## Cranial circumference = 365 mm, length = 485 mm
## Predicted birth weight = 3426.50 g +/- 273 g
##
## Prediction 7: A Female newborn from a mother with 2 pregnancies, 35 weeks of gestation,
## Cranial circumference = 350 mm, length = 480 mm
## Predicted birth weight = 3051.28 g +/- 273 g
##
## Prediction 8: A Male newborn from a mother with 4 pregnancies, 42 weeks of gestation,
## Cranial circumference = 370 mm, length = 515 mm
## Predicted birth weight = 4023.60 g +/- 273 g
In regression plots, the shaded area around the trend line is called the
confidence band. It represents the 95% confidence interval for the mean
predicted value of the response variable. This means we’re 95% confident
that the true average lies within that band at each value of the
predictor. It does not reflect the spread of individual data points.
Wider bands = more uncertainty; narrower bands = more precise estimates.
That’s why we have it narrow where the density of data entries is
maximum.
Another consideration to do is that due to the distribution of our dataset variables the previsions will be overestimates on for the “leftish” values of our variables (most points stay under the lines of the regression model) that’s also the reason why residuals Test of homoscedasticity failed and then the variance of them is not always the same