Description:

This project, developed for Neonatal Health Solutions, aims to build a statistical model that accurately predicts newborn birth weight based on clinical data from three hospitals. By analyzing factors such as gestational age, maternal smoking, and delivery type, the model supports early risk detection and better neonatal care planning. The insights also help optimize hospital resources and improve public health strategies.

1° assigment, descriptive analysis of variables:

Let’s start with Tables of relative frequencies of Qualitative variables:

Relative Frequency: Delivery Type
Var1	Freq
Ces	0.2912
Nat	0.7088

Relative Frequency: Hospital
Var1	Freq
osp1	0.3264
osp2	0.3396
osp3	0.3340

Relative Frequency: Sex
Var1	Freq
F	0.5024
M	0.4976

Relative Frequency: Smoking Status, 0 = N, 1 = Y
Var1	Freq
0	0.9584
1	0.0416

So it looks like the entries are equally distributed for the variables hospitals and sex. Instead only 30% of the births are cesarean, while 70% are natural births. Also circa 4% of the mothers are, fortunately, smokers.

Let’s see how the quantitative variables are distributed:

let’s start from mother’s age:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   25.00   28.00   28.16   32.00   46.00

Results look great, distribution looks symmetric with mean similar to median (about 28 yo). But there are some typos (mothers of age 0) So let’s remove this values and let’s also present the distribution with a density plot:

Summary Statistics of Mother’s Age
Statistic	Value
Min.	13.00
1st Qu.	25.00
Median	28.00
Mean	28.19
3rd Qu.	32.00
Max.	46.00

Descriptive Statistics – Mother’s Age
Statistic	Value
Standard Deviation of Mother’s Age	5.22
Skewness of Mother’s Age	0.15

Now let’s explore number of Pregnancies:

Summary Statistics of Number of Pregnancies
Statistic	Value
Min.	0.00
1st Qu.	0.00
Median	1.00
Mean	0.98
3rd Qu.	1.00
Max.	12.00

Absolute and Relative Frequencies of Number of Pregnancies
Pregnancies	Absolute	Relative_Percent
0	1095	43.84
1	817	32.71
2	340	13.61
3	150	6.00
4	48	1.92
5	21	0.84
6	11	0.44
7	1	0.04
8	8	0.32
9	2	0.08
10	3	0.12
11	1	0.04
12	1	0.04

So Median says that half of the mother of the sample had at maximum 1 pregnancy and it’s consistent to nowadays data. Also apparently there is “no limit” to the number of pregnancies in a woman’s life but numbers seem to be logic according to life experience, and online found data.

Now let’s see the distribution of the Gestational age:

Summary Statistics of Gestation Weeks
Statistic	Value
Min.	25.00
1st Qu.	38.00
Median	39.00
Mean	38.98
3rd Qu.	40.00
Max.	43.00

Descriptive Statistics – Gestational Age
Statistic	Value
Standard Deviation of Gestational Age	1.87
Skewness of Gestational Age	-2.07

The distribution has a quite long left tail, so in the sample there are some premature births. Mean and median are close, about 39 week (as they should be physiologically).

Let’s explore the length of the newborn:

Summary Statistics of Baby Length (mm)
Statistic	Value
Min.	310.0
1st Qu.	480.0
Median	500.0
Mean	494.7
3rd Qu.	510.0
Max.	565.0

Descriptive Statistics – Neonatal Length
Statistic	Value
Standard Deviation of Neonatal Length	26.33
Skewness of Neonatal Length	-1.51

Length is symmetric but there is a long tail on the left, meaning that there are very small newborns (most surely born premature).

Now let’s see the Cranial circumference of the newborn:

Summary Statistics of Cranial Circumference (mm)
Statistic	Value
Min.	235.00
1st Qu.	330.00
Median	340.00
Mean	340.03
3rd Qu.	350.00
Max.	390.00

Descriptive Statistics – Cranial circumference
Statistic	Value
Standard Deviation of Cranial circumference	16.43
Skewness of Cranial circumference	-0.79
Excess Kurtosis of Cranial circumference	2.94

This variable is symmetric and mean is in accord with literature data.

Now let’s see the dipendent variable, the weight of the newborn:

Summary Statistics of Birth Weight (g)
Statistic	Value
Min.	830.00
1st Qu.	2990.00
Median	3300.00
Mean	3284.18
3rd Qu.	3620.00
Max.	4930.00

Descriptive Statistics – Birth Weight
Statistic	Value
Standard Deviation of Birth Weight	525.23
Skewness of Birth Weight	-0.65
Excess Kurtosis of Birth Weight	2.03

This variable looks symmetric, even if there is a longer left tail. It looks suitable for linear regression.

Assignment 2 Analysis and Modelization:

Is cesarean delivery more frequent in an hospital? The way to see it is to perform a Chi-squared test of independence between the two categorical variables: Type of delivery and Hospital.

Contingency Table: Type of Delivery vs Hospital
Delivery_Type	osp1	osp2	osp3
Ces	242	254	232
Nat	574	594	602

Let’s perform Chi-squared test of independence:

Chi-squared Test of Independence: Type of Delivery vs Hospital
Statistic	Value
Chi-squared	1.0800
Degrees of freedom	2.0000
P-value	0.5819

p-value = 0.5819, so there is not enough evidence to reject the null hypothesis. The proportion of cesarean deliveries does not differ significantly among the three hospitals. Cesareans appear to be distributed homogeneously relative to the group sizes.

Expected Counts: Type of Delivery vs Hospital
Delivery Type	osp1	osp2	osp3
Ces	237.81	247.14	243.06
Nat	578.19	600.86	590.94

Indeed the table expected if the two variables were independent (as they are indeed) is very similar to the observed.

Now let’s see if the means of variables weight and length are statistically different from the population’s ones.

Assumptions of the means in population: 3300g and 500mm (sources in links below)

https://www.ospedalebambinogesu.it/da-0-a-30-giorni-come-si-presenta-e-come-cresce-80012/#:~:text=In%20media%20il%20peso%20nascita%20%C3%A8%20di%20circa,riguarda%20la%20lunghezza,%20pari%20mediamente%20a%2050%20centimetri.

https://www.my-personaltrainer.it/salute/lunghezza-neonato.html

Hypothesis:

H₀: the sample mean = population mean

H₁: the sample mean ≠ population mean

One-Sample t-Test: Birth Weight vs 3300g
Statistic	Value
t value	-1.51
df	2497
p-value	0.13
Mean	3284.18
H0 Mean	3300
95% CI Lower	3263.58
95% CI Upper	3304.79

One-Sample t-Test: Length vs 500mm
Statistic	Value
t value	-10.07
df	2497
p-value	2.10e-23
Mean	494.7
H0 Mean	500
95% CI Lower	493.66
95% CI Upper	495.73

The one-sample t-tests reveal two different outcomes for weight and length.

For weight, the sample mean is 3284.18 g, compared to the population mean of 3300 g. The p-value is 0.1324, and the 95% confidence interval is [3263.58, 3304.79]. Since the p-value is greater than 0.05 and the interval includes the population mean, we conclude that there is no statistically significant difference. This suggests that the average birth weight in the sample aligns with that of the general population.

On the other hand, length shows a significant deviation. The sample mean is 494.70 mm, the p-value is < 2.2e-16, and the 95% confidence interval [493.66, 495.73] does not include the expected 500 mm. This indicates a statistically significant difference: the newborns in this sample are, on average, shorter.

This likely reflects the presence of a substantial number of preterm births, which often results in babies who are physically smaller at birth. However, it appears that the weight statistic is balanced out by other infants in the sample who, despite being of average or above-average length, are relatively heavier. This counterbalance could explain why the sample mean weight remains consistent with that of the overall population, even though a portion of the sample includes lighter, premature infants.

Now let’s see if the antropometric variables are statistically different among the two sexes of the newborns.

Let’s see if the delta between the means of the two groups is significantly different from zero.

Variance of Anthropometric Variables by Sex
Variable	Female	Male
Peso	277215.90	243941.29
Lunghezza	758.73	578.22
Cranio	280.25	247.96

Two-Sample t-Test (Welch): Birth Weight by Sex
Statistic	Value
t value	-12.11
df	2488.67
p-value	< 2.2e-16
Mean F	3161.06
Mean M	3408.5
95% CI Lower	-287.48
95% CI Upper	-207.38

Two-Sample t-Test (Welch): Length by Sex
Statistic	Value
t value	-9.58
df	2457.3
p-value	< 2.2e-16
Mean F	489.76
Mean M	499.67
95% CI Lower	-11.94
95% CI Upper	-7.88

Two-Sample t-Test (Welch): Cranial circumference by Sex
Statistic	Value
t value	-7.44
df	2489.39
p-value	1.41e-13
Mean F	337.62
Mean M	342.46
95% CI Lower	-6.11
95% CI Upper	-3.56

Weight

p-value < 2.2e-16: This extremely small p-value provides strong evidence against the null hypothesis, indicating a statistically significant difference in birth weight between male and female newborns. 95% Confidence Interval: [-287.48, -207.38]: Since the interval does not include 0 and lies entirely below it, male newborns weigh significantly more than females, with a mean difference ranging from approximately 207 g to 287 g. Variance (F = 277216, M = 243941): Although the variances are relatively close, to ensure robustness, the Welch Two Sample t-test was used.

Length

p-value < 2.2e-16: Again, the result is highly significant, suggesting that sex has a notable effect on birth length. 95% Confidence Interval: [-11.94, -7.88]: The interval shows that male newborns are significantly longer than females by approximately 8 to 12 mm. Variance (F = 758.73, M = 578.22): The difference in variance is more pronounced here, justifying the choice of Welch’s t-test over the classic one.

head circumference

p-value = 1.414e-13: This result confirms a statistically significant difference in head circumference between sexes. 95% Confidence Interval: [-6.11, -3.56]: Male newborns have a significantly larger Cranial circumference than females, with an average difference of about 3.56 to 6 mm. Variance (F = 280.25, M = 247.96): The variances are relatively close, but Welch’s test was still preferred to maintain consistency across comparisons.

All three Welch Two Sample t-tests—on weight, length, and head circumference—demonstrate statistically significant differences between male and female newborns. In all three anthropometric measures, males exhibit higher average values. Given that the group variances were not perfectly equal, especially for length, the use of Welch’s test provided a more reliable and conservative assessment. These findings strongly support the presence of sexual dimorphism at birth within this sample.

Let’s create the Model:

For first let’s create scatterplots of dependent variable vs the quantitative ones and also a complete one of all the variables:

The set of graphs shown represents the relationships between birth weight and several potentially influential variables. The goal is to understand which factors are most strongly associated with birth weight and which ones seem to have little or no effect.

Starting with maternal age, no significant relationship emerges: the correlation value is practically zero (r = -0.02), and visually the data points are scattered with no clear trend. The same applies to the number of previous pregnancies, which shows a correlation of zero. In short, neither maternal age nor the number of prior pregnancies appears to have a direct impact on newborn weight.

The situation is different for gestational age: here the correlation reaches r = 0.59, indicating a moderate positive relationship. As expected, the longer a pregnancy lasts, the heavier the baby tends to be, confirming the natural intrauterine growth process.

The strongest associations, however, come from the newborn’s morphometric variables. Length shows a very high correlation (r = 0.80) with weight: longer babies clearly tend to weigh more. Cranial circumference is also strongly correlated (r = 0.70), indicating that body size at birth is a major factor associated with birth weight.

In summary, the analysis suggests that the newborn’s physical characteristics and the length of gestation are the key elements to consider when analyzing birth weight. In contrast, the maternal factors included in this dataset show no evident impact (but to some extent, gestation), at least from a linear correlation standpoint.

Full-model:

Type of birth and hospital could be deleted logically but let’s start from the full model and then we will simplify it.

## 
## Call:
## lm(formula = Peso ~ Anni.madre + N.gravidanze + Fumatrici + Gestazione + 
##     Lunghezza + Cranio + Tipo.parto + Ospedale + Sesso, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1123.26  -181.53   -14.45   161.05  2611.89 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -6735.7960   141.4790 -47.610  < 2e-16 ***
## Anni.madre        0.8018     1.1467   0.699   0.4845    
## N.gravidanze     11.3812     4.6686   2.438   0.0148 *  
## Fumatrici1      -30.2741    27.5492  -1.099   0.2719    
## Gestazione       32.5773     3.8208   8.526  < 2e-16 ***
## Lunghezza        10.2922     0.3009  34.207  < 2e-16 ***
## Cranio           10.4722     0.4263  24.567  < 2e-16 ***
## Tipo.partoNat    29.6335    12.0905   2.451   0.0143 *  
## Ospedaleosp2    -11.0912    13.4471  -0.825   0.4096    
## Ospedaleosp3     28.2495    13.5054   2.092   0.0366 *  
## SessoM           77.5723    11.1865   6.934 5.18e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 274 on 2487 degrees of freedom
## Multiple R-squared:  0.7289, Adjusted R-squared:  0.7278 
## F-statistic: 668.7 on 10 and 2487 DF,  p-value: < 2.2e-16

This model explains about 72,8% of the variability in birth weight (Adjusted R² = 0.7278), with good predictive accuracy (RSE = 274 g).

Significant predictors include gestational age (+32,5 g), neonatal length (+10,3 g) and cranial circumference (+10,4 g), male sex (+77,5 g), number of pregnancies (+11,4 g).

The model anyways has to be cleaned because it is the full initial model and better can be done.

Let’s clean the first model1 -> model2:

let’s remove the variables that are not significant, so we can simplify the model (Anni.madre, Smokers/Not smokers, Hospitals) .

lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio + Sesso, data = df)

## 
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio + 
##     Sesso, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1149.37  -180.98   -15.57   163.69  2639.09 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -6681.7251   135.8036 -49.201  < 2e-16 ***
## N.gravidanze    12.4554     4.3416   2.869  0.00415 ** 
## Gestazione      32.3827     3.8008   8.520  < 2e-16 ***
## Lunghezza       10.2455     0.3008  34.059  < 2e-16 ***
## Cranio          10.5410     0.4265  24.717  < 2e-16 ***
## SessoM          77.9807    11.2111   6.956 4.47e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 274.7 on 2492 degrees of freedom
## Multiple R-squared:  0.727,  Adjusted R-squared:  0.7265 
## F-statistic:  1327 on 5 and 2492 DF,  p-value: < 2.2e-16

This regression model explains about 72,65% of the variability in birth weight (Adjusted R² = 0.7265), with a low residual error (274.7 g), indicating strong predictive performance.

All included variables are statistically significant. Gestational age, length, and cranial circumference have a strong positive association with birth weight, confirming the central role of fetal growth. Male infants tend to weigh significantly more, and a higher number of pregnancies is also linked to a slight increase in weight.

In summary, the model highlights fetal development and sex as the most influential factors in determining birth weight.

Adjusted R² stays quite the same in regard to the previous one. That means the removed variables didn’t add much, but they didn’t hurt the model either. Actually, it’s a good sign. I simplified the model without losing explanatory power. I did also removed the kind of birth because once a baby is born it is useless to predict the weight.

Let’s check multicollinearity in the cleaned model 2

Variance Inflation Factors (VIF) – Model 2
Variable	VIF
N.gravidanze	1.02
Gestazione	1.67
Lunghezza	2.08
Cranio	1.62
Sesso	1.04

Everything is fine apparently with VIFs.

Let’s try for some logarithmic transformation of the variable Gestation

because I think that at some point (after the normal period needed or slightly above) it doesn’t matter anymore the time the baby stays in the womb, so it could be a good idea to transform it.

## 
## Call:
## lm(formula = Peso ~ N.gravidanze + log_Gestazione + Lunghezza + 
##     Cranio + Sesso, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1153.33  -180.96   -15.52   162.10  2638.06 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -9680.4831   440.1171 -21.995  < 2e-16 ***
## N.gravidanze      12.2975     4.3449   2.830  0.00469 ** 
## log_Gestazione  1163.2891   140.9956   8.251 2.52e-16 ***
## Lunghezza         10.2554     0.3026  33.886  < 2e-16 ***
## Cranio            10.5297     0.4274  24.639  < 2e-16 ***
## SessoM            78.7070    11.2197   7.015 2.95e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 274.9 on 2492 degrees of freedom
## Multiple R-squared:  0.7265, Adjusted R-squared:  0.726 
## F-statistic:  1324 on 5 and 2492 DF,  p-value: < 2.2e-16

The r² adjusted is the same but the model is more complex in general due to the log transformation.

This regression model includes a log-transformed gestational age, which improves interpretability for non-linear effects. The model remains strong, with Adjusted R² = 0.726 and a residual standard error of 274.9 grams.

All variables are statistically significant:

log(Gestation) has a strong positive effect (+1163 g per log-unit).

Length, cranial circumference and sex (male: +78.6 g) also show strong association

Number of pregnancies adds a small but still significant effect (+12.29 g per unit).

let’s try an interaction:

## 
## Call:
## lm(formula = Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio + 
##     Sesso + Lunghezza:Cranio, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1150.65  -180.93   -13.48   165.99  2865.46 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1.803e+03  1.018e+03  -1.771   0.0767 .  
## N.gravidanze      1.293e+01  4.323e+00   2.991   0.0028 ** 
## Gestazione        3.815e+01  3.967e+00   9.616  < 2e-16 ***
## Lunghezza        -3.060e-01  2.203e+00  -0.139   0.8895    
## Cranio           -4.755e+00  3.192e+00  -1.490   0.1365    
## SessoM            7.324e+01  1.120e+01   6.537 7.59e-11 ***
## Lunghezza:Cranio  3.157e-02  6.531e-03   4.835 1.41e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 273.5 on 2491 degrees of freedom
## Multiple R-squared:  0.7296, Adjusted R-squared:  0.7289 
## F-statistic:  1120 on 6 and 2491 DF,  p-value: < 2.2e-16

The R² adjusted is a bit higher, but the model is more complex.

Y^ = −1803 +12.93⋅N.gravidanze +38.15⋅Gestazione −0.306⋅Lunghezza −4.755⋅Cranio +73.24⋅SessoM +0.03157⋅(Lunghezza×Cranio)

Intercept (-1803): the baseline value of weight when all predictors are 0 (not directly interpretable but necessary for the model).

Number of Pregnancies (N.gravidanze): each additional pregnancy increases birth weight by ~12.93 grams (statistically significant).

Gestational Age: each additional week increases birth weight by ~38.15 grams (highly significant).

Body Length and Cranial circumference: individually not statistically significant, but relevant through interaction.

Sex (SessoM): male newborns are expected to weigh ~73.24 grams more on average (very significant).

Length × Cranial circumferencegh (Interaction): this interaction term is positive and highly significant, suggesting that the combination of longer body and larger cranial size leads to higher birth weight.

R-squared: 0.7296

Adjusted R-squared: 0.7289

The model explains approximately 73% of the variance in birth weight, which indicates a strong fit (it looks the best model).

Residual standard error: 273.5 grams, reflecting the average deviation of predictions from actual values.

This model confirms that gestational age, sex, and the interaction between body and head size are the most influential predictors of birth weight. Number of pregnancies, although included, play a lesser role compared to fetal growth metrics.

For curiosity let’s see what happens to VIFs in this case

Generalized Variance Inflation Factors (GVIF) – model_inter (type = ‘predictor’)
Variable	GVIF	Df	GVIF^(1/(2*Df))	Interacts With	Other Predictors
N.gravidanze	1.02	1	1.01	–	Gestazione, Lunghezza, Cranio, Sesso
Gestazione	1.84	1	1.35	–	N.gravidanze, Lunghezza, Cranio, Sesso
Lunghezza	1.89	3	1.11	Cranio	N.gravidanze, Gestazione, Sesso
Cranio	1.89	3	1.11	Lunghezza	N.gravidanze, Gestazione, Sesso
Sesso	1.05	1	1.02	–	N.gravidanze, Gestazione, Lunghezza, Cranio

Now let’s choose the best model among the ones developed:

First let’s see the AIC and BIC of the models:

Model Comparison Using AIC and BIC
Model	AIC	BIC
model1	35145.57	35215.45
model2	35152.89	35193.65
model_log	35157.29	35198.05
model_inter	35131.55	35178.14

Both selection techniques suggest to keep the model with the interaction between Cranial circumference and length but let’s test if the difference from model2 is significant

## Analysis of Variance Table
## 
## Model 1: Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio + Sesso + 
##     Lunghezza:Cranio
## Model 2: Peso ~ N.gravidanze + Gestazione + Lunghezza + Cranio + Sesso
##   Res.Df       RSS Df Sum of Sq      F    Pr(>F)    
## 1   2491 186293990                                  
## 2   2492 188042054 -1  -1748064 23.374 1.415e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since even BIC that commonly penalizes the more complex models suggests to keep the model with the interaction rather than model2, let’s keep it, also because the anova test suggested that the difference of the R² is significant.

But let’s try to understand why: probably becuse when a newborn is long and also has a large head, it means that the entire body has developed in a proportionate and complete way. This leads to a higher weight compared to those who grow in only one of the two dimensions.

Now let’s evaluate RMSE and the residuals of the model_inter:

Leverage values:

Outliers of the dependent variable:

Outliers Detected by Bonferroni Test – model_inter
Observation	Bonf_p	R_Student	Outlier
1549	0.0e+00	11.18	Yes
155	3.0e-07	5.12	Yes
1305	1.8e-06	4.78	Yes

We can do both considerations with Cook’s distance:

## Max Cook = 1.44

The one obs. going over the Cook’s line is probably obs. n° 1549, it does go over 0.5 (Cook’s treshold), that effectively is a very particular case, the weight is 4370 g and the length only 315 mm with a Cranial circumference of 374 mm. Anyway it’s just a case and probably it doesn’t influence the model too much so let’s keep it.

tests on residuals:

Diagnostic Tests on Residuals – model_inter
Test	Statistic	DF	p.value
Breusch-Pagan (Homoscedasticity of Residuals)	140.3300	6	< 2.2e-16
Durbin-Watson (Independence of Residuals)	1.9500	NA	1.2110e-01
Shapiro-Wilk (Normality of Residuals)	0.9697	NA	< 2.2e-16

Homoscedasticity test is not passed (we must refuse it). This means that the variance of the residuals is not always the same, or in other word there are “zones” where it performs better than others. Residuals are independent and the Shapiro test is not passed (we refuse normality) but probably this is due to the amount of n, the shape of the distribution however is quite symmetric and centered to zero, with a long right tail though.

RMSE of the model with interaction:

## RMSE = 273.09

We have a RMSE of 272.77 and it means that, on average, the predictions made by your model differ from the actual observed values by about 273 grams. Since RMSE keeps the same unit as your target variable (the weight), this value tells us directly how much error to expect. It tells us how well your model is performing. If most babies weigh around an RMSE of about 273 grams is acceptable, but not extremely accurate, but we already knew also by the adjusted R² value.

let’s try a pair of prediction:

## Prediction 1: with a female that had 3 pregnancies, is at the 39° week of gestation and the sex of the newborn is Female
## Predicted birth weight = 3266.38 g  +/- 273 g
## 
## Prediction 2: with a male that had 0 pregnancies, is at the 38° week of gestation and the sex of the newborn is Male
## Predicted birth weight = 3262.68 g  +/- 273 g
## 
## Prediction 3: with a female that had 2 pregnancies, is at the 40° week of gestation and the sex of the newborn is Female
## Predicted birth weight = 3291.60 g  +/- 273 g
## 
## Prediction 4: with a male that had 4 pregnancies, is at the 37° week of gestation and the sex of the newborn is Male
## Predicted birth weight = 3276.26 g  +/- 273 g

Other predictions:

## Prediction 1: A Female newborn from a mother with 1 pregnancies, 38 weeks of gestation,
## Cranial circumference = 345 mm, length = 490 mm
## Predicted birth weight = 3206.66 g  +/- 273 g
## 
## Prediction 2: A Male newborn from a mother with 2 pregnancies, 40 weeks of gestation,
## Cranial circumference = 360 mm, length = 510 mm
## Predicted birth weight = 3751.08 g  +/- 273 g
## 
## Prediction 3: A Female newborn from a mother with 5 pregnancies, 37 weeks of gestation,
## Cranial circumference = 375 mm, length = 500 mm
## Predicted birth weight = 3657.06 g  +/- 273 g
## 
## Prediction 4: A Male newborn from a mother with 0 pregnancies, 41 weeks of gestation,
## Cranial circumference = 340 mm, length = 495 mm
## Predicted birth weight = 3379.98 g  +/- 273 g
## 
## Prediction 5: A Female newborn from a mother with 3 pregnancies, 39 weeks of gestation,
## Cranial circumference = 355 mm, length = 505 mm
## Predicted birth weight = 3541.37 g  +/- 273 g
## 
## Prediction 6: A Male newborn from a mother with 6 pregnancies, 36 weeks of gestation,
## Cranial circumference = 365 mm, length = 485 mm
## Predicted birth weight = 3426.50 g  +/- 273 g
## 
## Prediction 7: A Female newborn from a mother with 2 pregnancies, 35 weeks of gestation,
## Cranial circumference = 350 mm, length = 480 mm
## Predicted birth weight = 3051.28 g  +/- 273 g
## 
## Prediction 8: A Male newborn from a mother with 4 pregnancies, 42 weeks of gestation,
## Cranial circumference = 370 mm, length = 515 mm
## Predicted birth weight = 4023.60 g  +/- 273 g

visualizations:

In regression plots, the shaded area around the trend line is called the confidence band. It represents the 95% confidence interval for the mean predicted value of the response variable. This means we’re 95% confident that the true average lies within that band at each value of the predictor. It does not reflect the spread of individual data points. Wider bands = more uncertainty; narrower bands = more precise estimates. That’s why we have it narrow where the density of data entries is maximum.

Another consideration to do is that due to the distribution of our dataset variables the previsions will be overestimates on for the “leftish” values of our variables (most points stay under the lines of the regression model) that’s also the reason why residuals Test of homoscedasticity failed and then the variance of them is not always the same

Neonatal Health Solutions_Main

2025-05-16