Introduction


Analyzing the nutrient composition of two stages of Bebelac baby formula Milk and performing statistical tests to determine if there are significant differences between the two stages; to help in formulating a clear idea about the milk and its different stages.


Data


After loading and manipulating the data, I wanted to calculate the composition by nutrient for each stage, dividing each row of Stage 2 data frame and Stage 3 data frame by the sum of its row. Then calculated the total nutrient composition for each stage to get at least a descriptive statistics for each stage.
I wanted to calculate several descriptive statistics for each stage, including the mean, standard deviation, minimum, maximum, and quartiles. Therefore, I selected and flattened the data for both stages, and bar-plotted the mean and standard deviation of each nutrient for each stage.
## (array([0, 1]), [Text(0, 0, 'per 100 g'), Text(1, 0, 'per 100 ml/13.7 g')])



In the bar plot, the mean values are represented by the heights of the bars, while the error bars extending from the top of each bar indicate the standard deviation.


I then performed a t-test and ANOVA F-test to determine if the observed differences between the two stages are statistically significant.
## (0.0, 5.0)
## (0.0, 5.0)



The horizontal bar plots display the results of two statistical tests: the two-sample t-test and the ANOVA F-test. The t-test compares the means of two groups, in this case, the nutrient composition of Stage 2 per 100g and Stage 3 per 100g. The ANOVA F-test compares the means of more than two groups.
The first horizontal bar plot shows the results of the t-test, displaying the t-statistic and p-value. The t-statistic measures the difference between the means of the two groups relative to the variation within each group. A large t-statistic indicates that the means of the two groups are significantly different. The p-value is a measure of the probability of obtaining a result as extreme as the observed t-statistic by chance, assuming that the null hypothesis (i.e., the means of the two groups are equal) is true. A small p-value (typically less than 0.05) indicates that the observed difference between the means is statistically significant, and the null hypothesis should be rejected.
The second horizontal bar plot shows the results of the ANOVA F-test, displaying the F-statistic and p-value. The F-statistic measures the variation between the means of the groups relative to the variation within each group. A large F-statistic indicates that the means of the groups are significantly different. The p-value is a measure of the probability of obtaining a result as extreme as the observed F-statistic by chance, assuming that the null hypothesis (i.e., the means of all groups are equal) is true. A small p-value (typically less than 0.05) indicates that the observed difference between the means is statistically significant, and the null hypothesis should be rejected.
In this case, both, the t-test and F-test, results show that there is no significant difference in the nutrient composition between stage 2 and stage 3 of Bebelac milk, since the p-values are greater than 0.05 (the common threshold for statistical significance). Therefore, we can generally conclude that the nutrient composition is similar between the two stages of Bebelac milk.
Now let’s create some more detailed visualizations to compare the nutrient levels and nutritional composition of the two stages.
I will start by creating a horizontal bar chart that shows the nutrient levels for each stage.
I set the y-axis scale to logarithmic to better visualize the differences in nutrient levels.





As we can see, the chart shows that some nutrients are present in higher levels in one stage than in the other.
Now let’s create a grouped bar chart that compares the nutritional composition of each nutrient between the two stages.



This chart shows how the proportions of each nutrient differ between the two stages, which can give us a better understanding of how the formulas differ in terms of their nutritional content.
I then, created a scatter plot with color-coded nutrient levels for the two stages of Bebelac baby formula, to identify any nutrients that are consistently present in higher or lower levels in one stage compared to the other.



The plot compares nutrient levels in stage 2 and stage 3 and visualizes the relationship between them through the scatters. The nutrient levels are color-coded based on their values and compared using a normalized colormap. A trendline is added to the plot using the seaborn library to represent the relationship between the nutrient levels in stage 2 and stage 3.
The plot also shows the correlation coefficient between the nutrient levels in the two stages. A correlation coefficient of 1.00 indicates a perfect positive correlation between the nutrient levels in the two stages. This means that there is a strong linear relationship between the nutrient levels in the two stages, and that as the nutrient levels increase in one stage, they also increase proportionally in the other stage. In other words, the two variables move in perfect tandem with one another.
It is important to note that a correlation of 1.00 does not necessarily imply causation, but it does suggest that there is a strong relationship between the two stages of the milk.
To draw more accurate conclusions about nutrient levels and make more informed decisions based on my data, I thought to visualize the data through another scatter plot, but this time with confidence intervals.
We can use the confidence intervals to compare the mean nutrient levels between the two stages and determine if there is a statistically significant difference between them.
I first calculated the mean nutrient levels, standard deviations, and standard errors of the mean for the two stages (stage 2 and stage 3) and then calculated the 95% confidence intervals for each stage. Plotted the nutrient level data for both stages as a scatter plot with color-coded nutrient levels, and overlay-ed the 95% confidence intervals for each stage as vertical red lines. The x-axis shows the stage numbers (2 and 3), and the y-axis shows the nutrient levels on a logarithmic scale.





In this plot, the confidence intervals are calculated for the mean nutrient levels of both stages (2 and 3), and are shown in the plot as red lines.
As we can tell, the confidence intervals for the two stages of the milk overlap significantly, which suggests that there may not be a statistically significant difference between the means of the two groups, but it does not necessarily mean that there is no difference between the groups. The true difference between the means may still exist but could be too small to detect with the current sample size or statistical method used. Therefore, it is important to interpret the results carefully and consider other factors such as effect size, sample size, and statistical power when interpreting statistical results.
I also created a heat map to compare nutrient levels between stages of Bebelac Baby Formula. The heat map shows the mean and standard deviation for each nutrient level in each stage side-by-side. Colored it using a gradient from cool to warm colors, with blue representing lower levels and red representing higher levels. I also added annotations to the heatmap to show the exact nutrient level for each value.





From the heat map above, we can compare the nutrient levels between stages and identify any significant differences. For example, we can see that the mean nutrient levels for most nutrients are higher in stage 3 compared to stage 2, except for the lactose, Carbs and protein, which has a higher mean nutrient level in stage 2. The fact that the mean nutrient levels for most nutrients are higher in stage 3 compared to stage 2, except for lactose, carbs, and protein, indicates that there may be some important differences in the nutrient profiles of these two stages.
Additionally, the lower standard deviation of most nutrients in stage 3 suggests that the nutrient levels are more consistent or less variable in this stage compared to stage 2. This could be an important factor to consider when evaluating the quality or nutritional value of these products.
However, it’s important to note that the differences in nutrient levels and standard deviations between the two stages may not necessarily be significant without further statistical analysis. It’s also important to consider other factors that may affect the nutrient levels, such as differences in production methods or ingredient sourcing.
Next, I wanted to create a box plot and a violin plot side-by-side, showing the distribution of nutrient composition for each nutrient in the concatenated DataFrame, which includes data from both stages of Bebelac milk.





The box plot shows the distribution of nutrient composition for each nutrient by displaying the median (line in the box), the interquartile range (the box), the range of the data (the whiskers), and any outliers (points outside the whiskers).
The violin plot shows the distribution of nutrient composition for each nutrient as a kernel density plot mirrored and rotated around a central axis. The width of the violin at any given point represents the density of the data at that point.
Together, these plots give us a visual representation of the spread and density of the nutrient composition for each nutrient. We can see the distribution of the data, the presence of outliers, and the central tendency of each nutrient. By comparing the plots for each nutrient, we can also observe any differences in the distribution of nutrient composition between the two stages of Bebelac milk.
After identifying the nutrient differences and visualize the distribution of the data in general, I wanted to be more specific and compare the distribution of macronutrients composition of the formula in each stage through a stacked bar chart, using the Matplotlib library.
## <BarContainer object of 2 artists>
## <BarContainer object of 2 artists>
## <BarContainer object of 2 artists>
## <BarContainer object of 2 artists>
## <BarContainer object of 2 artists>
## <BarContainer object of 2 artists>

The stacked bar chart is important because it visually displays the proportion of carbohydrates, fats, and proteins in each stage. This information can be crucial for parents and caregivers who need to ensure that infants are getting the proper nutrition they need to grow and develop.
The chart allows for an easy comparison between the two stages, showing how the distribution of macronutrients changes as the infant grows and requires different nutritional needs. For example, in the stage 2 formula, the proportion of carbohydrates is a bit higher than in stage 3, reflecting the need for more energy in the early stages of growth.
Overall, the stacked bar chart is a helpful tool for understanding the nutritional composition of infant formula, and can assist parents and caregivers in making informed decisions about their child’s diet.
To further explore the distribution of nutrients by stage, I used the Pandas to sum the nutrient values for each stage and then plotted the data using a pie chart. I also wanted to compare the total nutrient composition of each stage. This will help to show the overall nutrient composition of each stage and identify which nutrients were the most prevalent.
## ([<matplotlib.patches.Wedge object at 0x7fb4dbe05160>, <matplotlib.patches.Wedge object at 0x7fb4da0a92e0>], [Text(-0.43464003332485, -1.0104890110393945, 'per 100 g'), Text(0.43464012793365736, 1.0104889703454532, 'per 100 ml/13.7 g')], [Text(-0.23707638181355453, -0.551175824203306, '87.1%'), Text(0.23707643341835855, 0.5511758020066108, '12.9%')])
## ([<matplotlib.patches.Wedge object at 0x7fb4da0a9fd0>, <matplotlib.patches.Wedge object at 0x7fb4da0c7130>], [Text(-0.39832924776739886, -1.0253457028598005, 'per 100 g'), Text(0.39832924776739903, 1.0253457028598005, 'per 100 ml/13.7 g')], [Text(-0.21727049878221755, -0.5592794742871638, '88.2%'), Text(0.21727049878221763, 0.5592794742871638, '11.8%')])
## ([<matplotlib.patches.Wedge object at 0x7fb4da0d0940>, <matplotlib.patches.Wedge object at 0x7fb4da0e14f0>], [Text(-0.4466808134899834, -1.0052244778455839, 'per 100 g'), Text(0.4466807193740737, 1.0052245196668554, 'per 100 ml/13.7 g')], [Text(-0.24364408008544547, -0.5483042606430457, '86.7%'), Text(0.2436440287494947, 0.5483042834546484, '13.3%')])
## ([<matplotlib.patches.Wedge object at 0x7fb4da0e1f70>, <matplotlib.patches.Wedge object at 0x7fb4d99e91c0>], [Text(-0.4322133922344288, -1.0115293290721767, 'per 100 g'), Text(0.4322134869406378, 1.011529288605434, 'per 100 ml/13.7 g')], [Text(-0.23575275940059748, -0.5517432704030054, '87.1%'), Text(0.2357528110585297, 0.5517432483302366, '12.9%')])



The pie charts above provide useful visualizations of the nutrient content in the Bebelac baby formula by stage. Those showing the distribution of nutrients by stage helps to understand the relative proportion of nutrients in each stage of the formula. This can be helpful in determining if there are any significant differences in nutrient composition between stages.
The pie chart of the total nutrient composition for each stage shows the overall nutrient content of each stage. This can be useful in determining if there are any stages that have a particularly high or low overall nutrient content, which could be relevant to the nutritional needs of infants in different age groups.
In the end, I was able to use Python to create a variety of visualizations to explore the nutrient levels in Bebelac baby formula Milk. The different plots and statistics helped to highlight the differences between the two stages and provide a better understanding of the formula’s nutrient composition, which can be helpful for both parents and healthcare professionals in making informed decisions about infant nutrition.
However, it is important to note that while some nutrients have a significant difference between the two stages, others have a smaller difference or no difference at all. Therefore, it is essential to choose the appropriate baby formula based on the specific needs of the baby and to consult with a healthcare provider for advice on proper nutrition for infants.