Data
After loading and manipulating the data, I wanted to calculate the
composition by nutrient for each stage, dividing each row of Stage 2
data frame and Stage 3 data frame by the sum of its row. Then calculated
the total nutrient composition for each stage to get at least a
descriptive statistics for each stage.
I wanted to calculate several descriptive statistics for each stage,
including the mean, standard deviation, minimum, maximum, and quartiles.
Therefore, I selected and flattened the data for both stages, and
bar-plotted the mean and standard deviation of each nutrient for each
stage.
## (array([0, 1]), [Text(0, 0, 'per 100 g'), Text(1, 0, 'per 100 ml/13.7 g')])

In the bar plot, the mean values are represented by the heights of
the bars, while the error bars extending from the top of each bar
indicate the standard deviation.
The horizontal bar plots display the results of two statistical
tests: the two-sample t-test and the ANOVA F-test. The t-test compares
the means of two groups, in this case, the nutrient composition of Stage
2 per 100g and Stage 3 per 100g. The ANOVA F-test compares the means of
more than two groups.
The first horizontal bar plot shows the results of the t-test,
displaying the t-statistic and p-value. The t-statistic measures the
difference between the means of the two groups relative to the variation
within each group. A large t-statistic indicates that the means of the
two groups are significantly different. The p-value is a measure of the
probability of obtaining a result as extreme as the observed t-statistic
by chance, assuming that the null hypothesis (i.e., the means of the two
groups are equal) is true. A small p-value (typically less than 0.05)
indicates that the observed difference between the means is
statistically significant, and the null hypothesis should be
rejected.
The second horizontal bar plot shows the results of the ANOVA
F-test, displaying the F-statistic and p-value. The F-statistic measures
the variation between the means of the groups relative to the variation
within each group. A large F-statistic indicates that the means of the
groups are significantly different. The p-value is a measure of the
probability of obtaining a result as extreme as the observed F-statistic
by chance, assuming that the null hypothesis (i.e., the means of all
groups are equal) is true. A small p-value (typically less than 0.05)
indicates that the observed difference between the means is
statistically significant, and the null hypothesis should be
rejected.
In this case, both, the t-test and F-test, results show that there
is no significant difference in the nutrient composition between stage 2
and stage 3 of Bebelac milk, since the p-values are greater than 0.05
(the common threshold for statistical significance). Therefore, we can
generally conclude that the nutrient composition is similar between the
two stages of Bebelac milk.
Now let’s create some more detailed visualizations to compare the
nutrient levels and nutritional composition of the two stages.
I will start by creating a horizontal bar chart that shows the
nutrient levels for each stage.
I set the y-axis scale to logarithmic to better visualize the
differences in nutrient levels.

As we can see, the chart shows that some nutrients are present in
higher levels in one stage than in the other.
Now let’s create a grouped bar chart that compares the nutritional
composition of each nutrient between the two stages.

This chart shows how the proportions of each nutrient differ between
the two stages, which can give us a better understanding of how the
formulas differ in terms of their nutritional content.
The plot compares nutrient levels in stage 2 and stage 3 and
visualizes the relationship between them through the scatters. The
nutrient levels are color-coded based on their values and compared using
a normalized colormap. A trendline is added to the plot using the
seaborn library to represent the relationship between the nutrient
levels in stage 2 and stage 3.
The plot also shows the correlation coefficient between the nutrient
levels in the two stages. A correlation coefficient of 1.00 indicates a
perfect positive correlation between the nutrient levels in the two
stages. This means that there is a strong linear relationship between
the nutrient levels in the two stages, and that as the nutrient levels
increase in one stage, they also increase proportionally in the other
stage. In other words, the two variables move in perfect tandem with one
another.
It is important to note that a correlation of 1.00 does not
necessarily imply causation, but it does suggest that there is a strong
relationship between the two stages of the milk.
To draw more accurate conclusions about nutrient levels and make
more informed decisions based on my data, I thought to visualize the
data through another scatter plot, but this time with confidence
intervals.
We can use the confidence intervals to compare the mean nutrient
levels between the two stages and determine if there is a statistically
significant difference between them.
I first calculated the mean nutrient levels, standard deviations,
and standard errors of the mean for the two stages (stage 2 and stage 3)
and then calculated the 95% confidence intervals for each stage. Plotted
the nutrient level data for both stages as a scatter plot with
color-coded nutrient levels, and overlay-ed the 95% confidence intervals
for each stage as vertical red lines. The x-axis shows the stage numbers
(2 and 3), and the y-axis shows the nutrient levels on a logarithmic
scale.

In this plot, the confidence intervals are calculated for the mean
nutrient levels of both stages (2 and 3), and are shown in the plot as
red lines.
As we can tell, the confidence intervals for the two stages of the
milk overlap significantly, which suggests that there may not be a
statistically significant difference between the means of the two
groups, but it does not necessarily mean that there is no difference
between the groups. The true difference between the means may still
exist but could be too small to detect with the current sample size or
statistical method used. Therefore, it is important to interpret the
results carefully and consider other factors such as effect size, sample
size, and statistical power when interpreting statistical results.
From the heat map above, we can compare the nutrient levels between
stages and identify any significant differences. For example, we can see
that the mean nutrient levels for most nutrients are higher in stage 3
compared to stage 2, except for the lactose, Carbs and protein, which
has a higher mean nutrient level in stage 2. The fact that the mean
nutrient levels for most nutrients are higher in stage 3 compared to
stage 2, except for lactose, carbs, and protein, indicates that there
may be some important differences in the nutrient profiles of these two
stages.
Additionally, the lower standard deviation of most nutrients in
stage 3 suggests that the nutrient levels are more consistent or less
variable in this stage compared to stage 2. This could be an important
factor to consider when evaluating the quality or nutritional value of
these products.
However, it’s important to note that the differences in nutrient
levels and standard deviations between the two stages may not
necessarily be significant without further statistical analysis. It’s
also important to consider other factors that may affect the nutrient
levels, such as differences in production methods or ingredient
sourcing.
Next, I wanted to create a box plot and a violin plot side-by-side,
showing the distribution of nutrient composition for each nutrient in
the concatenated DataFrame, which includes data from both stages of
Bebelac milk.

The box plot shows the distribution of nutrient composition for each
nutrient by displaying the median (line in the box), the interquartile
range (the box), the range of the data (the whiskers), and any outliers
(points outside the whiskers).
The violin plot shows the distribution of nutrient composition for
each nutrient as a kernel density plot mirrored and rotated around a
central axis. The width of the violin at any given point represents the
density of the data at that point.
Together, these plots give us a visual representation of the spread
and density of the nutrient composition for each nutrient. We can see
the distribution of the data, the presence of outliers, and the central
tendency of each nutrient. By comparing the plots for each nutrient, we
can also observe any differences in the distribution of nutrient
composition between the two stages of Bebelac milk.
The stacked bar chart is important because it visually displays the
proportion of carbohydrates, fats, and proteins in each stage. This
information can be crucial for parents and caregivers who need to ensure
that infants are getting the proper nutrition they need to grow and
develop.
The chart allows for an easy comparison between the two stages,
showing how the distribution of macronutrients changes as the infant
grows and requires different nutritional needs. For example, in the
stage 2 formula, the proportion of carbohydrates is a bit higher than in
stage 3, reflecting the need for more energy in the early stages of
growth.
Overall, the stacked bar chart is a helpful tool for understanding
the nutritional composition of infant formula, and can assist parents
and caregivers in making informed decisions about their child’s
diet.
To further explore the distribution of nutrients by stage, I used
the Pandas to sum the nutrient values for each stage and then plotted
the data using a pie chart. I also wanted to compare the total nutrient
composition of each stage. This will help to show the overall nutrient
composition of each stage and identify which nutrients were the most
prevalent.
## ([<matplotlib.patches.Wedge object at 0x7fb4dbe05160>, <matplotlib.patches.Wedge object at 0x7fb4da0a92e0>], [Text(-0.43464003332485, -1.0104890110393945, 'per 100 g'), Text(0.43464012793365736, 1.0104889703454532, 'per 100 ml/13.7 g')], [Text(-0.23707638181355453, -0.551175824203306, '87.1%'), Text(0.23707643341835855, 0.5511758020066108, '12.9%')])
## ([<matplotlib.patches.Wedge object at 0x7fb4da0a9fd0>, <matplotlib.patches.Wedge object at 0x7fb4da0c7130>], [Text(-0.39832924776739886, -1.0253457028598005, 'per 100 g'), Text(0.39832924776739903, 1.0253457028598005, 'per 100 ml/13.7 g')], [Text(-0.21727049878221755, -0.5592794742871638, '88.2%'), Text(0.21727049878221763, 0.5592794742871638, '11.8%')])
## ([<matplotlib.patches.Wedge object at 0x7fb4da0d0940>, <matplotlib.patches.Wedge object at 0x7fb4da0e14f0>], [Text(-0.4466808134899834, -1.0052244778455839, 'per 100 g'), Text(0.4466807193740737, 1.0052245196668554, 'per 100 ml/13.7 g')], [Text(-0.24364408008544547, -0.5483042606430457, '86.7%'), Text(0.2436440287494947, 0.5483042834546484, '13.3%')])
## ([<matplotlib.patches.Wedge object at 0x7fb4da0e1f70>, <matplotlib.patches.Wedge object at 0x7fb4d99e91c0>], [Text(-0.4322133922344288, -1.0115293290721767, 'per 100 g'), Text(0.4322134869406378, 1.011529288605434, 'per 100 ml/13.7 g')], [Text(-0.23575275940059748, -0.5517432704030054, '87.1%'), Text(0.2357528110585297, 0.5517432483302366, '12.9%')])

The pie charts above provide useful visualizations of the nutrient
content in the Bebelac baby formula by stage. Those showing the
distribution of nutrients by stage helps to understand the relative
proportion of nutrients in each stage of the formula. This can be
helpful in determining if there are any significant differences in
nutrient composition between stages.
The pie chart of the total nutrient composition for each stage shows
the overall nutrient content of each stage. This can be useful in
determining if there are any stages that have a particularly high or low
overall nutrient content, which could be relevant to the nutritional
needs of infants in different age groups.
In the end, I was able to use Python to create a variety of
visualizations to explore the nutrient levels in Bebelac baby formula
Milk. The different plots and statistics helped to highlight the
differences between the two stages and provide a better understanding of
the formula’s nutrient composition, which can be helpful for both
parents and healthcare professionals in making informed decisions about
infant nutrition.