Using Multi-Level Regression Models to look at cereal rating differences

Jacqueline Nosrati

Overview

Cereals are the go-to choice food for many individuals as part of their breakfast meal. There are studies that suggest that eating breakfast cereal and breakfast in general can reduce the likelihood of obesety (Hunty, Gibson, and Ashwell 2013). It can also help reduce decline in attention span and memory over the morning period. According to a 2003 study researchers tested school children’s memory and attention levels during the morning by giving them different kinds of breakfasts (either 2 different kinds of cereals, a sugary drink or no breakfast) and found that those students who had cereal for breakfast tended to have less reduced memory and attention spans in the morning compared to those who had a sugary drink or no breakfast at all(Wesnes et al. 2003).

One has to consider however that many cereals tend to have high sugar amounts and more calories, which can do more harm in the long run and might lead to obesity, diabetes and exhibiting lower concentration levels. To address some of the health concerns and consumer demands there is a current trend among cereal manufacturers to reduce the sugar and calorie amounts in their products and offer healthier cereal choices. Additionally, there is an effort among cereal manufacturers to increase the amount of whole grains used in cereal manufacturing(Thomas et al. 2013).

In this week’s assignment I will use nutrition data of cereals from different cereal manufacturers to look at cereal ratings and whether the amounts of sugars and number of calories affect cereal ratings.

Load data set

For the analysis the following variables were selected cereal name, manufacturer, calories, sugars, and rating.

Recode mfr (manufacturer) variable into factor

For the manufacturer variable A stands for American Home Food Products, G for General Mills, K for Kellogs, N for Nabisco,P for Post, Q for Quaker Oats and R for Ralston Purina. We convert this variable into a factor.

Cereal3
name mfr calories sugars rating n_manufacturer
100% Bran N 70 6 68.402973 4
100% Natural Bran Q 120 8 33.983679 6
All-Bran K 70 5 59.425505 3
All-Bran with Extra Fiber K 50 0 93.704912 3
Almond Delight R 110 8 34.384843 7
Apple Cinnamon Cheerios G 110 10 29.509541 2

First, we look at how many how many cereal manufacturers exits and how many cereals belong to each manufacturer.

[1] 7

We generated the average rating, calories and sugars amounts per manufacturer. We see that Manufacturer 4 (Nabisco) has the highest average cereal rating. We also see that Manufacturer 2 (General Mills) has the highest average calorie content and that that Manufacturer 5 (Post) has the highest average sugar amount.

Cereal Manufacturers & Cereal Ratings

Complete Pooling Models


Call:
lm(formula = rating ~ calories, data = Cereal3)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.7201  -7.9317  -0.6678   5.9902  23.4161 

Coefficients:
            Estimate Std. Error t value             Pr(>|t|)    
(Intercept) 95.78802    6.55057  14.623 < 0.0000000000000002 ***
calories    -0.49701    0.06031  -8.241     0.00000000000414 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.24 on 75 degrees of freedom
Multiple R-squared:  0.4752,    Adjusted R-squared:  0.4682 
F-statistic: 67.92 on 1 and 75 DF,  p-value: 0.00000000000414

The output for the first complete pooling model shows that the average cereal rating (without accounting for differences among cereal manufacturers) is 95.79 if a cereal had 0 calories and that for each additional calorie the cereal rating decreases on average by 0.5. We see that the results are statistically significant at a 99% confidence level. To get a better look at rating differences we add for the next complete pooling model the amount of sugars into the model.


Call:
lm(formula = rating ~ sugars, data = Cereal3)

Residuals:
    Min      1Q  Median      3Q     Max 
-17.853  -5.677  -1.439   5.160  34.421 

Coefficients:
            Estimate Std. Error t value             Pr(>|t|)    
(Intercept)  59.2844     1.9485   30.43 < 0.0000000000000002 ***
sugars       -2.4008     0.2373  -10.12  0.00000000000000115 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.196 on 75 degrees of freedom
Multiple R-squared:  0.5771,    Adjusted R-squared:  0.5715 
F-statistic: 102.3 on 1 and 75 DF,  p-value: 0.000000000000001153

The output for the second complete pooling model shows that the average cereal rating (without accounting for differences among cereal manufacturers) is 59.28 if a cereal had 0 grams of sugar and that for each additional gram of sugar the cereal rating decreases on average by 2.4 We see that the results are statistically significant at a 99% confidence level.


Call:
lm(formula = rating ~ calories + sugars, data = Cereal3)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.643  -6.339  -1.221   4.823  23.413 

Coefficients:
            Estimate Std. Error t value             Pr(>|t|)    
(Intercept) 84.11417    5.44513  15.448 < 0.0000000000000002 ***
calories    -0.27644    0.05755  -4.804        0.00000793294 ***
sugars      -1.71939    0.25225  -6.816        0.00000000216 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.083 on 74 degrees of freedom
Multiple R-squared:  0.6776,    Adjusted R-squared:  0.6689 
F-statistic: 77.78 on 2 and 74 DF,  p-value: < 0.00000000000000022

The output for the third complete pooling model shows that the average cereal rating (without accounting for differences among cereal manufacturers) is 84.11 if a cereal had 0 calories and 0 grams sugar and that for each additional calorie the cereal rating decreases on average by 0.28 and for each additional sugar gram the cereal rating decreases on average by 1.72. Again, we see that the results are statistically significant at a 99% confidence level. Finally, we will look at rating differences with an interaction between calories and sugar amounts for the complete pooling model


Call:
lm(formula = rating ~ calories * sugars, data = Cereal3)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.1761  -5.3793   0.1491   4.8486  15.7521 

Coefficients:
                  Estimate Std. Error t value             Pr(>|t|)    
(Intercept)     101.659340   7.560213  13.447 < 0.0000000000000002 ***
calories         -0.454542   0.078219  -5.811          0.000000151 ***
sugars           -5.038979   1.075503  -4.685          0.000012641 ***
calories:sugars   0.031056   0.009812   3.165              0.00226 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.631 on 73 degrees of freedom
Multiple R-squared:  0.7165,    Adjusted R-squared:  0.7049 
F-statistic: 61.51 on 3 and 73 DF,  p-value: < 0.00000000000000022

In the last complete pooling model we see that on their own calories and sugar amounts are still statistically significant at a 99% confidence level, however the interaction between the two indepedent variables is only statistically significant at a 95% confidence level.

Comparing the 4 complete pooling models

Comparing the Models
Model 1 Model 2 Model 3 Model 4
(Intercept) 95.79*** 59.28*** 84.11*** 101.66***
(6.55) (1.95) (5.45) (7.56)
calories -0.50*** -0.28*** -0.45***
(0.06) (0.06) (0.08)
sugars -2.40*** -1.72*** -5.04***
(0.24) (0.25) (1.08)
calories:sugars 0.03**
(0.01)
R2 0.48 0.58 0.68 0.72
Adj. R2 0.47 0.57 0.67 0.70
Num. obs. 77 77 77 77
RMSE 10.24 9.20 8.08 7.63
p < 0.001, p < 0.01, p < 0.05

When we compare the four complete pooling models we see that the fourth model is the best fitting model with an R2 of 0.72 (which is the highest amongst the three models).

No Pooling Model

Now we use the No pooling model to look at manufacturer differences in ratings. We see that large differences in cereal ratings among different manufacturers. The distribution looks somewhat normally distributed.

When looking at rating differences with respect to calories we now see a slightly right skewed distribution as the number of calories increase.

Random Intercept Model

Linear mixed-effects model fit by maximum likelihood
 Data: Cereal3 
       AIC      BIC    logLik
  568.5035 577.8787 -280.2517

Random effects:
 Formula: ~1 | n_manufacturer
        (Intercept) Residual
StdDev:    6.374507 8.503316

Fixed effects: rating ~ calories 
               Value Std.Error DF   t-value p-value
(Intercept) 91.86110  6.322652 69 14.528886       0
calories    -0.44305  0.054472 69 -8.133517       0
 Correlation: 
         (Intr)
calories -0.899

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.76923784 -0.74835203  0.05835578  0.72828773  2.78544717 

Number of Observations: 77
Number of Groups: 7 

Random slope Model

Linear mixed-effects model fit by maximum likelihood
 Data: Cereal3 
       AIC      BIC    logLik
  550.0402 559.4154 -271.0201

Random effects:
 Formula: ~1 | n_manufacturer
        (Intercept) Residual
StdDev:    5.674263 7.540553

Fixed effects: rating ~ sugars 
               Value Std.Error DF   t-value p-value
(Intercept) 58.96186 2.7849719 69  21.17144       0
sugars      -2.17261 0.2126363 69 -10.21750       0
 Correlation: 
       (Intr)
sugars -0.469

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.76232047 -0.66734610  0.08421924  0.50447067  4.42116697 

Number of Observations: 77
Number of Groups: 7 
Linear mixed-effects model fit by maximum likelihood
 Data: Cereal3 
       AIC      BIC    logLik
  527.6888 539.4078 -258.8444

Random effects:
 Formula: ~1 | n_manufacturer
        (Intercept) Residual
StdDev:    5.195241  6.40292

Fixed effects: rating ~ calories + sugars 
               Value Std.Error DF   t-value p-value
(Intercept) 81.93156  5.038979 68 16.259557       0
calories    -0.25440  0.048437 68 -5.252119       0
sugars      -1.58308  0.213238 68 -7.424006       0
 Correlation: 
         (Intr) calors
calories -0.868       
sugars    0.262 -0.520

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.85332392 -0.62335188 -0.06762691  0.59790039  3.57112139 

Number of Observations: 77
Number of Groups: 7 
Linear mixed-effects model fit by maximum likelihood
 Data: Cereal3 
       AIC      BIC    logLik
  518.1851 536.9356 -251.0926

Random effects:
 Formula: ~sugars | n_manufacturer
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev        Corr  
(Intercept) 5.44074510787 (Intr)
sugars      0.00005841409 0     
Residual    5.72159189529       

Fixed effects: rating ~ calories * sugars 
                    Value Std.Error DF   t-value p-value
(Intercept)     100.89399  6.624025 67 15.231522  0.0000
calories         -0.44511  0.064296 67 -6.922701  0.0000
sugars           -5.05465  0.877583 67 -5.759745  0.0000
calories:sugars   0.03222  0.007927 67  4.064667  0.0001
 Correlation: 
                (Intr) calors sugars
calories        -0.924              
sugars          -0.651  0.639       
calories:sugars  0.708 -0.734 -0.976

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.0230039 -0.7014219  0.1141536  0.6372830  2.3697122 

Number of Observations: 77
Number of Groups: 7 

Final Model Comparison

Comparing the Models
Model 1 Model 2 Model 3 Model 4
(Intercept) 91.86*** 58.96*** 81.93*** 100.89***
(6.32) (2.78) (5.04) (6.62)
calories -0.44*** -0.25*** -0.45***
(0.05) (0.05) (0.06)
sugars -2.17*** -1.58*** -5.05***
(0.21) (0.21) (0.88)
calories:sugars 0.03***
(0.01)
AIC 568.50 550.04 527.69 518.19
BIC 577.88 559.42 539.41 536.94
Log Likelihood -280.25 -271.02 -258.84 -251.09
Num. obs. 77 77 77 77
Num. groups 7 7 7 7
p < 0.001, p < 0.01, p < 0.05

Comparing the models, we see that the last model that accounts for manufacturer differences is the best fit model since it has the lowest AIC and BIC (with 518.19 and 536.94, respectively). Looking at the m4_crating model we see that there is a difference of 5.44 in cereal ratings when keeping the amoung of sugars constant at the higher-level (manufacturer level). We also see that there is a 5.72 rating difference after keeping cereal manufacturer and sugar amount constant.

Bibliography

Hunty, Anne de la, Sigrid Gibson, and Margaret Ashwell. 2013. “Does Regular Breakfast Cereal Consumption Help Children and Adolescents Stay Slimmer? A Systematic Review and Meta-Analysis.” Obesity Facts 6 (1). Karger Publishers: 70–85.

Thomas, Robin G, Pamela R Pehrsson, Jaspreet KC Ahuja, Erin Smieja, and Kevin B Miller. 2013. “Recent Trends in Ready-to-Eat Breakfast Cereals in the Us.” Procedia Food Science 2. Elsevier: 20–26.

Wesnes, Keith A, Claire Pincock, David Richardson, Gareth Helm, and Simon Hails. 2003. “Breakfast Reduces Declines in Attention and Memory over the Morning in Schoolchildren.” Appetite 41 (3). Elsevier: 329–31.

