dataset

My data set is nutrition data about food, which chose from 100+ Interesting Date Sets for Statistics. The data is provided by USDA(United States Department of Agriculture).The dataset contains 50 nutrient contents of 8,618 different foods. The dataset can be devided into several food groups,such as “Dairy and Egg Products”,“Pork Products”,“Vegetables and Vegetable Products”,“Beef Products”,and “Fast Foods”.For this project, I decided to focus on the nutrition of Fast Foods. This is because nowadays, many people eat fast foods quiet often. Not only because they are convenient, but also because they are cheaper. However, fastfood is not good for our health and majority of them have very high colories, which lead to obesity,CVD, and hyperlipidemia.

I generate The data set of fast food has 304 observations and 5 variables. The first column is the short description of food. The second column is the Energy Kilocalorie of the food. The third column is Protein(g).The forth column is total lipid(g).The fifth column is Carbohydrate(g).

fastf.table <- read.table("fastfood.csv",header=T,sep=',')
attach(fastf.table)
head(fastf.table)
##                               Shrt_Desc Energ_Kcal Protein_.g.
## 1           FAST FOODS  BISCUIT  W/ EGG        274        8.53
## 2        FAST FOODS,BISCUIT,W/EGG&BACON        305       11.33
## 3          FAST FOODS,BISCUIT,W/EGG&HAM        233       10.64
## 4 BREAKFAST ITEMS,BISCUIT W/EGG&SAUSAGE        312       11.13
## 5     FAST FOODS,BISCUIT W/ EGG & STEAK        277       12.12
## 6   FAST FOODS,BISCUIT,W/EGG,CHS,&BACON        301       12.01
##   Lipid_Tot_.g. Carbohydrt_.g.
## 1         16.23          23.46
## 2         20.73          19.06
## 3         14.08          16.37
## 4         20.77          21.05
## 5         19.21          14.37
## 6         17.48          24.44
plot(fastf.table[,2:5])

Independent variable

My independent variables are: Protein(g),Lipid Total(g),and Carbohydrate(g).

Dependent variable

My dependent variable is Energy Kilocalorie.

Null hypothesis \(H_0\)

My null hypothesis is that: the variation of Energy Kilocalorie is depend on randomness and cannot be explained by any of the four independent variables(Protein(g),Lipid Total(g),Carbohydrate(g)).

Multiple Linear model

Entry-wise

fastf1.lm <- lm(Energ_Kcal~Protein_.g.+Lipid_Tot_.g.+Carbohydrt_.g.)
summary(fastf1.lm)
## 
## Call:
## lm(formula = Energ_Kcal ~ Protein_.g. + Lipid_Tot_.g. + Carbohydrt_.g.)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -96.481 -18.069  -4.789  11.104 108.853 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     60.7836     6.2726   9.690  < 2e-16 ***
## Protein_.g.      2.5759     0.3169   8.127 8.63e-15 ***
## Lipid_Tot_.g.    9.4858     0.2594  36.570  < 2e-16 ***
## Carbohydrt_.g.   2.0889     0.1407  14.849  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.14 on 334 degrees of freedom
## Multiple R-squared:  0.8303, Adjusted R-squared:  0.8288 
## F-statistic: 544.7 on 3 and 334 DF,  p-value: < 2.2e-16

Hierarchical

Hierarchical regression model is entering factor in a theoretically determined order. According to Atwater Factors, calories must be calculated using values per 100 grams for protein,fat and carbohydrate. So, I enter the factor in order of 1)Protein, 2)Lipid Total, 3)Carbohydrate.

fastf2.lm1 <- lm(Energ_Kcal~Protein_.g.)
fastf2.lm2 <- lm(Energ_Kcal~Protein_.g.+Lipid_Tot_.g.)
fastf2.lm3 <- lm(Energ_Kcal~Protein_.g.+Lipid_Tot_.g.+Carbohydrt_.g.)

summary(fastf2.lm1)
## 
## Call:
## lm(formula = Energ_Kcal ~ Protein_.g.)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -212.12  -41.05    4.09   36.26  353.22 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 225.3853     8.7312   25.81  < 2e-16 ***
## Protein_.g.   2.1895     0.6636    3.30  0.00107 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 71.79 on 336 degrees of freedom
## Multiple R-squared:  0.03139,    Adjusted R-squared:  0.0285 
## F-statistic: 10.89 on 1 and 336 DF,  p-value: 0.001072
summary(fastf2.lm2)
## 
## Call:
## lm(formula = Energ_Kcal ~ Protein_.g. + Lipid_Tot_.g.)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -153.075  -21.916    1.128   26.587  190.756 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   124.4243     5.8927  21.115   <2e-16 ***
## Protein_.g.     0.4453     0.3636   1.225    0.222    
## Lipid_Tot_.g.   9.5360     0.3337  28.577   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 38.78 on 335 degrees of freedom
## Multiple R-squared:  0.7182, Adjusted R-squared:  0.7166 
## F-statistic:   427 on 2 and 335 DF,  p-value: < 2.2e-16
summary(fastf2.lm3)
## 
## Call:
## lm(formula = Energ_Kcal ~ Protein_.g. + Lipid_Tot_.g. + Carbohydrt_.g.)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -96.481 -18.069  -4.789  11.104 108.853 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     60.7836     6.2726   9.690  < 2e-16 ***
## Protein_.g.      2.5759     0.3169   8.127 8.63e-15 ***
## Lipid_Tot_.g.    9.4858     0.2594  36.570  < 2e-16 ***
## Carbohydrt_.g.   2.0889     0.1407  14.849  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.14 on 334 degrees of freedom
## Multiple R-squared:  0.8303, Adjusted R-squared:  0.8288 
## F-statistic: 544.7 on 3 and 334 DF,  p-value: < 2.2e-16
anova(fastf2.lm1,fastf2.lm2,fastf2.lm3)
## Analysis of Variance Table
## 
## Model 1: Energ_Kcal ~ Protein_.g.
## Model 2: Energ_Kcal ~ Protein_.g. + Lipid_Tot_.g.
## Model 3: Energ_Kcal ~ Protein_.g. + Lipid_Tot_.g. + Carbohydrt_.g.
##   Res.Df     RSS Df Sum of Sq       F    Pr(>F)    
## 1    336 1731633                                   
## 2    335  503710  1   1227923 1351.75 < 2.2e-16 ***
## 3    334  303403  1    200307  220.51 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Step-wise

cor(fastf.table[,2:5])
##                Energ_Kcal Protein_.g. Lipid_Tot_.g. Carbohydrt_.g.
## Energ_Kcal      1.0000000   0.1771651    0.84674741     0.22659671
## Protein_.g.     0.1771651   1.0000000    0.16787646    -0.45626974
## Lipid_Tot_.g.   0.8467474   0.1678765    1.00000000    -0.06517789
## Carbohydrt_.g.  0.2265967  -0.4562697   -0.06517789     1.00000000
fastf3.lm1 <- lm(Energ_Kcal~Lipid_Tot_.g.)
fastf3.lm2 <- lm(Energ_Kcal~Lipid_Tot_.g.+Carbohydrt_.g.)
fastf3.lm3 <- lm(Energ_Kcal~Lipid_Tot_.g.+Carbohydrt_.g.+Protein_.g.)

summary(fastf3.lm1)
## 
## Call:
## lm(formula = Energ_Kcal ~ Lipid_Tot_.g.)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -160.517  -19.980    1.347   27.503  188.220 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   128.7909     4.6952   27.43   <2e-16 ***
## Lipid_Tot_.g.   9.6046     0.3292   29.18   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 38.81 on 336 degrees of freedom
## Multiple R-squared:  0.717,  Adjusted R-squared:  0.7161 
## F-statistic: 851.2 on 1 and 336 DF,  p-value: < 2.2e-16
summary(fastf3.lm2)
## 
## Call:
## lm(formula = Energ_Kcal ~ Lipid_Tot_.g. + Carbohydrt_.g.)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -144.730  -17.579    0.057   16.669   99.197 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     96.6342     4.8734   19.83   <2e-16 ***
## Lipid_Tot_.g.    9.8138     0.2800   35.05   <2e-16 ***
## Carbohydrt_.g.   1.5713     0.1371   11.46   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.94 on 335 degrees of freedom
## Multiple R-squared:  0.7967, Adjusted R-squared:  0.7955 
## F-statistic: 656.5 on 2 and 335 DF,  p-value: < 2.2e-16
summary(fastf3.lm3)
## 
## Call:
## lm(formula = Energ_Kcal ~ Lipid_Tot_.g. + Carbohydrt_.g. + Protein_.g.)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -96.481 -18.069  -4.789  11.104 108.853 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     60.7836     6.2726   9.690  < 2e-16 ***
## Lipid_Tot_.g.    9.4858     0.2594  36.570  < 2e-16 ***
## Carbohydrt_.g.   2.0889     0.1407  14.849  < 2e-16 ***
## Protein_.g.      2.5759     0.3169   8.127 8.63e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.14 on 334 degrees of freedom
## Multiple R-squared:  0.8303, Adjusted R-squared:  0.8288 
## F-statistic: 544.7 on 3 and 334 DF,  p-value: < 2.2e-16
anova(fastf3.lm1,fastf3.lm2,fastf3.lm3)
## Analysis of Variance Table
## 
## Model 1: Energ_Kcal ~ Lipid_Tot_.g.
## Model 2: Energ_Kcal ~ Lipid_Tot_.g. + Carbohydrt_.g.
## Model 3: Energ_Kcal ~ Lipid_Tot_.g. + Carbohydrt_.g. + Protein_.g.
##   Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
## 1    336 505966                                   
## 2    335 363407  1    142559 156.935 < 2.2e-16 ***
## 3    334 303403  1     60004  66.055 8.633e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Plot the Regression Line

Lipid Total(g)

plot(Lipid_Tot_.g.,Energ_Kcal,pch = 21, bg = 'blue')
abline(fastf3.lm1)

Carbohydrate(g)

fastC.lm <- lm(Energ_Kcal~Carbohydrt_.g.)
plot(Carbohydrt_.g.,Energ_Kcal,pch = 21, bg = 'blue')
abline(fastC.lm)

Protein(g)

plot(Protein_.g.,Energ_Kcal,pch = 21, bg = 'blue')
abline(fastf2.lm1)

3D Scatterplot

I plot the 3D scatterplot use two independent variables: lipid total(g) and carbohydrate(g).

library(scatterplot3d)
## Warning: package 'scatterplot3d' was built under R version 3.1.3
md <- scatterplot3d(Lipid_Tot_.g.,Carbohydrt_.g.,Energ_Kcal,pch = 21, main = "Regression plane",bg = 'blue',xlab = "Lipid total(g)", ylab = "carbohydrate(g)", zlab = "Energy Kcal",axis = TRUE)

md$plane3d(fastf3.lm2)

95% Confidence Intervals

Interpret

To be completed in the final version.