My data set is nutrition data about food, which chose from 100+ Interesting Date Sets for Statistics. The data is provided by USDA(United States Department of Agriculture).The dataset contains 50 nutrient contents of 8,618 different foods. The dataset can be devided into several food groups,such as “Dairy and Egg Products”,“Pork Products”,“Vegetables and Vegetable Products”,“Beef Products”,and “Fast Foods”.For this project, I decided to focus on the nutrition of Fast Foods. This is because nowadays, many people eat fast foods quiet often. Not only because they are convenient, but also because they are cheaper. However, fastfood is not good for our health and majority of them have very high colories, which lead to obesity,CVD, and hyperlipidemia.
I generate The data set of fast food has 304 observations and 5 variables. The first column is the short description of food. The second column is the Energy Kilocalorie of the food. The third column is Protein(g).The forth column is total lipid(g).The fifth column is Carbohydrate(g).
fastf.table <- read.table("fastfood.csv",header=T,sep=',')
attach(fastf.table)
head(fastf.table)
## Shrt_Desc Energ_Kcal Protein_.g.
## 1 FAST FOODS BISCUIT W/ EGG 274 8.53
## 2 FAST FOODS,BISCUIT,W/EGG&BACON 305 11.33
## 3 FAST FOODS,BISCUIT,W/EGG&HAM 233 10.64
## 4 BREAKFAST ITEMS,BISCUIT W/EGG&SAUSAGE 312 11.13
## 5 FAST FOODS,BISCUIT W/ EGG & STEAK 277 12.12
## 6 FAST FOODS,BISCUIT,W/EGG,CHS,&BACON 301 12.01
## Lipid_Tot_.g. Carbohydrt_.g.
## 1 16.23 23.46
## 2 20.73 19.06
## 3 14.08 16.37
## 4 20.77 21.05
## 5 19.21 14.37
## 6 17.48 24.44
plot(fastf.table[,2:5])
My independent variables are: Protein(g),Lipid Total(g),and Carbohydrate(g).
My dependent variable is Energy Kilocalorie.
My null hypothesis is that: the variation of Energy Kilocalorie is depend on randomness and cannot be explained by any of the four independent variables(Protein(g),Lipid Total(g),Carbohydrate(g)).
fastf1.lm <- lm(Energ_Kcal~Protein_.g.+Lipid_Tot_.g.+Carbohydrt_.g.)
summary(fastf1.lm)
##
## Call:
## lm(formula = Energ_Kcal ~ Protein_.g. + Lipid_Tot_.g. + Carbohydrt_.g.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -96.481 -18.069 -4.789 11.104 108.853
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.7836 6.2726 9.690 < 2e-16 ***
## Protein_.g. 2.5759 0.3169 8.127 8.63e-15 ***
## Lipid_Tot_.g. 9.4858 0.2594 36.570 < 2e-16 ***
## Carbohydrt_.g. 2.0889 0.1407 14.849 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.14 on 334 degrees of freedom
## Multiple R-squared: 0.8303, Adjusted R-squared: 0.8288
## F-statistic: 544.7 on 3 and 334 DF, p-value: < 2.2e-16
Hierarchical regression model is entering factor in a theoretically determined order. According to Atwater Factors, calories must be calculated using values per 100 grams for protein,fat and carbohydrate. So, I enter the factor in order of 1)Protein, 2)Lipid Total, 3)Carbohydrate.
fastf2.lm1 <- lm(Energ_Kcal~Protein_.g.)
fastf2.lm2 <- lm(Energ_Kcal~Protein_.g.+Lipid_Tot_.g.)
fastf2.lm3 <- lm(Energ_Kcal~Protein_.g.+Lipid_Tot_.g.+Carbohydrt_.g.)
summary(fastf2.lm1)
##
## Call:
## lm(formula = Energ_Kcal ~ Protein_.g.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -212.12 -41.05 4.09 36.26 353.22
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 225.3853 8.7312 25.81 < 2e-16 ***
## Protein_.g. 2.1895 0.6636 3.30 0.00107 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 71.79 on 336 degrees of freedom
## Multiple R-squared: 0.03139, Adjusted R-squared: 0.0285
## F-statistic: 10.89 on 1 and 336 DF, p-value: 0.001072
summary(fastf2.lm2)
##
## Call:
## lm(formula = Energ_Kcal ~ Protein_.g. + Lipid_Tot_.g.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -153.075 -21.916 1.128 26.587 190.756
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 124.4243 5.8927 21.115 <2e-16 ***
## Protein_.g. 0.4453 0.3636 1.225 0.222
## Lipid_Tot_.g. 9.5360 0.3337 28.577 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 38.78 on 335 degrees of freedom
## Multiple R-squared: 0.7182, Adjusted R-squared: 0.7166
## F-statistic: 427 on 2 and 335 DF, p-value: < 2.2e-16
summary(fastf2.lm3)
##
## Call:
## lm(formula = Energ_Kcal ~ Protein_.g. + Lipid_Tot_.g. + Carbohydrt_.g.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -96.481 -18.069 -4.789 11.104 108.853
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.7836 6.2726 9.690 < 2e-16 ***
## Protein_.g. 2.5759 0.3169 8.127 8.63e-15 ***
## Lipid_Tot_.g. 9.4858 0.2594 36.570 < 2e-16 ***
## Carbohydrt_.g. 2.0889 0.1407 14.849 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.14 on 334 degrees of freedom
## Multiple R-squared: 0.8303, Adjusted R-squared: 0.8288
## F-statistic: 544.7 on 3 and 334 DF, p-value: < 2.2e-16
anova(fastf2.lm1,fastf2.lm2,fastf2.lm3)
## Analysis of Variance Table
##
## Model 1: Energ_Kcal ~ Protein_.g.
## Model 2: Energ_Kcal ~ Protein_.g. + Lipid_Tot_.g.
## Model 3: Energ_Kcal ~ Protein_.g. + Lipid_Tot_.g. + Carbohydrt_.g.
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 336 1731633
## 2 335 503710 1 1227923 1351.75 < 2.2e-16 ***
## 3 334 303403 1 200307 220.51 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cor(fastf.table[,2:5])
## Energ_Kcal Protein_.g. Lipid_Tot_.g. Carbohydrt_.g.
## Energ_Kcal 1.0000000 0.1771651 0.84674741 0.22659671
## Protein_.g. 0.1771651 1.0000000 0.16787646 -0.45626974
## Lipid_Tot_.g. 0.8467474 0.1678765 1.00000000 -0.06517789
## Carbohydrt_.g. 0.2265967 -0.4562697 -0.06517789 1.00000000
fastf3.lm1 <- lm(Energ_Kcal~Lipid_Tot_.g.)
fastf3.lm2 <- lm(Energ_Kcal~Lipid_Tot_.g.+Carbohydrt_.g.)
fastf3.lm3 <- lm(Energ_Kcal~Lipid_Tot_.g.+Carbohydrt_.g.+Protein_.g.)
summary(fastf3.lm1)
##
## Call:
## lm(formula = Energ_Kcal ~ Lipid_Tot_.g.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -160.517 -19.980 1.347 27.503 188.220
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 128.7909 4.6952 27.43 <2e-16 ***
## Lipid_Tot_.g. 9.6046 0.3292 29.18 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 38.81 on 336 degrees of freedom
## Multiple R-squared: 0.717, Adjusted R-squared: 0.7161
## F-statistic: 851.2 on 1 and 336 DF, p-value: < 2.2e-16
summary(fastf3.lm2)
##
## Call:
## lm(formula = Energ_Kcal ~ Lipid_Tot_.g. + Carbohydrt_.g.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -144.730 -17.579 0.057 16.669 99.197
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 96.6342 4.8734 19.83 <2e-16 ***
## Lipid_Tot_.g. 9.8138 0.2800 35.05 <2e-16 ***
## Carbohydrt_.g. 1.5713 0.1371 11.46 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 32.94 on 335 degrees of freedom
## Multiple R-squared: 0.7967, Adjusted R-squared: 0.7955
## F-statistic: 656.5 on 2 and 335 DF, p-value: < 2.2e-16
summary(fastf3.lm3)
##
## Call:
## lm(formula = Energ_Kcal ~ Lipid_Tot_.g. + Carbohydrt_.g. + Protein_.g.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -96.481 -18.069 -4.789 11.104 108.853
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.7836 6.2726 9.690 < 2e-16 ***
## Lipid_Tot_.g. 9.4858 0.2594 36.570 < 2e-16 ***
## Carbohydrt_.g. 2.0889 0.1407 14.849 < 2e-16 ***
## Protein_.g. 2.5759 0.3169 8.127 8.63e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.14 on 334 degrees of freedom
## Multiple R-squared: 0.8303, Adjusted R-squared: 0.8288
## F-statistic: 544.7 on 3 and 334 DF, p-value: < 2.2e-16
anova(fastf3.lm1,fastf3.lm2,fastf3.lm3)
## Analysis of Variance Table
##
## Model 1: Energ_Kcal ~ Lipid_Tot_.g.
## Model 2: Energ_Kcal ~ Lipid_Tot_.g. + Carbohydrt_.g.
## Model 3: Energ_Kcal ~ Lipid_Tot_.g. + Carbohydrt_.g. + Protein_.g.
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 336 505966
## 2 335 363407 1 142559 156.935 < 2.2e-16 ***
## 3 334 303403 1 60004 66.055 8.633e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(Lipid_Tot_.g.,Energ_Kcal,pch = 21, bg = 'blue')
abline(fastf3.lm1)
fastC.lm <- lm(Energ_Kcal~Carbohydrt_.g.)
plot(Carbohydrt_.g.,Energ_Kcal,pch = 21, bg = 'blue')
abline(fastC.lm)
plot(Protein_.g.,Energ_Kcal,pch = 21, bg = 'blue')
abline(fastf2.lm1)
I plot the 3D scatterplot use two independent variables: lipid total(g) and carbohydrate(g).
library(scatterplot3d)
## Warning: package 'scatterplot3d' was built under R version 3.1.3
md <- scatterplot3d(Lipid_Tot_.g.,Carbohydrt_.g.,Energ_Kcal,pch = 21, main = "Regression plane",bg = 'blue',xlab = "Lipid total(g)", ylab = "carbohydrate(g)", zlab = "Energy Kcal",axis = TRUE)
md$plane3d(fastf3.lm2)
To be completed in the final version.