fat <- read.csv("fat-meat-country.csv", h = T)
head(fat)
## Country Year Obesity Meat GDP Working.Hours Life.Expectancy
## 1 China 1975 0.4 29.0714 1594 1974.898 63.915
## 2 China 1976 0.5 28.7700 1519 1974.207 64.631
## 3 China 1977 0.5 28.9344 1583 1973.435 65.278
## 4 China 1978 0.5 30.8798 1744 1972.727 65.857
## 5 China 1979 0.5 36.5790 1859 1972.104 66.377
## 6 China 1980 0.6 39.9492 1930 1971.497 66.844
str(fat)
## 'data.frame': 156 obs. of 7 variables:
## $ Country : chr "China" "China" "China" "China" ...
## $ Year : int 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 ...
## $ Obesity : num 0.4 0.5 0.5 0.5 0.5 0.6 0.6 0.6 0.7 0.7 ...
## $ Meat : num 29.1 28.8 28.9 30.9 36.6 ...
## $ GDP : num 1594 1519 1583 1744 1859 ...
## $ Working.Hours : num 1975 1974 1973 1973 1972 ...
## $ Life.Expectancy: num 63.9 64.6 65.3 65.9 66.4 ...
Data from World in data there are 4 countries as category and with each of them there’s obesity rate(%) and meat consumption(kg) from 1975-2013
convert char to factor variables
fat$Country <- as.factor(fat$Country)
ggplot(aes(y = Obesity, x = Meat, color = Country), data = fat) +
geom_point() +
geom_smooth(method = lm, se = F) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
From the chart we can see each country has it’s own different relation on meat consumption and obesity rate. China, USA and UK shows a positive correlation between meat consumption and obesity. And as in USA and UK, meat consumption is a strong indicator of obesity rate. As in UAE, it’s surprised that meat consumption and obesity rate has a negative correlation. Possible cause might be different cuisine, culture or religion.
fatmod <- lm(Obesity ~ Meat + Country, data = fat)
summary(fatmod)
##
## Call:
## lm(formula = Obesity ~ Meat + Country, data = fat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3016 -3.4382 -0.4874 2.3112 13.1494
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.70068 1.43295 -1.885 0.061393 .
## Meat 0.05088 0.01298 3.919 0.000134 ***
## CountryUnited Arab Emirates 9.30814 1.92384 4.838 3.20e-06 ***
## CountryUnited Kingdom 9.52275 1.85803 5.125 8.97e-07 ***
## CountryUnited States 8.90141 3.12237 2.851 0.004971 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.934 on 151 degrees of freedom
## Multiple R-squared: 0.721, Adjusted R-squared: 0.7136
## F-statistic: 97.55 on 4 and 151 DF, p-value: < 2.2e-16
R-squared .721 value indicate that the regression model accounts for 72.1% of the variability in the outcome measure.
hist( x = residuals(fatmod),
xlab = "Value of residual",
main = "",
breaks = 20)
the residuals are normally distributed so the model predicts well
plot(fatmod)
residualPlots(fatmod)
## Test stat Pr(>|Test stat|)
## Meat 4.0106 9.522e-05 ***
## Country
## Tukey test 2.7313 0.006308 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggplot(aes(y = Meat, x = GDP, color = Country), data = fat) +
geom_point() +
geom_smooth(method = lm, se = F) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
GDPmod <- lm(Meat ~ GDP + Country, data = fat)
summary(GDPmod)
##
## Call:
## lm(formula = Meat ~ GDP + Country, data = fat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -70.812 -15.893 -2.459 18.453 82.994
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.047e+01 5.094e+00 17.762 < 2e-16 ***
## GDP 3.529e-04 2.678e-04 1.318 0.19
## CountryUnited Arab Emirates 1.075e+02 1.271e+01 8.454 2.38e-14 ***
## CountryUnited Kingdom 1.061e+02 9.390e+00 11.299 < 2e-16 ***
## CountryUnited States 2.122e+02 1.170e+01 18.132 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.87 on 149 degrees of freedom
## (因為不存在,2 個觀察量被刪除了)
## Multiple R-squared: 0.8742, Adjusted R-squared: 0.8709
## F-statistic: 259 on 4 and 149 DF, p-value: < 2.2e-16
hist( x = residuals(GDPmod),
xlab = "Value of residual",
main = "",
breaks = 20)
As GDP grows higher, in USA, UK and China, people consume more meat. But as in UAE, it doesn’t seem that way
ggplot(aes(y = Working.Hours, x = Meat, color = Country), data = fat) +
geom_point() +
geom_smooth(method = lm, se = F) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 39 rows containing non-finite values (stat_smooth).
## Warning: Removed 39 rows containing missing values (geom_point).
WHmod <- lm(Working.Hours ~ Meat + Country, data = fat)
summary(WHmod)
##
## Call:
## lm(formula = Working.Hours ~ Meat + Country, data = fat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -101.018 -36.678 3.317 33.528 120.767
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1943.2679 16.7641 115.919 < 2e-16 ***
## Meat 1.1204 0.1595 7.024 1.72e-10 ***
## CountryUnited Kingdom -454.0813 21.5234 -21.097 < 2e-16 ***
## CountryUnited States -503.9420 37.6036 -13.401 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 50.44 on 113 degrees of freedom
## (因為不存在,39 個觀察量被刪除了)
## Multiple R-squared: 0.8932, Adjusted R-squared: 0.8904
## F-statistic: 315 on 3 and 113 DF, p-value: < 2.2e-16
hist( x = residuals(WHmod),
xlab = "Value of residual",
main = "",
breaks = 20)
It seems like as people eat more meat, they have to also work longer hours in China, UK and USA
ggplot(aes(y = Life.Expectancy, x = Meat, color = Country), data = fat) +
geom_point() +
geom_smooth(method = lm, se = F) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
LEmod <- lm(Life.Expectancy ~ Meat + Country, data = fat)
summary(LEmod)
##
## Call:
## lm(formula = Life.Expectancy ~ Meat + Country, data = fat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.6651 -1.0081 -0.0284 1.0578 6.7204
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 66.212991 0.686288 96.480 < 2e-16 ***
## Meat 0.042401 0.006217 6.820 2.06e-10 ***
## CountryUnited Arab Emirates -2.850125 0.921393 -3.093 0.00236 **
## CountryUnited Kingdom 1.729099 0.889874 1.943 0.05387 .
## CountryUnited States -3.787228 1.495408 -2.533 0.01234 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.363 on 151 degrees of freedom
## Multiple R-squared: 0.617, Adjusted R-squared: 0.6068
## F-statistic: 60.81 on 4 and 151 DF, p-value: < 2.2e-16
hist( x = residuals(LEmod),
xlab = "Value of residual",
main = "",
breaks = 20)