fat <- read.csv("fat-meat-country.csv", h = T)
head(fat)
##   Country Year Obesity    Meat  GDP Working.Hours Life.Expectancy
## 1   China 1975     0.4 29.0714 1594      1974.898          63.915
## 2   China 1976     0.5 28.7700 1519      1974.207          64.631
## 3   China 1977     0.5 28.9344 1583      1973.435          65.278
## 4   China 1978     0.5 30.8798 1744      1972.727          65.857
## 5   China 1979     0.5 36.5790 1859      1972.104          66.377
## 6   China 1980     0.6 39.9492 1930      1971.497          66.844
str(fat)
## 'data.frame':    156 obs. of  7 variables:
##  $ Country        : chr  "China" "China" "China" "China" ...
##  $ Year           : int  1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 ...
##  $ Obesity        : num  0.4 0.5 0.5 0.5 0.5 0.6 0.6 0.6 0.7 0.7 ...
##  $ Meat           : num  29.1 28.8 28.9 30.9 36.6 ...
##  $ GDP            : num  1594 1519 1583 1744 1859 ...
##  $ Working.Hours  : num  1975 1974 1973 1973 1972 ...
##  $ Life.Expectancy: num  63.9 64.6 65.3 65.9 66.4 ...

Data from World in data there are 4 countries as category and with each of them there’s obesity rate(%) and meat consumption(kg) from 1975-2013

convert char to factor variables

fat$Country <- as.factor(fat$Country)
ggplot(aes(y = Obesity, x = Meat, color = Country), data = fat) +
  geom_point() +
  geom_smooth(method = lm, se = F) + 
  theme_bw()
## `geom_smooth()` using formula 'y ~ x'

From the chart we can see each country has it’s own different relation on meat consumption and obesity rate. China, USA and UK shows a positive correlation between meat consumption and obesity. And as in USA and UK, meat consumption is a strong indicator of obesity rate. As in UAE, it’s surprised that meat consumption and obesity rate has a negative correlation. Possible cause might be different cuisine, culture or religion.

fatmod <- lm(Obesity ~ Meat + Country, data = fat)
summary(fatmod)
## 
## Call:
## lm(formula = Obesity ~ Meat + Country, data = fat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3016 -3.4382 -0.4874  2.3112 13.1494 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -2.70068    1.43295  -1.885 0.061393 .  
## Meat                         0.05088    0.01298   3.919 0.000134 ***
## CountryUnited Arab Emirates  9.30814    1.92384   4.838 3.20e-06 ***
## CountryUnited Kingdom        9.52275    1.85803   5.125 8.97e-07 ***
## CountryUnited States         8.90141    3.12237   2.851 0.004971 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.934 on 151 degrees of freedom
## Multiple R-squared:  0.721,  Adjusted R-squared:  0.7136 
## F-statistic: 97.55 on 4 and 151 DF,  p-value: < 2.2e-16

R-squared .721 value indicate that the regression model accounts for 72.1% of the variability in the outcome measure.

hist( x = residuals(fatmod),
      xlab = "Value of residual",
      main = "",
      breaks = 20)

the residuals are normally distributed so the model predicts well

plot(fatmod)

residualPlots(fatmod)

##            Test stat Pr(>|Test stat|)    
## Meat          4.0106        9.522e-05 ***
## Country                                  
## Tukey test    2.7313         0.006308 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggplot(aes(y = Meat, x = GDP, color = Country), data = fat) +
  geom_point() +
  geom_smooth(method = lm, se = F) + 
  theme_bw()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).

GDPmod <- lm(Meat ~ GDP + Country, data = fat)
summary(GDPmod)
## 
## Call:
## lm(formula = Meat ~ GDP + Country, data = fat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -70.812 -15.893  -2.459  18.453  82.994 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 9.047e+01  5.094e+00  17.762  < 2e-16 ***
## GDP                         3.529e-04  2.678e-04   1.318     0.19    
## CountryUnited Arab Emirates 1.075e+02  1.271e+01   8.454 2.38e-14 ***
## CountryUnited Kingdom       1.061e+02  9.390e+00  11.299  < 2e-16 ***
## CountryUnited States        2.122e+02  1.170e+01  18.132  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.87 on 149 degrees of freedom
##   (因為不存在,2 個觀察量被刪除了)
## Multiple R-squared:  0.8742, Adjusted R-squared:  0.8709 
## F-statistic:   259 on 4 and 149 DF,  p-value: < 2.2e-16
hist( x = residuals(GDPmod),
      xlab = "Value of residual",
      main = "",
      breaks = 20)

As GDP grows higher, in USA, UK and China, people consume more meat. But as in UAE, it doesn’t seem that way

ggplot(aes(y = Working.Hours, x = Meat, color = Country), data = fat) +
  geom_point() +
  geom_smooth(method = lm, se = F) + 
  theme_bw()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 39 rows containing non-finite values (stat_smooth).
## Warning: Removed 39 rows containing missing values (geom_point).

WHmod <- lm(Working.Hours ~ Meat + Country, data = fat)
summary(WHmod)
## 
## Call:
## lm(formula = Working.Hours ~ Meat + Country, data = fat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -101.018  -36.678    3.317   33.528  120.767 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           1943.2679    16.7641 115.919  < 2e-16 ***
## Meat                     1.1204     0.1595   7.024 1.72e-10 ***
## CountryUnited Kingdom -454.0813    21.5234 -21.097  < 2e-16 ***
## CountryUnited States  -503.9420    37.6036 -13.401  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.44 on 113 degrees of freedom
##   (因為不存在,39 個觀察量被刪除了)
## Multiple R-squared:  0.8932, Adjusted R-squared:  0.8904 
## F-statistic:   315 on 3 and 113 DF,  p-value: < 2.2e-16
hist( x = residuals(WHmod),
      xlab = "Value of residual",
      main = "",
      breaks = 20)

It seems like as people eat more meat, they have to also work longer hours in China, UK and USA

ggplot(aes(y = Life.Expectancy, x = Meat, color = Country), data = fat) +
  geom_point() +
  geom_smooth(method = lm, se = F) + 
  theme_bw()
## `geom_smooth()` using formula 'y ~ x'

LEmod <- lm(Life.Expectancy ~ Meat + Country, data = fat)
summary(LEmod)
## 
## Call:
## lm(formula = Life.Expectancy ~ Meat + Country, data = fat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.6651 -1.0081 -0.0284  1.0578  6.7204 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 66.212991   0.686288  96.480  < 2e-16 ***
## Meat                         0.042401   0.006217   6.820 2.06e-10 ***
## CountryUnited Arab Emirates -2.850125   0.921393  -3.093  0.00236 ** 
## CountryUnited Kingdom        1.729099   0.889874   1.943  0.05387 .  
## CountryUnited States        -3.787228   1.495408  -2.533  0.01234 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.363 on 151 degrees of freedom
## Multiple R-squared:  0.617,  Adjusted R-squared:  0.6068 
## F-statistic: 60.81 on 4 and 151 DF,  p-value: < 2.2e-16
hist( x = residuals(LEmod),
      xlab = "Value of residual",
      main = "",
      breaks = 20)