data <- read.csv("data.csv", h = T)
head(data)
##   Country Year Obesity    Meat  GDP Working.Hours Life.Expectancy
## 1   China 1975     0.4 29.0714 1594      1974.898          63.915
## 2   China 1976     0.5 28.7700 1519      1974.207          64.631
## 3   China 1977     0.5 28.9344 1583      1973.435          65.278
## 4   China 1978     0.5 30.8798 1744      1972.727          65.857
## 5   China 1979     0.5 36.5790 1859      1972.104          66.377
## 6   China 1980     0.6 39.9492 1930      1971.497          66.844
str(data)
## 'data.frame':    195 obs. of  7 variables:
##  $ Country        : chr  "China" "China" "China" "China" ...
##  $ Year           : int  1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 ...
##  $ Obesity        : num  0.4 0.5 0.5 0.5 0.5 0.6 0.6 0.6 0.7 0.7 ...
##  $ Meat           : num  29.1 28.8 28.9 30.9 36.6 ...
##  $ GDP            : num  1594 1519 1583 1744 1859 ...
##  $ Working.Hours  : num  1975 1974 1973 1973 1972 ...
##  $ Life.Expectancy: num  63.9 64.6 65.3 65.9 66.4 ...

Data from World in data, there are 5 countries as category and with each of them there’s obesity rate(%), meat consumption(kg), GDP per capita, working hours and life expectancy from 1975-2013

convert char to factor variables

data$Country <- as.factor(data$Country)
boxplot(Obesity ~ Country, data = data)

ggplot(aes(y = Obesity, x = Meat, color = Country), data = data) +
  geom_point() +
  geom_smooth(method = lm, se = F) + 
  theme_bw()
## `geom_smooth()` using formula 'y ~ x'

From the chart we can see each country has it’s own different relation on meat consumption and obesity rate. China, Saudi Arabia, USA and UK shows a positive correlation between meat consumption and obesity. And as in USA and UK, meat consumption is a strong indicator of obesity rate. As in UAE, it’s surprised that meat consumption and obesity rate has a negative correlation. Possible cause is unknown since Saudi Arabia has similar cuisine, culture or religion.

fatmod <- lm(Obesity ~ Meat + Country, data = data)
summary(fatmod)
## 
## Call:
## lm(formula = Obesity ~ Meat + Country, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.0811  -3.6896  -0.5381   3.2416  13.1815 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -5.10949    1.38066  -3.701 0.000282 ***
## Meat                         0.07704    0.01219   6.322 1.82e-09 ***
## CountrySaudi Arabia         15.85829    1.19668  13.252  < 2e-16 ***
## CountryUnited Arab Emirates  6.15245    1.85880   3.310 0.001118 ** 
## CountryUnited Kingdom        6.53145    1.79885   3.631 0.000364 ***
## CountryUnited States         3.02664    2.96380   1.021 0.308464    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.023 on 189 degrees of freedom
## Multiple R-squared:  0.6986, Adjusted R-squared:  0.6906 
## F-statistic: 87.61 on 5 and 189 DF,  p-value: < 2.2e-16

yi=β0+β1x1i+β2x2i+…+εi

-5.1 + 0.07Meat + 15.85Saudi Arabia + 6.15United Arab Emirates + 6.53United Kingdom + 3.02United States + 5.023

Intercept β0 -5.1 = grand mean of Obesity
Interpretation for continuous variables β1 Meat consumption

Interpretation for categorical variables
β2 Saudi Arabia
β3 United Arab Emirates
β4 United Kingdom
β5 United States

Interpretation for R-squared means 69% of explanation can be explained through this model

The t value of United States is not greater than 1.96 and the p value is greater than 0.05, thus we can’t overthrown Null Hypothesis.

With each kilogram more of meat consume, the obesity rate raises 0.77%

hist( x = residuals(fatmod),
      xlab = "Value of residual",
      main = "",
      breaks = 20)

the residuals are normally distributed so the model predicts well

residualPlots(fatmod)

##            Test stat Pr(>|Test stat|)    
## Meat          2.7901         0.005813 ** 
## Country                                  
## Tukey test    4.9244        8.461e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The line is not fully horizontal so the model doesn’t do well as trying to predict the relation between Meat consumption and Obesity rate.

ggplot(aes(y = Meat, x = GDP, color = Country), data = data) +
  geom_point() +
  geom_smooth(method = lm, se = F) + 
  theme_bw()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).

As GDP grows higher, in Saudi Arabia, USA, UK and China, people consume more meat. But as in UAE, it doesn’t seem that way

GDPmod <- lm(Meat ~ GDP + Country, data = data)
summary(GDPmod)
## 
## Call:
## lm(formula = Meat ~ GDP + Country, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -75.843 -13.076   1.253  15.801  85.292 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 8.904e+01  4.813e+00  18.502  < 2e-16 ***
## GDP                         6.650e-04  2.218e-04   2.998  0.00309 ** 
## CountrySaudi Arabia         1.723e+01  7.990e+00   2.157  0.03228 *  
## CountryUnited Arab Emirates 9.517e+01  1.104e+01   8.619 2.81e-15 ***
## CountryUnited Kingdom       9.879e+01  8.439e+00  11.706  < 2e-16 ***
## CountryUnited States        2.013e+02  1.023e+01  19.669  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.38 on 187 degrees of freedom
##   (因為不存在,2 個觀察量被刪除了)
## Multiple R-squared:  0.8828, Adjusted R-squared:  0.8797 
## F-statistic: 281.7 on 5 and 187 DF,  p-value: < 2.2e-16

yi=β0+β1x1i+β2x2i+…+εi

8.904e+01 + 6.650e-04GDP + 1.723e+01Saudi Arabia + 9.517e+01United Arab Emirates + 9.879e+01United Kingdom + 2.013e+02United States + 29.38

Intercept β0 8.904e+01 = grand mean of meat consumption
Interpretation for continuous variables β1 GDP

Interpretation for categorical variables
β2 Saudi Arabia
β3 United Arab Emirates
β4 United Kingdom
β5 United States

Interpretation for R-squared means 88% of explanation can be explained through this model

As GDP grows, the obesity rate raises 6.650e-04

hist( x = residuals(GDPmod),
      xlab = "Value of residual",
      main = "",
      breaks = 20)

the residuals are normally distributed so the model predicts well

ggplot(aes(y = Working.Hours, x = Meat, color = Country), data = data) +
  geom_point() +
  geom_smooth(method = lm, se = F) + 
  theme_bw()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 78 rows containing non-finite values (stat_smooth).
## Warning: Removed 78 rows containing missing values (geom_point).

It seems like as people eat more meat, they have to also work longer hours in China and USA. But UK is the opposite, people eat less meat when they work longer hours

WHmod <- lm(Working.Hours ~ Meat + Country, data = data)
summary(WHmod)
## 
## Call:
## lm(formula = Working.Hours ~ Meat + Country, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -101.018  -36.678    3.317   33.528  120.767 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           1943.2679    16.7641 115.919  < 2e-16 ***
## Meat                     1.1204     0.1595   7.024 1.72e-10 ***
## CountryUnited Kingdom -454.0813    21.5234 -21.097  < 2e-16 ***
## CountryUnited States  -503.9420    37.6036 -13.401  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.44 on 113 degrees of freedom
##   (因為不存在,78 個觀察量被刪除了)
## Multiple R-squared:  0.8932, Adjusted R-squared:  0.8904 
## F-statistic:   315 on 3 and 113 DF,  p-value: < 2.2e-16

yi=β0+β1x1i+β2x2i+…+εi

1943.2679 + 1.1204Meat - 454.0813United Kingdom - 454.0813United States + 50.44

Intercept β0 1943.2679 = grand mean of meat consumption
Interpretation for continuous variables β1 working hours

Interpretation for categorical variables
β2 United Kingdom
β3 United States

Interpretation for R-squared means 89% of explanation can be explained through this model

With each kilogram more of meat consume, the working hours increase 1.12

hist( x = residuals(WHmod),
      xlab = "Value of residual",
      main = "",
      breaks = 20)

the residuals are normally distributed so the model predicts well

ggplot(aes(y = Life.Expectancy, x = Meat, color = Country), data = data) +
  geom_point() +
  geom_smooth(method = lm, se = F) + 
  theme_bw()
## `geom_smooth()` using formula 'y ~ x'

People’s life expectancy seems to drop when people in UAE eat more meat. But in the other 4 countries, people are expected to live longer when they eat more meat.

LEmod <- lm(Life.Expectancy ~ Meat + Country, data = data)
summary(LEmod)
## 
## Call:
## lm(formula = Life.Expectancy ~ Meat + Country, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.0858 -1.0242  0.0432  1.1115  7.8066 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 64.600466   0.734322  87.973  < 2e-16 ***
## Meat                         0.059912   0.006481   9.244  < 2e-16 ***
## CountrySaudi Arabia         -2.788041   0.636470  -4.380 1.96e-05 ***
## CountryUnited Arab Emirates -4.962628   0.988627  -5.020 1.19e-06 ***
## CountryUnited Kingdom       -0.273360   0.956739  -0.286    0.775    
## CountryUnited States        -7.719957   1.576332  -4.897 2.08e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.672 on 189 degrees of freedom
## Multiple R-squared:  0.6373, Adjusted R-squared:  0.6277 
## F-statistic: 66.41 on 5 and 189 DF,  p-value: < 2.2e-16

yi=β0+β1x1i+β2x2i+…+εi

64.600466 + 0.05Meat - 2.78Saudi Arabia - 4.96United Arab Emirates - 0.27United Kingdom - 7.71United States + 2.672

Intercept β0 -5.1 = grand mean of meat consumption
Interpretation for continuous variables β1 life expectancy

Interpretation for categorical variables
β2 Saudi Arabia
β3 United Arab Emirates
β4 United Kingdom
β5 United States

Interpretation for R-squared means 63% of explanation can be explained through this model

The t value of United Kingdom is not lesser than -1.96 and the p value is greater than 0.05, thus we can’t overthrown Null Hypothesis.

With each kilogram more of meat consume, the expected life increases 0.05 year

hist( x = residuals(LEmod),
      xlab = "Value of residual",
      main = "",
      breaks = 20)

the residuals are normally distributed so the model predicts well