data <- read.csv("data.csv", h = T)
head(data)
## Country Year Obesity Meat GDP Working.Hours Life.Expectancy
## 1 China 1975 0.4 29.0714 1594 1974.898 63.915
## 2 China 1976 0.5 28.7700 1519 1974.207 64.631
## 3 China 1977 0.5 28.9344 1583 1973.435 65.278
## 4 China 1978 0.5 30.8798 1744 1972.727 65.857
## 5 China 1979 0.5 36.5790 1859 1972.104 66.377
## 6 China 1980 0.6 39.9492 1930 1971.497 66.844
str(data)
## 'data.frame': 195 obs. of 7 variables:
## $ Country : chr "China" "China" "China" "China" ...
## $ Year : int 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 ...
## $ Obesity : num 0.4 0.5 0.5 0.5 0.5 0.6 0.6 0.6 0.7 0.7 ...
## $ Meat : num 29.1 28.8 28.9 30.9 36.6 ...
## $ GDP : num 1594 1519 1583 1744 1859 ...
## $ Working.Hours : num 1975 1974 1973 1973 1972 ...
## $ Life.Expectancy: num 63.9 64.6 65.3 65.9 66.4 ...
Data from World in data, there are 5 countries as category and with each of them there’s obesity rate(%), meat consumption(kg), GDP per capita, working hours and life expectancy from 1975-2013
convert char to factor variables
data$Country <- as.factor(data$Country)
boxplot(Obesity ~ Country, data = data)
ggplot(aes(y = Obesity, x = Meat, color = Country), data = data) +
geom_point() +
geom_smooth(method = lm, se = F) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
From the chart we can see each country has it’s own different relation on meat consumption and obesity rate. China, Saudi Arabia, USA and UK shows a positive correlation between meat consumption and obesity. And as in USA and UK, meat consumption is a strong indicator of obesity rate. As in UAE, it’s surprised that meat consumption and obesity rate has a negative correlation. Possible cause is unknown since Saudi Arabia has similar cuisine, culture or religion.
fatmod <- lm(Obesity ~ Meat + Country, data = data)
summary(fatmod)
##
## Call:
## lm(formula = Obesity ~ Meat + Country, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.0811 -3.6896 -0.5381 3.2416 13.1815
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.10949 1.38066 -3.701 0.000282 ***
## Meat 0.07704 0.01219 6.322 1.82e-09 ***
## CountrySaudi Arabia 15.85829 1.19668 13.252 < 2e-16 ***
## CountryUnited Arab Emirates 6.15245 1.85880 3.310 0.001118 **
## CountryUnited Kingdom 6.53145 1.79885 3.631 0.000364 ***
## CountryUnited States 3.02664 2.96380 1.021 0.308464
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.023 on 189 degrees of freedom
## Multiple R-squared: 0.6986, Adjusted R-squared: 0.6906
## F-statistic: 87.61 on 5 and 189 DF, p-value: < 2.2e-16
yi=β0+β1x1i+β2x2i+…+εi
-5.1 + 0.07Meat + 15.85Saudi Arabia + 6.15United Arab Emirates + 6.53United Kingdom + 3.02United States + 5.023
Intercept β0 -5.1 = grand mean of Obesity
Interpretation for continuous variables β1 Meat consumption
Interpretation for categorical variables
β2 Saudi Arabia
β3 United Arab Emirates
β4 United Kingdom
β5 United States
Interpretation for R-squared means 69% of explanation can be explained through this model
The t value of United States is not greater than 1.96 and the p value is greater than 0.05, thus we can’t overthrown Null Hypothesis.
With each kilogram more of meat consume, the obesity rate raises 0.77%
hist( x = residuals(fatmod),
xlab = "Value of residual",
main = "",
breaks = 20)
the residuals are normally distributed so the model predicts well
residualPlots(fatmod)
## Test stat Pr(>|Test stat|)
## Meat 2.7901 0.005813 **
## Country
## Tukey test 4.9244 8.461e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The line is not fully horizontal so the model doesn’t do well as trying to predict the relation between Meat consumption and Obesity rate.
ggplot(aes(y = Meat, x = GDP, color = Country), data = data) +
geom_point() +
geom_smooth(method = lm, se = F) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
As GDP grows higher, in Saudi Arabia, USA, UK and China, people consume more meat. But as in UAE, it doesn’t seem that way
GDPmod <- lm(Meat ~ GDP + Country, data = data)
summary(GDPmod)
##
## Call:
## lm(formula = Meat ~ GDP + Country, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -75.843 -13.076 1.253 15.801 85.292
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.904e+01 4.813e+00 18.502 < 2e-16 ***
## GDP 6.650e-04 2.218e-04 2.998 0.00309 **
## CountrySaudi Arabia 1.723e+01 7.990e+00 2.157 0.03228 *
## CountryUnited Arab Emirates 9.517e+01 1.104e+01 8.619 2.81e-15 ***
## CountryUnited Kingdom 9.879e+01 8.439e+00 11.706 < 2e-16 ***
## CountryUnited States 2.013e+02 1.023e+01 19.669 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29.38 on 187 degrees of freedom
## (因為不存在,2 個觀察量被刪除了)
## Multiple R-squared: 0.8828, Adjusted R-squared: 0.8797
## F-statistic: 281.7 on 5 and 187 DF, p-value: < 2.2e-16
yi=β0+β1x1i+β2x2i+…+εi
8.904e+01 + 6.650e-04GDP + 1.723e+01Saudi Arabia + 9.517e+01United Arab Emirates + 9.879e+01United Kingdom + 2.013e+02United States + 29.38
Intercept β0 8.904e+01 = grand mean of meat consumption
Interpretation for continuous variables β1 GDP
Interpretation for categorical variables
β2 Saudi Arabia
β3 United Arab Emirates
β4 United Kingdom
β5 United States
Interpretation for R-squared means 88% of explanation can be explained through this model
As GDP grows, the obesity rate raises 6.650e-04
hist( x = residuals(GDPmod),
xlab = "Value of residual",
main = "",
breaks = 20)
the residuals are normally distributed so the model predicts well
ggplot(aes(y = Working.Hours, x = Meat, color = Country), data = data) +
geom_point() +
geom_smooth(method = lm, se = F) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 78 rows containing non-finite values (stat_smooth).
## Warning: Removed 78 rows containing missing values (geom_point).
It seems like as people eat more meat, they have to also work longer hours in China and USA. But UK is the opposite, people eat less meat when they work longer hours
WHmod <- lm(Working.Hours ~ Meat + Country, data = data)
summary(WHmod)
##
## Call:
## lm(formula = Working.Hours ~ Meat + Country, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -101.018 -36.678 3.317 33.528 120.767
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1943.2679 16.7641 115.919 < 2e-16 ***
## Meat 1.1204 0.1595 7.024 1.72e-10 ***
## CountryUnited Kingdom -454.0813 21.5234 -21.097 < 2e-16 ***
## CountryUnited States -503.9420 37.6036 -13.401 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 50.44 on 113 degrees of freedom
## (因為不存在,78 個觀察量被刪除了)
## Multiple R-squared: 0.8932, Adjusted R-squared: 0.8904
## F-statistic: 315 on 3 and 113 DF, p-value: < 2.2e-16
yi=β0+β1x1i+β2x2i+…+εi
1943.2679 + 1.1204Meat - 454.0813United Kingdom - 454.0813United States + 50.44
Intercept β0 1943.2679 = grand mean of meat consumption
Interpretation for continuous variables β1 working hours
Interpretation for categorical variables
β2 United Kingdom
β3 United States
Interpretation for R-squared means 89% of explanation can be explained through this model
With each kilogram more of meat consume, the working hours increase 1.12
hist( x = residuals(WHmod),
xlab = "Value of residual",
main = "",
breaks = 20)
the residuals are normally distributed so the model predicts well
ggplot(aes(y = Life.Expectancy, x = Meat, color = Country), data = data) +
geom_point() +
geom_smooth(method = lm, se = F) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
People’s life expectancy seems to drop when people in UAE eat more meat. But in the other 4 countries, people are expected to live longer when they eat more meat.
LEmod <- lm(Life.Expectancy ~ Meat + Country, data = data)
summary(LEmod)
##
## Call:
## lm(formula = Life.Expectancy ~ Meat + Country, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0858 -1.0242 0.0432 1.1115 7.8066
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 64.600466 0.734322 87.973 < 2e-16 ***
## Meat 0.059912 0.006481 9.244 < 2e-16 ***
## CountrySaudi Arabia -2.788041 0.636470 -4.380 1.96e-05 ***
## CountryUnited Arab Emirates -4.962628 0.988627 -5.020 1.19e-06 ***
## CountryUnited Kingdom -0.273360 0.956739 -0.286 0.775
## CountryUnited States -7.719957 1.576332 -4.897 2.08e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.672 on 189 degrees of freedom
## Multiple R-squared: 0.6373, Adjusted R-squared: 0.6277
## F-statistic: 66.41 on 5 and 189 DF, p-value: < 2.2e-16
yi=β0+β1x1i+β2x2i+…+εi
64.600466 + 0.05Meat - 2.78Saudi Arabia - 4.96United Arab Emirates - 0.27United Kingdom - 7.71United States + 2.672
Intercept β0 -5.1 = grand mean of meat consumption
Interpretation for continuous variables β1 life expectancy
Interpretation for categorical variables
β2 Saudi Arabia
β3 United Arab Emirates
β4 United Kingdom
β5 United States
Interpretation for R-squared means 63% of explanation can be explained through this model
The t value of United Kingdom is not lesser than -1.96 and the p value is greater than 0.05, thus we can’t overthrown Null Hypothesis.
With each kilogram more of meat consume, the expected life increases 0.05 year
hist( x = residuals(LEmod),
xlab = "Value of residual",
main = "",
breaks = 20)
the residuals are normally distributed so the model predicts well