We used WHO table of Average BMI and life expectancy for 177 countries in the world. Source.
Predicted / dependent variable Y: Life expectancy - refers to average number of years a person can expect to live in a country.
Undependable variable X: BMI - Body Mass Index units. A calculation using a person’s height and weight. The formula is \(BMI = \frac{kg}{h_m^2}\) where kg is a person’s weight in kilograms and mh is their height in meters. A BMI of 25.0 or more is overweight, while the healthy range is 18.5 to 24.9.
We hypothesize that BMI is an explanatory variable for a person’s life expectancy.
We noticed there are observations between 0 and 10 (physically impossible BMI values). WE Removed the noise in the data. Observations where BMI is smaller then 10 are omitted.
hist(BMI , main = "BMI", breaks=50, freq=FALSE, col="light blue")
lines(density.default(BMI))
A distribution with two centric values can be seen(23,27).no values under 20 or above 32. (that indicates precise data because beyond this range its means extreme underweight or extreme obesity)
hist(Life.expectancy, main = "Life Expectancy", breaks=50, freq = FALSE, col="light blue")
lines(density.default(Life.expectancy))
It can be seen that the life expectancy dis’ has a long left tail and a center around 72, and the latest age is 88
You can see observations that represent an average life expectancy below the age of 55. We assume that these observations will correspond with people with a low BMI score - third world countries. We will confirm the test later in the work.
# tmp <- df[c('BMI','Life.Expectancy') ]
# tmp
(summary(Life.expectancy))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 51.00 66.10 74.00 71.82 77.00 88.00
summary(BMI)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.40 23.70 26.20 25.58 27.20 31.80
\(SD_x =\) 2.28, and \(SD_y =\) 8.08
reg.bmi.le <- lm(Life.expectancy ~ BMI)
plot(BMI,Life.expectancy, main = "Average BMI VS Average Life Expectency")
abline(reg.bmi.le,lwd=2,col="green")
\(\hat{a} =\) 2.135 is the slope
of the regression line. In how much will a person’s life expectancy
change if their BMI increases by one BMI score.
\(\hat{b} =\) 17.19 is the intercept
of the regression line. It is the size of the life expectency that does
not depend on BMI. Simply put: a person lives 17.19 years first of all,
and then depending on his BMI he will live another \(\hat{a}\)BMI years.
Pearson Correlation Coefficient: \(r(X,Y):= \frac{cov(X,Y)}{SD_X SD_Y} =\) 0.603 The correlation between the variables is positive and high.It suggests that the higher the BMI, the higher the life expectancy.
R Square Indicator: \(R^2:= r^2(X,Y)=\) 0.364
Our average BMI based model explains 0.364 of the variance of averagel life expectancy in a country.
Note: The connection is not necessarily linear. Becuase in a linear model, as BMI increases, so does life expectancy. In practice, a BMI over 25 describes a situation of “dangerous obesity” and therefore shortens the lifestyle. A parabolic model may be appropriate.
We showed that \(R^2\) is lower than 0.5, a value that is considered low. However, it can be explained by countless other reasons throughout a person’s life expectancy, so relatively it is a variable that explains a considerable percentage of variation
\(RMSE=\) 6.46 Years of life expectancy, a small variance of the residuals, suggesting a good accuracy of our model.
Out of the 177 observations, we removed the 12 values with most extreme residuals. We omitted 12 extreme residuals.,and the value of RMSE decreased from 6.46 to 5.43.
\(R^2_{new}:= r_{new}^2(X,Y)=\) 0.45
Our average BMI based model explains almost 0.5 of the variance of average life expectancy in a country.
Country | BMI | Life.expectancy | a_hat | b_hat | Y_hat_LEYears | Residual | |
---|---|---|---|---|---|---|---|
32 | Canada | 27.3 | 82.2 | 2.16905 | 16.73614 | 75.95 | 6.25 |
49 | Ecuador | 27.0 | 76.2 | 2.16905 | 16.73614 | 75.30 | 0.90 |
69 | Guyana | 26.1 | 66.2 | 2.16905 | 16.73614 | 73.35 | 7.15 |