We used WHO table of Average BMI and life expectacny for every country in the world (approximatly 193, down to 164 after remooving mistakes in the data.
Source of the data.
Predicted / dependent variable Y: Life expectancy in years- refers to average number of years a person can expect to live in a country.
Undependent variable X: BMI - Body Mass Index units. A calculation using a person’s height and weight. The formula is \(BMI = \frac{kg}{h_m^2}\) where kg is a person’s weight in kilograms and mh is their height in meters. A BMI of 25.0 or more is overweight, while the healthy range is 18.5 to 24.9.
We hypothesize that BMI is an explanatory variable for a person’s life expectancy. Because extreme obesity or thinness harms a person’s health and increases their chance of dying.
We noticed an anomaly in the data, as shown:
Since there are observations between 0 and 10 (impossible BMI values).
Removed the noise in the data. Observations where BMI is smaller then 10
are omitted.
A distribution with two centric values can be seen.
It can be seen that the life expectancy dis’ has a left long tail and a
center around 72, and the latest age is 86
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 51.00 65.70 73.90 71.54 77.00 86.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 17.40 27.30 53.30 46.38 62.10 77.60
\(SD_x =\) 17.89, and \(SD_y =\) 8.2
\(\hat{a} =\) 0.32 is the slope of
the regression line. In how much will a person’s life expectancy change
if their BMI increases by one BMI score.
\(\hat{b} =\) 56.68 is the intercept
of the regression line. It is the size of the life expectency that does
not depend on BMI. Simply put: a person lives 53.3 years first of all,
and then depending on his BMI he will live another \(\hat{a}\)BMI years.
Pearson Correlation Coefficient: \(r(X,Y):= \frac{cov(X,Y)}{SD_X SD_Y} =\) 0.699 The correlation between the variables is positive and high.It suggests that the higher the BMI, the higher the life expectancy. R Square Indicator: \(R^2:= r^2(X,Y)=\) 0.488
Which can suggest that our average BMI based model explains almost 0 of the variance of averagel life expectancy in a country.
Note: It can be learned from the definition of b_hat that the model representing this equality is not necessarily linear. This is because, in a linear model, as BMI increases, so does life expectancy. In practice, a BMI over 25 describes a situation of “dangerous obesity” and therefore shortens the lifestyle. A parabolic model may be appropriate in this case, up to a certain limit an increase in BMI increases life expectancy and after crossing a certain line an increase in BMI shortens life expectancy.
We showed that \(R^2\) is high. Lets examine a bit more.
\(RMSE=\) 5.88 Years of life expectancy, a small variance of the residuals, suggesting a good accuracy of our model.
Out of the 165 observations, We removed the 10 values with most extreme residuals according to the first model, and ran it to see if the coefficients change.
After omitting the observations with the largest residual. It seems that the value of RMSE decreased from 5.88 to 4.7.
A change of 12 observations resulted in a change of a tenth of a point of RMSE. This suggests that these sightings were particularly unusual.
\(R^2_{new}:= r_{new}^2(X,Y)=\) 0.62
Which can suggest that our average BMI based model explains almost 0 of the variance of averagel life expectancy in a country.
| Country | BMI | Life.Expectancy.Years | Y_hat_LEYears | Residual | |
|---|---|---|---|---|---|
| 34 | Chad | 19.1 | 53.1 | 62.21 | 9.11 |
| 38 | Comoros | 24.2 | 63.5 | 63.94 | 0.44 |
| 141 | Senegal | 24.3 | 66.7 | 63.98 | 2.72 |