1) The Data-Set

We used WHO table of Average BMI and life expectacny for every country in the world (approximatly 193, down to 164 after remooving mistakes in the data.

Source of the data.

Predicted / dependent variable Y: Life expectancy in years- refers to average number of years a person can expect to live in a country.

Undependent variable X: BMI - Body Mass Index units. A calculation using a person’s height and weight. The formula is \(BMI = \frac{kg}{h_m^2}\) where kg is a person’s weight in kilograms and mh is their height in meters. A BMI of 25.0 or more is overweight, while the healthy range is 18.5 to 24.9.

We hypothesize that BMI is an explanatory variable for a person’s life expectancy. Because extreme obesity or thinness harms a person’s health and increases their chance of dying.

1.5) Handling an anomaly in the data

We noticed an anomaly in the data, as shown: Since there are observations between 0 and 10 (impossible BMI values). Removed the noise in the data. Observations where BMI is smaller then 10 are omitted.

2) Discriptive Statistics

Histograms:

A distribution with two centric values can be seen.

It can be seen that the life expectancy dis’ has a left long tail and a center around 72, and the latest age is 86

Mean, Median, SD, Min, Max, and Quantiles:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   51.00   65.70   73.90   71.54   77.00   86.00
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   17.40   27.30   53.30   46.38   62.10   77.60

\(SD_x =\) 17.89, and \(SD_y =\) 8.2

3) Our Linear Regression Model

Linear Regression Coefficients

\(\hat{a} =\)   0.32 is the slope of the regression line. In how much will a person’s life expectancy change if their BMI increases by one BMI score.
\(\hat{b} =\)  56.68 is the intercept of the regression line. It is the size of the life expectency that does not depend on BMI. Simply put: a person lives 53.3 years first of all, and then depending on his BMI he will live another \(\hat{a}\)BMI years.

Pearson Correlation Coefficient: \(r(X,Y):= \frac{cov(X,Y)}{SD_X SD_Y} =\) 0.699 The correlation between the variables is positive and high.It suggests that the higher the BMI, the higher the life expectancy. R Square Indicator: \(R^2:= r^2(X,Y)=\) 0.488

Which can suggest that our average BMI based model explains almost 0 of the variance of averagel life expectancy in a country.

Note: It can be learned from the definition of b_hat that the model representing this equality is not necessarily linear. This is because, in a linear model, as BMI increases, so does life expectancy. In practice, a BMI over 25 describes a situation of “dangerous obesity” and therefore shortens the lifestyle. A parabolic model may be appropriate in this case, up to a certain limit an increase in BMI increases life expectancy and after crossing a certain line an increase in BMI shortens life expectancy.

Evaluation - Root Mean Square Error (RMSE)

We showed that \(R^2\) is high. Lets examine a bit more.

\(RMSE=\)  5.88 Years of life expectancy, a small variance of the residuals, suggesting a good accuracy of our model.

   

Reference to Unusual Observations

Out of the 165 observations, We removed the 10 values with most extreme residuals according to the first model, and ran it to see if the coefficients change.

After omitting the observations with the largest residual. It seems that the value of RMSE decreased from 5.88 to 4.7.

A change of 12 observations resulted in a change of a tenth of a point of RMSE. This suggests that these sightings were particularly unusual.

\(R^2_{new}:= r_{new}^2(X,Y)=\) 0.62

Which can suggest that our average BMI based model explains almost 0 of the variance of averagel life expectancy in a country.

Choose number of obs and write their y_hat, y, and the residual

We sampled 3:
Country BMI Life.Expectancy.Years Y_hat_LEYears Residual
34 Chad 19.1 53.1 62.21 9.11
38 Comoros 24.2 63.5 63.94 0.44
141 Senegal 24.3 66.7 63.98 2.72