2024-01-23

Exercise 3.5.

For the country data:

  • produce a matrix of scatter plots comparing the three numerical columns of data.

You could either use:

  • the pairs() function

  • or explore the car package for a more sophisticated representation.

  • Are any of the relationships linear? Look for best linear relationships using log() transformations and correlations.

round data first to get a general overview in the matrix of plots

cd_pop <- round (cd$population, digits = - 5)
cd_gdp <- round (cd$gdp_head, -2)
cd_age <- round (cd$age_median / 5) * 5
cd_cal <- round (cd$kcals_day, -2)

produce the data frame for using pairs() function:

df_cd <- data.frame (
  population = cd_pop,
  gdp = cd_gdp,
  age = cd_age,
  kcal = cd_cal
)

The plot with original data

Now I take log on them:

df_cd_log <- data.frame (
  population = log(cd_pop),
  gdp = log(cd_gdp),
  age = log(cd_age),
  kcal = log(cd_cal)
)

The plot with log data

Closer look: age and gdp data :

I deal with non-rounded data, take them on a log-scale

Closer look: age and kcal data :

Closer look: gdp and kcal data :

Combine them