library(datasets)
library(ggplot2)
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
data(swiss)

The’swiss’ data is a collection of standardized fertility rates and some socio-economic indicators of 47 French-speaking provinces in Switzerland in 1888, including:

[,1] Fertility Ig, ‘common standardized fertility measure’ [,2] Agriculture % of males involved in agriculture as occupation [,3] Examination % draftees receiving highest mark on army examination [,4] Education % education beyond primary school for draftees. [,5] Catholic % ‘catholic’ (as opposed to ‘protestant’). [,6] Infant.Mortality live births who live less than 1 year.

Let’s observe the relationship between variables:

df <- ggpairs(swiss, lower = list(continuous = "smooth"))
df

We found that the fertility rate is positively correlated with the agricultural employment rate.

Establish a simple linear model of agricultural employment rate and standardized fertility rate:

summary(lm(Fertility ~ Agriculture, data = swiss))$coefficients
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 60.3043752 4.25125562 14.185074 3.216304e-18
## Agriculture  0.1942017 0.07671176  2.531577 1.491720e-02

It is correct, the partial regression coefficient of the agricultural employment rate is 0.1942017, and p<0.05, the two are positively correlated.However, we use the fertility rate as the dependent variable and the other indicators as independent variables to fit the multiple linear regression model.

summary(lm(Fertility ~ . , data = swiss))$coefficients
##                    Estimate  Std. Error   t value     Pr(>|t|)
## (Intercept)      66.9151817 10.70603759  6.250229 1.906051e-07
## Agriculture      -0.1721140  0.07030392 -2.448142 1.872715e-02
## Examination      -0.2580082  0.25387820 -1.016268 3.154617e-01
## Education        -0.8709401  0.18302860 -4.758492 2.430605e-05
## Catholic          0.1041153  0.03525785  2.952969 5.190079e-03
## Infant.Mortality  1.0770481  0.38171965  2.821568 7.335715e-03

The p of the agricultural employment rate is still less than 0.05, and the partial regression coefficient is -0.1721140. The two are negatively correlated, which is inconsistent with the result just discovered. This phenomenon is called Simpson’s Paradox.