1 geom_smooth1函数

geom_smooth(mapping = NULL, data = NULL, stat = "smooth",
  position = "identity", ..., method = "auto", formula = y ~ x,
  se = TRUE, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)

2 geom_smooth函数中的method参数

method  smoothing method (function) to use, eg. "lm", "glm", "gam", "loess", "rlm".
        For method = "auto" the smoothing method is chosen based on the size of the largest group (across all panels). loess is used for than 1,000 observations; otherwise         
        gam is used with formula = y ~ s(x, bs = "cs"). Somewhat anecdotally, loess gives a better appearance, but is O(n^2) in memory, so does not work for larger datasets.

2.1 默认参数

2.1.1 “loess”

局部回归模型,当\(n<1000\)时的默认选项,平滑程度有\(span\)参数控制.

\(O(n^2)\),因此不适用用大数据.

2.1.2 “gam”

广义可加模型2(generalized additive model),当\(n>1000\)时的默认选项,来自\(mgcv\)包.

2.2 可选参数

2.2.1 “lm”

线性模型(linear model),拟合线性模型,通常是一条直线.

2.2.2 “rlm”

稳健线性模型(robust linear model)3,通过删除异常值实现更稳健的线性模型,来自\(MASS\)包.

2.2.3 “glm”

广义线性模型(generalized linear model),简单最小二乘回归(OLS)的扩展,在OLS的假设中,响应变量是连续数值数据且服从正态分布,而且响应变量期望值与预测变量之间的关系是线性关系。而广义线性模型则放宽其假设,首先响应变量可以是正整数或分类数据,其分布为某指数分布族。其次响应变量期望值的函数(连接函数)与预测变量之间的关系为线性关系。因此在进行GLM建模时,需要指定分布类型和连接函数4

3 参考


  1. geom smooth主页

  2. 广义可加模型

  3. 稳健模型

  4. 广义线性模型