11.11你可以与其他同学讨论作业题目,但作业应由你本人独立完成.
请将你所有的答案、计算机代码和图表整合到一个 HTML 文件中.
请将作业文件命名为
学号-姓名-01,例如:2025017000-肖磊-01.
评分方式:每一问得分 \(\left\{ 0,~ 1,~ 2~ \right\}\).
2: 按时提交且答案基本正确
1: 按时提交且答案部分正确
0: 提交的答案完全错误,或未交作业
请将本次作业发送至:2023582025@cupk.edu.cn. 截止日期: 2025年11月15日24:00.
每个 ggplot2 图形都包含三个关键组成部分:
数据 data.
数据中的变量与视觉属性之间的一组美学映射
aesthetic mapping.
至少包含一个图层,该图层用于描述如何呈现每个观测值. 图层通常通过
geom 函数创建.
本题使用数据集 mpg. 例如:
library(ggplot2)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point()
1.1 利用 ggplot2 作 cty 与
hwy 的散点图. 你会如何描述城市油耗 cty
与高速油耗 hwy 之间的关系? (2 分)
library(ggplot2)
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point()
1.2 你还可以使用其它的图形美学属性设置,例如颜色
color、形状 shape 和大小 size.
运行以下代码,观察会出现什么结果.
ggplot(data = mpg, aes(x = cty, y = hwy, colour = class)) +
geom_point()
ggplot(data = mpg, aes(x = cty, y = hwy, shape = drv)) +
geom_point()
ggplot(data = mpg, aes(x = cty, y = hwy, size = cyl)) +
geom_point()
尝试使用颜色 colour、形状 shape 和大小
size 这些美学属性.
将它们映射到连续值时会发生什么?映射到分类值时又会怎样?在一个图形中使用不止一种美学属性时,结果又如何? (2 分)
# 运行题目中的代码
ggplot(data = mpg, aes(x = cty, y = hwy, colour = class)) +
geom_point()
ggplot(data = mpg, aes(x = cty, y = hwy, shape = drv)) +
geom_point()
ggplot(data = mpg, aes(x = cty, y = hwy, size = cyl)) +
geom_point()
# 额外实验:映射到连续值 vs 分类值
# 颜色映射到连续值
ggplot(data = mpg, aes(x = cty, y = hwy, colour = cyl)) +
geom_point()
# 颜色映射到分类值
ggplot(data = mpg, aes(x = cty, y = hwy, colour = factor(cyl))) +
geom_point()
# 同时使用多种美学属性
ggplot(data = mpg, aes(x = cty, y = hwy, colour = class, shape = drv, size = displ)) +
geom_point()
1.3 在图形上展示额外分类变量的另一种方法是分面 (facetting). 例如:
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ class)
如果尝试按照高速油耗 hwy
这类连续变量进行分面,会发生什么?按气缸数 cyl
分面又会怎样?二者最关键的区别是什么? (2 分)
# 按连续变量 hwy 分面
ggplot(data = mpg, aes(x = displ, y = cty)) +
geom_point() +
facet_wrap(~ hwy)
# 按气缸数 cyl 分面(cyl 本质上是分类变量)
ggplot(data = mpg, aes(x = displ, y = cty)) +
geom_point() +
facet_wrap(~ cyl)
# 将 cyl 转换为因子后分面
ggplot(data = mpg, aes(x = displ, y = cty)) +
geom_point() +
facet_wrap(~ factor(cyl))
1.4 你可以使用 geom_smooth()
函数为图形添加一条平滑曲线,例如:
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(span = 0.75)
geom_smooth() 函数中一个重要的参数是 method
(方法),通过该参数,你可以选择用于拟合平滑曲线的模型类型.
method = "loess",这是样本量较小时 (\(n\) 较小) 的默认设置 (具体说明可参考
?loess 帮助文档). 尝试调整 span
参数的不同取值,观察会出现什么结果. (2 分)
# 使用不同的 span 参数值
library(ggplot2)
# span = 0.2 (更局部的拟合,更波动)
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = "loess", span = 0.2, se = FALSE) +
ggtitle("span = 0.2 (更波动)")
## `geom_smooth()` using formula = 'y ~ x'
# span = 0.5 (中等平滑度)
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = "loess", span = 0.5, se = FALSE) +
ggtitle("span = 0.5 (中等平滑)")
## `geom_smooth()` using formula = 'y ~ x'
# span = 0.75 (默认值,较平滑)
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = "loess", span = 0.75, se = FALSE) +
ggtitle("span = 0.75 (默认,较平滑)")
## `geom_smooth()` using formula = 'y ~ x'
# span = 1.0 (非常平滑)
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = "loess", span = 1.0, se = FALSE) +
ggtitle("span = 1.0 (非常平滑)")
## `geom_smooth()` using formula = 'y ~ x'
# 同时显示多个span值的效果对比
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(alpha = 0.3) +
geom_smooth(method = "loess", span = 0.2, se = FALSE, color = "red", linetype = "dashed") +
geom_smooth(method = "loess", span = 0.5, se = FALSE, color = "blue", linetype = "dotted") +
geom_smooth(method = "loess", span = 0.75, se = FALSE, color = "green") +
geom_smooth(method = "loess", span = 1.0, se = FALSE, color = "purple", linetype = "dotdash") +
labs(title = "不同span参数值的平滑曲线比较",
subtitle = "红色: span=0.2, 蓝色: span=0.5, 绿色: span=0.75, 紫色: span=1.0")
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
method = "lm" 用于拟合线性模型 (linear
model),得到最佳拟合线. 尝试运行下面的代码,观察会出现什么结果. (2 分) ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = "lm")
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(alpha = 0.6, color = "steelblue", size = 2) + # 调整点的透明度、颜色和大小
geom_smooth(method = "lm", color = "red", fill = "lightcoral",
level = 0.95, linetype = "solid") + # 自定义线性回归线的外观
labs(
title = "发动机排量与高速油耗的线性关系",
subtitle = "使用线性回归模型 (method = 'lm') 拟合",
x = "发动机排量 (升)",
y = "高速油耗 (英里/加仑)",
caption = "数据来源: ggplot2::mpg 数据集"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(color = "gray40")
)
## `geom_smooth()` using formula = 'y ~ x'
method = "gam" 用于拟合由 mgcv
包提供的广义加性模型 (Generalized Additive Model,简称 GAM).
你需要使用类似 formula = y ~ s(x) 或
formula = y ~ s(x, bs = "cs") (适用于大数据集) 这样的公式.
当数据点数量超过 \(1000\)
个时,ggplot2 会默认使用该方法.
尝试运行以下代码,观察会出现什么结果. (2 分)
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = "gam", formula = y ~ s(x))
# 基础GAM拟合
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(alpha = 0.6, color = "steelblue") +
geom_smooth(method = "gam", formula = y ~ s(x),
color = "darkgreen", fill = "lightgreen") +
labs(title = "使用GAM方法拟合发动机排量与高速油耗关系",
subtitle = "formula = y ~ s(x)",
x = "发动机排量 (升)",
y = "高速油耗 (英里/加仑)") +
theme_minimal()
# 比较不同GAM公式
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "gam", formula = y ~ s(x),
color = "darkgreen", aes(fill = "y ~ s(x)"), se = TRUE) +
geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs"),
color = "purple", aes(fill = "y ~ s(x, bs='cs')"), se = TRUE) +
scale_fill_manual(values = c("y ~ s(x)" = "lightgreen",
"y ~ s(x, bs='cs')" = "lavender")) +
labs(title = "不同GAM公式比较",
x = "发动机排量 (升)",
y = "高速油耗 (英里/加仑)",
fill = "GAM公式") +
theme_minimal() +
theme(legend.position = "bottom")
# 与线性模型和loess方法对比
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(alpha = 0.3) +
geom_smooth(method = "gam", formula = y ~ s(x),
color = "darkgreen", aes(fill = "GAM"), se = TRUE) +
geom_smooth(method = "lm", color = "red",
aes(fill = "线性模型"), se = TRUE) +
geom_smooth(method = "loess", color = "blue",
aes(fill = "Loess"), se = TRUE) +
scale_fill_manual(values = c("GAM" = "lightgreen",
"线性模型" = "lightcoral",
"Loess" = "lightblue")) +
labs(title = "GAM、线性模型和Loess方法对比",
subtitle = "GAM能捕捉更复杂的非线性关系",
x = "发动机排量 (升)",
y = "高速油耗 (英里/加仑)",
fill = "拟合方法") +
theme_minimal() +
theme(legend.position = "bottom")
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
R 语言中 alr4 包中的 walleye
数据集,包含了对美国威斯康星州 (Wisconsin) 白斑狗鱼
(walleye,一种淡水鱼) 的测量数据.
2.1 使用 dplyr 包中的 filter
函数筛选数据框中的行,然后用 ggplot2 绘制
age (年龄) 为 \(5\) 至
\(8\) 岁的 length (体长)
的箱线图. (2 分)
# 加载必要的包
library(alr4)
## Loading required package: car
## Loading required package: carData
## Loading required package: effects
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
# 使用filter筛选年龄在5-8岁的数据,并绘制箱线图
walleye %>%
filter(age >= 5 & age <= 8) %>%
ggplot(aes(x = factor(age), y = length)) +
geom_boxplot(fill = "lightblue", alpha = 0.7) +
labs(
title = "白斑狗鱼年龄与体长关系箱线图 (5-8岁)",
x = "年龄 (岁)",
y = "体长 (mm)"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.title = element_text(size = 12)
)
2.2 使用 dplyr 包计算四个年龄组 (\(5\) 至 \(8\) 岁) 中体长 length
的样本均值和样本标准差. (2 分)
# 加载必要的包
library(alr4)
library(dplyr)
# 使用dplyr计算5-8岁年龄组体长的样本均值和标准差
walleye %>%
filter(age >= 5 & age <= 8) %>%
group_by(age) %>%
summarise(
样本均值 = mean(length),
样本标准差 = sd(length),
.groups = 'drop'
)
## # A tibble: 4 × 3
## age 样本均值 样本标准差
## <int> <dbl> <dbl>
## 1 5 364. 30.5
## 2 6 384. 29.1
## 3 7 403. 27.8
## 4 8 420. 28.4
2.3 使用 ggplot2 绘制 age 为 \(5\) 至 \(8\) 岁时 length
的直方图,并将这些图形以 \(2 \times 2\)
的网格形式排列在同一张图中. (2 分)
# 加载必要的包
library(alr4)
library(dplyr)
library(ggplot2)
# 如果walleye对象不存在,使用Walleye
if (!exists("walleye")) {
walleye <- Walleye
}
# 筛选5-8岁的数据
walleye_filtered <- walleye %>%
filter(age >= 5 & age <= 8)
# 绘制2×2网格排列的直方图
ggplot(walleye_filtered, aes(x = length)) +
geom_histogram(fill = "lightblue", color = "black", alpha = 0.7, bins = 15) +
facet_wrap(~ age, nrow = 2, ncol = 2) +
labs(
title = "白斑狗鱼不同年龄组体长分布直方图 (5-8岁)",
x = "体长 (mm)",
y = "频数"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.title = element_text(size = 12),
strip.text = element_text(size = 11, face = "bold") # 分面标题样式
)
以下问题基于 R 语言中的 anscombe 数据集.
3.1 使用 ggplot2 和 gridExtra 包,在
\(2 \times 2\) 的图形网格中绘制 \(4\) 个数据集 \(\left( x_1 ,~ y_1 \right)\)、\(\left( x_2 ,~ y_2 \right)\)、\(\left( x_3 ,~ y_3 \right)\)、\(\left( x_4 ,~ y_4 \right)\) 的散点图. (2 分)
在每个图形上添加数据集的编号,并将其作为该图形的主标题.
使用 bquote 函数为每个图形添加坐标轴标签. 例如:
library(ggplot2)
library(magrittr)
library(dplyr)
ggplot(data = anscombe) +
geom_point(aes(x = x1, y = y1)) +
xlab(bquote(x[1])) +
ylab(bquote(y[1])) +
ggtitle(paste0("n=", dim(anscombe %>% select(x1, y1))[1])) +
theme(plot.title = element_text(hjust = 0.5))
# 加载必要的包
library(ggplot2)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(dplyr)
# 创建四个散点图
plot1 <- ggplot(data = anscombe) +
geom_point(aes(x = x1, y = y1)) +
xlab(bquote(x[1])) +
ylab(bquote(y[1])) +
ggtitle("Dataset 1") +
theme(plot.title = element_text(hjust = 0.5))
plot2 <- ggplot(data = anscombe) +
geom_point(aes(x = x2, y = y2)) +
xlab(bquote(x[2])) +
ylab(bquote(y[2])) +
ggtitle("Dataset 2") +
theme(plot.title = element_text(hjust = 0.5))
plot3 <- ggplot(data = anscombe) +
geom_point(aes(x = x3, y = y3)) +
xlab(bquote(x[3])) +
ylab(bquote(y[3])) +
ggtitle("Dataset 3") +
theme(plot.title = element_text(hjust = 0.5))
plot4 <- ggplot(data = anscombe) +
geom_point(aes(x = x4, y = y4)) +
xlab(bquote(x[4])) +
ylab(bquote(y[4])) +
ggtitle("Dataset 4") +
theme(plot.title = element_text(hjust = 0.5))
# 使用 gridExtra 包将四个图形排列在 2×2 网格中
grid.arrange(plot1, plot2, plot3, plot4, ncol = 2, nrow = 2)
3.2 使用 lm() 拟合以下数据集的简单线性回归模型:
- $y_1 \sim x_1$
- $y_2 \sim x_2$
- $y_3 \sim x_3$
- $y_4 \sim x_4$
验证所有的拟合模型是否具有相同的回归系数 (在数值误差范围内).
<span style="color:red"> (2 分) </span>
# 拟合四个线性回归模型
model1 <- lm(y1 ~ x1, data = anscombe)
model2 <- lm(y2 ~ x2, data = anscombe)
model3 <- lm(y3 ~ x3, data = anscombe)
model4 <- lm(y4 ~ x4, data = anscombe)
# 提取回归系数
coef1 <- coef(model1)
coef2 <- coef(model2)
coef3 <- coef(model3)
coef4 <- coef(model4)
# 创建系数比较表格
coefficients_table <- data.frame(
Dataset = c("y1 ~ x1", "y2 ~ x2", "y3 ~ x3", "y4 ~ x4"),
Intercept = c(coef1[1], coef2[1], coef3[1], coef4[1]),
Slope = c(coef1[2], coef2[2], coef3[2], coef4[2])
)
cat("回归系数比较:\n")
## 回归系数比较:
print(coefficients_table)
## Dataset Intercept Slope
## x1 y1 ~ x1 3.000091 0.5000909
## x2 y2 ~ x2 3.000909 0.5000000
## x3 y3 ~ x3 3.002455 0.4997273
## x4 y4 ~ x4 3.001727 0.4999091
# 验证系数是否在数值误差范围内相同
cat("\n验证回归系数是否相同:\n")
##
## 验证回归系数是否相同:
cat("截距项的最大差异:", max(abs(diff(coefficients_table$Intercept))), "\n")
## 截距项的最大差异: 0.001545455
cat("斜率项的最大差异:", max(abs(diff(coefficients_table$Slope))), "\n")
## 斜率项的最大差异: 0.0002727273
# 使用 all.equal 进行精确比较
cat("\n使用 all.equal 比较 (容差 = 1e-10):\n")
##
## 使用 all.equal 比较 (容差 = 1e-10):
cat("所有模型的截距是否相同:",
isTRUE(all.equal(coef1[1], coef2[1], tolerance = 1e-10)) &
isTRUE(all.equal(coef1[1], coef3[1], tolerance = 1e-10)) &
isTRUE(all.equal(coef1[1], coef4[1], tolerance = 1e-10)), "\n")
## 所有模型的截距是否相同: FALSE
cat("所有模型的斜率是否相同:",
isTRUE(all.equal(coef1[2], coef2[2], tolerance = 1e-10)) &
isTRUE(all.equal(coef1[2], coef3[2], tolerance = 1e-10)) &
isTRUE(all.equal(coef1[2], coef4[2], tolerance = 1e-10)), "\n")
## 所有模型的斜率是否相同: FALSE
# 显示模型摘要
cat("\n详细模型摘要:\n")
##
## 详细模型摘要:
cat("模型1 (y1 ~ x1):\n")
## 模型1 (y1 ~ x1):
print(summary(model1)$coefficients)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0000909 1.1247468 2.667348 0.025734051
## x1 0.5000909 0.1179055 4.241455 0.002169629
cat("\n模型2 (y2 ~ x2):\n")
##
## 模型2 (y2 ~ x2):
print(summary(model2)$coefficients)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.000909 1.1253024 2.666758 0.025758941
## x2 0.500000 0.1179637 4.238590 0.002178816
cat("\n模型3 (y3 ~ x3):\n")
##
## 模型3 (y3 ~ x3):
print(summary(model3)$coefficients)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0024545 1.1244812 2.670080 0.025619109
## x3 0.4997273 0.1178777 4.239372 0.002176305
cat("\n模型4 (y4 ~ x4):\n")
##
## 模型4 (y4 ~ x4):
print(summary(model4)$coefficients)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0017273 1.1239211 2.670763 0.025590425
## x4 0.4999091 0.1178189 4.243028 0.002164602
3.3 使用 cor() 命令计算每个数据集的样本相关系数. (2 分)
# 3.3 使用 cor() 计算每个数据集的样本相关系数
# 计算四个数据集的样本相关系数
cor1 <- cor(anscombe$x1, anscombe$y1)
cor2 <- cor(anscombe$x2, anscombe$y2)
cor3 <- cor(anscombe$x3, anscombe$y3)
cor4 <- cor(anscombe$x4, anscombe$y4)
# 创建相关系数比较表格
correlation_table <- data.frame(
Dataset = c("(x1, y1)", "(x2, y2)", "(x3, y3)", "(x4, y4)"),
Correlation = c(cor1, cor2, cor3, cor4)
)
cat("样本相关系数比较:\n")
## 样本相关系数比较:
print(correlation_table)
## Dataset Correlation
## 1 (x1, y1) 0.8164205
## 2 (x2, y2) 0.8162365
## 3 (x3, y3) 0.8162867
## 4 (x4, y4) 0.8165214
# 验证相关系数是否在数值误差范围内相同
cat("\n验证相关系数是否相同:\n")
##
## 验证相关系数是否相同:
cat("相关系数的最大差异:", max(abs(diff(correlation_table$Correlation))), "\n")
## 相关系数的最大差异: 0.0002346974
# 使用 all.equal 进行精确比较
cat("\n使用 all.equal 比较 (容差 = 1e-10):\n")
##
## 使用 all.equal 比较 (容差 = 1e-10):
cat("所有数据集的相关系数是否相同:",
isTRUE(all.equal(cor1, cor2, tolerance = 1e-10)) &
isTRUE(all.equal(cor1, cor3, tolerance = 1e-10)) &
isTRUE(all.equal(cor1, cor4, tolerance = 1e-10)), "\n")
## 所有数据集的相关系数是否相同: FALSE
# 显示更精确的相关系数值
cat("\n精确的相关系数值:\n")
##
## 精确的相关系数值:
cat("数据集 1 (x1, y1):", sprintf("%.10f", cor1), "\n")
## 数据集 1 (x1, y1): 0.8164205163
cat("数据集 2 (x2, y2):", sprintf("%.10f", cor2), "\n")
## 数据集 2 (x2, y2): 0.8162365060
cat("数据集 3 (x3, y3):", sprintf("%.10f", cor3), "\n")
## 数据集 3 (x3, y3): 0.8162867395
cat("数据集 4 (x4, y4):", sprintf("%.10f", cor4), "\n")
## 数据集 4 (x4, y4): 0.8165214369
3.4 拟合与第 \(2\)
问中相同的模型,但交换 \(x\) 和 \(y\) 的位置. 使用 summary()
命令查看结果,当交换 \(x\) 和 \(y\)
的位置时,结果中是否有某些内容保持不变? (2 分)
# 拟合交换x和y位置的四个模型
model1_swapped <- lm(x1 ~ y1, data = anscombe)
model2_swapped <- lm(x2 ~ y2, data = anscombe)
model3_swapped <- lm(x3 ~ y3, data = anscombe)
model4_swapped <- lm(x4 ~ y4, data = anscombe)
# 查看原始模型和交换后模型的摘要
cat("原始模型 (y ~ x) 的摘要:\n")
## 原始模型 (y ~ x) 的摘要:
cat("模型1 (y1 ~ x1):\n")
## 模型1 (y1 ~ x1):
print(summary(model1))
##
## Call:
## lm(formula = y1 ~ x1, data = anscombe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.92127 -0.45577 -0.04136 0.70941 1.83882
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0001 1.1247 2.667 0.02573 *
## x1 0.5001 0.1179 4.241 0.00217 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared: 0.6665, Adjusted R-squared: 0.6295
## F-statistic: 17.99 on 1 and 9 DF, p-value: 0.00217
cat("\n模型2 (y2 ~ x2):\n")
##
## 模型2 (y2 ~ x2):
print(summary(model2))
##
## Call:
## lm(formula = y2 ~ x2, data = anscombe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9009 -0.7609 0.1291 0.9491 1.2691
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.001 1.125 2.667 0.02576 *
## x2 0.500 0.118 4.239 0.00218 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared: 0.6662, Adjusted R-squared: 0.6292
## F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002179
cat("\n模型3 (y3 ~ x3):\n")
##
## 模型3 (y3 ~ x3):
print(summary(model3))
##
## Call:
## lm(formula = y3 ~ x3, data = anscombe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1586 -0.6146 -0.2303 0.1540 3.2411
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0025 1.1245 2.670 0.02562 *
## x3 0.4997 0.1179 4.239 0.00218 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.236 on 9 degrees of freedom
## Multiple R-squared: 0.6663, Adjusted R-squared: 0.6292
## F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002176
cat("\n模型4 (y4 ~ x4):\n")
##
## 模型4 (y4 ~ x4):
print(summary(model4))
##
## Call:
## lm(formula = y4 ~ x4, data = anscombe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.751 -0.831 0.000 0.809 1.839
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0017 1.1239 2.671 0.02559 *
## x4 0.4999 0.1178 4.243 0.00216 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.236 on 9 degrees of freedom
## Multiple R-squared: 0.6667, Adjusted R-squared: 0.6297
## F-statistic: 18 on 1 and 9 DF, p-value: 0.002165
cat("\n\n交换后的模型 (x ~ y) 的摘要:\n")
##
##
## 交换后的模型 (x ~ y) 的摘要:
cat("模型1 (x1 ~ y1):\n")
## 模型1 (x1 ~ y1):
print(summary(model1_swapped))
##
## Call:
## lm(formula = x1 ~ y1, data = anscombe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6522 -1.5117 -0.2657 1.2341 3.8946
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.9975 2.4344 -0.410 0.69156
## y1 1.3328 0.3142 4.241 0.00217 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.019 on 9 degrees of freedom
## Multiple R-squared: 0.6665, Adjusted R-squared: 0.6295
## F-statistic: 17.99 on 1 and 9 DF, p-value: 0.00217
cat("\n模型2 (x2 ~ y2):\n")
##
## 模型2 (x2 ~ y2):
print(summary(model2_swapped))
##
## Call:
## lm(formula = x2 ~ y2, data = anscombe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8516 -1.4315 -0.3440 0.8467 4.2017
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.9948 2.4354 -0.408 0.69246
## y2 1.3325 0.3144 4.239 0.00218 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.02 on 9 degrees of freedom
## Multiple R-squared: 0.6662, Adjusted R-squared: 0.6292
## F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002179
cat("\n模型3 (x3 ~ y3):\n")
##
## 模型3 (x3 ~ y3):
print(summary(model3_swapped))
##
## Call:
## lm(formula = x3 ~ y3, data = anscombe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9869 -1.3733 -0.0266 1.3200 3.2133
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0003 2.4362 -0.411 0.69097
## y3 1.3334 0.3145 4.239 0.00218 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.019 on 9 degrees of freedom
## Multiple R-squared: 0.6663, Adjusted R-squared: 0.6292
## F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002176
cat("\n模型4 (x4 ~ y4):\n")
##
## 模型4 (x4 ~ y4):
print(summary(model4_swapped))
##
## Call:
## lm(formula = x4 ~ y4, data = anscombe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7859 -1.4122 -0.1853 1.4551 3.3329
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0036 2.4349 -0.412 0.68985
## y4 1.3337 0.3143 4.243 0.00216 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.018 on 9 degrees of freedom
## Multiple R-squared: 0.6667, Adjusted R-squared: 0.6297
## F-statistic: 18 on 1 and 9 DF, p-value: 0.002165
# 比较保持不变的内容
cat("\n比较保持不变的内容:\n")
##
## 比较保持不变的内容:
# 1. 比较R-squared值
cat("\n1. R-squared值比较:\n")
##
## 1. R-squared值比较:
cat("原始模型 R-squared:",
summary(model1)$r.squared, summary(model2)$r.squared,
summary(model3)$r.squared, summary(model4)$r.squared, "\n")
## 原始模型 R-squared: 0.6665425 0.666242 0.666324 0.6667073
cat("交换后模型 R-squared:",
summary(model1_swapped)$r.squared, summary(model2_swapped)$r.squared,
summary(model3_swapped)$r.squared, summary(model4_swapped)$r.squared, "\n")
## 交换后模型 R-squared: 0.6665425 0.666242 0.666324 0.6667073
cat("R-squared值是否相同:",
isTRUE(all.equal(summary(model1)$r.squared, summary(model1_swapped)$r.squared)) &
isTRUE(all.equal(summary(model2)$r.squared, summary(model2_swapped)$r.squared)) &
isTRUE(all.equal(summary(model3)$r.squared, summary(model3_swapped)$r.squared)) &
isTRUE(all.equal(summary(model4)$r.squared, summary(model4_swapped)$r.squared)), "\n")
## R-squared值是否相同: TRUE
# 2. 比较F统计量的值
cat("\n2. F统计量值比较:\n")
##
## 2. F统计量值比较:
cat("原始模型 F-statistic:",
summary(model1)$fstatistic[1], summary(model2)$fstatistic[1],
summary(model3)$fstatistic[1], summary(model4)$fstatistic[1], "\n")
## 原始模型 F-statistic: 17.98994 17.96565 17.97228 18.00329
cat("交换后模型 F-statistic:",
summary(model1_swapped)$fstatistic[1], summary(model2_swapped)$fstatistic[1],
summary(model3_swapped)$fstatistic[1], summary(model4_swapped)$fstatistic[1], "\n")
## 交换后模型 F-statistic: 17.98994 17.96565 17.97228 18.00329
cat("F统计量值是否相同:",
isTRUE(all.equal(summary(model1)$fstatistic[1], summary(model1_swapped)$fstatistic[1])) &
isTRUE(all.equal(summary(model2)$fstatistic[1], summary(model2_swapped)$fstatistic[1])) &
isTRUE(all.equal(summary(model3)$fstatistic[1], summary(model3_swapped)$fstatistic[1])) &
isTRUE(all.equal(summary(model4)$fstatistic[1], summary(model4_swapped)$fstatistic[1])), "\n")
## F统计量值是否相同: TRUE
# 3. 比较t统计量的绝对值(对于斜率系数)
cat("\n3. 斜率系数的t统计量绝对值比较:\n")
##
## 3. 斜率系数的t统计量绝对值比较:
cat("原始模型 |t| for slope:",
abs(summary(model1)$coefficients[2,3]), abs(summary(model2)$coefficients[2,3]),
abs(summary(model3)$coefficients[2,3]), abs(summary(model4)$coefficients[2,3]), "\n")
## 原始模型 |t| for slope: 4.241455 4.23859 4.239372 4.243028
cat("交换后模型 |t| for slope:",
abs(summary(model1_swapped)$coefficients[2,3]), abs(summary(model2_swapped)$coefficients[2,3]),
abs(summary(model3_swapped)$coefficients[2,3]), abs(summary(model4_swapped)$coefficients[2,3]), "\n")
## 交换后模型 |t| for slope: 4.241455 4.23859 4.239372 4.243028
cat("t统计量绝对值是否相同:",
isTRUE(all.equal(abs(summary(model1)$coefficients[2,3]), abs(summary(model1_swapped)$coefficients[2,3]))) &
isTRUE(all.equal(abs(summary(model2)$coefficients[2,3]), abs(summary(model2_swapped)$coefficients[2,3]))) &
isTRUE(all.equal(abs(summary(model3)$coefficients[2,3]), abs(summary(model3_swapped)$coefficients[2,3]))) &
isTRUE(all.equal(abs(summary(model4)$coefficients[2,3]), abs(summary(model4_swapped)$coefficients[2,3]))), "\n")
## t统计量绝对值是否相同: TRUE
# 4. 比较p值(对于斜率系数)
cat("\n4. 斜率系数的p值比较:\n")
##
## 4. 斜率系数的p值比较:
cat("原始模型 p-value for slope:",
summary(model1)$coefficients[2,4], summary(model2)$coefficients[2,4],
summary(model3)$coefficients[2,4], summary(model4)$coefficients[2,4], "\n")
## 原始模型 p-value for slope: 0.002169629 0.002178816 0.002176305 0.002164602
cat("交换后模型 p-value for slope:",
summary(model1_swapped)$coefficients[2,4], summary(model2_swapped)$coefficients[2,4],
summary(model3_swapped)$coefficients[2,4], summary(model4_swapped)$coefficients[2,4], "\n")
## 交换后模型 p-value for slope: 0.002169629 0.002178816 0.002176305 0.002164602
cat("p值是否相同:",
isTRUE(all.equal(summary(model1)$coefficients[2,4], summary(model1_swapped)$coefficients[2,4])) &
isTRUE(all.equal(summary(model2)$coefficients[2,4], summary(model2_swapped)$coefficients[2,4])) &
isTRUE(all.equal(summary(model3)$coefficients[2,4], summary(model3_swapped)$coefficients[2,4])) &
isTRUE(all.equal(summary(model4)$coefficients[2,4], summary(model4_swapped)$coefficients[2,4])), "\n")
## p值是否相同: TRUE
# 结论
cat("\n结论:\n")
##
## 结论:
cat("当交换x和y的位置时,以下内容保持不变:\n")
## 当交换x和y的位置时,以下内容保持不变:
cat("- R-squared值\n")
## - R-squared值
cat("- F统计量的值\n")
## - F统计量的值
cat("- 斜率系数的t统计量绝对值\n")
## - 斜率系数的t统计量绝对值
cat("- 斜率系数的p值\n")
## - 斜率系数的p值
cat("这是因为在简单线性回归中,这些统计量基于变量间的相关性,而相关性是对称的。")
## 这是因为在简单线性回归中,这些统计量基于变量间的相关性,而相关性是对称的。
3.5 计算每个数据集的残差平方和 (SSE)、总平方和 (SST) 以及决定系数
\(R^2\). 使用
mean、sum、predict 或
resid 命令. (使用原始模型,即模型形式为 \(y_i \sim x_i\),因此只需计算 \(4\) 个残差平方和 (SSE) 的值). (2 分)
# 使用原始模型 (y_i ~ x_i)
# 重新拟合模型以确保可用
model1 <- lm(y1 ~ x1, data = anscombe)
model2 <- lm(y2 ~ x2, data = anscombe)
model3 <- lm(y3 ~ x3, data = anscombe)
model4 <- lm(y4 ~ x4, data = anscombe)
# 计算每个数据集的统计量
calculate_stats <- function(model, x_var, y_var) {
# 获取残差
residuals <- resid(model)
# 计算残差平方和 (SSE)
SSE <- sum(residuals^2)
# 获取响应变量
y <- model$model[[1]]
# 计算总平方和 (SST)
SST <- sum((y - mean(y))^2)
# 计算回归平方和 (SSR)
SSR <- sum((predict(model) - mean(y))^2)
# 计算决定系数 R²
R_squared <- 1 - (SSE / SST)
# 验证 R² 是否与 summary() 输出一致
R_squared_verify <- summary(model)$r.squared
return(list(SSE = SSE, SST = SST, SSR = SSR,
R_squared = R_squared, R_squared_verify = R_squared_verify))
}
# 计算四个数据集的统计量
stats1 <- calculate_stats(model1, anscombe$x1, anscombe$y1)
stats2 <- calculate_stats(model2, anscombe$x2, anscombe$y2)
stats3 <- calculate_stats(model3, anscombe$x3, anscombe$y3)
stats4 <- calculate_stats(model4, anscombe$x4, anscombe$y4)
# 创建结果表格
results_table <- data.frame(
Dataset = c("y1 ~ x1", "y2 ~ x2", "y3 ~ x3", "y4 ~ x4"),
SSE = c(stats1$SSE, stats2$SSE, stats3$SSE, stats4$SSE),
SST = c(stats1$SST, stats2$SST, stats3$SST, stats4$SST),
SSR = c(stats1$SSR, stats2$SSR, stats3$SSR, stats4$SSR),
R_squared = c(stats1$R_squared, stats2$R_squared, stats3$R_squared, stats4$R_squared),
R_squared_from_summary = c(stats1$R_squared_verify, stats2$R_squared_verify,
stats3$R_squared_verify, stats4$R_squared_verify)
)
cat("残差平方和(SSE)、总平方和(SST)和决定系数(R²)的计算结果:\n")
## 残差平方和(SSE)、总平方和(SST)和决定系数(R²)的计算结果:
print(results_table)
## Dataset SSE SST SSR R_squared R_squared_from_summary
## 1 y1 ~ x1 13.76269 41.27269 27.51000 0.6665425 0.6665425
## 2 y2 ~ x2 13.77629 41.27629 27.50000 0.6662420 0.6662420
## 3 y3 ~ x3 13.75619 41.22620 27.47001 0.6663240 0.6663240
## 4 y4 ~ x4 13.74249 41.23249 27.49000 0.6667073 0.6667073
# 验证 SSE + SSR = SST
cat("\n验证 SSE + SSR = SST:\n")
##
## 验证 SSE + SSR = SST:
for(i in 1:4) {
equality_check <- results_table$SSE[i] + results_table$SSR[i] - results_table$SST[i]
cat(paste0("数据集 ", i, ": SSE + SSR - SST = ",
format(equality_check, scientific = TRUE, digits = 6),
" (应该接近0)\n"))
}
## 数据集 1: SSE + SSR - SST = 7.10543e-15 (应该接近0)
## 数据集 2: SSE + SSR - SST = 1.42109e-14 (应该接近0)
## 数据集 3: SSE + SSR - SST = 7.10543e-15 (应该接近0)
## 数据集 4: SSE + SSR - SST = -1.42109e-14 (应该接近0)
# 验证 R² = 1 - SSE/SST
cat("\n验证 R² = 1 - SSE/SST:\n")
##
## 验证 R² = 1 - SSE/SST:
for(i in 1:4) {
R_squared_calc <- 1 - (results_table$SSE[i] / results_table$SST[i])
difference <- R_squared_calc - results_table$R_squared[i]
cat(paste0("数据集 ", i, ": 计算R² - 公式R² = ",
format(difference, scientific = TRUE, digits = 6),
" (应该接近0)\n"))
}
## 数据集 1: 计算R² - 公式R² = 0e+00 (应该接近0)
## 数据集 2: 计算R² - 公式R² = 0e+00 (应该接近0)
## 数据集 3: 计算R² - 公式R² = 0e+00 (应该接近0)
## 数据集 4: 计算R² - 公式R² = 0e+00 (应该接近0)
# 验证与 summary() 输出的一致性
cat("\n验证与 summary() 输出的一致性:\n")
##
## 验证与 summary() 输出的一致性:
for(i in 1:4) {
difference <- results_table$R_squared[i] - results_table$R_squared_from_summary[i]
cat(paste0("数据集 ", i, ": 计算R² - summary()R² = ",
format(difference, scientific = TRUE, digits = 6),
" (应该接近0)\n"))
}
## 数据集 1: 计算R² - summary()R² = 0e+00 (应该接近0)
## 数据集 2: 计算R² - summary()R² = 1.11022e-16 (应该接近0)
## 数据集 3: 计算R² - summary()R² = 0e+00 (应该接近0)
## 数据集 4: 计算R² - summary()R² = 0e+00 (应该接近0)
# 手动计算方法展示(使用要求的函数)
cat("\n手动计算方法展示:\n")
##
## 手动计算方法展示:
cat("数据集1 (y1 ~ x1):\n")
## 数据集1 (y1 ~ x1):
y1_mean <- mean(anscombe$y1)
SSE_manual <- sum(resid(model1)^2)
SST_manual <- sum((anscombe$y1 - y1_mean)^2)
R2_manual <- 1 - (SSE_manual / SST_manual)
cat(" SSE = sum(resid(model)^2) =", SSE_manual, "\n")
## SSE = sum(resid(model)^2) = 13.76269
cat(" SST = sum((y - mean(y))^2) =", SST_manual, "\n")
## SST = sum((y - mean(y))^2) = 41.27269
cat(" R² = 1 - SSE/SST =", R2_manual, "\n")
## R² = 1 - SSE/SST = 0.6665425
# 总结
cat("\n总结:\n")
##
## 总结:
cat("四个数据集的残差平方和(SSE)分别为:\n")
## 四个数据集的残差平方和(SSE)分别为:
for(i in 1:4) {
cat(paste0(" 数据集", i, ": ", round(results_table$SSE[i], 6), "\n"))
}
## 数据集1: 13.76269
## 数据集2: 13.776291
## 数据集3: 13.756192
## 数据集4: 13.74249
cat("在数值误差范围内,四个数据集的SSE几乎相同。\n")
## 在数值误差范围内,四个数据集的SSE几乎相同。
3.6 使用 ggplot2 包重新绘制数据,并在每个图形上添加回归线. (使用原始模型,即模型形式为 \(y_i \sim x_i\),因此只需绘制 \(4\) 个图形). (2 分)
# 加载必要的包
library(ggplot2)
library(gridExtra)
# 创建四个带有回归线的散点图
plot1_with_line <- ggplot(data = anscombe, aes(x = x1, y = y1)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red", linewidth = 0.8) +
xlab(bquote(x[1])) +
ylab(bquote(y[1])) +
ggtitle("Dataset 1 with Regression Line") +
theme(plot.title = element_text(hjust = 0.5))
plot2_with_line <- ggplot(data = anscombe, aes(x = x2, y = y2)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red", linewidth = 0.8) +
xlab(bquote(x[2])) +
ylab(bquote(y[2])) +
ggtitle("Dataset 2 with Regression Line") +
theme(plot.title = element_text(hjust = 0.5))
plot3_with_line <- ggplot(data = anscombe, aes(x = x3, y = y3)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red", linewidth = 0.8) +
xlab(bquote(x[3])) +
ylab(bquote(y[3])) +
ggtitle("Dataset 3 with Regression Line") +
theme(plot.title = element_text(hjust = 0.5))
plot4_with_line <- ggplot(data = anscombe, aes(x = x4, y = y4)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red", linewidth = 0.8) +
xlab(bquote(x[4])) +
ylab(bquote(y[4])) +
ggtitle("Dataset 4 with Regression Line") +
theme(plot.title = element_text(hjust = 0.5))
# 在2×2网格中显示图形
grid.arrange(plot1_with_line, plot2_with_line, plot3_with_line, plot4_with_line,
ncol = 2, nrow = 2)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
# 可选:添加回归方程和R²值到图形中
# 创建函数来生成回归方程标签
get_equation <- function(model) {
coef <- coef(model)
intercept <- round(coef[1], 3)
slope <- round(coef[2], 3)
r_squared <- round(summary(model)$r.squared, 3)
paste0("y = ", slope, "x + ", intercept, "\nR² = ", r_squared)
}
# 创建带有回归方程和R²值的图形
plot1_annotated <- plot1_with_line +
annotate("text", x = min(anscombe$x1), y = max(anscombe$y1),
label = get_equation(model1), hjust = 0, vjust = 1, size = 3.5)
plot2_annotated <- plot2_with_line +
annotate("text", x = min(anscombe$x2), y = max(anscombe$y2),
label = get_equation(model2), hjust = 0, vjust = 1, size = 3.5)
plot3_annotated <- plot3_with_line +
annotate("text", x = min(anscombe$x3), y = max(anscombe$y3),
label = get_equation(model3), hjust = 0, vjust = 1, size = 3.5)
plot4_annotated <- plot4_with_line +
annotate("text", x = min(anscombe$x4), y = max(anscombe$y4),
label = get_equation(model4), hjust = 0, vjust = 1, size = 3.5)
# 显示带有注解的图形
cat("\n带有回归方程和R²值的图形:\n")
##
## 带有回归方程和R²值的图形:
grid.arrange(plot1_annotated, plot2_annotated, plot3_annotated, plot4_annotated,
ncol = 2, nrow = 2)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
# 显示回归方程的详细信息
cat("\n回归方程详细信息:\n")
##
## 回归方程详细信息:
cat("数据集 1:", get_equation(model1), "\n")
## 数据集 1: y = 0.5x + 3
## R² = 0.667
cat("数据集 2:", get_equation(model2), "\n")
## 数据集 2: y = 0.5x + 3.001
## R² = 0.666
cat("数据集 3:", get_equation(model3), "\n")
## 数据集 3: y = 0.5x + 3.002
## R² = 0.666
cat("数据集 4:", get_equation(model4), "\n")
## 数据集 4: y = 0.5x + 3.002
## R² = 0.667
对简单线性回归模型 \[ Y = \beta_0 + \beta_1 \, X + \varepsilon ,\, \quad \varepsilon \sim N \left( 0 ,\, \sigma^2 \right) \,. \] 若 \(\left( x_1 ,\, y_1 \right) ,\, \ldots ,\, \left( x_n ,\, y_n \right)\) 是对 \((X ,\, Y)\) 的一组观测数值,求回归参数 \(\beta_0 ,\, \beta_1\) 的最小二乘估计. (2 分)
# 第4题:简单线性回归参数的最小二乘估计
# 模型: Y = β₀ + β₁X + ε, ε ~ N(0, σ²)
# 观测数据(模拟的线性关系数据)
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(2.1, 3.8, 5.2, 7.1, 8.9, 10.5, 12.1, 13.8, 15.3, 16.8)
# 计算基本统计量
n <- length(x)
x_bar <- mean(x)
y_bar <- mean(y)
# 最小二乘估计核心计算
# β₁_hat = Σ(xi - x̄)(yi - ȳ) / Σ(xi - x̄)²
S_xy <- sum((x - x_bar) * (y - y_bar))
S_xx <- sum((x - x_bar)^2)
beta_1_hat <- S_xy / S_xx
# β₀_hat = ȳ - β₁_hat × x̄
beta_0_hat <- y_bar - beta_1_hat * x_bar
# 使用R内置函数验证计算结果
model_lm <- lm(y ~ x)
beta_0_lm <- coef(model_lm)[1]
beta_1_lm <- coef(model_lm)[2]
# 正式输出最小二乘估计结果
cat("==================== 最小二乘估计结果 ====================\n")
## ==================== 最小二乘估计结果 ====================
cat("根据公式计算:\n")
## 根据公式计算:
cat(sprintf("β₀_hat = %.6f\n", beta_0_hat))
## β₀_hat = 0.480000
cat(sprintf("β₁_hat = %.6f\n", beta_1_hat))
## β₁_hat = 1.650909
cat("\n使用lm()函数验证:\n")
##
## 使用lm()函数验证:
cat(sprintf("β₀_hat = %.6f\n", beta_0_lm))
## β₀_hat = 0.480000
cat(sprintf("β₁_hat = %.6f\n", beta_1_lm))
## β₁_hat = 1.650909
# 验证计算精度
tolerance <- 1e-10
if (abs(beta_0_hat - beta_0_lm) < tolerance &&
abs(beta_1_hat - beta_1_lm) < tolerance) {
cat("\n✓ 验证通过:手动计算结果与lm()函数完全一致\n")
} else {
cat("\n⚠ 注意:计算结果存在微小差异\n")
}
##
## ✓ 验证通过:手动计算结果与lm()函数完全一致
# 输出数学公式对应关系
cat("\n==================== 公式对应关系 ====================\n")
##
## ==================== 公式对应关系 ====================
cat("β₁_hat = S_xy / S_xx =", S_xy, "/", S_xx, "=", beta_1_hat, "\n")
## β₁_hat = S_xy / S_xx = 136.2 / 82.5 = 1.650909
cat("β₀_hat = ȳ - β₁_hat × x̄ =", y_bar, "-", beta_1_hat, "×", x_bar, "=", beta_0_hat, "\n")
## β₀_hat = ȳ - β₁_hat × x̄ = 9.56 - 1.650909 × 5.5 = 0.48
# 最终答案框显(符合数学答题格式)
cat("\n==================== 最终答案 ====================\n")
##
## ==================== 最终答案 ====================
cat("回归参数的最小二乘估计为:\n")
## 回归参数的最小二乘估计为:
cat(sprintf("β₀_hat = %.4f\n", beta_0_hat))
## β₀_hat = 0.4800
cat(sprintf("β₁_hat = %.4f\n", beta_1_hat))
## β₁_hat = 1.6509