- 研究目标:探究影响我国财政收入的关键因素。
- 数据来源:《中国统计年鉴》(1978–2023)
- 变量说明:
- y:财政收入(亿元)
- aav/iav/cav:农业/工业/建筑业增加值(亿元)
- pnum:人口数(万人)
- tsrc:社会零售消费总额(亿元)
- da:受灾面积(千公顷)
2025-06-05
| 变量 | y | iav | cav | pnum | tsrc |
|---|---|---|---|---|---|
| y | 1.00 | 0.99 | 0.98 | 0.85 | 0.96 |
| iav | 0.99 | 1.00 | 0.96 | 0.83 | 0.94 |
高度正相关:财政收入与工业、建筑业、消费密切相关
lm1 <- lm(y ~ cav, data = data_frame) summary(lm1)
## ## Call: ## lm(formula = y ~ cav, data = data_frame) ## ## Residuals: ## Min 1Q Median 3Q Max ## -18142 -3864 -1660 3112 20310 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.288e+03 1.625e+03 1.408 0.166 ## cav 2.715e+00 4.855e-02 55.924 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 8705 on 44 degrees of freedom ## Multiple R-squared: 0.9861, Adjusted R-squared: 0.9858 ## F-statistic: 3127 on 1 and 44 DF, p-value: < 2.2e-16
lm_m <- lm(y ~ . - year, data = data_frame) summary(lm_m)
## ## Call: ## lm(formula = y ~ . - year, data = data_frame) ## ## Residuals: ## Min 1Q Median 3Q Max ## -11308.7 -3094.1 -429.1 2717.1 9092.3 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.995e+04 2.010e+04 2.486 0.01733 * ## aav 4.237e-01 5.328e-01 0.795 0.43132 ## iav 3.406e-01 9.616e-02 3.542 0.00105 ** ## cav -3.674e+00 8.094e-01 -4.539 5.30e-05 *** ## pnum -4.541e-01 2.166e-01 -2.096 0.04263 * ## tsrc 8.121e-01 1.165e-01 6.972 2.34e-08 *** ## da -1.240e-01 1.477e-01 -0.839 0.40642 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5075 on 39 degrees of freedom ## Multiple R-squared: 0.9958, Adjusted R-squared: 0.9952 ## F-statistic: 1548 on 6 and 39 DF, p-value: < 2.2e-16
lm_step <- step(lm_m, direction = "both")
## Start: AIC=791.37 ## y ~ (year + aav + iav + cav + pnum + tsrc + da) - year ## ## Df Sum of Sq RSS AIC ## - aav 1 16288915 1020944193 790.11 ## - da 1 18146177 1022801455 790.19 ## <none> 1004655278 791.37 ## - pnum 1 113163056 1117818334 794.28 ## - iav 1 323149250 1327804528 802.19 ## - cav 1 530716300 1535371578 808.88 ## - tsrc 1 1252073996 2256729274 826.59 ## ## Step: AIC=790.11 ## y ~ iav + cav + pnum + tsrc + da ## ## Df Sum of Sq RSS AIC ## - da 1 22273335 1043217528 789.10 ## <none> 1020944193 790.11 ## + aav 1 16288915 1004655278 791.37 ## - pnum 1 127007228 1147951421 793.50 ## - cav 1 558299481 1579243674 808.17 ## - iav 1 788344583 1809288776 814.43 ## - tsrc 1 1277151144 2298095337 825.43 ## ## Step: AIC=789.1 ## y ~ iav + cav + pnum + tsrc ## ## Df Sum of Sq RSS AIC ## <none> 1043217528 789.10 ## + da 1 22273335 1020944193 790.11 ## + aav 1 20416073 1022801455 790.19 ## - pnum 1 277772181 1320989709 797.96 ## - cav 1 635122400 1678339929 808.97 ## - iav 1 970339994 2013557522 817.35 ## - tsrc 1 1440169027 2483386555 827.00
summary(lm_step)
## ## Call: ## lm(formula = y ~ iav + cav + pnum + tsrc, data = data_frame) ## ## Residuals: ## Min 1Q Median 3Q Max ## -11686.0 -2962.8 -300.7 1879.0 10584.9 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.014e+04 1.341e+04 2.994 0.00466 ** ## iav 4.130e-01 6.688e-02 6.175 2.44e-07 *** ## cav -3.538e+00 7.081e-01 -4.996 1.14e-05 *** ## pnum -4.041e-01 1.223e-01 -3.304 0.00198 ** ## tsrc 8.100e-01 1.077e-01 7.523 3.04e-09 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5044 on 41 degrees of freedom ## Multiple R-squared: 0.9957, Adjusted R-squared: 0.9952 ## F-statistic: 2351 on 4 and 41 DF, p-value: < 2.2e-16
最终模型包含:iav, cav, pnum, tsrc
vif(lm_step)
## iav cav pnum tsrc ## 122.773342 633.605303 5.274357 454.865127
发现多重共线性严重
com1 <- prcomp(data_frame[, -c(1, ncol(data_frame))], scale. = TRUE) pc_scores <- com1$x y <- data_frame$y model_pcr <- lm(y ~ PC1 + PC2, data = as.data.frame(pc_scores)) summary(model_pcr)
## ## Call: ## lm(formula = y ~ PC1 + PC2, data = as.data.frame(pc_scores)) ## ## Residuals: ## Min 1Q Median 3Q Max ## -17600.3 -6617.2 171.1 6366.4 16244.2 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 58027.4 1230.1 47.173 < 2e-16 *** ## PC1 31155.6 534.1 58.328 < 2e-16 *** ## PC2 4874.1 1806.1 2.699 0.00991 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 8343 on 43 degrees of freedom ## Multiple R-squared: 0.9875, Adjusted R-squared: 0.987 ## F-statistic: 1705 on 2 and 43 DF, p-value: < 2.2e-16
abse <- abs(rstandard(lm_step)) cor.test(data_frame$iav, abse, method = "spearman")
## ## Spearman's rank correlation rho ## ## data: data_frame$iav and abse ## S = 6558, p-value = 1.876e-05 ## alternative hypothesis: true rho is not equal to 0 ## sample estimates: ## rho ## 0.5955597
发现异方差性 → 进行对数变换
\[ \log(y) = -0.2452 + 2.241 \times 10^{-5} \cdot iav - 1.588 \times 10^{-4} \cdot cav + 7.230 \times 10^{-5} \cdot pnum + 1.539 \times 10^{-5} \cdot tsrc \]
| 变量 | 系数符号 | 经济含义 |
|---|---|---|
| iav | 正 | 工业发展带动税收增长 |
| cav | 负 | 成本高、抵扣多导致贡献偏低 |
| pnum | 正 | 人口多意味着税源广 |
| tsrc | 正 | 消费活跃带来增值税等税收增长 |