数据可视化期末报告

Author

221527127刘政宏

1 报告要求

  • 期末实验报告由5章节5个图形组成,每个章节需要作一个图形。

  • 每个章节选择作什么图自主选择,作图前补充完整图形标题名称,例如:图形1——多变量条形图。

  • 案例数据自主收集,不同章节可以公用一个数据集。但同学间不允许使用相同数据集。

  • 每个章节的数据集合需要通过datatable 函数展示,并简要解释数据来源和变量意义。

  • 每个输出图形后需要对图形作简要解读,最少需针对图形提出一个观点。

  • 渲染html文件保留代码展示,6月22日前将发布网址提交至共享文档“8、期末报告” 列中。

  • 评分标准:

    • 每章节图形各20分

    • 能有效输出图形和合理解释75%

    • 数据独特性强10%

    • 图形个性化强15%

2 类别数据可视化

2.1 案例数据解释与展示

  • 以下是2010年至2024年中国财政关键数据的整理,主要涵盖一般公共预算收入、支出、赤字、债务及部分金融指标。

  • 数据来源:国家统计局、财政部及审计署等官方报告

  • 时间跨度:2010-2024年

library(readxl)


data1 <- read_excel("zuoye.xlsx")

DT::datatable(data1,rownames = FALSE)
# 创建数据框(或从CSV读取)
fiscal_data <- data.frame(
  year = 2010:2024,
  revenue = c(8.30, 10.37, 11.73, 12.90, 14.04, 15.22, 15.96, 17.26, 18.34, 19.04, 18.29, 20.25, 20.37, 21.68, 21.97),
  expenditure = c(8.90, 10.89, 12.60, 14.00, 15.18, 17.58, 18.78, 20.33, 22.09, 23.89, 24.56, 24.63, 26.19, 27.46, 28.46)
)

2.2 图形1——玫瑰图

library(ggplot2)
library(dplyr)
library(tidyr)

# 转换数据为长格式(适合ggplot2)
fiscal_long <- fiscal_data %>%
  pivot_longer(
    cols = c(revenue, expenditure),
    names_to = "category",
    values_to = "amount"
  )

# 绘制玫瑰图
ggplot(fiscal_long, aes(x = factor(year), y = amount, fill = category)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar(start = 0) +  # 转换为极坐标
  scale_fill_manual(
    values = c("revenue" = "#4E79A7", "expenditure" = "#E15759"),
    labels = c("财政收入", "财政支出")
  ) +
  labs(
    title = "中国财政收入与支出(2010-2024)",
    subtitle = "单位:万亿元",
    fill = "类别"
  ) +
  theme_minimal() +
  theme(
    axis.title = element_blank(),
    axis.text.y = element_blank(),
    panel.grid = element_blank(),
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12)
  )

  • 图形解读:图形特点
    • 每个年份的扇形区域代表财政数据。

    • 内环为财政收入(蓝色),外环为财政支出(红色)。

    • 半径长度对应数值大小(单位:万亿元)。

    • 半径长度:代表财政金额(万亿元),半径越长数值越大。

    • 角度范围:每个年份占据相同的角度(360°/15年=24°)。

    • 颜色区分:蓝色(财政收入)和红色(财政支出)形成对比。

    • 2.2.0.1 趋势观察

      • 财政收入增长:蓝色环半径逐年增大(2010→2024增长2.6倍)。

      • 支出扩张更快:红色环外扩更明显,尤其2020年后。

      • 赤字可视化:红蓝环之间的差距代表赤字(2020年最宽)。

3 数据分布可视化

3.1 案例数据解释与展示

library(readxl)


data1 <- read_excel("zuoye.xlsx")

DT::datatable(data1,rownames = FALSE)
# 创建数据框(或从CSV读取)
fiscal_data <- data.frame(
  year = 2010:2024,
  revenue = c(8.30, 10.37, 11.73, 12.90, 14.04, 15.22, 15.96, 17.26, 18.34, 19.04, 18.29, 20.25, 20.37, 21.68, 21.97),
  expenditure = c(8.90, 10.89, 12.60, 14.00, 15.18, 17.58, 18.78, 20.33, 22.09, 23.89, 24.56, 24.63, 26.19, 27.46, 28.46)
)

3.2 图形2——箱线图

# 加载必要的包
library(ggplot2)
library(tidyr)
library(dplyr)

# 创建2010-2024中国财政数据框
fiscal_data <- data.frame(
  Year = 2010:2024,
  财政收入 = c(8.30, 10.37, 11.73, 12.90, 14.04, 15.22, 15.96, 17.26, 18.34, 19.04, 18.29, 20.25, 20.37, 21.68, 21.97),
  财政支出 = c(8.90, 10.89, 12.60, 14.00, 15.18, 17.58, 18.78, 20.33, 22.09, 23.89, 24.56, 24.63, 26.19, 27.46, 28.46),
  财政赤字 = c(0.60, 0.52, 0.87, 1.10, 1.14, 2.36, 2.82, 3.07, 3.75, 4.85, 6.27, 4.38, 5.83, 5.78, 6.49),
  国债余额 = c(6.75, 7.20, 7.76, 8.66, 9.56, 10.66, 12.01, 13.48, 14.96, 16.80, 20.89, 23.20, 25.19, 29.50, 34.57),
  地方政府债务余额 = c(10.72, 12.38, 15.89, 17.89, 15.40, 16.00, 15.32, 16.47, 18.39, 21.31, 25.66, 30.47, 35.07, 40.74, 47.54)
)

# 将数据转换为长格式(便于ggplot绘图)
fiscal_long <- fiscal_data %>%
  pivot_longer(
    cols = -Year,
    names_to = "指标",
    values_to = "数值"
  )

# 创建箱线图
ggplot(fiscal_long, aes(x = 指标, y = 数值, fill = 指标)) +
  geom_boxplot(
    alpha = 0.7,          # 填充透明度
    outlier.shape = 21,    # 异常点形状
    outlier.size = 3,      # 异常点大小
    outlier.color = "red", # 异常点颜色
    notch = TRUE          # 中位数置信区间
  ) +
  # 添加数据点(半透明显示)
  geom_jitter(
    width = 0.15, 
    alpha = 0.5,
    color = "blue"
  ) +
  # 添加标签
  labs(
    title = "中国财政主要指标分布(2010-2024年)",
    subtitle = "箱线图展示中位数、四分位距及异常值",
    x = "财政指标",
    y = "数值(万亿元)",
    caption = "数据来源:财政部年度报告"
  ) +
  # 设置颜色
  scale_fill_brewer(palette = "Set2") +
  # 主题美化
  theme_minimal(base_size = 12) +
  theme(
    legend.position = "none",  # 移除图例
    axis.text.x = element_text(angle = 45, hjust = 1),  # x轴标签倾斜
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5, color = "gray40"),
    panel.grid.major.x = element_blank()
  ) +
  # 添加中位数标签
  stat_summary(
    fun = median, 
    geom = "text", 
    aes(label = sprintf("%.1f", ..y..)),
    position = position_nudge(x = -0.4),
    size = 3.5,
    color = "darkred"
  )

# 可选:分面显示(避免尺度差异问题)
ggplot(fiscal_long, aes(x = 指标, y = 数值, fill = 指标)) +
  geom_boxplot() +
  facet_wrap(~指标, scales = "free_y", nrow = 2) +
  labs(title = "中国财政指标分布(分面显示)") +
  theme(axis.text.x = element_blank())

  • 图形解读:箱体结构
    • 箱体下边界:第一四分位数(Q1)

    • 箱体中线:中位数(Median)

    • 箱体上边界:第三四分位数(Q3)

    • 箱体高度:四分位距(IQR = Q3-Q1)

  • 须线范围
    • 上下须线通常延伸至1.5倍IQR内的最远数据点

    • 超出此范围的点为异常值(图中红点)

  • 关键观察点
    • 财政收入/支出:分布相对集中,中位数稳步上升

    • 财政赤字:分布范围扩大,2020年后明显上移

    • 国债余额:呈指数增长趋势,2024年有异常高值

    • 地方政府债务:分布范围最广,离散程度最大

4 变量关系可视化

4.1 案例数据解释与展示

library(readxl)


data1 <- read_excel("zuoye.xlsx")

DT::datatable(data1,rownames = FALSE)
# 创建数据框(或从CSV读取)
fiscal_data <- data.frame(
  year = 2010:2024,
  revenue = c(8.30, 10.37, 11.73, 12.90, 14.04, 15.22, 15.96, 17.26, 18.34, 19.04, 18.29, 20.25, 20.37, 21.68, 21.97),
  expenditure = c(8.90, 10.89, 12.60, 14.00, 15.18, 17.58, 18.78, 20.33, 22.09, 23.89, 24.56, 24.63, 26.19, 27.46, 28.46)
)

4.2 图形3——散点图矩阵

install.packages(
  c("GGally", "ggplot2", "scales"),
  repos = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/"
)

  There is a binary version available but the source version is later:
       binary source needs_compilation
scales  1.3.0  1.4.0             FALSE

Error in download.file(url, destfile, method, mode = "wb", ...) : 
  cannot open URL 'https://mirrors.tuna.tsinghua.edu.cn/CRAN/bin/windows/contrib/4.3/GGally_2.2.1.zip'
Error in download.file(url, destfile, method, mode = "wb", ...) : 
  cannot open URL 'https://mirrors.tuna.tsinghua.edu.cn/CRAN/src/contrib/scales_1.4.0.tar.gz'
library(GGally)
# 创建财政数据框(含GDP假设数据)
fiscal_data <- data.frame(
  Year = 2010:2024,
  财政收入 = c(8.30, 10.37, 11.73, 12.90, 14.04, 15.22, 15.96, 17.26, 18.34, 19.04, 18.29, 20.25, 20.37, 21.68, 21.97),
  财政支出 = c(8.90, 10.89, 12.60, 14.00, 15.18, 17.58, 18.78, 20.33, 22.09, 23.89, 24.56, 24.63, 26.19, 27.46, 28.46),
  财政赤字 = c(0.60, 0.52, 0.87, 1.10, 1.14, 2.36, 2.82, 3.07, 3.75, 4.85, 6.27, 4.38, 5.83, 5.78, 6.49),
  国债余额 = c(6.75, 7.20, 7.76, 8.66, 9.56, 10.66, 12.01, 13.48, 14.96, 16.80, 20.89, 23.20, 25.19, 29.50, 34.57),
  地方债务 = c(10.72, 12.38, 15.89, 17.89, 15.40, 16.00, 15.32, 16.47, 18.39, 21.31, 25.66, 30.47, 35.07, 40.74, 47.54),
  GDP = c(40.2, 48.8, 54.0, 59.3, 64.4, 68.9, 74.6, 83.2, 91.9, 99.1, 101.4, 114.9, 121.0, 126.1, 130.0) # 假设GDP数据(万亿元)
)

# 制作散点图矩阵
ggpairs(
  fiscal_data[, 2:7], # 选择需要分析的列
  columns = c("财政收入", "财政支出", "财政赤字", "国债余额", "地方债务", "GDP"),
  title = "中国财政指标关系矩阵(2010-2024)",
  upper = list(continuous = wrap("cor", size = 4)), # 上三角显示相关系数
  lower = list(continuous = wrap("points", alpha = 0.7, size = 2)), # 下三角散点图
  diag = list(continuous = wrap("densityDiag", fill = "lightblue")) # 对角线显示密度曲线
) +
  theme_bw(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    panel.grid.minor = element_blank(),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

  • 图形解读:通过散点图矩阵发现:
    1. 强协同增长:财政支出、收入与GDP高度同步

    2. 债务驱动模式:国债与地方债务呈现”双螺旋”上升

    3. 结构性风险

      • 赤字与债务增速超过经济增速

      • 地方债务离散度增大(部分区域风险积聚)

5 样本相似性可视化

5.1 案例数据解释与展示

library(readxl)


data1 <- read_excel("zuoye.xlsx")

DT::datatable(data1,rownames = FALSE)
# 创建数据框(或从CSV读取)
fiscal_data <- data.frame(
  year = 2010:2024,
  revenue = c(8.30, 10.37, 11.73, 12.90, 14.04, 15.22, 15.96, 17.26, 18.34, 19.04, 18.29, 20.25, 20.37, 21.68, 21.97),
  expenditure = c(8.90, 10.89, 12.60, 14.00, 15.18, 17.58, 18.78, 20.33, 22.09, 23.89, 24.56, 24.63, 26.19, 27.46, 28.46)
)

5.2 图形4——分面气泡图

# 安装必要包
install.packages(
  c("ggplot2", "dplyr", "scales"), 
  repos = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/"
)

  There is a binary version available but the source version is later:
       binary source needs_compilation
scales  1.3.0  1.4.0             FALSE

Error in download.file(url, destfile, method, mode = "wb", ...) : 
  cannot open URL 'https://mirrors.tuna.tsinghua.edu.cn/CRAN/src/contrib/scales_1.4.0.tar.gz'
library(ggplot2)
library(dplyr)
library(scales)
install.packages("tidyr")
library(tidyr)
# 创建财政数据框(添加GDP假设数据)
fiscal_data <- data.frame(
  Year = 2010:2024,
  财政收入 = c(8.30, 10.37, 11.73, 12.90, 14.04, 15.22, 15.96, 17.26, 18.34, 19.04, 18.29, 20.25, 20.37, 21.68, 21.97),
  财政支出 = c(8.90, 10.89, 12.60, 14.00, 15.18, 17.58, 18.78, 20.33, 22.09, 23.89, 24.56, 24.63, 26.19, 27.46, 28.46),
  财政赤字 = c(0.60, 0.52, 0.87, 1.10, 1.14, 2.36, 2.82, 3.07, 3.75, 4.85, 6.27, 4.38, 5.83, 5.78, 6.49),
  国债余额 = c(6.75, 7.20, 7.76, 8.66, 9.56, 10.66, 12.01, 13.48, 14.96, 16.80, 20.89, 23.20, 25.19, 29.50, 34.57),
  地方债务 = c(10.72, 12.38, 15.89, 17.89, 15.40, 16.00, 15.32, 16.47, 18.39, 21.31, 25.66, 30.47, 35.07, 40.74, 47.54),
  GDP = c(40.2, 48.8, 54.0, 59.3, 64.4, 68.9, 74.6, 83.2, 91.9, 99.1, 101.4, 114.9, 121.0, 126.1, 130.0) # 假设GDP数据(万亿元)
)

# 转换为长格式(用于分面)
fiscal_long <- fiscal_data %>%
  pivot_longer(
    cols = c(财政收入, 财政支出, 国债余额, 地方债务),
    names_to = "指标类型",
    values_to = "金额"
  )

# 分面气泡图
ggplot(fiscal_long, aes(x = Year, y = 金额)) +
  geom_point(
    aes(size = 财政赤字,  # 气泡大小映射赤字规模
        color = GDP),    # 颜色映射GDP增长
    alpha = 0.7) +
  scale_size_continuous(
    name = "财政赤字(万亿元)",
    range = c(3, 15),  # 气泡大小范围
    breaks = c(1, 3, 6)) +
  scale_color_gradientn(
    name = "GDP(万亿元)",
    colors = c("#377EB8", "#4DAF4A", "#E41A1C"),
    values = rescale(c(40, 85, 130))) +
  facet_wrap(~指标类型, 
             ncol = 2,
             scales = "free_y") +  # y轴独立缩放
  labs(
    title = "中国财政指标分面气泡图(2010-2024)",
    subtitle = "气泡大小=财政赤字 | 颜色=GDP规模",
    x = "年份",
    y = "金额(万亿元)"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    strip.background = element_rect(fill = "gray90"),
    strip.text = element_text(face = "bold"),
    legend.position = "right",
    panel.grid.minor = element_blank(),
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold")
  ) +
  scale_x_continuous(breaks = seq(2010, 2024, 2)) +
  geom_text(
    data = filter(fiscal_long, Year %in% c(2010, 2015, 2020, 2024)),
    aes(label = Year), 
    size = 3, 
    vjust = -1.5)

  • 图形解读:通过分面气泡图,可以直观发现:
    1. 债务驱动模式:地方债务与国债呈现”双轮驱动”

    2. 疫情冲击:2020年所有面板气泡特征突变

    3. 非线性关系:GDP增长与财政指标的相关性随时间变化

6 时间序列可视化

6.1 案例数据解释与展示

library(readxl)


data1 <- read_excel("zuoye.xlsx")

DT::datatable(data1,rownames = FALSE)
# 创建数据框(或从CSV读取)
fiscal_data <- data.frame(
  year = 2010:2024,
  revenue = c(8.30, 10.37, 11.73, 12.90, 14.04, 15.22, 15.96, 17.26, 18.34, 19.04, 18.29, 20.25, 20.37, 21.68, 21.97),
  expenditure = c(8.90, 10.89, 12.60, 14.00, 15.18, 17.58, 18.78, 20.33, 22.09, 23.89, 24.56, 24.63, 26.19, 27.46, 28.46)
)

6.2 图形5——流线图

options(repos = c(CRAN = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
# 安装必要包
install.packages(c("ggplot2", "ggstream", "dplyr", "tidyr"))
Error in download.file(url, destfile, method, mode = "wb", ...) : 
  cannot open URL 'https://mirrors.tuna.tsinghua.edu.cn/CRAN/bin/windows/contrib/4.3/ggstream_0.1.0.zip'
library(ggplot2)
library(ggstream) # 流线图专用包
library(dplyr)
library(tidyr)

# 创建财政数据(长格式)
fiscal_long <- data.frame(
  Year = rep(2010:2024, 3),
  Category = rep(c("财政收入", "财政支出", "财政赤字"), each = 15),
  Value = c(
    c(8.30, 10.37, 11.73, 12.90, 14.04, 15.22, 15.96, 17.26, 18.34, 19.04, 18.29, 20.25, 20.37, 21.68, 21.97), # 收入
    c(8.90, 10.89, 12.60, 14.00, 15.18, 17.58, 18.78, 20.33, 22.09, 23.89, 24.56, 24.63, 26.19, 27.46, 28.46), # 支出
    c(0.60, 0.52, 0.87, 1.10, 1.14, 2.36, 2.82, 3.07, 3.75, 4.85, 6.27, 4.38, 5.83, 5.78, 6.49)  # 赤字
  )
) %>%
  mutate(Value = ifelse(Category == "财政赤字", -Value, Value)) # 赤字负向显示

# 绘制基础流线图
ggplot(fiscal_long, aes(x = Year, y = Value, fill = Category)) +
  geom_stream(
    type = "mirror",  # 镜像对称布局
    bw = 0.6,         # 平滑参数
    extra_span = 0.2  # 两端扩展
  ) +
  scale_fill_manual(
    values = c("财政收入" = "#4E79A7", 
               "财政支出" = "#E15759",
               "财政赤字" = "#59A14F"),
    labels = c("财政收入", "财政支出", "赤字(支出-收入)")
  ) +
  labs(
    title = "中国财政收支流线图(2010-2024)",
    subtitle = "单位:万亿元 | 赤字显示为支出超出部分",
    x = "年份",
    y = "金额",
    fill = "类别"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, color = "gray40"),
    legend.position = "bottom",
    panel.grid.major.x = element_line(color = "gray90"),
    panel.grid.minor.x = element_blank()
  ) +
  scale_x_continuous(breaks = seq(2010, 2024, 2)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray30")

  • 图形解读:波浪宽度:表示该指标的绝对规模
    • 上波浪:财政支出(红色)

    • 下波浪:财政收入(蓝色)

    • 中间绿色区域:财政赤字(支出>收入的部分)