11.4 y轴 scaling
根据缩放数字y轴的方式,您可以使效果显得庞大或微不足道。考虑以下示例,比较男性和女性助理教授的9个月薪水。 数据来自“学术工资”数据集。
# load data
data(Salaries, package="carData")
# get means, standard deviations, and 95% confidence intervals for assistant professor salary by sex
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
df <- Salaries %>%
filter(rank == "AsstProf") %>%
group_by(sex) %>%
summarize(n = n(),
mean = mean(salary),
sd = sd(salary),
se = sd / sqrt(n),
ci = qt(0.975, df = n - 1) * se)
df
## # A tibble: 2 x 6
## sex n mean sd se ci
## <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Female 11 78050. 9372. 2826. 6296.
## 2 Male 56 81311. 7901. 1056. 2116.
# create and save the plot
library(ggplot2)
p <- ggplot(df,
aes(x = sex,
y = mean)) +
geom_point(size = 4) +
geom_line(aes(group = 1)) +
scale_y_continuous(limits = c(77000, 82000),
labels = scales::dollar) +
labs(title = "Mean salary differences by gender",
subtitle = "9-mo academic salary in 2007-2008",
caption = paste("source: Fox J. and Weisberg, S. (2011)",
"An R Companion to Applied Regression,",
"Second Edition Sage"),
x = "Gender",
y = "Salary")
p

First, let’s plot this with a y-axis going from 77,000 to 82,000.
# plot in a narrow range of y
p + scale_y_continuous(limits=c(77000, 82000))
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

性别差异似乎很大。接下来,让我们以y轴从0到125,000绘制相同的数据。
# plot in a wide range of y
p + scale_y_continuous(limits = c(0, 125000))
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

似乎没有性别差异!数据可视化的目的是要以最小的失真展示发现,这意味着为y轴应该选择合适的范围。
条形图应几乎总是从y = 0开始。对于其他图表,限制实际上取决于对值的预期范围的主题知识。我们还可以通过添加不确定性指标来改进图形。
df %>%
ggplot(aes(sex,mean)) +
geom_point(size = 3,col = "red") +
geom_errorbar(aes(ymin = mean - ci,ymax = mean + ci),width = 0.2,col = "steelblue") +
geom_line(aes(group = 1))
p <- ggplot(df,
aes(x = sex, y = mean)) +
geom_point(size = 4) +
geom_line(aes(group = 1)) +
scale_y_continuous(limits = c(70000, 85000),
labels = scales::dollar) +
labs(title = "Mean salary differences by gender",
subtitle = "9-mo academic salary in 2007-2008",
caption = paste("source: Fox J. and Weisberg, S. (2011)",
"An R Companion to Applied Regression,",
"Second Edition Sage"),
x = "Gender",
y = "Salary")
# plot with confidence limits
p + geom_errorbar(aes(ymin = mean - ci,
ymax = mean + ci),
width = .1) +
ggplot2::annotate("text",
label = "I-bars are 95% \nconfidence intervals",
x=2,
y=73500,
size = 4) +
theme(text = element_text(family = "Times New Roman"))


看看离散变量绘制线段,有意思哈哈
df %>%
ggplot(aes(sex,mean)) +
geom_point(size = 3) +
geom_segment(aes(x = 1,y = 78049.91,xend = 2,yend = 81311.46),size = 1)
df %>%
ggplot(aes(sex,mean)) +
geom_point(size = 3) +
geom_line(group = 1) +
theme(text = element_text(family = "Times New Roman",face = "italic"))


