Fifty male and fifty female students fill out the same questionnaire in weekly intervals starting five weeks before an important examination to measure state anxiety. The research interests are: 1. whether there are gender difference in state anxiety 2. individual differences in state anxiety Explore the answers to both questions with plots involving confidence intervals or error bars for the means.
Source: Von Eye, A., & Schuster C. (1998). Regression Analysis for Social Sciences. San Diego: Academic Press.
Column 1: Anxiety score 5 weeks before exam for female
Column 2: Anxiety score 4 weeks before exam for female
Column 3: Anxiety score 3 weeks before exam for female
Column 4: Anxiety score 2 weeks before exam for female
Column 5: Anxiety score 1 weeks before exam for female
Column 6: Anxiety score 5 weeks before exam for male
Column 7: Anxiety score 4 weeks before exam for male
Column 8: Anxiety score 3 weeks before exam for male
Column 9: Anxiety score 2 weeks before exam for male
Column 10: Anxiety score 1 weeks before exam for male
library(ggplot2)
#讀資料
dta <- read.table("D:\\stateAnxiety.txt", header = T)
head(dta) f1 f2 f3 f4 f5 m1 m2 m3 m4 m5
1 13 17 18 20 24 6 14 22 20 24
2 26 31 33 38 42 4 11 14 12 23
3 13 17 24 29 32 17 25 26 29 38
4 22 24 26 27 29 19 22 26 30 34
5 18 19 19 22 30 12 21 21 23 24
6 32 31 30 31 32 11 16 20 19 22
#50 obs. of 10 variables
str(dta)'data.frame': 50 obs. of 10 variables:
$ f1: int 13 26 13 22 18 32 16 18 14 20 ...
$ f2: int 17 31 17 24 19 31 16 22 17 19 ...
$ f3: int 18 33 24 26 19 30 21 25 23 23 ...
$ f4: int 20 38 29 27 22 31 27 29 21 25 ...
$ f5: int 24 42 32 29 30 32 30 35 25 28 ...
$ m1: int 6 4 17 19 12 11 14 9 12 11 ...
$ m2: int 14 11 25 22 21 16 23 18 16 13 ...
$ m3: int 22 14 26 26 21 20 26 20 23 17 ...
$ m4: int 20 12 29 30 23 19 29 20 26 14 ...
$ m5: int 24 23 38 34 24 22 33 24 32 20 ...
long = stack(dta)
head(long) values ind
1 13 f1
2 26 f1
3 13 f1
4 22 f1
5 18 f1
6 32 f1
matplot(x=c(1:5),
y=t(dta[, -c(6:10)]),
main="Anxiety score for female",
type='b',
pch=1,
cex=.5,
col='#4428bc',
bty='n',
xlab="Time (Weeks)",
ylab="Anxiety score")matplot(x=c(1:5),
y=t(dta[, -c(1:5)]),
main="Anxiety score for male",
type='b',
pch=1,
cex=.5,
col='#4480bc',
bty='n',
xlab="Time (Weeks)",
ylab="Anxiety score")#basic R
boxplot(values ~ ind, data = long)#ggplot
p2<-ggplot(long,
aes(x = ind, y = values))+
labs(x='week',
y='Anxiety score') +
geom_boxplot()
p2圖示法蠻直觀的
可以看出來無論女生男生,焦慮程度都會隨著日期上升。 把女生和男生用boxplot放一起看,可以看的出來女生比男生焦慮程度更高
Use the markdown file to replicate the contents of Weissgerber, T.L., Milic, N.M., Winham, S.J., Garovic, V.D. (2015). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLOS Biology , 13. The two data sets are here provided: journal.pbio.1002128.s002.XLS and journal.pbio.1002128.s003.XLS . You can also download everything in a zip file from this location.
library(readxl)
# read in data from PLOS Biology article supplementary materials
independent_data <- read_excel("D:\\journal.pbio.1002128.s002.XLSX", sheet = 1)
# subset just groups 1-5 from the 'No overlapping points' sheet
independent_data <- independent_data[15:30,2:6]
# assign column names
names(independent_data) <- independent_data[1, ]
# remove first row with column names
independent_data <- independent_data[-1, ]
knitr::kable(independent_data)| Group 1 | Group 2 | Group 3 | Group 4 | Group 5 |
|---|---|---|---|---|
| 5 | 7 | 9 | 42 | 2 |
| 3 | 3 | 7 | 2 | 0 |
| 6 | 9 | 10 | 5 | 3 |
| 8 | 10 | 12 | 55 | 5 |
| 10 | 33 | 14 | 9 | 7 |
| 13 | 15 | 17 | 12 | 10 |
| 1 | 18 | 20 | 15 | 13 |
| 4 | 6 | 40 | 3 | 1 |
| 18 | 20 | 22 | NA | 15 |
| 4 | 30 | 35 | NA | 1 |
| 7 | NA | 42 | NA | 4 |
| 9 | NA | 13 | NA | 6 |
| 14 | NA | NA | NA | 11 |
| 15 | NA | NA | NA | 12 |
| 17 | NA | NA | NA | 14 |
An important step is reshaping the data from their current wide format to a more tidy long format. Long formats are most useful for plotting and statistical analysis in R. Here’s what the data look like in the long format:
# reshape for plotting
library(tidyr)
independent_data_long <- gather(independent_data, group, value, `Group 1`:`Group 5`, convert = TRUE)
knitr::kable(independent_data_long)| group | value |
|---|---|
| Group 1 | 5 |
| Group 1 | 3 |
| Group 1 | 6 |
| Group 1 | 8 |
| Group 1 | 10 |
| Group 1 | 13 |
| Group 1 | 1 |
| Group 1 | 4 |
| Group 1 | 18 |
| Group 1 | 4 |
| Group 1 | 7 |
| Group 1 | 9 |
| Group 1 | 14 |
| Group 1 | 15 |
| Group 1 | 17 |
| Group 2 | 7 |
| Group 2 | 3 |
| Group 2 | 9 |
| Group 2 | 10 |
| Group 2 | 33 |
| Group 2 | 15 |
| Group 2 | 18 |
| Group 2 | 6 |
| Group 2 | 20 |
| Group 2 | 30 |
| Group 2 | NA |
| Group 2 | NA |
| Group 2 | NA |
| Group 2 | NA |
| Group 2 | NA |
| Group 3 | 9 |
| Group 3 | 7 |
| Group 3 | 10 |
| Group 3 | 12 |
| Group 3 | 14 |
| Group 3 | 17 |
| Group 3 | 20 |
| Group 3 | 40 |
| Group 3 | 22 |
| Group 3 | 35 |
| Group 3 | 42 |
| Group 3 | 13 |
| Group 3 | NA |
| Group 3 | NA |
| Group 3 | NA |
| Group 4 | 42 |
| Group 4 | 2 |
| Group 4 | 5 |
| Group 4 | 55 |
| Group 4 | 9 |
| Group 4 | 12 |
| Group 4 | 15 |
| Group 4 | 3 |
| Group 4 | NA |
| Group 4 | NA |
| Group 4 | NA |
| Group 4 | NA |
| Group 4 | NA |
| Group 4 | NA |
| Group 4 | NA |
| Group 5 | 2 |
| Group 5 | 0 |
| Group 5 | 3 |
| Group 5 | 5 |
| Group 5 | 7 |
| Group 5 | 10 |
| Group 5 | 13 |
| Group 5 | 1 |
| Group 5 | 15 |
| Group 5 | 1 |
| Group 5 | 4 |
| Group 5 | 6 |
| Group 5 | 11 |
| Group 5 | 12 |
| Group 5 | 14 |
Now we are ready to plot, starting with subsetting just groups 1 and and 2 from the long data frame. Open circles show measurements for each participant or observation.
# plot
library(ggplot2)
library(dplyr)
# subset groups 1 & 2
independent_data_long_groups_1_and_2 <- independent_data_long %>%
filter(group %in% c("Group 1", "Group 2"))
# plot
ggplot(independent_data_long_groups_1_and_2, aes(group, as.numeric(value))) +
geom_point(shape = 1, size = 4) +
stat_summary(fun.y = median, fun.ymin = median, fun.ymax = median, geom = "crossbar", width = 0.2, size = 1) +
xlab("") +
ylab("Measurement (units)") +
theme_minimal(base_size = 16) Plotting groups 1, 2, and 3, the only thing that changes is the subsetting method:
# subset groups 1, 2 & 3
independent_data_long_groups_1_2_3 <- independent_data_long %>%
filter(group %in% c("Group 1", "Group 2", "Group 3"))
# plot
ggplot(independent_data_long_groups_1_2_3, aes(group, as.numeric(value))) +
geom_point(shape = 1, size = 4) +
stat_summary(fun.y = median, fun.ymin = median, fun.ymax = median, geom = "crossbar", width = 0.2, size = 1) +
xlab("") +
ylab("Measurement (units)") +
theme_minimal(base_size = 16) Plotting groups 1 to 4:
# groups 1, 2, 3, & 4
independent_data_long_groups_1_2_3_4 <- independent_data_long %>%
filter(group %in% c("Group 1", "Group 2", "Group 3", "Group 4"))
# plot
ggplot(independent_data_long_groups_1_2_3_4, aes(group, as.numeric(value))) +
geom_point(shape = 1, size = 4) +
stat_summary(fun.y = median, fun.ymin = median, fun.ymax = median, geom = "crossbar", width = 0.2, size = 1) +
xlab("") +
ylab("Measurement (units)") +
theme_minimal(base_size = 16) And finally plotting all five groups, no subsetting required:
# all five groups
ggplot(independent_data_long, aes(group, as.numeric(value))) +
geom_point(shape = 1, size = 4) +
stat_summary(fun.y = median, fun.ymin = median, fun.ymax = median, geom = "crossbar", width = 0.2, size = 1) +
xlab("") +
ylab("Measurement (units)") +
theme_minimal(base_size = 16) library(readxl)
# read in data from PLOS Biology article supplementary materials
independent_data_j <- read_excel("D:\\journal.pbio.1002128.s002.XLSX", sheet = 2)
# subset data from the 'points jittered' sheet
independent_data_j <- independent_data_j[16:115,2:3]
# group numbers are not given in the spreadsheet, so we'll add them
independent_data_j$Groups <- c(rep(1, 20), rep(2, 20), rep(3, 20), rep(4, 20), rep(5, 20))
# assign column names
names(independent_data_j) <- c("Subject ID", "Measurement", "Group")The data are already in a nice tidy long format, with Group Name in one column and Measurement Values in another column, so we don’t need to reshape them. We can go directly to plotting them, first two groups, then three, then four, then all five groups. Once again the only thing that varies is how we subset the original data.
# plot
library(ggplot2)
library(dplyr)
# groups 1 & 2
independent_data_j_groups_1_and_2 <- independent_data_j %>%
filter(Group %in% 1:2)
# plot
ggplot(independent_data_j_groups_1_and_2, aes(as.factor(Group), as.numeric(Measurement))) +
geom_jitter(shape = 1, size = 4, position=position_jitter(width = 0.2, height = 0.2)) +
stat_summary(fun.y = median, fun.ymin = median, fun.ymax = median, geom = "crossbar", width = 0.2, size = 1) +
xlab("") +
ylab("Measurement (units)") +
theme_minimal(base_size = 16) # groups 1, 2 & 3
independent_data_j_groups_1_2_3 <- independent_data_j %>%
filter(Group %in% 1:3)
# plot
ggplot(independent_data_j_groups_1_2_3, aes(as.factor(Group), as.numeric(Measurement))) +
geom_jitter(shape = 1, size = 4, position=position_jitter(width = 0.2, height = 0.2)) +
stat_summary(fun.y = median, fun.ymin = median, fun.ymax = median, geom = "crossbar", width = 0.2, size = 1) +
xlab("") +
ylab("Measurement (units)") +
theme_minimal(base_size = 16) # groups 1, 2, 3, & 4
independent_data_j_groups_1_2_3_4 <- independent_data_j %>%
filter(Group %in% 1:4)
# plot
ggplot(independent_data_j_groups_1_2_3_4, aes(as.factor(Group), as.numeric(Measurement))) +
geom_jitter(shape = 1, size = 4, position=position_jitter(width = 0.2, height = 0.2)) +
stat_summary(fun.y = median, fun.ymin = median, fun.ymax = median, geom = "crossbar", width = 0.2, size = 1) +
xlab("") +
ylab("Measurement (units)") +
theme_minimal(base_size = 16) # all five groups
ggplot(independent_data_j, aes(as.factor(Group), as.numeric(Measurement))) +
geom_jitter(shape = 1, size = 4, position=position_jitter(width = 0.2, height = 0.2)) +
stat_summary(fun.y = median, fun.ymin = median, fun.ymax = median, geom = "crossbar", width = 0.2, size = 1) +
xlab("") +
ylab("Measurement (units)") +
theme_minimal(base_size = 16)library(readxl)
# read in data from PLOS Biology article supplementary materials
One_group_two_conditions <- read_excel("D:\\journal.pbio.1002128.s003.XLS", sheet = 1)
# subset data from the 'points jittered' sheet
One_group_two_conditions <- One_group_two_conditions[12:23,1:3]
# assign column names
names(One_group_two_conditions) <- c("Subject ID", "Condition 1 Name", "Condition 2 Name")
One_group_two_conditions$difference <- as.numeric(One_group_two_conditions$`Condition 2 Name`) - as.numeric(One_group_two_conditions$`Condition 1 Name`)The data in the Excel sheet are in an untidy wide format, so let’s convert them to a tidy long format:
# reshape for plotting
library(tidyr)
One_group_two_conditions_long <- gather(One_group_two_conditions, group, value, `Condition 1 Name`:`Condition 2 Name`, -`Subject ID`, -difference, convert = TRUE)Now we can plot:
# plot
library(ggplot2)
library(gridExtra)
g1 <- ggplot(One_group_two_conditions_long, aes(group, as.numeric(value), group = `Subject ID`)) +
geom_point(shape = 1, size = 4) +
geom_line() +
xlab("") +
ylab("Measurement (units)") +
theme_minimal(base_size = 16) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
# differences
g2 <- ggplot(One_group_two_conditions_long, aes(x = 1, y = difference)) +
geom_point(shape = 1, size = 4) +
stat_summary(fun.y = median, fun.ymin = median, fun.ymax = median, geom = "crossbar", width = 0.001, size = 1) +
xlab("") +
ylab("Difference in Measurement (units)") +
theme_minimal(base_size = 16) +
scale_x_continuous(breaks = NULL) +
coord_fixed(ratio = 0.0005)
# combine the two plots
grid.arrange(g1, g2, ncol = 2)library(readxl)
# read in data from PLOS Biology article supplementary materials
Two_groups_two_conditions <- read_excel("D:\\journal.pbio.1002128.s003.XLS", sheet = 2)
# subset data from the 'points jittered' sheet
Two_groups_two_conditions <- Two_groups_two_conditions[12:41,2:5]
# assign group names
Two_groups_two_conditions$group <- c(rep("Group 1 Name", 15), rep("Group 2 Name", 15))
names(Two_groups_two_conditions) <- c("Condition 1 Name A", "Condition 2 Name A", "Condition 1 Name B", "Condition 2 Name B")The data in the Excel sheet are in an unusual structure, so a few steps for reshaping into a tidy form are needed. Here’s how we can tidy them and how they look after being tidied:
# convert to simple long form
Two_groups_two_conditions[,1] <- unlist(c(Two_groups_two_conditions[1:15,1], Two_groups_two_conditions[16:30,3]))
Two_groups_two_conditions[,2] <- unlist(c(Two_groups_two_conditions[1:15,2], Two_groups_two_conditions[16:30,4]))
# drop unneeded columns
Two_groups_two_conditions <- Two_groups_two_conditions[,c(1:2, 5)]
# assign column names
names(Two_groups_two_conditions) <- c("Condition 1", "Condition 2", "Group")
Two_groups_two_conditions$`Subject ID` <- 1:30
# compute differences
Two_groups_two_conditions$difference <- as.numeric(Two_groups_two_conditions$`Condition 2`) - as.numeric(Two_groups_two_conditions$`Condition 1`)
# convert to long again
library(tidyr)
Two_groups_two_conditions_long <- gather(Two_groups_two_conditions, condition, value, c(`Condition 1`, `Condition 2`), convert = TRUE)
knitr::kable(Two_groups_two_conditions_long)| Group | Subject ID | difference | condition | value |
|---|---|---|---|---|
| Group 1 Name | 1 | 8 | Condition 1 | 5 |
| Group 1 Name | 2 | 4 | Condition 1 | 1 |
| Group 1 Name | 3 | 5 | Condition 1 | 7 |
| Group 1 Name | 4 | 2 | Condition 1 | 9 |
| Group 1 Name | 5 | 7 | Condition 1 | 2 |
| Group 1 Name | 6 | -1 | Condition 1 | 6 |
| Group 1 Name | 7 | 1 | Condition 1 | 4 |
| Group 1 Name | 8 | 3 | Condition 1 | 11 |
| Group 1 Name | 9 | -2 | Condition 1 | 14 |
| Group 1 Name | 10 | 6 | Condition 1 | 13 |
| Group 1 Name | 11 | NA | Condition 1 | NA |
| Group 1 Name | 12 | NA | Condition 1 | NA |
| Group 1 Name | 13 | NA | Condition 1 | NA |
| Group 1 Name | 14 | NA | Condition 1 | NA |
| Group 1 Name | 15 | NA | Condition 1 | NA |
| Group 2 Name | 16 | -2 | Condition 1 | 20 |
| Group 2 Name | 17 | -4 | Condition 1 | 13 |
| Group 2 Name | 18 | 1 | Condition 1 | 15 |
| Group 2 Name | 19 | 5 | Condition 1 | 8 |
| Group 2 Name | 20 | 2 | Condition 1 | 3 |
| Group 2 Name | 21 | 1 | Condition 1 | 7 |
| Group 2 Name | 22 | -7 | Condition 1 | 14 |
| Group 2 Name | 23 | 0 | Condition 1 | 12 |
| Group 2 Name | 24 | 3 | Condition 1 | 11 |
| Group 2 Name | 25 | 1 | Condition 1 | 9 |
| Group 2 Name | 26 | NA | Condition 1 | NA |
| Group 2 Name | 27 | NA | Condition 1 | NA |
| Group 2 Name | 28 | NA | Condition 1 | NA |
| Group 2 Name | 29 | NA | Condition 1 | NA |
| Group 2 Name | 30 | NA | Condition 1 | NA |
| Group 1 Name | 1 | 8 | Condition 2 | 13 |
| Group 1 Name | 2 | 4 | Condition 2 | 5 |
| Group 1 Name | 3 | 5 | Condition 2 | 12 |
| Group 1 Name | 4 | 2 | Condition 2 | 11 |
| Group 1 Name | 5 | 7 | Condition 2 | 9 |
| Group 1 Name | 6 | -1 | Condition 2 | 5 |
| Group 1 Name | 7 | 1 | Condition 2 | 5 |
| Group 1 Name | 8 | 3 | Condition 2 | 14 |
| Group 1 Name | 9 | -2 | Condition 2 | 12 |
| Group 1 Name | 10 | 6 | Condition 2 | 19 |
| Group 1 Name | 11 | NA | Condition 2 | NA |
| Group 1 Name | 12 | NA | Condition 2 | NA |
| Group 1 Name | 13 | NA | Condition 2 | NA |
| Group 1 Name | 14 | NA | Condition 2 | NA |
| Group 1 Name | 15 | NA | Condition 2 | NA |
| Group 2 Name | 16 | -2 | Condition 2 | 18 |
| Group 2 Name | 17 | -4 | Condition 2 | 9 |
| Group 2 Name | 18 | 1 | Condition 2 | 16 |
| Group 2 Name | 19 | 5 | Condition 2 | 13 |
| Group 2 Name | 20 | 2 | Condition 2 | 5 |
| Group 2 Name | 21 | 1 | Condition 2 | 8 |
| Group 2 Name | 22 | -7 | Condition 2 | 7 |
| Group 2 Name | 23 | 0 | Condition 2 | 12 |
| Group 2 Name | 24 | 3 | Condition 2 | 14 |
| Group 2 Name | 25 | 1 | Condition 2 | 10 |
| Group 2 Name | 26 | NA | Condition 2 | NA |
| Group 2 Name | 27 | NA | Condition 2 | NA |
| Group 2 Name | 28 | NA | Condition 2 | NA |
| Group 2 Name | 29 | NA | Condition 2 | NA |
| Group 2 Name | 30 | NA | Condition 2 | NA |
Now we can plot:
# plot
library(ggplot2)
g1 <- ggplot(Two_groups_two_conditions_long, aes(condition, as.numeric(value), group = `Subject ID`)) +
geom_point(size = 4, shape = 1) +
geom_line() +
xlab("") +
ylab("Measurement (units)") +
theme_minimal(base_size = 16) +
facet_grid(~Group) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
# difference
g2 <- ggplot(Two_groups_two_conditions_long, aes(Group, difference)) +
geom_point(size = 4, shape = 1) +
stat_summary(fun.y = median, fun.ymin = median, fun.ymax = median, geom = "crossbar", width = 0.3) +
xlab("") +
ylab("Difference in Measurement (units)") +
theme_minimal(base_size = 16) +
coord_fixed(ratio = 0.15)
# combine the two plots
grid.arrange(g1, g2, ncol = 2)The dataset consists of a sample of 14 primary school children between 8 and 12 years old. The children were asked to respond on 8 emotions and coping strategies scales for each of 6 situations: fail to fulfill assingments in class, not allowed to play with other children, forbidden to do something by the teacher, victim of bullying, too much school work, forbidden to do something by the mother. Plot the data in some meaningful ways. You may have to manipulate data into a different format first.
#讀資料
dta<- read.table("D:\\coping.txt", header = T)
str(dta)'data.frame': 84 obs. of 10 variables:
$ annoy : int 4 4 2 4 4 4 3 3 3 4 ...
$ sad : int 2 4 2 3 2 3 2 1 1 4 ...
$ afraid : int 2 4 2 4 1 1 2 1 1 2 ...
$ angry : int 2 2 2 4 1 4 2 2 2 1 ...
$ approach : num 1 4 2.67 4 1 2.33 2 1.33 1 1.67 ...
$ avoid : num 2 3 3 1.5 2.75 2.5 1 4 1 4 ...
$ support : num 1 1.25 1 3.25 1.25 1 1.5 2.75 1.33 3.5 ...
$ agressive: num 2.5 1.5 2.33 1 1.5 3.67 1 2 1.67 2.5 ...
$ situation: chr "Fail" "NoPart" "TeacNo" "Bully" ...
$ sbj : chr "S2" "S2" "S2" "S2" ...
head(dta) annoy sad afraid angry approach avoid support agressive situation sbj
1 4 2 2 2 1.00 2.00 1.00 2.50 Fail S2
2 4 4 4 2 4.00 3.00 1.25 1.50 NoPart S2
3 2 2 2 2 2.67 3.00 1.00 2.33 TeacNo S2
4 4 3 4 4 4.00 1.50 3.25 1.00 Bully S2
5 4 2 1 1 1.00 2.75 1.25 1.50 Work S2
6 4 3 1 4 2.33 2.50 1.00 3.67 MomNo S2
names(dta)<-c("Annoy", "Sad", "Afraid", "Angry",
"Approach", "Avoidance", "Socialsupport", "Aggression",
"Situation", "ChildrenID")# wide to long
library(dbplyr)
dta1<-dta%>%reshape::melt(id = c("ChildrenID", "Situation"),
variable_name = "Emotion")
head(dta1) ChildrenID Situation Emotion value
1 S2 Fail Annoy 4
2 S2 NoPart Annoy 4
3 S2 TeacNo Annoy 2
4 S2 Bully Annoy 4
5 S2 Work Annoy 4
6 S2 MomNo Annoy 4
str(dta1)'data.frame': 672 obs. of 4 variables:
$ ChildrenID: chr "S2" "S2" "S2" "S2" ...
$ Situation : chr "Fail" "NoPart" "TeacNo" "Bully" ...
$ Emotion : Factor w/ 8 levels "Annoy","Sad",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 4 4 2 4 4 4 3 3 3 4 ...
Use the USPersonalExpenditure{datasets} for this problem. This data set consists of United States personal expenditures (in billions of dollars) in the categories; food and tobacco, household operation, medical and health, personal care, and private education for the years 1940, 1945, 1950, 1955 and 1960. Plot the US personal expenditure data in the style of the third plot on the “Time Use” case study in the course web page. You might want to transform the dollar amounts to log base 10 unit first.
dta<-USPersonalExpenditure
head(dta) 1940 1945 1950 1955 1960
Food and Tobacco 22.200 44.500 59.60 73.2 86.80
Household Operation 10.500 15.500 29.00 36.5 46.20
Medical and Health 3.530 5.760 9.71 14.0 21.10
Personal Care 1.040 1.980 2.45 3.4 5.40
Private Education 0.341 0.974 1.80 2.6 3.64
str(dta) num [1:5, 1:5] 22.2 10.5 3.53 1.04 0.341 44.5 15.5 5.76 1.98 0.974 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:5] "Food and Tobacco" "Household Operation" "Medical and Health" "Personal Care" ...
..$ : chr [1:5] "1940" "1945" "1950" "1955" ...
Use the Cushings{MASS} data set to generate a plot similar to the following one
#讀資料
dta<-MASS::Cushings
str(dta)'data.frame': 27 obs. of 3 variables:
$ Tetrahydrocortisone: num 3.1 3 1.9 3.8 4.1 1.9 8.3 3.8 3.9 7.8 ...
$ Pregnanetriol : num 11.7 1.3 0.1 0.04 1.1 0.4 1 0.2 0.6 1.2 ...
$ Type : Factor w/ 4 levels "a","b","c","u": 1 1 1 1 1 1 2 2 2 2 ...
head(dta) Tetrahydrocortisone Pregnanetriol Type
a1 3.1 11.70 a
a2 3.0 1.30 a
a3 1.9 0.10 a
a4 3.8 0.04 a
a5 4.1 1.10 a
a6 1.9 0.40 a
ggplot(dta, aes(Salary, SAT, color=Region)) + geom_point(pch=16, color=“peru”) + geom_label_repel(aes(label=State)) + labs(x=“Mean Teacher Salary (x1000)”, y=“Mean SAT score”) + scale_color_manual(values=c(‘dodgerblue’, ‘indianred’, ‘forestgreen’, ‘goldenrod’))+ theme_economist()+ theme(legend.position=‘top’) ```