0413 HW exercise 1, 2
## Warning: package 'rmdformats' was built under R version 3.6.3
HW 1
This R script illustrates how to split the plot region to include histograms on the margins of a scatter diagram using the Galton{HistData} data set. Compile it as a html document with comments on each code chunk.
# Galton's data on the heights of parents and their children
install.packages("HistData")install HistData package to access the Galton database
dta <- HistData::Galton
zones <- matrix(c(2, 0, 1, 3), ncol=2, byrow=TRUE)- save database
Galtonas dta
- create a matirx names it as zones with two rows and 2 columns and read the vector into matrix by row:
2 0
1 3
layout(zones, widths=c(4/5, 1/5), heights = c(1/5, 4/5))using layout devide the space as rows and columns, and widthes and heights for the each plots.
plot (2) will display in [1, 1] with 4/5 inches wideth and 1/5 inches height.
plot (1) and (3) will display in [2, 1] and [2, 2] respectivly with 4/5 inches width, 4/5 inches height and 1/5 inches width, 4/5 inches height.
xh <- with(dta, hist(parent, plot=FALSE))
yh <- with(dta, hist(child, plot=FALSE))
ub <- max(c(xh$counts, yh$counts))data manipulation
1. Using with function to get information (such as counts and density) on histogram of varaibles paretns and child in dta.
2. Then save information of histrogram as xh and xy.
3. Found the maximun value of counts in xh and xy then save as ub.
plot(1)
par(mar=c(3, 3, 1, 1))
with(dta, sunflowerplot(parent, child))sunflower plot
1. Adjusting plotting parameters: sets the margin sizes in the following order: bottom=3, left=3, top=1, and right=1.
2. sunflower plot: Each petal in a sunflower plot represents an observation.
It shows that the parents’ height and children’s height are centered around 68.
plot(2)
par(mar=c(0, 3, 1, 1))
barplot(xh$counts, axes=FALSE, ylim=c(0, ub), space=0)- Adjusting plotting parameters: sets the margin sizes in the following order: bottom=3, left=3, top=1, and right=1.
- bar plot: create a bar plot with variable
count, don’t show the axes, the limit of y-axis is from 0 to the maximun value of counts in parents. No space between bars.
plot(3)
par(mar=c(3, 0, 1, 1))
barplot(yh$counts, axes=FALSE, xlim=c(0, ub), space=0, horiz=TRUE)- Adjusting plotting parameters: sets the margin sizes in the following order: bottom=3, left=0, top=1, and right=1.
- horizontal bar plot: create a bar plot with variable
count, don’t show the axes, the limit of x-axis is from 0 to the maximun value of counts in children. No space between bars.
par(oma=c(3, 3, 0, 0))
mtext("Average height of parents (in inch)", side=1, line=2,
outer=TRUE, adj=0,
at=.4 * (mean(dta$parent) -
min(dta$parent))/(diff(range(dta$parent))))
mtext("Height of child (in inch)", side=2, line=2,
outer=TRUE, adj=0,
at=.4 * (mean(dta$child) - min(dta$child))/(diff(range(dta$child))))
dta
par(mar=c(3, 3, 1, 1))- Adjusting plotting parameters: sets the outer margin sizes in the following order: bottom=3, left=3, top=0, and right=0.
- to add text in outer marigins “Average height of parents (in inch)” in the bottom and distance 2 inches and using the (mean-min)/(max-min)*0.4 to locate the text.
- Same process in y-axis (side=2, left).
HW 2
Deaths per 100,000 from male suicides for 5 age groups and 15 countries are given in the table below. The data set is available as suicides2{HSAUR3}. Construct side-by-side box plots for the data from different age groups and comment briefly.
Loading data, check data structure, rename column and create new variable as rowname
library(HSAUR3)
dta<-HSAUR3::suicides2
str(dta)'data.frame': 15 obs. of 5 variables:
$ A25.34: num 22 9 22 29 16 28 48 7 8 26 ...
$ A35.44: num 27 19 19 40 25 35 65 8 11 29 ...
$ A45.54: num 31 10 21 52 36 41 84 11 18 36 ...
$ A55.64: num 34 14 31 53 47 49 81 18 20 32 ...
$ A65.74: num 24 27 49 69 56 52 107 27 28 28 ...
head(dta) A25.34 A35.44 A45.54 A55.64 A65.74
Canada 22 27 31 34 24
Israel 9 19 10 14 27
Japan 22 19 21 31 49
Austria 29 40 52 53 69
France 16 25 36 47 56
Germany 28 35 41 49 52
colnames(dta)<-c(paste0(c("25-34", "35-44", "45-54", "55-64", "65-74"), rep("y", 5)))
dta$country<-rownames(dta)boxplot
dta1<-reshape::melt(dta, Id="ountry")
boxplot( value ~ variable ,
data=dta1,
varwidth=T,
cex.axis=.6,
xlab="Age group",
ylab="Number of suicides per (per 100000 males)")