## Warning: package 'rmdformats' was built under R version 3.6.3

HW 1

This R script illustrates how to split the plot region to include histograms on the margins of a scatter diagram using the Galton{HistData} data set. Compile it as a html document with comments on each code chunk.

# Galton's data on the heights of parents and their children
install.packages("HistData")

install HistData package to access the Galton database

dta <- HistData::Galton
zones <- matrix(c(2, 0, 1, 3), ncol=2, byrow=TRUE)
  1. save databaseGalton as dta
  2. create a matirx names it as zones with two rows and 2 columns and read the vector into matrix by row:
    2 0
    1 3
layout(zones, widths=c(4/5, 1/5), heights = c(1/5, 4/5))

using layout devide the space as rows and columns, and widthes and heights for the each plots.
plot (2) will display in [1, 1] with 4/5 inches wideth and 1/5 inches height.
plot (1) and (3) will display in [2, 1] and [2, 2] respectivly with 4/5 inches width, 4/5 inches height and 1/5 inches width, 4/5 inches height.

xh <- with(dta, hist(parent, plot=FALSE))

yh <- with(dta, hist(child, plot=FALSE))

ub <- max(c(xh$counts, yh$counts))

data manipulation
1. Using with function to get information (such as counts and density) on histogram of varaibles paretns and child in dta.
2. Then save information of histrogram as xh and xy.
3. Found the maximun value of counts in xh and xy then save as ub.

plot(1)

par(mar=c(3, 3, 1, 1))

with(dta, sunflowerplot(parent, child))

sunflower plot
1. Adjusting plotting parameters: sets the margin sizes in the following order: bottom=3, left=3, top=1, and right=1.
2. sunflower plot: Each petal in a sunflower plot represents an observation.
It shows that the parents’ height and children’s height are centered around 68.

plot(2)

par(mar=c(0, 3, 1, 1))

barplot(xh$counts, axes=FALSE, ylim=c(0, ub), space=0)
  1. Adjusting plotting parameters: sets the margin sizes in the following order: bottom=3, left=3, top=1, and right=1.
  2. bar plot: create a bar plot with variable count, don’t show the axes, the limit of y-axis is from 0 to the maximun value of counts in parents. No space between bars.

plot(3)

par(mar=c(3, 0, 1, 1))

barplot(yh$counts, axes=FALSE, xlim=c(0, ub), space=0, horiz=TRUE)
  1. Adjusting plotting parameters: sets the margin sizes in the following order: bottom=3, left=0, top=1, and right=1.
  2. horizontal bar plot: create a bar plot with variable count, don’t show the axes, the limit of x-axis is from 0 to the maximun value of counts in children. No space between bars.
par(oma=c(3, 3, 0, 0))
  mtext("Average height of parents (in inch)", side=1, line=2, 
      outer=TRUE, adj=0, 
      at=.4 * (mean(dta$parent) -    
                 min(dta$parent))/(diff(range(dta$parent))))
  
  mtext("Height of child (in inch)", side=2, line=2, 
      outer=TRUE, adj=0,
      at=.4 * (mean(dta$child) - min(dta$child))/(diff(range(dta$child))))
dta

par(mar=c(3, 3, 1, 1))
  1. Adjusting plotting parameters: sets the outer margin sizes in the following order: bottom=3, left=3, top=0, and right=0.
  2. to add text in outer marigins “Average height of parents (in inch)” in the bottom and distance 2 inches and using the (mean-min)/(max-min)*0.4 to locate the text.
  3. Same process in y-axis (side=2, left).

HW 2

Deaths per 100,000 from male suicides for 5 age groups and 15 countries are given in the table below. The data set is available as suicides2{HSAUR3}. Construct side-by-side box plots for the data from different age groups and comment briefly.

Loading data, check data structure, rename column and create new variable as rowname

library(HSAUR3)
dta<-HSAUR3::suicides2
str(dta)
'data.frame':   15 obs. of  5 variables:
 $ A25.34: num  22 9 22 29 16 28 48 7 8 26 ...
 $ A35.44: num  27 19 19 40 25 35 65 8 11 29 ...
 $ A45.54: num  31 10 21 52 36 41 84 11 18 36 ...
 $ A55.64: num  34 14 31 53 47 49 81 18 20 32 ...
 $ A65.74: num  24 27 49 69 56 52 107 27 28 28 ...
head(dta)
        A25.34 A35.44 A45.54 A55.64 A65.74
Canada      22     27     31     34     24
Israel       9     19     10     14     27
Japan       22     19     21     31     49
Austria     29     40     52     53     69
France      16     25     36     47     56
Germany     28     35     41     49     52
colnames(dta)<-c(paste0(c("25-34", "35-44", "45-54", "55-64", "65-74"), rep("y", 5)))

dta$country<-rownames(dta)

boxplot

dta1<-reshape::melt(dta, Id="ountry")

boxplot( value ~ variable , 
         data=dta1, 
         varwidth=T,
         cex.axis=.6,
         xlab="Age group",
         ylab="Number of suicides per (per 100000 males)")