1 Inclass 1

Task: Render the Ch04.R script for replicating figures in Chapter 4 of Lattice: Multivariate Data Visualization with R (Sarkar, D. 2008) to html document with comments at each code chunk indicated by ‘##’.

1.1 VADeaths

## load package
pacman::p_load(lattice, dplyr, magrittr)

## view data
VADeaths

##       Rural Male Rural Female Urban Male Urban Female
## 50-54       11.7          8.7       15.4          8.4
## 55-59       18.1         11.7       24.3         13.6
## 60-64       26.9         20.3       37.0         19.3
## 65-69       41.0         30.9       54.6         35.1
## 70-74       66.0         54.3       71.1         50.0

## data Classes
class(VADeaths)

## [1] "matrix" "array"

## List all available methods for a S3 and S4 generic function (dotplot), or all methods for an S3 or S4 class.
methods("dotplot")

## [1] dotplot.array*   dotplot.default* dotplot.formula* dotplot.matrix* 
## [5] dotplot.numeric* dotplot.table*  
## see '?methods' for accessing help and source code

1.1.1 dotplot

## dotplot of VADeaths (groups=TRUE shows without grid)
dotplot(VADeaths, groups=FALSE)

## dotplot with 1 by 4 grid
dotplot(VADeaths, groups=FALSE, 
        layout = c(1, 4), # 1column * 4row 
        aspect = "fill", # aspect controls the physical aspect ratio of the panels
        #origin = 0, ##???
        type = c("p", "h"), # "p" denotes a point, "h" for histogram-like vertical lines
        main = "Death Rates in Virginia - 1940", # plot title
        xlab = "Rate (per 1000)") # x axis title

## dotplot with 1 pannel without grid and legend
dotplot(VADeaths, type="o", # "o" for overplotted points and lines
        auto.key=list(lines=TRUE, space="right"), # add legend,
        main="Death Rates in Virginia - 1940", 
        xlab="Rate (per 1000)")

1.1.2 barchart

## barchart with 1 by 4 grid
barchart(VADeaths, groups=FALSE,
         layout = c(1, 4), 
         aspect = "fill", 
         reference = FALSE, ##?
         main = "Death Rates in Virginia - 1940",
         xlab = "Rate (per 100)")

1.2 postdoc

## input postdoc data from latticeExtra package
data(postdoc, package="latticeExtra")

## table class
class(postdoc)

## [1] "table"

knitr::kable(postdoc)

	Expected or Additional Training	Work with Specific Person	Training Outside PhD Field	Other Employment Not Available	Other
Biological Sciences	6404	2427	1950	1779	602
Chemistry	865	308	292	551	168
Earth, Atmospheric, and Ocean Sciences	343	75	75	238	80
Engineering	586	464	288	517	401
Medical Sciences	205	137	82	68	74
Physics and Astronomy	1010	347	175	399	162
Social and Behavioral Sciences	1368	564	412	514	305
All Postdoctorates	11197	4687	3403	4406	1914

1.2.1 barchart

## proportion barchart
barchart(prop.table(postdoc, margin=1), 
         xlab = "Proportion",
         auto.key = list(adj=1)) # legend word position

1.2.2 dotplot

## dotplot grid by "Reasons for Taking First Postdoctoral Appointment"
dotplot(prop.table(postdoc, margin = 1), 
        groups = FALSE, 
        xlab = "Proportion",
        par.strip.text = list(abbreviate=TRUE, minlength=10)) # panel title with abbreviate

## 調整不好
dotplot(prop.table(postdoc, margin=1), 
        groups=FALSE, 
        index.cond=function(x, y) median(x),
        xlab="Proportion", 
        layout=c(1, 5), 
        aspect=0.6,
        scales=list(y=list(relation="free", rot=0)), 
        # relation: A character string that determines how axis limits are calculated for each panel. Possible values are "same" (default), "free" and "sliced". 
        
        ## For relation="same", the same limits, usually large enough to encompass all the data, are used for all the panels. 
        ## For relation="free", limits for each panel is determined by just the points in that panel.
        ## Behavior for relation="sliced" is similar, except that the length (max - min) of the scales are constrained to remain the same across panels.
        
        # rot: Angle (in degrees) by which the axis labels are to be rotated. Can be a vector of length 2, to control left/bottom and right/top axes separately.
        prepanel=function(x, y) {
          list(ylim=levels(reorder(y, x)))
          },
        panel=function(x, y, ...) {
            panel.dotplot(x, reorder(y, x), ...)
          }
        )

1.3 Chem97

## input Chem97 data from mlmRev package
data(Chem97, package="mlmRev")

## Cross Tabulation
## count the number of gcsescore (Average GCSE score of individual) by gender
gcsescore.tab <- xtabs(~ gcsescore + gender, Chem97)

## trans to data frame
gcsescore.df <- as.data.frame(gcsescore.tab)

## set the gcsescore as numeric variable
## 因為原本是factor，需要先轉成character
gcsescore.df$gcsescore <- as.numeric(as.character(gcsescore.df$gcsescore))

1.3.1 xyplot

## Freq by gcsescore grid by gender
xyplot(Freq ~ gcsescore | gender, 
       data = gcsescore.df, 
       type="h", 
       layout=c(1, 2), 
       xlab="Average GCSE Score")

## count the number of score (Point score on A-level Chemistry in 1997) by gender
score.tab <- xtabs(~score + gender, Chem97)

## trans to data frame
score.df <- as.data.frame(score.tab)

1.3.2 barchart

## ## Freq by score grid by gender
barchart(Freq ~ score | gender, score.df, origin = 0)

## The End

2 Inclass 2

The lattice function, xyplot, is used to construct a variety of scatter diagrams presented in the Western Collaborative Group Study web page.

Task: Replicate similar scatter diagrams by replacing the grouping variable with the conditioning variable, and vice versa, in the example.

Blood pressure, behavioral patterns, and coronary heart disease

This prospective cohort study was conducted by collaborators from UCSF and the SPH at Berkeley. The participants, all men, were recruited between June 1960 and December 1961. The main disease outcome of interest is coronary heart disease (CHD).
None of the men had CHD at the start of the study. They were followed annually for 8 or 9 years.

id Subject ID: age Age: age in years height Height: height in inches weight Weight: weight in pounds sbp Systolic blood pressure: mm Hg dbp Diastolic blood pressure: mm Hg chol Cholesterol: mg/100 ml behpat Behavior pattern: ncigs Smoking: Cigarettes/day dibpat Dichotomous behavior pattern: 0 = Type B; 1 = Type A chd69 Coronary heart disease event: 0 = none; 1 = yes typechd69 to be done time169 Observation (follow up) time: Days arcus Corneal arcus: 0 = none; 1 = yes

2.1 Data input

fL <- "http://courses.washington.edu/b517/Datasets/wcgs.csv"
dta <- read.csv(fL)
dplyr::glimpse(dta)

## Rows: 3,154
## Columns: 14
## $ id       <int> 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2~
## $ age      <int> 49, 42, 42, 41, 59, 44, 44, 40, 43, 42, 53, 41, 50, 43, 44, 5~
## $ height   <int> 73, 70, 69, 68, 70, 72, 72, 71, 72, 70, 69, 67, 72, 72, 71, 7~
## $ weight   <int> 150, 160, 160, 152, 150, 204, 164, 150, 190, 175, 167, 156, 1~
## $ sbp      <int> 110, 154, 110, 124, 144, 150, 130, 138, 146, 132, 146, 138, 1~
## $ dbp      <int> 76, 84, 78, 78, 86, 90, 84, 60, 76, 90, 94, 96, 90, 80, 80, 8~
## $ chol     <int> 225, 177, 181, 132, 255, 182, 155, 140, 149, 325, 223, 271, 2~
## $ behpat   <chr> "A2", "A2", "B3", "B4", "B3", "B4", "B4", "A2", "B3", "A2", "~
## $ ncigs    <int> 25, 20, 0, 20, 20, 0, 0, 0, 25, 0, 25, 20, 50, 30, 0, 3, 9, 0~
## $ dibpat   <chr> "A1,A2", "A1,A2", "B3,B4", "B3,B4", "B3,B4", "B3,B4", "B3,B4"~
## $ chd69    <chr> "no", "no", "no", "no", "yes", "no", "no", "no", "no", "no", ~
## $ typchd69 <chr> "no CHD", "no CHD", "no CHD", "no CHD", "MI or SD", "no CHD",~
## $ time169  <int> 1664, 3071, 3071, 3064, 1885, 3102, 3074, 3071, 3064, 1032, 3~
## $ arcus    <chr> "absent", "present", "absent", "absent", "present", "absent",~

2.2 Many faces of scatter plot

2.2.1 simple

xyplot(sbp ~ dbp, 
       data=dta,
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)")

2.2.2 add grid and regression line

xyplot(sbp ~ dbp, 
       data=dta,
       cex=.5,
       type=c("p", "g", "r"), 
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)")

2.2.3 spline smooth regression line

xyplot(sbp ~ dbp, 
       data=dta,
       type=c("p", "g", "spline"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)")

2.2.4 Title, subtitle, color, alpha

xyplot(sbp ~ dbp, 
       data=dta, 
       type = c("p", "g", "r"), 
       pch=16, 
       cex=.5,
       alpha=.7,
       col="indianred", 
       main="Linear association between blood pressure measures", 
       sub="n=3154", 
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       par.settings=list(par.main.text=list(col="steelblue"), 
                         par.sub.text=list(cex=0.8, 
                                           col="steelblue"), 
                         par.xlab.text=list(cex=0.8, 
                                            col="black"), 
                         par.ylab.text=list(cex=0.8, 
                                            col="black")))

2.3 Grouping

2.3.1 CHD event

group by CHD event

xyplot(sbp ~ dbp,
       groups=chd69,
       data=dta,
       type=c("p", "g", "r"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       auto.key=list(space="right"))

group by CHD event

2.3.2 Corneal arcus

missing value in Corneal arcus

xyplot(sbp ~ dbp,
       groups=arcus,
       data=dta,
       type=c("p", "g", "r"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       auto.key=list(space="right"))

dta$chdclr <- ifelse(dta$chd69=="no", "lightgreen", "plum")

xyplot(sbp ~ dbp, 
       groups=chd69,
       K=dta$chd69, #K?
       data=dta, 
       type="p",
       scales=list(x=list(lim=c(55, 155)), 
                   y=list(lim=c(90, 240))), 
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       panel=function(x, y, K, groups, ..., subscripts) {
          panel.xyplot(x, y, 
                       col=dta$chdclr, 
                       pch=ifelse(K[subscripts]=="no", 1, 16), 
                       alpha=0.6)
          panel.abline(v=80, 
                       lty=3, 
                       col="red")
          panel.abline(h=120, 
                       lty=3, 
                       col="red")})

2.4 Conditioning

2.4.1 Behavior pattern

grid by Behavior pattern

xyplot(sbp ~ dbp | behpat, 
       data=dta, 
       cex=.7,
       alpha=.4,
       type=c("p", "g", "r"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)")

2.4.2 Dichotomous behavior

xyplot(sbp ~ dbp | dibpat, 
       data=dta, 
       cex=.7,
       alpha=.4,
       type=c("p", "g", "r"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)")

2.5 Conditoning and grouping

2.5.1 condition by behpat, group by chd69

xyplot(sbp ~ dbp | behpat, groups=chd69,
       data=dta, 
       cex=.7,
       alpha=.4,
       layout=c(4, 1),
       type=c("p", "g", "r"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       auto.key=list(space="top", 
                     text=c("no", "yes"), 
                     cex=0.7, 
                     column=2,
                     lines=TRUE, 
                     points=TRUE))

2.5.2 condition by dibpat, group by arcus

xyplot(sbp ~ dbp | dibpat, groups=arcus,
       data=dta, 
       cex=.7,
       alpha=.4,
       layout=c(2, 1),
       type=c("p", "g", "r"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       auto.key=list(space="top", 
                     text=c("NA", "no", "yes"), 
                     cex=0.7, 
                     column=2,
                     lines=TRUE, 
                     points=TRUE))

xyplot(sbp ~ dbp | behpat, 
       data=dta, 
       col=dta$chdclr,
       type=c("p", "g"), 
       pch=19, 
       alpha=0.5, 
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       par.settings=list(layout.heights=list(strip=0.9)),
                         par.strip.text=list(cex=0.7),
                         key=list(space="top",
                                  title="CHD status",
                                  cex.title=0.6,
                                  column=2,
                                  points=list(col=c("lightgreen", "plum"),
                                              cex=0.5, 
                                              pch=19),   
                                  text=list(lab=c("No", "Yes"),                                                          cex=0.7)))

2.6 References

Rosenman, RH et al. (1975). Coronary Heart Disease in the Western Collaborative Group Study. Journal of American Medical Association, 233(8), 872-877.

Vittinghoff, D.V. et al (2012). Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. 2nd Ed. Springer-Verlag, Inc.

Data source

3 Inclass 3

The data set concerns student evaluation of instructor’s beauty and teaching quality for several courses at the University of Texas. The teaching evaluations were done at the end of the semester, and the beauty judgments were made later, by six students who had not attended the classes and were not aware of the course evaluations.

Source: Hamermesh, D.S., & Parker, A.M. (2005). Beauty in the classroom: instructor’s pulchritude and putative pedagogical productivity.a Economics and Education Review, 24, 369-376. Reported in Gelman, A., & Hill, J. (2006). Data analysis using regression and hierarchical/multilevel models. p. 51.

Data

Column 1: Course evaluation score
Column 2: Beauty score
Column 3: Gender of professor, 1 = Female, 0 = Male
Column 4: Pofessor age in years
Column 5: Minority status of professor, 1 = Minority, 0 = Others
Column 6: Tenure status of professor, 1 = Tenured, 0 = No
Column 7: Course ID

Task: Use the lattice package to produce the plot above. Hint: Use ‘reorder’ after obtaining regression coefficients to rearrange conditioning panels.

3.1 Data input

dat <- read.table(file="beautyCourseEval.txt", h=T, sep="")
head(dat)

##   eval     beauty sex age minority tenure courseID
## 1  4.3  0.2015666   1  36        1      0        3
## 2  4.5 -0.8260813   0  59        0      1        0
## 3  3.7 -0.6603327   0  51        0      1        4
## 4  4.3 -0.7663125   1  40        0      1        2
## 5  4.4  1.4214450   1  31        0      0        0
## 6  4.2  0.5002196   0  62        0      1        0

3.2 Plot

xyplot(eval ~ beauty | as.factor(courseID), 
       data = subset(dat, courseID!=0),
       xlab = "Beauty judgement score", 
       ylab = "Average course evaluation score",
       layout = c(6,5),
       type=c("p", "g", "r"),
       par.settings = simpleTheme(col = "steelblue", col.line = "steelblue"))

xyplot(eval ~ beauty | as.factor(courseID), 
       data = subset(dat, courseID==0),
       xlab = "Beauty judgement score", 
       ylab = "Average course evaluation score",
       type=c("p", "g", "r"),
       par.settings = simpleTheme(col = "steelblue", col.line = "steelblue"))

4 Exercises 1

Eight different physical measurements of 30 French girls were recorded from 4 to 15 years old.

Task: Explore various ways to display the data using trellis graphics.

Source: Sempe, M., et al. (1987). Multivariate and longitudinal data on growing children: Presentation of the French auxiological survey. In J. Janssen, et al. (1987). Data analysis. The Ins and Outs of solving real problems (pp. 3-6). New York: Plenum Press.

Column 1: Weight in grams
Column 2: Height in mms
Column 3: Head to butt length in mms
Column 4: Head circumference in mms
Column 5: Chest circumference in mms
Column 6: Arm length in mms
Column 7: Calf length in mms
Column 8: Pelvis circumference in mms
Column 9: Age in years
Column 10: Girl ID

Load in the data set and check the data structure

dat <- read.table("girlsGrowth.txt", header = TRUE)
head(dat)

##     Wt   Ht  Hb  Hc  Cc Arm Calf Pelvis age id
## 1 1456 1025 602 486 520 157  205    170   4 S1
## 2 1426  998 572 501 520 150  215    169   4 S2
## 3 1335  961 560 494 495 145  214    158   4 S3
## 4 1607 1006 595 497 560 178  218    172   4 S4
## 5 1684 1012 584 490 553 165  220    158   4 S5
## 6 1374 1012 580 492 525 158  202    167   4 S6

summary(dat)

##        Wt             Ht             Hb              Hc              Cc       
##  Min.   :1214   Min.   : 932   Min.   :546.0   Min.   :465.0   Min.   :476.0  
##  1st Qu.:1981   1st Qu.:1145   1st Qu.:631.0   1st Qu.:508.0   1st Qu.:575.0  
##  Median :2778   Median :1312   Median :694.5   Median :522.0   Median :630.0  
##  Mean   :3041   Mean   :1318   Mean   :707.2   Mean   :522.8   Mean   :656.4  
##  3rd Qu.:4025   3rd Qu.:1504   3rd Qu.:786.0   3rd Qu.:537.0   3rd Qu.:740.0  
##  Max.   :6285   Max.   :1682   Max.   :905.0   Max.   :585.0   Max.   :935.0  
##       Arm             Calf           Pelvis           age       
##  Min.   :144.0   Min.   :198.0   Min.   :150.0   Min.   : 4.00  
##  1st Qu.:170.0   1st Qu.:234.8   1st Qu.:190.0   1st Qu.: 6.75  
##  Median :186.5   Median :266.0   Median :209.0   Median : 9.50  
##  Mean   :192.8   Mean   :271.5   Mean   :214.8   Mean   : 9.50  
##  3rd Qu.:216.0   3rd Qu.:309.2   3rd Qu.:241.0   3rd Qu.:12.25  
##  Max.   :285.0   Max.   :380.0   Max.   :305.0   Max.   :15.00  
##       id           
##  Length:360        
##  Class :character  
##  Mode  :character  
##                    
##                    
##

length(unique(dat$age))

## [1] 12

# Present relationship between weight and height for each year and generally
xyplot(Wt ~ Ht, groups = factor(age), data = dat,
       auto.key = list(column=4), pch=16,
       type = c('p', 'r', 'g'), xlab= 'Height', ylab = 'Weight')

# Correlation
with(dat, cor(Wt, Ht))

## [1] 0.9491352

split(dat, dat$age) %>% 
  sapply(., function(df) {with(df, cor(Wt, Ht))})

##         4         5         6         7         8         9        10        11 
## 0.6589876 0.7131878 0.7563261 0.7550721 0.7381013 0.6127379 0.6706406 0.6738371 
##        12        13        14        15 
## 0.7103265 0.6313972 0.4659650 0.3489307

# Intecept and slope in linear model (weight ~ height)
split(dat, dat$age) %>%
  sapply(., function(df) {coefficients(lm(Wt ~ Ht, data = df))[1]})

##  4.(Intercept)  5.(Intercept)  6.(Intercept)  7.(Intercept)  8.(Intercept) 
##      -1492.131      -2071.391      -2397.710      -2525.049      -3362.959 
##  9.(Intercept) 10.(Intercept) 11.(Intercept) 12.(Intercept) 13.(Intercept) 
##      -3230.680      -4717.608      -4661.448      -5253.717      -6183.765 
## 14.(Intercept) 15.(Intercept) 
##      -4622.857      -1026.453

split(dat, dat$age) %>%
  sapply(., function(df) {coefficients(lm(Wt ~ Ht, data = df))[2]})

##     4.Ht     5.Ht     6.Ht     7.Ht     8.Ht     9.Ht    10.Ht    11.Ht 
## 3.019906 3.558213 3.827905 3.936904 4.638774 4.579293 5.745730 5.712182 
##    12.Ht    13.Ht    14.Ht    15.Ht 
## 6.131886 6.841161 5.939274 3.772245

Generally, height highly positively correlated with weight. But when grouping with age, the correaltion are not so high.
In the linear models of weight ~ height, all slopes are positive. This indicates that from age 4-15, height keep positively correlatied with weight in girls.
While the intercept of model with age 15 is quite different from others, indicating that girls get heavier in age 15 than other age.

# Present the relationship between weight (Wt) and other measurements for each age
# Standardize each measurement

dat_std <- dat[,1:9] %>% 
  mutate_all(function(x) {as.numeric(robustHD::standardize(x))})

dat_std$age <- dat$age
dat_std$id <- dat$id
summary(dat_std)

##        Wt                Ht                 Hb                Hc          
##  Min.   :-1.4813   Min.   :-1.89954   Min.   :-1.7435   Min.   :-2.78039  
##  1st Qu.:-0.8593   1st Qu.:-0.85086   1st Qu.:-0.8241   1st Qu.:-0.71183  
##  Median :-0.2137   Median :-0.03113   Median :-0.1372   Median :-0.03835  
##  Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.00000  
##  3rd Qu.: 0.7977   3rd Qu.: 0.91785   3rd Qu.: 0.8526   3rd Qu.: 0.68324  
##  Max.   : 2.6301   Max.   : 1.79298   Max.   : 2.1399   Max.   : 2.99233  
##        Cc               Arm               Calf             Pelvis       
##  Min.   :-1.7838   Min.   :-1.6862   Min.   :-1.7252   Min.   :-1.9485  
##  1st Qu.:-0.8049   1st Qu.:-0.7872   1st Qu.:-0.8628   1st Qu.:-0.7464  
##  Median :-0.2610   Median :-0.2167   Median :-0.1295   Median :-0.1755  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.8267   3rd Qu.: 0.8034   3rd Qu.: 0.8855   3rd Qu.: 0.7862  
##  Max.   : 2.7548   Max.   : 3.1892   Max.   : 2.5457   Max.   : 2.7095  
##       age             id           
##  Min.   : 4.00   Length:360        
##  1st Qu.: 6.75   Class :character  
##  Median : 9.50   Mode  :character  
##  Mean   : 9.50                     
##  3rd Qu.:12.25                     
##  Max.   :15.00

#Turn data set into long form
dat_Wt <- dat_std |> select(-Wt, -id, -age) |> stack()
dat_Wt$Wt <- rep(dat_std$Wt, ncol(dat_std)-3)
dat_Wt$age <- rep(dat_std$age, ncol(dat_std)-3)
dat_Wt$id <- rep(dat_std$id, ncol(dat_std)-3)
str(dat_Wt)

## 'data.frame':    2520 obs. of  5 variables:
##  $ values: num  -1.44 -1.57 -1.76 -1.54 -1.51 ...
##  $ ind   : Factor w/ 7 levels "Ht","Hb","Hc",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Wt    : num  -1.29 -1.31 -1.38 -1.16 -1.1 ...
##  $ age   : int  4 4 4 4 4 4 4 4 4 4 ...
##  $ id    : chr  "S1" "S2" "S3" "S4" ...

xyplot(Wt ~ values | ind, 
       groups = paste0('Age ', age), 
       data = dat_Wt,
       auto.key = list(column=4), 
       pch=16, cex=.4, type = c('p', 'r', 'g'),
       xlab= 'Standardized measurement', 
       ylab = 'Standardized weight')

The slopes of Ht in predicting Wt are relatively similar in different ages. While slopes of other measurement in predicting Wt differ in different ages. Take head circumference (Hc) as an example, the slope of Hc in predicting Wt get higher when girls get older.

# Try another variables: Present the relationship between one measuremeant and other measurments for each age

# Put codes into a function
xyplots_dat <- function(var_name, var_label) {
  library(lattice)
  
  # Turn data set into long form
  dat_var <- dat_std %>% select(-var_name, -id, -age) %>% stack()
  dat_var$var_name <- rep(dat_std[,var_name], ncol(dat_std)-3)
  dat_var$age <- rep(dat_std$age, ncol(dat_std)-3)
  dat_var$id <- rep(dat_std$id, ncol(dat_std)-3)
  
  # Plot
  xyplot(var_name ~ values | ind, groups = paste0('Age ', age), data = dat_var,
         auto.key = list(column=4), pch=16, cex=.4, type = c('p', 'r', 'g'), 
         xlab = 'Standardized measurement',
         ylab = paste0('Standardized ', var_label))
}

# Head to butt length
xyplots_dat(var_name = 'Hb', var_label = 'length of head to butt')

## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(var_name)` instead of `var_name` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

- In linear models of Hb ~ Ht of each age, both of intecepts and slopes are similar. - In linear models of Hb ~ Hc of each age,slopes are similar while intecepts differ.

# Arm
xyplots_dat(var_name = 'Arm', var_label = 'length of arm')

In linear models of Arm ~ Ht of each age, slope of age 15 is negative, which is quite different from other ages.
In linear models of Hb ~ Hb of each age, slopes of age 6 and 15 are quite smaller than other ages.

5 Exercise 2

Your manager gave you a sales data on several products in a SAS format.

Task: Your task is to summarize and report the data in tables and graphs using the R lattice package.

Re-code the region variable (1 to 4) by “Nothern”, “Southern”, “Eastern” and “Western”;
the district variable (1 - 5) by “North East”, “South East”, “South West”, “North West”, “Central West”;
the quarter variable (1-4) by “1st”, “2nd”, “3rd”, “4th”;
and the month variable (1-12) by “Jan”, “Feb”, etc. Set negative sales values to zero.

pacman::p_load('sas7bdat')
dat <- read.sas7bdat("sales.sas7bdat")
head(dat)

##    product category customer year month quarter market sales expense region
## 1    Shoes    Shoes     Acme 2001     1       1      1   300     240      1
## 2    Boots    Shoes     Acme 2001     1       1      1  2200    1540      1
## 3 Slippers Slippers     Acme 2001     1       1      1   900     540      1
## 4    Shoes    Shoes     Acme 2001     2       1      1   100      80      1
## 5    Boots    Shoes     Acme 2001     2       1      1  1400     980      1
## 6 Slippers Slippers     Acme 2001     2       1      1     0       0      1
##   district return constantv quantity
## 1        1      0         1       30
## 2        1      0         1      275
## 3        1      0         1      180
## 4        1      0         1       10
## 5        1      0         1      175
## 6        1      0         1        0

summary(dat)

##    product            category           customer              year     
##  Length:72          Length:72          Length:72          Min.   :2001  
##  Class :character   Class :character   Class :character   1st Qu.:2001  
##  Mode  :character   Mode  :character   Mode  :character   Median :2002  
##                                                           Mean   :2002  
##                                                           3rd Qu.:2002  
##                                                           Max.   :2002  
##      month          quarter         market          sales          expense    
##  Min.   : 1.00   Min.   :1.00   Min.   :1.000   Min.   :-1400   Min.   :-980  
##  1st Qu.: 3.75   1st Qu.:1.75   1st Qu.:1.000   1st Qu.: 1000   1st Qu.: 660  
##  Median : 6.50   Median :2.50   Median :2.000   Median : 1550   Median :1065  
##  Mean   : 6.50   Mean   :2.50   Mean   :1.667   Mean   : 1686   Mean   :1172  
##  3rd Qu.: 9.25   3rd Qu.:3.25   3rd Qu.:2.000   3rd Qu.: 2525   3rd Qu.:1860  
##  Max.   :12.00   Max.   :4.00   Max.   :2.000   Max.   : 4700   Max.   :2960  
##      region         district       return         constantv    quantity    
##  Min.   :1.000   Min.   :1.0   Min.   :0.0000   Min.   :1   Min.   :  0.0  
##  1st Qu.:1.000   1st Qu.:1.0   1st Qu.:0.0000   1st Qu.:1   1st Qu.:135.2  
##  Median :1.000   Median :1.0   Median :0.0000   Median :1   Median :220.0  
##  Mean   :1.333   Mean   :1.5   Mean   :0.6667   Mean   :1   Mean   :248.9  
##  3rd Qu.:1.000   3rd Qu.:1.0   3rd Qu.:0.0000   3rd Qu.:1   3rd Qu.:287.8  
##  Max.   :4.000   Max.   :5.0   Max.   :5.0000   Max.   :1   Max.   :940.0

# See the distribution of scales for each region.
densityplot(~ quantity | as.character(region), data = dat, layout=c(1, 3))

There is no sale in Eastern region (region=3).

Compare sales of each quarter

bwplot(quantity ~ factor(quarter), data = dat, xlab="Quarter")

Generally, the sale increase as the quarter goes by.

Compare sales of each market

bwplot(quantity ~ factor(market), data = dat, xlab="Market")

There is a larger sale in market 2 than makret 1. And the variation of sales of market 1 is smaller than that of market 2.

Compare sales of each product

bwplot(quantity ~ factor(product), data = dat, xlab="Market")

Slippers have a better sale.

Compare sales of each product in 2 markets

bwplot(quantity ~ factor(product) | factor(market), data = dat, xlab="Market")

The sale of slippers is not that good in market 1.

Compare sales change of months for each product

dotplot(quantity ~ month, groups = product, data = dat, 
        xlab="Month", ylab="Quantity", 
        type=c('p', 'g',"r"), auto.key=list(space="top", columns=3))

Compare sales of each customer in 2 markets

bwplot(quantity ~ factor(customer) | factor(market), data = dat, xlab="Market")

We have only one customer, Acme, in market 1.

Compare distribution of sales of each product in four quaters.

densityplot(~ quantity | factor(quarter), groups = product, data = dat, 
            xlab = "Quarter", auto.key = list(column=3), layout = c(1, 4))

Finding

We should develop markets in Eastern region
Sale of TwoFeet and BigX are lower than Acme. We should put more efforts on attracting these two customers, especially in market 1.
Keep a good relationship with our biggest customer, Acme.
Put more resources on marketing boots and shoes.
Try to develop products for spring to increase sales in month.

5.1 Reference

Source: Gupta, S. K. (2006). Data Management and Reporting Made Easy with SAS Learning Edition 2.0

6 Exercise 3

A sample of 40 psychology students at a large southwestern university took four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. The researchers also used Magnetic Resonance Imaging (MRI) to determine the brain size of the subjects.

Column 1: Subject ID
Column 2: Gender ID
Column 3: Full scale IQ
Column 4: Verbal IQ
Column 5: Performance IQ
Column 6: Body weight in pounds
Column 7: Height in inches
Column 8: Total pixel counts from 18 MRI scans

Task: Use appropriate lattice graphics to answer the following questions.

Are there gender differences in the three IQ scores?
Is the relationship between height and weight gender dependent?
Is the relationship between IQ and brain size (as measured by MRIcount) gender dependent?

6.1 Data input

brain <- read.table("brainsize.txt",h=T)
head(brain)

##   Sbj Gender FSIQ VIQ PIQ Weight Height MRICount
## 1   1 Female  133 132 124    118   64.5   816932
## 2   2   Male  140 150 124     NA   72.5  1001121
## 3   3   Male  139 123 150    143   73.3  1038437
## 4   4   Male  133 129 128    172   68.8   965353
## 5   5 Female  137 132 134    147   65.0   951545
## 6   6 Female   99  90 110    146   69.0   928799

6.2 Q1

Are there gender differences in the three IQ scores?

densityplot(~ FSIQ + VIQ + PIQ, data = brain, groups = Gender, 
            plot.points = FALSE,
            ref = TRUE,auto.key = list(space = "right"),
            layout=c(3,1),
            par.settings = simpleTheme(col.line = c("#E16A86", "#00AD9A")))

6.3 Q2

Is the relationship between height and weight gender dependent?

xyplot(Height ~ Weight, group = Gender, data = brain,
       type = c("g","r","p"), 
       par.settings = simpleTheme(col = c("#E16A86", "#00AD9A"), 
                                  pch = 20, 
                                  col.line = c("#E16A86", "#00AD9A")),
       auto.key = list(column = 2))

6.4 Q3

Is the relationship between IQ and brain size (as measured by MRIcount) gender dependent?

xyplot(MRICount ~ FSIQ+VIQ+PIQ , 
       group = Gender, 
       data = brain,
       layout=c(3,1),
       type = c("g","r","p"), 
       par.settings = simpleTheme(col = c("#E16A86", "#00AD9A"), 
                                  pch = 20, col.line = c("#E16A86", "#00AD9A")),
       auto.key = list(column = 2))

6.5 Reference

Source: Willerman, L., Schultz, R., Rutledge, J.N., & Bigler, E. (1991), In Vivo Brain Size and Intelligence, Intelligence, 15, 223-228.

20211115 exercises

Chang-Chun Chen

2021-11-15