1 Inclass 1

Task: Render the Ch04.R script for replicating figures in Chapter 4 of Lattice: Multivariate Data Visualization with R (Sarkar, D. 2008) to html document with comments at each code chunk indicated by ‘##’.

1.1 VADeaths

## load package
pacman::p_load(lattice, dplyr, magrittr)

## view data
VADeaths
##       Rural Male Rural Female Urban Male Urban Female
## 50-54       11.7          8.7       15.4          8.4
## 55-59       18.1         11.7       24.3         13.6
## 60-64       26.9         20.3       37.0         19.3
## 65-69       41.0         30.9       54.6         35.1
## 70-74       66.0         54.3       71.1         50.0
## data Classes
class(VADeaths)
## [1] "matrix" "array"
## List all available methods for a S3 and S4 generic function (dotplot), or all methods for an S3 or S4 class.
methods("dotplot")
## [1] dotplot.array*   dotplot.default* dotplot.formula* dotplot.matrix* 
## [5] dotplot.numeric* dotplot.table*  
## see '?methods' for accessing help and source code

1.1.1 dotplot

## dotplot of VADeaths (groups=TRUE shows without grid)
dotplot(VADeaths, groups=FALSE)

## dotplot with 1 by 4 grid
dotplot(VADeaths, groups=FALSE, 
        layout = c(1, 4), # 1column * 4row 
        aspect = "fill", # aspect controls the physical aspect ratio of the panels
        #origin = 0, ##???
        type = c("p", "h"), # "p" denotes a point, "h" for histogram-like vertical lines
        main = "Death Rates in Virginia - 1940", # plot title
        xlab = "Rate (per 1000)") # x axis title

## dotplot with 1 pannel without grid and legend
dotplot(VADeaths, type="o", # "o" for overplotted points and lines
        auto.key=list(lines=TRUE, space="right"), # add legend,
        main="Death Rates in Virginia - 1940", 
        xlab="Rate (per 1000)")

1.1.2 barchart

## barchart with 1 by 4 grid
barchart(VADeaths, groups=FALSE,
         layout = c(1, 4), 
         aspect = "fill", 
         reference = FALSE, ##?
         main = "Death Rates in Virginia - 1940",
         xlab = "Rate (per 100)")

1.2 postdoc

## input postdoc data from latticeExtra package
data(postdoc, package="latticeExtra")

## table class
class(postdoc)
## [1] "table"
knitr::kable(postdoc)
Expected or Additional Training Work with Specific Person Training Outside PhD Field Other Employment Not Available Other
Biological Sciences 6404 2427 1950 1779 602
Chemistry 865 308 292 551 168
Earth, Atmospheric, and Ocean Sciences 343 75 75 238 80
Engineering 586 464 288 517 401
Medical Sciences 205 137 82 68 74
Physics and Astronomy 1010 347 175 399 162
Social and Behavioral Sciences 1368 564 412 514 305
All Postdoctorates 11197 4687 3403 4406 1914

1.2.1 barchart

## proportion barchart
barchart(prop.table(postdoc, margin=1), 
         xlab = "Proportion",
         auto.key = list(adj=1)) # legend word position 

1.2.2 dotplot

## dotplot grid by "Reasons for Taking First Postdoctoral Appointment"
dotplot(prop.table(postdoc, margin = 1), 
        groups = FALSE, 
        xlab = "Proportion",
        par.strip.text = list(abbreviate=TRUE, minlength=10)) # panel title with abbreviate

## 調整不好
dotplot(prop.table(postdoc, margin=1), 
        groups=FALSE, 
        index.cond=function(x, y) median(x),
        xlab="Proportion", 
        layout=c(1, 5), 
        aspect=0.6,
        scales=list(y=list(relation="free", rot=0)), 
        # relation: A character string that determines how axis limits are calculated for each panel. Possible values are "same" (default), "free" and "sliced". 
        
        ## For relation="same", the same limits, usually large enough to encompass all the data, are used for all the panels. 
        ## For relation="free", limits for each panel is determined by just the points in that panel.
        ## Behavior for relation="sliced" is similar, except that the length (max - min) of the scales are constrained to remain the same across panels.
        
        # rot: Angle (in degrees) by which the axis labels are to be rotated. Can be a vector of length 2, to control left/bottom and right/top axes separately.
        prepanel=function(x, y) {
          list(ylim=levels(reorder(y, x)))
          },
        panel=function(x, y, ...) {
            panel.dotplot(x, reorder(y, x), ...)
          }
        )

1.3 Chem97

## input Chem97 data from mlmRev package
data(Chem97, package="mlmRev")
## Cross Tabulation
## count the number of gcsescore (Average GCSE score of individual) by gender
gcsescore.tab <- xtabs(~ gcsescore + gender, Chem97)
## trans to data frame
gcsescore.df <- as.data.frame(gcsescore.tab)
## set the gcsescore as numeric variable
## 因為原本是factor,需要先轉成character
gcsescore.df$gcsescore <- as.numeric(as.character(gcsescore.df$gcsescore))

1.3.1 xyplot

## Freq by gcsescore grid by gender
xyplot(Freq ~ gcsescore | gender, 
       data = gcsescore.df, 
       type="h", 
       layout=c(1, 2), 
       xlab="Average GCSE Score")

## count the number of score (Point score on A-level Chemistry in 1997) by gender
score.tab <- xtabs(~score + gender, Chem97)
## trans to data frame
score.df <- as.data.frame(score.tab)

1.3.2 barchart

## ## Freq by score grid by gender
barchart(Freq ~ score | gender, score.df, origin = 0)

## The End

2 Inclass 2

The lattice function, xyplot, is used to construct a variety of scatter diagrams presented in the Western Collaborative Group Study web page.

Task: Replicate similar scatter diagrams by replacing the grouping variable with the conditioning variable, and vice versa, in the example.

Blood pressure, behavioral patterns, and coronary heart disease

This prospective cohort study was conducted by collaborators from UCSF and the SPH at Berkeley. The participants, all men, were recruited between June 1960 and December 1961. The main disease outcome of interest is coronary heart disease (CHD).
None of the men had CHD at the start of the study. They were followed annually for 8 or 9 years.

id Subject ID: age Age: age in years height Height: height in inches weight Weight: weight in pounds sbp Systolic blood pressure: mm Hg dbp Diastolic blood pressure: mm Hg chol Cholesterol: mg/100 ml behpat Behavior pattern: ncigs Smoking: Cigarettes/day dibpat Dichotomous behavior pattern: 0 = Type B; 1 = Type A chd69 Coronary heart disease event: 0 = none; 1 = yes typechd69 to be done time169 Observation (follow up) time: Days arcus Corneal arcus: 0 = none; 1 = yes

2.1 Data input

fL <- "http://courses.washington.edu/b517/Datasets/wcgs.csv"
dta <- read.csv(fL)
dplyr::glimpse(dta)
## Rows: 3,154
## Columns: 14
## $ id       <int> 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2~
## $ age      <int> 49, 42, 42, 41, 59, 44, 44, 40, 43, 42, 53, 41, 50, 43, 44, 5~
## $ height   <int> 73, 70, 69, 68, 70, 72, 72, 71, 72, 70, 69, 67, 72, 72, 71, 7~
## $ weight   <int> 150, 160, 160, 152, 150, 204, 164, 150, 190, 175, 167, 156, 1~
## $ sbp      <int> 110, 154, 110, 124, 144, 150, 130, 138, 146, 132, 146, 138, 1~
## $ dbp      <int> 76, 84, 78, 78, 86, 90, 84, 60, 76, 90, 94, 96, 90, 80, 80, 8~
## $ chol     <int> 225, 177, 181, 132, 255, 182, 155, 140, 149, 325, 223, 271, 2~
## $ behpat   <chr> "A2", "A2", "B3", "B4", "B3", "B4", "B4", "A2", "B3", "A2", "~
## $ ncigs    <int> 25, 20, 0, 20, 20, 0, 0, 0, 25, 0, 25, 20, 50, 30, 0, 3, 9, 0~
## $ dibpat   <chr> "A1,A2", "A1,A2", "B3,B4", "B3,B4", "B3,B4", "B3,B4", "B3,B4"~
## $ chd69    <chr> "no", "no", "no", "no", "yes", "no", "no", "no", "no", "no", ~
## $ typchd69 <chr> "no CHD", "no CHD", "no CHD", "no CHD", "MI or SD", "no CHD",~
## $ time169  <int> 1664, 3071, 3071, 3064, 1885, 3102, 3074, 3071, 3064, 1032, 3~
## $ arcus    <chr> "absent", "present", "absent", "absent", "present", "absent",~

2.2 Many faces of scatter plot

2.2.1 simple

xyplot(sbp ~ dbp, 
       data=dta,
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)")

2.2.2 add grid and regression line

xyplot(sbp ~ dbp, 
       data=dta,
       cex=.5,
       type=c("p", "g", "r"), 
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)")

2.2.3 spline smooth regression line

xyplot(sbp ~ dbp, 
       data=dta,
       type=c("p", "g", "spline"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)")

2.2.4 Title, subtitle, color, alpha

xyplot(sbp ~ dbp, 
       data=dta, 
       type = c("p", "g", "r"), 
       pch=16, 
       cex=.5,
       alpha=.7,
       col="indianred", 
       main="Linear association between blood pressure measures", 
       sub="n=3154", 
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       par.settings=list(par.main.text=list(col="steelblue"), 
                         par.sub.text=list(cex=0.8, 
                                           col="steelblue"), 
                         par.xlab.text=list(cex=0.8, 
                                            col="black"), 
                         par.ylab.text=list(cex=0.8, 
                                            col="black"))) 

2.3 Grouping

2.3.1 CHD event

  • group by CHD event
xyplot(sbp ~ dbp,
       groups=chd69,
       data=dta,
       type=c("p", "g", "r"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       auto.key=list(space="right"))

  • group by CHD event

2.3.2 Corneal arcus

missing value in Corneal arcus

xyplot(sbp ~ dbp,
       groups=arcus,
       data=dta,
       type=c("p", "g", "r"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       auto.key=list(space="right"))

dta$chdclr <- ifelse(dta$chd69=="no", "lightgreen", "plum")

xyplot(sbp ~ dbp, 
       groups=chd69,
       K=dta$chd69, #K?
       data=dta, 
       type="p",
       scales=list(x=list(lim=c(55, 155)), 
                   y=list(lim=c(90, 240))), 
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       panel=function(x, y, K, groups, ..., subscripts) {
          panel.xyplot(x, y, 
                       col=dta$chdclr, 
                       pch=ifelse(K[subscripts]=="no", 1, 16), 
                       alpha=0.6)
          panel.abline(v=80, 
                       lty=3, 
                       col="red")
          panel.abline(h=120, 
                       lty=3, 
                       col="red")})

2.4 Conditioning

2.4.1 Behavior pattern

grid by Behavior pattern

xyplot(sbp ~ dbp | behpat, 
       data=dta, 
       cex=.7,
       alpha=.4,
       type=c("p", "g", "r"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)")

2.4.2 Dichotomous behavior

xyplot(sbp ~ dbp | dibpat, 
       data=dta, 
       cex=.7,
       alpha=.4,
       type=c("p", "g", "r"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)")

2.5 Conditoning and grouping

2.5.1 condition by behpat, group by chd69

xyplot(sbp ~ dbp | behpat, groups=chd69,
       data=dta, 
       cex=.7,
       alpha=.4,
       layout=c(4, 1),
       type=c("p", "g", "r"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       auto.key=list(space="top", 
                     text=c("no", "yes"), 
                     cex=0.7, 
                     column=2,
                     lines=TRUE, 
                     points=TRUE))

2.5.2 condition by dibpat, group by arcus

xyplot(sbp ~ dbp | dibpat, groups=arcus,
       data=dta, 
       cex=.7,
       alpha=.4,
       layout=c(2, 1),
       type=c("p", "g", "r"),
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       auto.key=list(space="top", 
                     text=c("NA", "no", "yes"), 
                     cex=0.7, 
                     column=2,
                     lines=TRUE, 
                     points=TRUE))

xyplot(sbp ~ dbp | behpat, 
       data=dta, 
       col=dta$chdclr,
       type=c("p", "g"), 
       pch=19, 
       alpha=0.5, 
       xlab="Diastolic blood pressure (mmHg)",
       ylab="Systolic blood pressure (mmHg)",
       par.settings=list(layout.heights=list(strip=0.9)),
                         par.strip.text=list(cex=0.7),
                         key=list(space="top",
                                  title="CHD status",
                                  cex.title=0.6,
                                  column=2,
                                  points=list(col=c("lightgreen", "plum"),
                                              cex=0.5, 
                                              pch=19),   
                                  text=list(lab=c("No", "Yes"),                                                          cex=0.7)))

2.6 References

Rosenman, RH et al. (1975). Coronary Heart Disease in the Western Collaborative Group Study. Journal of American Medical Association, 233(8), 872-877.

Vittinghoff, D.V. et al (2012). Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. 2nd Ed. Springer-Verlag, Inc.

Data source

3 Inclass 3

The data set concerns student evaluation of instructor’s beauty and teaching quality for several courses at the University of Texas. The teaching evaluations were done at the end of the semester, and the beauty judgments were made later, by six students who had not attended the classes and were not aware of the course evaluations.

Source: Hamermesh, D.S., & Parker, A.M. (2005). Beauty in the classroom: instructor’s pulchritude and putative pedagogical productivity.a Economics and Education Review, 24, 369-376. Reported in Gelman, A., & Hill, J. (2006). Data analysis using regression and hierarchical/multilevel models. p. 51.

Data

Column 1: Course evaluation score
Column 2: Beauty score
Column 3: Gender of professor, 1 = Female, 0 = Male
Column 4: Pofessor age in years
Column 5: Minority status of professor, 1 = Minority, 0 = Others
Column 6: Tenure status of professor, 1 = Tenured, 0 = No
Column 7: Course ID

Task: Use the lattice package to produce the plot above. Hint: Use ‘reorder’ after obtaining regression coefficients to rearrange conditioning panels.

3.1 Data input

dat <- read.table(file="beautyCourseEval.txt", h=T, sep="")
head(dat)
##   eval     beauty sex age minority tenure courseID
## 1  4.3  0.2015666   1  36        1      0        3
## 2  4.5 -0.8260813   0  59        0      1        0
## 3  3.7 -0.6603327   0  51        0      1        4
## 4  4.3 -0.7663125   1  40        0      1        2
## 5  4.4  1.4214450   1  31        0      0        0
## 6  4.2  0.5002196   0  62        0      1        0

3.2 Plot

xyplot(eval ~ beauty | as.factor(courseID), 
       data = subset(dat, courseID!=0),
       xlab = "Beauty judgement score", 
       ylab = "Average course evaluation score",
       layout = c(6,5),
       type=c("p", "g", "r"),
       par.settings = simpleTheme(col = "steelblue", col.line = "steelblue"))

xyplot(eval ~ beauty | as.factor(courseID), 
       data = subset(dat, courseID==0),
       xlab = "Beauty judgement score", 
       ylab = "Average course evaluation score",
       type=c("p", "g", "r"),
       par.settings = simpleTheme(col = "steelblue", col.line = "steelblue"))

4 Exercises 1

Eight different physical measurements of 30 French girls were recorded from 4 to 15 years old.

Task: Explore various ways to display the data using trellis graphics.

Source: Sempe, M., et al. (1987). Multivariate and longitudinal data on growing children: Presentation of the French auxiological survey. In J. Janssen, et al. (1987). Data analysis. The Ins and Outs of solving real problems (pp. 3-6). New York: Plenum Press.

Column 1: Weight in grams
Column 2: Height in mms
Column 3: Head to butt length in mms
Column 4: Head circumference in mms
Column 5: Chest circumference in mms
Column 6: Arm length in mms
Column 7: Calf length in mms
Column 8: Pelvis circumference in mms
Column 9: Age in years
Column 10: Girl ID

Load in the data set and check the data structure

dat <- read.table("girlsGrowth.txt", header = TRUE)
head(dat)
##     Wt   Ht  Hb  Hc  Cc Arm Calf Pelvis age id
## 1 1456 1025 602 486 520 157  205    170   4 S1
## 2 1426  998 572 501 520 150  215    169   4 S2
## 3 1335  961 560 494 495 145  214    158   4 S3
## 4 1607 1006 595 497 560 178  218    172   4 S4
## 5 1684 1012 584 490 553 165  220    158   4 S5
## 6 1374 1012 580 492 525 158  202    167   4 S6
summary(dat)
##        Wt             Ht             Hb              Hc              Cc       
##  Min.   :1214   Min.   : 932   Min.   :546.0   Min.   :465.0   Min.   :476.0  
##  1st Qu.:1981   1st Qu.:1145   1st Qu.:631.0   1st Qu.:508.0   1st Qu.:575.0  
##  Median :2778   Median :1312   Median :694.5   Median :522.0   Median :630.0  
##  Mean   :3041   Mean   :1318   Mean   :707.2   Mean   :522.8   Mean   :656.4  
##  3rd Qu.:4025   3rd Qu.:1504   3rd Qu.:786.0   3rd Qu.:537.0   3rd Qu.:740.0  
##  Max.   :6285   Max.   :1682   Max.   :905.0   Max.   :585.0   Max.   :935.0  
##       Arm             Calf           Pelvis           age       
##  Min.   :144.0   Min.   :198.0   Min.   :150.0   Min.   : 4.00  
##  1st Qu.:170.0   1st Qu.:234.8   1st Qu.:190.0   1st Qu.: 6.75  
##  Median :186.5   Median :266.0   Median :209.0   Median : 9.50  
##  Mean   :192.8   Mean   :271.5   Mean   :214.8   Mean   : 9.50  
##  3rd Qu.:216.0   3rd Qu.:309.2   3rd Qu.:241.0   3rd Qu.:12.25  
##  Max.   :285.0   Max.   :380.0   Max.   :305.0   Max.   :15.00  
##       id           
##  Length:360        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
length(unique(dat$age))
## [1] 12
# Present relationship between weight and height for each year and generally
xyplot(Wt ~ Ht, groups = factor(age), data = dat,
       auto.key = list(column=4), pch=16,
       type = c('p', 'r', 'g'), xlab= 'Height', ylab = 'Weight')

# Correlation
with(dat, cor(Wt, Ht))
## [1] 0.9491352
split(dat, dat$age) %>% 
  sapply(., function(df) {with(df, cor(Wt, Ht))})
##         4         5         6         7         8         9        10        11 
## 0.6589876 0.7131878 0.7563261 0.7550721 0.7381013 0.6127379 0.6706406 0.6738371 
##        12        13        14        15 
## 0.7103265 0.6313972 0.4659650 0.3489307
# Intecept and slope in linear model (weight ~ height)
split(dat, dat$age) %>%
  sapply(., function(df) {coefficients(lm(Wt ~ Ht, data = df))[1]})
##  4.(Intercept)  5.(Intercept)  6.(Intercept)  7.(Intercept)  8.(Intercept) 
##      -1492.131      -2071.391      -2397.710      -2525.049      -3362.959 
##  9.(Intercept) 10.(Intercept) 11.(Intercept) 12.(Intercept) 13.(Intercept) 
##      -3230.680      -4717.608      -4661.448      -5253.717      -6183.765 
## 14.(Intercept) 15.(Intercept) 
##      -4622.857      -1026.453
split(dat, dat$age) %>%
  sapply(., function(df) {coefficients(lm(Wt ~ Ht, data = df))[2]})
##     4.Ht     5.Ht     6.Ht     7.Ht     8.Ht     9.Ht    10.Ht    11.Ht 
## 3.019906 3.558213 3.827905 3.936904 4.638774 4.579293 5.745730 5.712182 
##    12.Ht    13.Ht    14.Ht    15.Ht 
## 6.131886 6.841161 5.939274 3.772245
  • Generally, height highly positively correlated with weight. But when grouping with age, the correaltion are not so high.
  • In the linear models of weight ~ height, all slopes are positive. This indicates that from age 4-15, height keep positively correlatied with weight in girls.
  • While the intercept of model with age 15 is quite different from others, indicating that girls get heavier in age 15 than other age.
# Present the relationship between weight (Wt) and other measurements for each age
# Standardize each measurement

dat_std <- dat[,1:9] %>% 
  mutate_all(function(x) {as.numeric(robustHD::standardize(x))})
dat_std$age <- dat$age
dat_std$id <- dat$id
summary(dat_std)
##        Wt                Ht                 Hb                Hc          
##  Min.   :-1.4813   Min.   :-1.89954   Min.   :-1.7435   Min.   :-2.78039  
##  1st Qu.:-0.8593   1st Qu.:-0.85086   1st Qu.:-0.8241   1st Qu.:-0.71183  
##  Median :-0.2137   Median :-0.03113   Median :-0.1372   Median :-0.03835  
##  Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.00000  
##  3rd Qu.: 0.7977   3rd Qu.: 0.91785   3rd Qu.: 0.8526   3rd Qu.: 0.68324  
##  Max.   : 2.6301   Max.   : 1.79298   Max.   : 2.1399   Max.   : 2.99233  
##        Cc               Arm               Calf             Pelvis       
##  Min.   :-1.7838   Min.   :-1.6862   Min.   :-1.7252   Min.   :-1.9485  
##  1st Qu.:-0.8049   1st Qu.:-0.7872   1st Qu.:-0.8628   1st Qu.:-0.7464  
##  Median :-0.2610   Median :-0.2167   Median :-0.1295   Median :-0.1755  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.8267   3rd Qu.: 0.8034   3rd Qu.: 0.8855   3rd Qu.: 0.7862  
##  Max.   : 2.7548   Max.   : 3.1892   Max.   : 2.5457   Max.   : 2.7095  
##       age             id           
##  Min.   : 4.00   Length:360        
##  1st Qu.: 6.75   Class :character  
##  Median : 9.50   Mode  :character  
##  Mean   : 9.50                     
##  3rd Qu.:12.25                     
##  Max.   :15.00
#Turn data set into long form
dat_Wt <- dat_std |> select(-Wt, -id, -age) |> stack()
dat_Wt$Wt <- rep(dat_std$Wt, ncol(dat_std)-3)
dat_Wt$age <- rep(dat_std$age, ncol(dat_std)-3)
dat_Wt$id <- rep(dat_std$id, ncol(dat_std)-3)
str(dat_Wt)
## 'data.frame':    2520 obs. of  5 variables:
##  $ values: num  -1.44 -1.57 -1.76 -1.54 -1.51 ...
##  $ ind   : Factor w/ 7 levels "Ht","Hb","Hc",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Wt    : num  -1.29 -1.31 -1.38 -1.16 -1.1 ...
##  $ age   : int  4 4 4 4 4 4 4 4 4 4 ...
##  $ id    : chr  "S1" "S2" "S3" "S4" ...
xyplot(Wt ~ values | ind, 
       groups = paste0('Age ', age), 
       data = dat_Wt,
       auto.key = list(column=4), 
       pch=16, cex=.4, type = c('p', 'r', 'g'),
       xlab= 'Standardized measurement', 
       ylab = 'Standardized weight')

The slopes of Ht in predicting Wt are relatively similar in different ages. While slopes of other measurement in predicting Wt differ in different ages. Take head circumference (Hc) as an example, the slope of Hc in predicting Wt get higher when girls get older.

# Try another variables: Present the relationship between one measuremeant and other measurments for each age

# Put codes into a function
xyplots_dat <- function(var_name, var_label) {
  library(lattice)
  
  # Turn data set into long form
  dat_var <- dat_std %>% select(-var_name, -id, -age) %>% stack()
  dat_var$var_name <- rep(dat_std[,var_name], ncol(dat_std)-3)
  dat_var$age <- rep(dat_std$age, ncol(dat_std)-3)
  dat_var$id <- rep(dat_std$id, ncol(dat_std)-3)
  
  # Plot
  xyplot(var_name ~ values | ind, groups = paste0('Age ', age), data = dat_var,
         auto.key = list(column=4), pch=16, cex=.4, type = c('p', 'r', 'g'), 
         xlab = 'Standardized measurement',
         ylab = paste0('Standardized ', var_label))
}
# Head to butt length
xyplots_dat(var_name = 'Hb', var_label = 'length of head to butt')
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(var_name)` instead of `var_name` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

- In linear models of Hb ~ Ht of each age, both of intecepts and slopes are similar. - In linear models of Hb ~ Hc of each age,slopes are similar while intecepts differ.

# Arm
xyplots_dat(var_name = 'Arm', var_label = 'length of arm')

  • In linear models of Arm ~ Ht of each age, slope of age 15 is negative, which is quite different from other ages.
  • In linear models of Hb ~ Hb of each age, slopes of age 6 and 15 are quite smaller than other ages.

5 Exercise 2

Your manager gave you a sales data on several products in a SAS format.

Task: Your task is to summarize and report the data in tables and graphs using the R lattice package.

Re-code the region variable (1 to 4) by “Nothern”, “Southern”, “Eastern” and “Western”;
the district variable (1 - 5) by “North East”, “South East”, “South West”, “North West”, “Central West”;
the quarter variable (1-4) by “1st”, “2nd”, “3rd”, “4th”;
and the month variable (1-12) by “Jan”, “Feb”, etc. Set negative sales values to zero.

pacman::p_load('sas7bdat')
dat <- read.sas7bdat("sales.sas7bdat")
head(dat)
##    product category customer year month quarter market sales expense region
## 1    Shoes    Shoes     Acme 2001     1       1      1   300     240      1
## 2    Boots    Shoes     Acme 2001     1       1      1  2200    1540      1
## 3 Slippers Slippers     Acme 2001     1       1      1   900     540      1
## 4    Shoes    Shoes     Acme 2001     2       1      1   100      80      1
## 5    Boots    Shoes     Acme 2001     2       1      1  1400     980      1
## 6 Slippers Slippers     Acme 2001     2       1      1     0       0      1
##   district return constantv quantity
## 1        1      0         1       30
## 2        1      0         1      275
## 3        1      0         1      180
## 4        1      0         1       10
## 5        1      0         1      175
## 6        1      0         1        0
summary(dat)
##    product            category           customer              year     
##  Length:72          Length:72          Length:72          Min.   :2001  
##  Class :character   Class :character   Class :character   1st Qu.:2001  
##  Mode  :character   Mode  :character   Mode  :character   Median :2002  
##                                                           Mean   :2002  
##                                                           3rd Qu.:2002  
##                                                           Max.   :2002  
##      month          quarter         market          sales          expense    
##  Min.   : 1.00   Min.   :1.00   Min.   :1.000   Min.   :-1400   Min.   :-980  
##  1st Qu.: 3.75   1st Qu.:1.75   1st Qu.:1.000   1st Qu.: 1000   1st Qu.: 660  
##  Median : 6.50   Median :2.50   Median :2.000   Median : 1550   Median :1065  
##  Mean   : 6.50   Mean   :2.50   Mean   :1.667   Mean   : 1686   Mean   :1172  
##  3rd Qu.: 9.25   3rd Qu.:3.25   3rd Qu.:2.000   3rd Qu.: 2525   3rd Qu.:1860  
##  Max.   :12.00   Max.   :4.00   Max.   :2.000   Max.   : 4700   Max.   :2960  
##      region         district       return         constantv    quantity    
##  Min.   :1.000   Min.   :1.0   Min.   :0.0000   Min.   :1   Min.   :  0.0  
##  1st Qu.:1.000   1st Qu.:1.0   1st Qu.:0.0000   1st Qu.:1   1st Qu.:135.2  
##  Median :1.000   Median :1.0   Median :0.0000   Median :1   Median :220.0  
##  Mean   :1.333   Mean   :1.5   Mean   :0.6667   Mean   :1   Mean   :248.9  
##  3rd Qu.:1.000   3rd Qu.:1.0   3rd Qu.:0.0000   3rd Qu.:1   3rd Qu.:287.8  
##  Max.   :4.000   Max.   :5.0   Max.   :5.0000   Max.   :1   Max.   :940.0
# See the distribution of scales for each region.
densityplot(~ quantity | as.character(region), data = dat, layout=c(1, 3))

There is no sale in Eastern region (region=3).

Compare sales of each quarter

bwplot(quantity ~ factor(quarter), data = dat, xlab="Quarter")

Generally, the sale increase as the quarter goes by.

Compare sales of each market

bwplot(quantity ~ factor(market), data = dat, xlab="Market")

There is a larger sale in market 2 than makret 1. And the variation of sales of market 1 is smaller than that of market 2.

Compare sales of each product

bwplot(quantity ~ factor(product), data = dat, xlab="Market")

Slippers have a better sale.

Compare sales of each product in 2 markets

bwplot(quantity ~ factor(product) | factor(market), data = dat, xlab="Market")

The sale of slippers is not that good in market 1.

Compare sales change of months for each product

dotplot(quantity ~ month, groups = product, data = dat, 
        xlab="Month", ylab="Quantity", 
        type=c('p', 'g',"r"), auto.key=list(space="top", columns=3))

Compare sales of each customer in 2 markets

bwplot(quantity ~ factor(customer) | factor(market), data = dat, xlab="Market")

We have only one customer, Acme, in market 1.

Compare distribution of sales of each product in four quaters.

densityplot(~ quantity | factor(quarter), groups = product, data = dat, 
            xlab = "Quarter", auto.key = list(column=3), layout = c(1, 4))

Finding

  • We should develop markets in Eastern region
  • Sale of TwoFeet and BigX are lower than Acme. We should put more efforts on attracting these two customers, especially in market 1.
  • Keep a good relationship with our biggest customer, Acme.
  • Put more resources on marketing boots and shoes.
  • Try to develop products for spring to increase sales in month.

5.1 Reference

Source: Gupta, S. K. (2006). Data Management and Reporting Made Easy with SAS Learning Edition 2.0

6 Exercise 3

A sample of 40 psychology students at a large southwestern university took four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. The researchers also used Magnetic Resonance Imaging (MRI) to determine the brain size of the subjects.

  • Column 1: Subject ID
  • Column 2: Gender ID
  • Column 3: Full scale IQ
  • Column 4: Verbal IQ
  • Column 5: Performance IQ
  • Column 6: Body weight in pounds
  • Column 7: Height in inches
  • Column 8: Total pixel counts from 18 MRI scans

Task: Use appropriate lattice graphics to answer the following questions.

  • Are there gender differences in the three IQ scores?
  • Is the relationship between height and weight gender dependent?
  • Is the relationship between IQ and brain size (as measured by MRIcount) gender dependent?

6.1 Data input

brain <- read.table("brainsize.txt",h=T)
head(brain)
##   Sbj Gender FSIQ VIQ PIQ Weight Height MRICount
## 1   1 Female  133 132 124    118   64.5   816932
## 2   2   Male  140 150 124     NA   72.5  1001121
## 3   3   Male  139 123 150    143   73.3  1038437
## 4   4   Male  133 129 128    172   68.8   965353
## 5   5 Female  137 132 134    147   65.0   951545
## 6   6 Female   99  90 110    146   69.0   928799

6.2 Q1

  • Are there gender differences in the three IQ scores?
densityplot(~ FSIQ + VIQ + PIQ, data = brain, groups = Gender, 
            plot.points = FALSE,
            ref = TRUE,auto.key = list(space = "right"),
            layout=c(3,1),
            par.settings = simpleTheme(col.line = c("#E16A86", "#00AD9A")))

6.3 Q2

  • Is the relationship between height and weight gender dependent?
xyplot(Height ~ Weight, group = Gender, data = brain,
       type = c("g","r","p"), 
       par.settings = simpleTheme(col = c("#E16A86", "#00AD9A"), 
                                  pch = 20, 
                                  col.line = c("#E16A86", "#00AD9A")),
       auto.key = list(column = 2))

6.4 Q3

  • Is the relationship between IQ and brain size (as measured by MRIcount) gender dependent?
xyplot(MRICount ~ FSIQ+VIQ+PIQ , 
       group = Gender, 
       data = brain,
       layout=c(3,1),
       type = c("g","r","p"), 
       par.settings = simpleTheme(col = c("#E16A86", "#00AD9A"), 
                                  pch = 20, col.line = c("#E16A86", "#00AD9A")),
       auto.key = list(column = 2))

6.5 Reference

Source: Willerman, L., Schultz, R., Rutledge, J.N., & Bigler, E. (1991), In Vivo Brain Size and Intelligence, Intelligence, 15, 223-228.