Task: Render the Ch04.R script for replicating figures in Chapter 4 of Lattice: Multivariate Data Visualization with R (Sarkar, D. 2008) to html document with comments at each code chunk indicated by ‘##’.
## load package
::p_load(lattice, dplyr, magrittr)
pacman
## view data
VADeaths
## Rural Male Rural Female Urban Male Urban Female
## 50-54 11.7 8.7 15.4 8.4
## 55-59 18.1 11.7 24.3 13.6
## 60-64 26.9 20.3 37.0 19.3
## 65-69 41.0 30.9 54.6 35.1
## 70-74 66.0 54.3 71.1 50.0
## data Classes
class(VADeaths)
## [1] "matrix" "array"
## List all available methods for a S3 and S4 generic function (dotplot), or all methods for an S3 or S4 class.
methods("dotplot")
## [1] dotplot.array* dotplot.default* dotplot.formula* dotplot.matrix*
## [5] dotplot.numeric* dotplot.table*
## see '?methods' for accessing help and source code
## dotplot of VADeaths (groups=TRUE shows without grid)
dotplot(VADeaths, groups=FALSE)
## dotplot with 1 by 4 grid
dotplot(VADeaths, groups=FALSE,
layout = c(1, 4), # 1column * 4row
aspect = "fill", # aspect controls the physical aspect ratio of the panels
#origin = 0, ##???
type = c("p", "h"), # "p" denotes a point, "h" for histogram-like vertical lines
main = "Death Rates in Virginia - 1940", # plot title
xlab = "Rate (per 1000)") # x axis title
## dotplot with 1 pannel without grid and legend
dotplot(VADeaths, type="o", # "o" for overplotted points and lines
auto.key=list(lines=TRUE, space="right"), # add legend,
main="Death Rates in Virginia - 1940",
xlab="Rate (per 1000)")
## barchart with 1 by 4 grid
barchart(VADeaths, groups=FALSE,
layout = c(1, 4),
aspect = "fill",
reference = FALSE, ##?
main = "Death Rates in Virginia - 1940",
xlab = "Rate (per 100)")
## input postdoc data from latticeExtra package
data(postdoc, package="latticeExtra")
## table class
class(postdoc)
## [1] "table"
::kable(postdoc) knitr
Expected or Additional Training | Work with Specific Person | Training Outside PhD Field | Other Employment Not Available | Other | |
---|---|---|---|---|---|
Biological Sciences | 6404 | 2427 | 1950 | 1779 | 602 |
Chemistry | 865 | 308 | 292 | 551 | 168 |
Earth, Atmospheric, and Ocean Sciences | 343 | 75 | 75 | 238 | 80 |
Engineering | 586 | 464 | 288 | 517 | 401 |
Medical Sciences | 205 | 137 | 82 | 68 | 74 |
Physics and Astronomy | 1010 | 347 | 175 | 399 | 162 |
Social and Behavioral Sciences | 1368 | 564 | 412 | 514 | 305 |
All Postdoctorates | 11197 | 4687 | 3403 | 4406 | 1914 |
## proportion barchart
barchart(prop.table(postdoc, margin=1),
xlab = "Proportion",
auto.key = list(adj=1)) # legend word position
## dotplot grid by "Reasons for Taking First Postdoctoral Appointment"
dotplot(prop.table(postdoc, margin = 1),
groups = FALSE,
xlab = "Proportion",
par.strip.text = list(abbreviate=TRUE, minlength=10)) # panel title with abbreviate
## 調整不好
dotplot(prop.table(postdoc, margin=1),
groups=FALSE,
index.cond=function(x, y) median(x),
xlab="Proportion",
layout=c(1, 5),
aspect=0.6,
scales=list(y=list(relation="free", rot=0)),
# relation: A character string that determines how axis limits are calculated for each panel. Possible values are "same" (default), "free" and "sliced".
## For relation="same", the same limits, usually large enough to encompass all the data, are used for all the panels.
## For relation="free", limits for each panel is determined by just the points in that panel.
## Behavior for relation="sliced" is similar, except that the length (max - min) of the scales are constrained to remain the same across panels.
# rot: Angle (in degrees) by which the axis labels are to be rotated. Can be a vector of length 2, to control left/bottom and right/top axes separately.
prepanel=function(x, y) {
list(ylim=levels(reorder(y, x)))
},panel=function(x, y, ...) {
panel.dotplot(x, reorder(y, x), ...)
} )
## input Chem97 data from mlmRev package
data(Chem97, package="mlmRev")
## Cross Tabulation
## count the number of gcsescore (Average GCSE score of individual) by gender
<- xtabs(~ gcsescore + gender, Chem97) gcsescore.tab
## trans to data frame
<- as.data.frame(gcsescore.tab) gcsescore.df
## set the gcsescore as numeric variable
## 因為原本是factor,需要先轉成character
$gcsescore <- as.numeric(as.character(gcsescore.df$gcsescore)) gcsescore.df
## Freq by gcsescore grid by gender
xyplot(Freq ~ gcsescore | gender,
data = gcsescore.df,
type="h",
layout=c(1, 2),
xlab="Average GCSE Score")
## count the number of score (Point score on A-level Chemistry in 1997) by gender
<- xtabs(~score + gender, Chem97) score.tab
## trans to data frame
<- as.data.frame(score.tab) score.df
## ## Freq by score grid by gender
barchart(Freq ~ score | gender, score.df, origin = 0)
## The End
The lattice function, xyplot, is used to construct a variety of scatter diagrams presented in the Western Collaborative Group Study web page.
Task: Replicate similar scatter diagrams by replacing the grouping variable with the conditioning variable, and vice versa, in the example.
Blood pressure, behavioral patterns, and coronary heart disease
This prospective cohort study was conducted by collaborators from UCSF and the SPH at Berkeley. The participants, all men, were recruited between June 1960 and December 1961. The main disease outcome of interest is coronary heart disease (CHD).
None of the men had CHD at the start of the study. They were followed annually for 8 or 9 years.
id Subject ID: age Age: age in years height Height: height in inches weight Weight: weight in pounds sbp Systolic blood pressure: mm Hg dbp Diastolic blood pressure: mm Hg chol Cholesterol: mg/100 ml behpat Behavior pattern: ncigs Smoking: Cigarettes/day dibpat Dichotomous behavior pattern: 0 = Type B; 1 = Type A chd69 Coronary heart disease event: 0 = none; 1 = yes typechd69 to be done time169 Observation (follow up) time: Days arcus Corneal arcus: 0 = none; 1 = yes
<- "http://courses.washington.edu/b517/Datasets/wcgs.csv"
fL <- read.csv(fL)
dta ::glimpse(dta) dplyr
## Rows: 3,154
## Columns: 14
## $ id <int> 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2~
## $ age <int> 49, 42, 42, 41, 59, 44, 44, 40, 43, 42, 53, 41, 50, 43, 44, 5~
## $ height <int> 73, 70, 69, 68, 70, 72, 72, 71, 72, 70, 69, 67, 72, 72, 71, 7~
## $ weight <int> 150, 160, 160, 152, 150, 204, 164, 150, 190, 175, 167, 156, 1~
## $ sbp <int> 110, 154, 110, 124, 144, 150, 130, 138, 146, 132, 146, 138, 1~
## $ dbp <int> 76, 84, 78, 78, 86, 90, 84, 60, 76, 90, 94, 96, 90, 80, 80, 8~
## $ chol <int> 225, 177, 181, 132, 255, 182, 155, 140, 149, 325, 223, 271, 2~
## $ behpat <chr> "A2", "A2", "B3", "B4", "B3", "B4", "B4", "A2", "B3", "A2", "~
## $ ncigs <int> 25, 20, 0, 20, 20, 0, 0, 0, 25, 0, 25, 20, 50, 30, 0, 3, 9, 0~
## $ dibpat <chr> "A1,A2", "A1,A2", "B3,B4", "B3,B4", "B3,B4", "B3,B4", "B3,B4"~
## $ chd69 <chr> "no", "no", "no", "no", "yes", "no", "no", "no", "no", "no", ~
## $ typchd69 <chr> "no CHD", "no CHD", "no CHD", "no CHD", "MI or SD", "no CHD",~
## $ time169 <int> 1664, 3071, 3071, 3064, 1885, 3102, 3074, 3071, 3064, 1032, 3~
## $ arcus <chr> "absent", "present", "absent", "absent", "present", "absent",~
xyplot(sbp ~ dbp,
data=dta,
xlab="Diastolic blood pressure (mmHg)",
ylab="Systolic blood pressure (mmHg)")
xyplot(sbp ~ dbp,
data=dta,
cex=.5,
type=c("p", "g", "r"),
xlab="Diastolic blood pressure (mmHg)",
ylab="Systolic blood pressure (mmHg)")
xyplot(sbp ~ dbp,
data=dta,
type=c("p", "g", "spline"),
xlab="Diastolic blood pressure (mmHg)",
ylab="Systolic blood pressure (mmHg)")
xyplot(sbp ~ dbp,
data=dta,
type = c("p", "g", "r"),
pch=16,
cex=.5,
alpha=.7,
col="indianred",
main="Linear association between blood pressure measures",
sub="n=3154",
xlab="Diastolic blood pressure (mmHg)",
ylab="Systolic blood pressure (mmHg)",
par.settings=list(par.main.text=list(col="steelblue"),
par.sub.text=list(cex=0.8,
col="steelblue"),
par.xlab.text=list(cex=0.8,
col="black"),
par.ylab.text=list(cex=0.8,
col="black")))
xyplot(sbp ~ dbp,
groups=chd69,
data=dta,
type=c("p", "g", "r"),
xlab="Diastolic blood pressure (mmHg)",
ylab="Systolic blood pressure (mmHg)",
auto.key=list(space="right"))
missing value in Corneal arcus
xyplot(sbp ~ dbp,
groups=arcus,
data=dta,
type=c("p", "g", "r"),
xlab="Diastolic blood pressure (mmHg)",
ylab="Systolic blood pressure (mmHg)",
auto.key=list(space="right"))
$chdclr <- ifelse(dta$chd69=="no", "lightgreen", "plum")
dta
xyplot(sbp ~ dbp,
groups=chd69,
K=dta$chd69, #K?
data=dta,
type="p",
scales=list(x=list(lim=c(55, 155)),
y=list(lim=c(90, 240))),
xlab="Diastolic blood pressure (mmHg)",
ylab="Systolic blood pressure (mmHg)",
panel=function(x, y, K, groups, ..., subscripts) {
panel.xyplot(x, y,
col=dta$chdclr,
pch=ifelse(K[subscripts]=="no", 1, 16),
alpha=0.6)
panel.abline(v=80,
lty=3,
col="red")
panel.abline(h=120,
lty=3,
col="red")})
grid by Behavior pattern
xyplot(sbp ~ dbp | behpat,
data=dta,
cex=.7,
alpha=.4,
type=c("p", "g", "r"),
xlab="Diastolic blood pressure (mmHg)",
ylab="Systolic blood pressure (mmHg)")
xyplot(sbp ~ dbp | dibpat,
data=dta,
cex=.7,
alpha=.4,
type=c("p", "g", "r"),
xlab="Diastolic blood pressure (mmHg)",
ylab="Systolic blood pressure (mmHg)")
xyplot(sbp ~ dbp | behpat, groups=chd69,
data=dta,
cex=.7,
alpha=.4,
layout=c(4, 1),
type=c("p", "g", "r"),
xlab="Diastolic blood pressure (mmHg)",
ylab="Systolic blood pressure (mmHg)",
auto.key=list(space="top",
text=c("no", "yes"),
cex=0.7,
column=2,
lines=TRUE,
points=TRUE))
xyplot(sbp ~ dbp | dibpat, groups=arcus,
data=dta,
cex=.7,
alpha=.4,
layout=c(2, 1),
type=c("p", "g", "r"),
xlab="Diastolic blood pressure (mmHg)",
ylab="Systolic blood pressure (mmHg)",
auto.key=list(space="top",
text=c("NA", "no", "yes"),
cex=0.7,
column=2,
lines=TRUE,
points=TRUE))
xyplot(sbp ~ dbp | behpat,
data=dta,
col=dta$chdclr,
type=c("p", "g"),
pch=19,
alpha=0.5,
xlab="Diastolic blood pressure (mmHg)",
ylab="Systolic blood pressure (mmHg)",
par.settings=list(layout.heights=list(strip=0.9)),
par.strip.text=list(cex=0.7),
key=list(space="top",
title="CHD status",
cex.title=0.6,
column=2,
points=list(col=c("lightgreen", "plum"),
cex=0.5,
pch=19),
text=list(lab=c("No", "Yes"), cex=0.7)))
Rosenman, RH et al. (1975). Coronary Heart Disease in the Western Collaborative Group Study. Journal of American Medical Association, 233(8), 872-877.
Vittinghoff, D.V. et al (2012). Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. 2nd Ed. Springer-Verlag, Inc.
The data set concerns student evaluation of instructor’s beauty and teaching quality for several courses at the University of Texas. The teaching evaluations were done at the end of the semester, and the beauty judgments were made later, by six students who had not attended the classes and were not aware of the course evaluations.
Source: Hamermesh, D.S., & Parker, A.M. (2005). Beauty in the classroom: instructor’s pulchritude and putative pedagogical productivity.a Economics and Education Review, 24, 369-376. Reported in Gelman, A., & Hill, J. (2006). Data analysis using regression and hierarchical/multilevel models. p. 51.
Data
Column 1: Course evaluation score
Column 2: Beauty score
Column 3: Gender of professor, 1 = Female, 0 = Male
Column 4: Pofessor age in years
Column 5: Minority status of professor, 1 = Minority, 0 = Others
Column 6: Tenure status of professor, 1 = Tenured, 0 = No
Column 7: Course ID
Task: Use the lattice package to produce the plot above. Hint: Use ‘reorder’ after obtaining regression coefficients to rearrange conditioning panels.
<- read.table(file="beautyCourseEval.txt", h=T, sep="")
dat head(dat)
## eval beauty sex age minority tenure courseID
## 1 4.3 0.2015666 1 36 1 0 3
## 2 4.5 -0.8260813 0 59 0 1 0
## 3 3.7 -0.6603327 0 51 0 1 4
## 4 4.3 -0.7663125 1 40 0 1 2
## 5 4.4 1.4214450 1 31 0 0 0
## 6 4.2 0.5002196 0 62 0 1 0
xyplot(eval ~ beauty | as.factor(courseID),
data = subset(dat, courseID!=0),
xlab = "Beauty judgement score",
ylab = "Average course evaluation score",
layout = c(6,5),
type=c("p", "g", "r"),
par.settings = simpleTheme(col = "steelblue", col.line = "steelblue"))
xyplot(eval ~ beauty | as.factor(courseID),
data = subset(dat, courseID==0),
xlab = "Beauty judgement score",
ylab = "Average course evaluation score",
type=c("p", "g", "r"),
par.settings = simpleTheme(col = "steelblue", col.line = "steelblue"))
Eight different physical measurements of 30 French girls were recorded from 4 to 15 years old.
Task: Explore various ways to display the data using trellis graphics.
Source: Sempe, M., et al. (1987). Multivariate and longitudinal data on growing children: Presentation of the French auxiological survey. In J. Janssen, et al. (1987). Data analysis. The Ins and Outs of solving real problems (pp. 3-6). New York: Plenum Press.
Column 1: Weight in grams
Column 2: Height in mms
Column 3: Head to butt length in mms
Column 4: Head circumference in mms
Column 5: Chest circumference in mms
Column 6: Arm length in mms
Column 7: Calf length in mms
Column 8: Pelvis circumference in mms
Column 9: Age in years
Column 10: Girl ID
Load in the data set and check the data structure
<- read.table("girlsGrowth.txt", header = TRUE)
dat head(dat)
## Wt Ht Hb Hc Cc Arm Calf Pelvis age id
## 1 1456 1025 602 486 520 157 205 170 4 S1
## 2 1426 998 572 501 520 150 215 169 4 S2
## 3 1335 961 560 494 495 145 214 158 4 S3
## 4 1607 1006 595 497 560 178 218 172 4 S4
## 5 1684 1012 584 490 553 165 220 158 4 S5
## 6 1374 1012 580 492 525 158 202 167 4 S6
summary(dat)
## Wt Ht Hb Hc Cc
## Min. :1214 Min. : 932 Min. :546.0 Min. :465.0 Min. :476.0
## 1st Qu.:1981 1st Qu.:1145 1st Qu.:631.0 1st Qu.:508.0 1st Qu.:575.0
## Median :2778 Median :1312 Median :694.5 Median :522.0 Median :630.0
## Mean :3041 Mean :1318 Mean :707.2 Mean :522.8 Mean :656.4
## 3rd Qu.:4025 3rd Qu.:1504 3rd Qu.:786.0 3rd Qu.:537.0 3rd Qu.:740.0
## Max. :6285 Max. :1682 Max. :905.0 Max. :585.0 Max. :935.0
## Arm Calf Pelvis age
## Min. :144.0 Min. :198.0 Min. :150.0 Min. : 4.00
## 1st Qu.:170.0 1st Qu.:234.8 1st Qu.:190.0 1st Qu.: 6.75
## Median :186.5 Median :266.0 Median :209.0 Median : 9.50
## Mean :192.8 Mean :271.5 Mean :214.8 Mean : 9.50
## 3rd Qu.:216.0 3rd Qu.:309.2 3rd Qu.:241.0 3rd Qu.:12.25
## Max. :285.0 Max. :380.0 Max. :305.0 Max. :15.00
## id
## Length:360
## Class :character
## Mode :character
##
##
##
length(unique(dat$age))
## [1] 12
# Present relationship between weight and height for each year and generally
xyplot(Wt ~ Ht, groups = factor(age), data = dat,
auto.key = list(column=4), pch=16,
type = c('p', 'r', 'g'), xlab= 'Height', ylab = 'Weight')
# Correlation
with(dat, cor(Wt, Ht))
## [1] 0.9491352
split(dat, dat$age) %>%
sapply(., function(df) {with(df, cor(Wt, Ht))})
## 4 5 6 7 8 9 10 11
## 0.6589876 0.7131878 0.7563261 0.7550721 0.7381013 0.6127379 0.6706406 0.6738371
## 12 13 14 15
## 0.7103265 0.6313972 0.4659650 0.3489307
# Intecept and slope in linear model (weight ~ height)
split(dat, dat$age) %>%
sapply(., function(df) {coefficients(lm(Wt ~ Ht, data = df))[1]})
## 4.(Intercept) 5.(Intercept) 6.(Intercept) 7.(Intercept) 8.(Intercept)
## -1492.131 -2071.391 -2397.710 -2525.049 -3362.959
## 9.(Intercept) 10.(Intercept) 11.(Intercept) 12.(Intercept) 13.(Intercept)
## -3230.680 -4717.608 -4661.448 -5253.717 -6183.765
## 14.(Intercept) 15.(Intercept)
## -4622.857 -1026.453
split(dat, dat$age) %>%
sapply(., function(df) {coefficients(lm(Wt ~ Ht, data = df))[2]})
## 4.Ht 5.Ht 6.Ht 7.Ht 8.Ht 9.Ht 10.Ht 11.Ht
## 3.019906 3.558213 3.827905 3.936904 4.638774 4.579293 5.745730 5.712182
## 12.Ht 13.Ht 14.Ht 15.Ht
## 6.131886 6.841161 5.939274 3.772245
# Present the relationship between weight (Wt) and other measurements for each age
# Standardize each measurement
<- dat[,1:9] %>%
dat_std mutate_all(function(x) {as.numeric(robustHD::standardize(x))})
$age <- dat$age
dat_std$id <- dat$id
dat_stdsummary(dat_std)
## Wt Ht Hb Hc
## Min. :-1.4813 Min. :-1.89954 Min. :-1.7435 Min. :-2.78039
## 1st Qu.:-0.8593 1st Qu.:-0.85086 1st Qu.:-0.8241 1st Qu.:-0.71183
## Median :-0.2137 Median :-0.03113 Median :-0.1372 Median :-0.03835
## Mean : 0.0000 Mean : 0.00000 Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.7977 3rd Qu.: 0.91785 3rd Qu.: 0.8526 3rd Qu.: 0.68324
## Max. : 2.6301 Max. : 1.79298 Max. : 2.1399 Max. : 2.99233
## Cc Arm Calf Pelvis
## Min. :-1.7838 Min. :-1.6862 Min. :-1.7252 Min. :-1.9485
## 1st Qu.:-0.8049 1st Qu.:-0.7872 1st Qu.:-0.8628 1st Qu.:-0.7464
## Median :-0.2610 Median :-0.2167 Median :-0.1295 Median :-0.1755
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.8267 3rd Qu.: 0.8034 3rd Qu.: 0.8855 3rd Qu.: 0.7862
## Max. : 2.7548 Max. : 3.1892 Max. : 2.5457 Max. : 2.7095
## age id
## Min. : 4.00 Length:360
## 1st Qu.: 6.75 Class :character
## Median : 9.50 Mode :character
## Mean : 9.50
## 3rd Qu.:12.25
## Max. :15.00
#Turn data set into long form
<- dat_std |> select(-Wt, -id, -age) |> stack()
dat_Wt $Wt <- rep(dat_std$Wt, ncol(dat_std)-3)
dat_Wt$age <- rep(dat_std$age, ncol(dat_std)-3)
dat_Wt$id <- rep(dat_std$id, ncol(dat_std)-3)
dat_Wtstr(dat_Wt)
## 'data.frame': 2520 obs. of 5 variables:
## $ values: num -1.44 -1.57 -1.76 -1.54 -1.51 ...
## $ ind : Factor w/ 7 levels "Ht","Hb","Hc",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Wt : num -1.29 -1.31 -1.38 -1.16 -1.1 ...
## $ age : int 4 4 4 4 4 4 4 4 4 4 ...
## $ id : chr "S1" "S2" "S3" "S4" ...
xyplot(Wt ~ values | ind,
groups = paste0('Age ', age),
data = dat_Wt,
auto.key = list(column=4),
pch=16, cex=.4, type = c('p', 'r', 'g'),
xlab= 'Standardized measurement',
ylab = 'Standardized weight')
The slopes of Ht in predicting Wt are relatively similar in different ages. While slopes of other measurement in predicting Wt differ in different ages. Take head circumference (Hc) as an example, the slope of Hc in predicting Wt get higher when girls get older.
# Try another variables: Present the relationship between one measuremeant and other measurments for each age
# Put codes into a function
<- function(var_name, var_label) {
xyplots_dat library(lattice)
# Turn data set into long form
<- dat_std %>% select(-var_name, -id, -age) %>% stack()
dat_var $var_name <- rep(dat_std[,var_name], ncol(dat_std)-3)
dat_var$age <- rep(dat_std$age, ncol(dat_std)-3)
dat_var$id <- rep(dat_std$id, ncol(dat_std)-3)
dat_var
# Plot
xyplot(var_name ~ values | ind, groups = paste0('Age ', age), data = dat_var,
auto.key = list(column=4), pch=16, cex=.4, type = c('p', 'r', 'g'),
xlab = 'Standardized measurement',
ylab = paste0('Standardized ', var_label))
}
# Head to butt length
xyplots_dat(var_name = 'Hb', var_label = 'length of head to butt')
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(var_name)` instead of `var_name` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
- In linear models of Hb ~ Ht of each age, both of intecepts and slopes are similar. - In linear models of Hb ~ Hc of each age,slopes are similar while intecepts differ.
# Arm
xyplots_dat(var_name = 'Arm', var_label = 'length of arm')
Your manager gave you a sales data on several products in a SAS format.
Task: Your task is to summarize and report the data in tables and graphs using the R lattice package.
Re-code the region variable (1 to 4) by “Nothern”, “Southern”, “Eastern” and “Western”;
the district variable (1 - 5) by “North East”, “South East”, “South West”, “North West”, “Central West”;
the quarter variable (1-4) by “1st”, “2nd”, “3rd”, “4th”;
and the month variable (1-12) by “Jan”, “Feb”, etc. Set negative sales values to zero.
::p_load('sas7bdat')
pacman<- read.sas7bdat("sales.sas7bdat")
dat head(dat)
## product category customer year month quarter market sales expense region
## 1 Shoes Shoes Acme 2001 1 1 1 300 240 1
## 2 Boots Shoes Acme 2001 1 1 1 2200 1540 1
## 3 Slippers Slippers Acme 2001 1 1 1 900 540 1
## 4 Shoes Shoes Acme 2001 2 1 1 100 80 1
## 5 Boots Shoes Acme 2001 2 1 1 1400 980 1
## 6 Slippers Slippers Acme 2001 2 1 1 0 0 1
## district return constantv quantity
## 1 1 0 1 30
## 2 1 0 1 275
## 3 1 0 1 180
## 4 1 0 1 10
## 5 1 0 1 175
## 6 1 0 1 0
summary(dat)
## product category customer year
## Length:72 Length:72 Length:72 Min. :2001
## Class :character Class :character Class :character 1st Qu.:2001
## Mode :character Mode :character Mode :character Median :2002
## Mean :2002
## 3rd Qu.:2002
## Max. :2002
## month quarter market sales expense
## Min. : 1.00 Min. :1.00 Min. :1.000 Min. :-1400 Min. :-980
## 1st Qu.: 3.75 1st Qu.:1.75 1st Qu.:1.000 1st Qu.: 1000 1st Qu.: 660
## Median : 6.50 Median :2.50 Median :2.000 Median : 1550 Median :1065
## Mean : 6.50 Mean :2.50 Mean :1.667 Mean : 1686 Mean :1172
## 3rd Qu.: 9.25 3rd Qu.:3.25 3rd Qu.:2.000 3rd Qu.: 2525 3rd Qu.:1860
## Max. :12.00 Max. :4.00 Max. :2.000 Max. : 4700 Max. :2960
## region district return constantv quantity
## Min. :1.000 Min. :1.0 Min. :0.0000 Min. :1 Min. : 0.0
## 1st Qu.:1.000 1st Qu.:1.0 1st Qu.:0.0000 1st Qu.:1 1st Qu.:135.2
## Median :1.000 Median :1.0 Median :0.0000 Median :1 Median :220.0
## Mean :1.333 Mean :1.5 Mean :0.6667 Mean :1 Mean :248.9
## 3rd Qu.:1.000 3rd Qu.:1.0 3rd Qu.:0.0000 3rd Qu.:1 3rd Qu.:287.8
## Max. :4.000 Max. :5.0 Max. :5.0000 Max. :1 Max. :940.0
# See the distribution of scales for each region.
densityplot(~ quantity | as.character(region), data = dat, layout=c(1, 3))
There is no sale in Eastern region (region=3).
Compare sales of each quarter
bwplot(quantity ~ factor(quarter), data = dat, xlab="Quarter")
Generally, the sale increase as the quarter goes by.
Compare sales of each market
bwplot(quantity ~ factor(market), data = dat, xlab="Market")
There is a larger sale in market 2 than makret 1. And the variation of sales of market 1 is smaller than that of market 2.
Compare sales of each product
bwplot(quantity ~ factor(product), data = dat, xlab="Market")
Slippers have a better sale.
Compare sales of each product in 2 markets
bwplot(quantity ~ factor(product) | factor(market), data = dat, xlab="Market")
The sale of slippers is not that good in market 1.
Compare sales change of months for each product
dotplot(quantity ~ month, groups = product, data = dat,
xlab="Month", ylab="Quantity",
type=c('p', 'g',"r"), auto.key=list(space="top", columns=3))
Compare sales of each customer in 2 markets
bwplot(quantity ~ factor(customer) | factor(market), data = dat, xlab="Market")
We have only one customer, Acme, in market 1.
Compare distribution of sales of each product in four quaters.
densityplot(~ quantity | factor(quarter), groups = product, data = dat,
xlab = "Quarter", auto.key = list(column=3), layout = c(1, 4))
Finding
Source: Gupta, S. K. (2006). Data Management and Reporting Made Easy with SAS Learning Edition 2.0
A sample of 40 psychology students at a large southwestern university took four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. The researchers also used Magnetic Resonance Imaging (MRI) to determine the brain size of the subjects.
Task: Use appropriate lattice graphics to answer the following questions.
<- read.table("brainsize.txt",h=T)
brain head(brain)
## Sbj Gender FSIQ VIQ PIQ Weight Height MRICount
## 1 1 Female 133 132 124 118 64.5 816932
## 2 2 Male 140 150 124 NA 72.5 1001121
## 3 3 Male 139 123 150 143 73.3 1038437
## 4 4 Male 133 129 128 172 68.8 965353
## 5 5 Female 137 132 134 147 65.0 951545
## 6 6 Female 99 90 110 146 69.0 928799
densityplot(~ FSIQ + VIQ + PIQ, data = brain, groups = Gender,
plot.points = FALSE,
ref = TRUE,auto.key = list(space = "right"),
layout=c(3,1),
par.settings = simpleTheme(col.line = c("#E16A86", "#00AD9A")))
xyplot(Height ~ Weight, group = Gender, data = brain,
type = c("g","r","p"),
par.settings = simpleTheme(col = c("#E16A86", "#00AD9A"),
pch = 20,
col.line = c("#E16A86", "#00AD9A")),
auto.key = list(column = 2))
xyplot(MRICount ~ FSIQ+VIQ+PIQ ,
group = Gender,
data = brain,
layout=c(3,1),
type = c("g","r","p"),
par.settings = simpleTheme(col = c("#E16A86", "#00AD9A"),
pch = 20, col.line = c("#E16A86", "#00AD9A")),
auto.key = list(column = 2))
Source: Willerman, L., Schultz, R., Rutledge, J.N., & Bigler, E. (1991), In Vivo Brain Size and Intelligence, Intelligence, 15, 223-228.