Use trellis graphics to explore various ways to display the sample data from the National Longitudinal Survey of Youth.
The data are drawn from the National Longitudinal Survey of Youth (NLSY). The sample observations are from the 1986, 1988, 1990, and 1992 assessment periods. Children were selected to be in kindergarten, first, and second grade and to be of age 5, 6, or 7 at the first assessment (1986). Both reading and mathematical achievement scores are recorded. The former is a recognition subscore of the Peabody Individual Achievement Test (PIAT). This was scaled as the percentage of 84 items that were answered correctly. The same 84 items were administered at all four time points, providing a consistent scale over time. The data set is a subsample of 166 subjects with complete observations.
Source: Bollen, K.A. & Curran, P.J. (2006). Latent curve models. A structural equation perspective. p.59.
dta1 <- read.csv("C:/Users/TheorEco Lab/Desktop/108-2/DataManagement/0413/nlsy86long.csv", header = T)
str(dta1)
## 'data.frame': 664 obs. of 9 variables:
## $ id : int 2390 2560 3740 4020 6350 7030 7200 7610 7680 7700 ...
## $ sex : Factor w/ 2 levels "Female","Male": 1 1 1 2 2 2 2 2 1 2 ...
## $ race : Factor w/ 2 levels "Majority","Minority": 1 1 1 1 1 1 1 1 1 1 ...
## $ time : int 1 1 1 1 1 1 1 1 1 1 ...
## $ grade: int 0 0 0 0 1 0 0 0 0 0 ...
## $ year : int 6 6 6 5 7 5 6 7 6 6 ...
## $ month: int 67 66 67 60 78 62 66 79 76 67 ...
## $ math : num 14.29 20.24 17.86 7.14 29.76 ...
## $ read : num 19.05 21.43 21.43 7.14 30.95 ...
Column 1: Student ID
Column 2: Gender, male or female
Column 3: Race, minority or majority
Column 4: Measurement occasions
Column 5: Grade at which measurements were made, Kindergarten = 0, First grade = 1, Second grade = 2
Column 6: Age in years
Column 7: Age in months
Column 8: Math score
Column 9: Reading score
math ~ sex, race, grade, year
read ~ sex, race, grade, year
score ~ sex, race, grade, year
dta1$grade <- as.factor(dta1$grade)
library(lattice)
stripplot(math ~ grade| sex,
group=race,
data=dta1,
alpha=.8,
type=c('g','p'),
jitter.data=TRUE,
xlab="Race",
ylab="Math scores",
auto.key=list(space="top",
columns=2))
dotplot(read ~ grade | sex,
data=dta1,
pch=1,
cex=.5,
alpha=.5,
xlab="Grade",
ylab='Reading score',
par.settings=standard.theme(color=FALSE))
bwplot(math ~ grade | sex:race,
data=dta1,
xlab="Grade",
ylab = "Math score",
par.settings=standard.theme(color=FALSE))
xyplot(read ~ year | race,
groups=sex,
data=dta1,
xlab="Age",
ylab="Reading score",
type=c('p', 'g', 'r'),
between=list(x=0.5),
auto.key = list(points=TRUE, lines=TRUE, column=2),
par.settings=simpleTheme(col=c("black",
"gray"),
pch=c(1, 20),
col.line=c("black",
"gray")))
xyplot(read + math ~ year | race,
groups=sex,
data=dta1,
xlab="Age",
ylab="Score",
type=c('p', 'g', 'r'),
between=list(x=0.5),
auto.key = list(points=TRUE, lines=TRUE, column=2),
par.settings=simpleTheme(col=c("black",
"gray"),
pch=c(1, 20),
col.line=c("black",
"gray")))
histogram(~ read | sex,
data=dta1,
type='density',
layout=c(1, 2),
between=list(y=0.5),
panel=function(x,...) {
panel.histogram(x,...)
panel.mathdensity(dmath=dnorm,
lwd=1.2,
args=list(mean=mean(x, na.rm=T),
sd=sd(x, na.rm=T)), ...)
},
par.settings=standard.theme(color=FALSE))
densityplot(~ math,
groups=sex,
data=dta1,
auto.key=TRUE,
par.settings=standard.theme(color=FALSE))
qqmath(~ read | sex,
aspect="xy",
data=dta1,
type=c('p','g'),
prepanel=prepanel.qqmathline,
panel=function(x, ...) {
panel.qqmathline(x, ...)
panel.qqmath(x, ...)
},
par.settings=standard.theme(color=FALSE))
splom(~ dta1[,c("read", "math", "year")] | sex,
data=dta1,
pch='.',
axis.text.cex=0.3,
par.settings=standard.theme(color=FALSE))
Eight different physical measurements of 30 French girls were recorded from 4 to 15 years old. Explore various ways to display the data using trellis graphics.
Source: Sempe, M., et al. (1987). Multivariate and longitudinal data on growing children: Presentation of the French auxiological survey. In J. Janssen, et al. (1987). Data analysis. The Ins and Outs of solving real problems (pp. 3-6). New York: Plenum Press.
dta2 <- read.table("C:/Users/TheorEco Lab/Desktop/108-2/DataManagement/0413/girlsGrowth.txt", header = T)
str(dta2)
## 'data.frame': 360 obs. of 10 variables:
## $ Wt : int 1456 1426 1335 1607 1684 1374 1570 1450 1214 1456 ...
## $ Ht : int 1025 998 961 1006 1012 1012 1040 990 968 983 ...
## $ Hb : int 602 572 560 595 584 580 586 561 571 563 ...
## $ Hc : int 486 501 494 497 490 492 511 488 481 485 ...
## $ Cc : int 520 520 495 560 553 525 540 520 476 532 ...
## $ Arm : int 157 150 145 178 165 158 153 159 145 158 ...
## $ Calf : int 205 215 214 218 220 202 220 210 198 219 ...
## $ Pelvis: int 170 169 158 172 158 167 180 158 150 154 ...
## $ age : int 4 4 4 4 4 4 4 4 4 4 ...
## $ id : Factor w/ 30 levels "S1","S10","S11",..: 1 12 23 25 26 27 28 29 30 2 ...
Column 1: Weight in grams
Column 2: Height in mms
Column 3: Head to butt length in mms
Column 4: Head circumference in mms
Column 5: Chest circumference in mms
Column 6: Arm length in mms
Column 7: Calf length in mms
Column 8: Pelvis circumference in mms
Column 9: Age in years
Column 10: Girl ID
dta2$age <- as.character(dta2$age)
stripplot(Wt ~ Ht,
group = age,
data=dta2,
alpha=.8,
type=c('g','p'),
jitter.data=TRUE,
xlab="Height(mms)",
ylab="Weight(grams)",
auto.key=list(space="top",
columns=5))
parallelplot(~ dta2[, c("Hc", "Cc")] |age,
data = dta2,
horizontal.axis=F,
auto.key=list(column=6),
scales=list(x=list(rot=90)),
par.settings=standard.theme(color=FALSE))
xyplot(Pelvis ~ Ht,
groups = age,
data=dta2,
type="smooth",
panel=function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.grid(h=-1,
v=-1,
col="gray80",
lty=3, ...)
panel.average(x, y, fun=mean,
horizontal=FALSE,
col='gray', ...)},
auto.key = list(lines = TRUE, column = 6)
)
Your manager gave you a sales data on sevral products in a SAS format. Your task is to summarize and report the data in tables and graphs using the R lattice package.
Source: Gupta, S. K. (2006). Data Management and Reporting Made Easy with SAS Learning Edition 2.0
Recode the region variable (1 to 4) by “Nothern”, “Southern”, “Eastern” and “Western”;
the district variable (1 - 5) by “North East”, “South East”, “South West”, “North West”, “Central West”;
the quarter variable (1-4) by “1st”, “2nd”, “3rd”, “4th”;
and the month variable (1-12) by “Jan”, “Feb”, etc. Set negative sales values to zero.
# install.packages("sas7bdat")
library(sas7bdat)
dta3 <- read.sas7bdat("C:/Users/TheorEco Lab/Desktop/108-2/DataManagement/0413/sales.sas7bdat")
dta3$income <- dta3$sales - dta3$expense
head(dta3)
## product category customer year month quarter market sales expense region
## 1 Shoes Shoes Acme 2001 1 1 1 300 240 1
## 2 Boots Shoes Acme 2001 1 1 1 2200 1540 1
## 3 Slippers Slippers Acme 2001 1 1 1 900 540 1
## 4 Shoes Shoes Acme 2001 2 1 1 100 80 1
## 5 Boots Shoes Acme 2001 2 1 1 1400 980 1
## 6 Slippers Slippers Acme 2001 2 1 1 0 0 1
## district return constantv quantity income
## 1 1 0 1 30 60
## 2 1 0 1 275 660
## 3 1 0 1 180 360
## 4 1 0 1 10 20
## 5 1 0 1 175 420
## 6 1 0 1 0 0
dta3$region <- as.factor(dta3$region)
levels(dta3$region)[1:4] <- c("Nothern", "Southern", "Eastern", "Western")
dta3$district <- as.factor(dta3$district)
levels(dta3$district)[1:5] <- c("North East", "South East", "South West", "North West", "Central West")
dta3$quarter <- as.factor(dta3$quarter)
levels(dta3$quarter)[1:4] <- c("1st", "2nd", "3rd", "4th")
dta3$month <- as.factor(dta3$month)
levels(dta3$month)[1:12] <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
head(dta3)
## product category customer year month quarter market sales expense region
## 1 Shoes Shoes Acme 2001 Jan 1st 1 300 240 Nothern
## 2 Boots Shoes Acme 2001 Jan 1st 1 2200 1540 Nothern
## 3 Slippers Slippers Acme 2001 Jan 1st 1 900 540 Nothern
## 4 Shoes Shoes Acme 2001 Feb 1st 1 100 80 Nothern
## 5 Boots Shoes Acme 2001 Feb 1st 1 1400 980 Nothern
## 6 Slippers Slippers Acme 2001 Feb 1st 1 0 0 Nothern
## district return constantv quantity income
## 1 North East 0 1 30 60
## 2 North East 0 1 275 660
## 3 North East 0 1 180 360
## 4 North East 0 1 10 20
## 5 North East 0 1 175 420
## 6 North East 0 1 0 0
library(lattice)
library(Rtools)
xyplot(sales ~ month, data=dta3, xlab="Month", type=c('p', 'g',"r"))
bwplot(sales ~ product | region,
data = dta3,
xlab = "Product",
ylab = "Quantity",
panel = function(x, y, ...) {
panel.bwplot(x,y, ...)
panel.pvalue(x, y,
std = NULL,
symbol = TRUE, pvalue = FALSE,
pval_digits = 1, cex = 0.8, offset = -0.5,
fixed_pos = NULL, verbose = TRUE
)
})
## p-value for comparison of 1
## with Boots Shoes Slippers = 1 0.1701787 0.3182698
## p-value for comparison of 1
## with Boots Shoes Slippers = 1 0.3582681 0.5092293
## p-value for comparison of 1
## with Boots Shoes Slippers = 1 0.5567856 0.8956515
bwplot(income ~ product | quarter,
data = dta3,
xlab = "Product",
ylab = "Income",
layout(1,4),
panel = function(x, y, ...) {
panel.bwplot(x,y, ...)
panel.pvalue(x, y,
std = NULL,
symbol = TRUE, pvalue = FALSE,
pval_digits = 1, cex = 0.6, offset = 3,
fixed_pos = NULL, verbose = TRUE
)
})
## p-value for comparison of 1
## with Boots Shoes Slippers = 1 0.9405557 0.1220574
## p-value for comparison of 1
## with Boots Shoes Slippers = 1 0.5005069 0.6120658
## p-value for comparison of 1
## with Boots Shoes Slippers = 1 0.5592303 0.5452863
## p-value for comparison of 1
## with Boots Shoes Slippers = 1 0.4357351 0.1375524
dotplot(income ~ month,
group=year,
data=dta3,
alpha=.8,
type=c('g','p',"r"),
jitter.data=TRUE,
xlab="Month",
ylab="Income",
auto.key=list(space="top",
columns=2))
xyplot(sales ~ product | region,
group=customer,
data=dta3,
alpha=.8,
type=c('g','p',"r"),
jitter.data=TRUE,
xlab="Product",
ylab="Sales",
auto.key=list(space="top",
columns=3))
bwplot(sales ~ quarter | region,
data = dta3,
xlab = "Product",
ylab = "Income",
layout(1,4),
panel = function(x, y, ...) {
panel.bwplot(x,y, ...)
panel.pvalue(x, y,
std = NULL,
symbol = TRUE, pvalue = FALSE,
pval_digits = 1, cex = 0.6, offset = 3,
fixed_pos = NULL, verbose = TRUE
)
})
## p-value for comparison of 1
## with 1st 2nd 3rd 4th = 1 0.07440167 0.004949564 1.105484e-05
## p-value for comparison of 3
## with 3rd = NA NA 1 NA
## p-value for comparison of 4
## with 4th = NA NA NA 1
parallelplot(~ dta3[, c("expense", "income")] | region,
groups = product,
data = dta3,
horizontal.axis=F,
auto.key=list(column=3),
scales=list(x=list(rot=90)),
par.settings=standard.theme(color=FALSE))
Use the Lattice package to graphically explore the age and gender effects on reaction time reported in the Bassin data example.
Each year the U.S. Naval Postgraduate School sets aside a “Discovery Day” during which the general public is invited into their laboratories. The data come from October 21 1995, when visitors could test their reaction times and hand-eye coordination in the Human Systems Integration Laboratory. The variable of interest, “anticipatory timing”, was measured by a Bassin timer, which measures the ability to estimate the speed of a moving light and its arrival at a designated point. The Timer consists of a 10 foot row of lights which is controlled by a variable speed potentiometer. The lights are switched on sequentially from one end to the other so that light ‘travels’ at 5 miles per hour down the Timer. Each visitor was instructed to anticipate the ‘arrival’ of the light at one end of the Timer and at that time to swing a plastic bat across a light beam at the same end of the Timer. An automatic timing device measured the difference between the breaking of the beam and the actual arrival of the light. A negative value of a trial variable indicates the bat broke the beam before the light actually arrived. Each of 113 visitors completed the trial five times. Age and gender were also recorded. Visitors tended to come in family groups, but that information was not recorded. It may be that subject #35, who is a two year old with much slower reaction times, should be deleted.
Source: OzData
dta4 <- read.table("C:/Users/TheorEco Lab/Desktop/108-2/DataManagement/0413/timetrial.dat.txt", header = TRUE)
dta4$mean <- (dta4$Trial1 + dta4$Trial2 + dta4$Trial3 + dta4$Trial4 + dta4$Trial5)/5
str(dta4)
## 'data.frame': 113 obs. of 8 variables:
## $ Sex : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ Age : int 31 30 30 27 30 28 34 28 28 33 ...
## $ Trial1: num 0.051 0.074 0.051 0.182 0.077 0.103 -0.066 0.204 -0.231 -0.052 ...
## $ Trial2: num 0.023 0.006 0.094 0.166 0.001 0.065 0.031 -0.106 -0.124 -0.011 ...
## $ Trial3: num 0.106 0.003 0.084 -0.073 0 0.063 0.036 -0.09 -0.065 -0.025 ...
## $ Trial4: num 0.076 0.02 0.176 -0.044 -0.027 0.059 0.11 -0.04 -0.19 -0.014 ...
## $ Trial5: num 0.013 0.022 0.103 0.029 -0.2 0.059 0.045 -0.03 -0.211 -0.059 ...
## $ mean : num 0.0538 0.025 0.1016 0.052 -0.0298 ...
Column 1: Gender ID
Column 2: Age (year)
Column 3: Response time Trial 1
Column 4: Response time Trial 2
Column 5: Response time Trial 3
Column 6: Response time Trial 4
Column 7: Response time Trial 5
bwplot( mean ~ Sex,
data = dta4,
xlab = "Sex",
ylab = "Mean of Response time",
panel = function(x, y, ...) {
panel.bwplot(x,y, ...)
panel.pvalue(x, y,
std = NULL,
symbol = TRUE, pvalue = TRUE,
pval_digits = 1, cex = 0.6, offset = -2.5,
fixed_pos = NULL, verbose = TRUE
)
}
)
## p-value for comparison of 1
## with F M = 1 0.4239052
xyplot(mean ~ Age,
group=Sex,
data=dta4,
alpha=.8,
type=c('g','p',"r"),
jitter.data=TRUE,
xlab="Age",
ylab="Mean of Reaction Time",
ylim = c(-0.3, 0.5),
auto.key=list(space="top",
columns=2))