Homework 1

Use trellis graphics to explore various ways to display the sample data from the National Longitudinal Survey of Youth.

讀取資料

The data are drawn from the National Longitudinal Survey of Youth (NLSY). The sample observations are from the 1986, 1988, 1990, and 1992 assessment periods. Children were selected to be in kindergarten, first, and second grade and to be of age 5, 6, or 7 at the first assessment (1986). Both reading and mathematical achievement scores are recorded. The former is a recognition subscore of the Peabody Individual Achievement Test (PIAT). This was scaled as the percentage of 84 items that were answered correctly. The same 84 items were administered at all four time points, providing a consistent scale over time. The data set is a subsample of 166 subjects with complete observations.

Source: Bollen, K.A. & Curran, P.J. (2006). Latent curve models. A structural equation perspective. p.59.

dta1 <- read.csv("C:/Users/TheorEco Lab/Desktop/108-2/DataManagement/0413/nlsy86long.csv", header = T)
str(dta1)
## 'data.frame':    664 obs. of  9 variables:
##  $ id   : int  2390 2560 3740 4020 6350 7030 7200 7610 7680 7700 ...
##  $ sex  : Factor w/ 2 levels "Female","Male": 1 1 1 2 2 2 2 2 1 2 ...
##  $ race : Factor w/ 2 levels "Majority","Minority": 1 1 1 1 1 1 1 1 1 1 ...
##  $ time : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ grade: int  0 0 0 0 1 0 0 0 0 0 ...
##  $ year : int  6 6 6 5 7 5 6 7 6 6 ...
##  $ month: int  67 66 67 60 78 62 66 79 76 67 ...
##  $ math : num  14.29 20.24 17.86 7.14 29.76 ...
##  $ read : num  19.05 21.43 21.43 7.14 30.95 ...

Column 1: Student ID

Column 2: Gender, male or female

Column 3: Race, minority or majority

Column 4: Measurement occasions

Column 5: Grade at which measurements were made, Kindergarten = 0, First grade = 1, Second grade = 2

Column 6: Age in years

Column 7: Age in months

Column 8: Math score

Column 9: Reading score

嘗試不同種類的圖表

math ~ sex, race, grade, year

read ~ sex, race, grade, year

score ~ sex, race, grade, year

dta1$grade <- as.factor(dta1$grade)

Stripplot

library(lattice)
stripplot(math ~ grade| sex, 
          group=race,
          data=dta1, 
          alpha=.8,
          type=c('g','p'),
          jitter.data=TRUE, 
          xlab="Race",
          ylab="Math scores",
          auto.key=list(space="top", 
                        columns=2))

Dotplot

dotplot(read ~ grade | sex, 
       data=dta1, 
       pch=1, 
       cex=.5, 
       alpha=.5,
       xlab="Grade", 
       ylab='Reading score', 
       par.settings=standard.theme(color=FALSE))

Box-and-whisker plot

bwplot(math ~ grade | sex:race,
       data=dta1,
       xlab="Grade",
       ylab = "Math score",
       par.settings=standard.theme(color=FALSE))

xyplot

xyplot(read ~ year | race, 
       groups=sex, 
       data=dta1,  
       xlab="Age", 
       ylab="Reading score",
       type=c('p', 'g', 'r'),
       between=list(x=0.5),
       auto.key = list(points=TRUE, lines=TRUE, column=2),
       par.settings=simpleTheme(col=c("black", 
                                      "gray"),
                                pch=c(1, 20),
                                col.line=c("black", 
                                           "gray")))

xyplot(read + math ~ year | race, 
       groups=sex, 
       data=dta1,  
       xlab="Age", 
       ylab="Score",
       type=c('p', 'g', 'r'),
       between=list(x=0.5),
       auto.key = list(points=TRUE, lines=TRUE, column=2),
       par.settings=simpleTheme(col=c("black", 
                                      "gray"),
                                pch=c(1, 20),
                                col.line=c("black", 
                                           "gray")))

Histogram

histogram(~ read  | sex, 
          data=dta1, 
          type='density', 
          layout=c(1, 2),
          between=list(y=0.5),
          panel=function(x,...) {
            panel.histogram(x,...)
            panel.mathdensity(dmath=dnorm, 
                              lwd=1.2, 
                              args=list(mean=mean(x, na.rm=T),
                                        sd=sd(x, na.rm=T)), ...)
          },
          par.settings=standard.theme(color=FALSE))

Density plot

densityplot(~ math, 
            groups=sex, 
            data=dta1,
            auto.key=TRUE,
            par.settings=standard.theme(color=FALSE))

Q-Q plot

qqmath(~ read | sex, 
       aspect="xy", 
       data=dta1,
       type=c('p','g'),
       prepanel=prepanel.qqmathline,
       panel=function(x, ...) {
          panel.qqmathline(x, ...)
          panel.qqmath(x, ...)
       },
       par.settings=standard.theme(color=FALSE))

Scatter PlOt Matrix

splom(~ dta1[,c("read", "math", "year")] | sex, 
      data=dta1,
      pch='.', 
      axis.text.cex=0.3,
      par.settings=standard.theme(color=FALSE))

Homework 2

Eight different physical measurements of 30 French girls were recorded from 4 to 15 years old. Explore various ways to display the data using trellis graphics.

Source: Sempe, M., et al. (1987). Multivariate and longitudinal data on growing children: Presentation of the French auxiological survey. In J. Janssen, et al. (1987). Data analysis. The Ins and Outs of solving real problems (pp. 3-6). New York: Plenum Press.

讀取資料

dta2 <- read.table("C:/Users/TheorEco Lab/Desktop/108-2/DataManagement/0413/girlsGrowth.txt", header = T)
str(dta2)
## 'data.frame':    360 obs. of  10 variables:
##  $ Wt    : int  1456 1426 1335 1607 1684 1374 1570 1450 1214 1456 ...
##  $ Ht    : int  1025 998 961 1006 1012 1012 1040 990 968 983 ...
##  $ Hb    : int  602 572 560 595 584 580 586 561 571 563 ...
##  $ Hc    : int  486 501 494 497 490 492 511 488 481 485 ...
##  $ Cc    : int  520 520 495 560 553 525 540 520 476 532 ...
##  $ Arm   : int  157 150 145 178 165 158 153 159 145 158 ...
##  $ Calf  : int  205 215 214 218 220 202 220 210 198 219 ...
##  $ Pelvis: int  170 169 158 172 158 167 180 158 150 154 ...
##  $ age   : int  4 4 4 4 4 4 4 4 4 4 ...
##  $ id    : Factor w/ 30 levels "S1","S10","S11",..: 1 12 23 25 26 27 28 29 30 2 ...

Column 1: Weight in grams

Column 2: Height in mms

Column 3: Head to butt length in mms

Column 4: Head circumference in mms

Column 5: Chest circumference in mms

Column 6: Arm length in mms

Column 7: Calf length in mms

Column 8: Pelvis circumference in mms

Column 9: Age in years

Column 10: Girl ID

dta2$age <- as.character(dta2$age)

Stripplot

stripplot(Wt ~ Ht,
          group = age,
          data=dta2, 
          alpha=.8,
          type=c('g','p'),
          jitter.data=TRUE, 
          xlab="Height(mms)",
          ylab="Weight(grams)",
          auto.key=list(space="top", 
                        columns=5))

Dotplot

parallelplot(~ dta2[, c("Hc", "Cc")] |age, 
            data = dta2, 
            horizontal.axis=F, 
            auto.key=list(column=6),
            scales=list(x=list(rot=90)),
            par.settings=standard.theme(color=FALSE))

xyplot(Pelvis ~ Ht,
       groups = age,
       data=dta2, 
       type="smooth",
       panel=function(x, y, ...) {
        panel.xyplot(x, y, ...)
        panel.grid(h=-1, 
                   v=-1, 
                   col="gray80", 
                   lty=3, ...)
        panel.average(x, y, fun=mean, 
                      horizontal=FALSE, 
                      col='gray', ...)},
       auto.key = list(lines = TRUE, column = 6)
       )

Homework 3

Your manager gave you a sales data on sevral products in a SAS format. Your task is to summarize and report the data in tables and graphs using the R lattice package.

Source: Gupta, S. K. (2006). Data Management and Reporting Made Easy with SAS Learning Edition 2.0

Recode the region variable (1 to 4) by “Nothern”, “Southern”, “Eastern” and “Western”;

the district variable (1 - 5) by “North East”, “South East”, “South West”, “North West”, “Central West”;

the quarter variable (1-4) by “1st”, “2nd”, “3rd”, “4th”;

and the month variable (1-12) by “Jan”, “Feb”, etc. Set negative sales values to zero.

讀取資料

# install.packages("sas7bdat")
library(sas7bdat)
dta3 <- read.sas7bdat("C:/Users/TheorEco Lab/Desktop/108-2/DataManagement/0413/sales.sas7bdat")
dta3$income <- dta3$sales - dta3$expense
head(dta3)
##    product category customer year month quarter market sales expense region
## 1    Shoes    Shoes     Acme 2001     1       1      1   300     240      1
## 2    Boots    Shoes     Acme 2001     1       1      1  2200    1540      1
## 3 Slippers Slippers     Acme 2001     1       1      1   900     540      1
## 4    Shoes    Shoes     Acme 2001     2       1      1   100      80      1
## 5    Boots    Shoes     Acme 2001     2       1      1  1400     980      1
## 6 Slippers Slippers     Acme 2001     2       1      1     0       0      1
##   district return constantv quantity income
## 1        1      0         1       30     60
## 2        1      0         1      275    660
## 3        1      0         1      180    360
## 4        1      0         1       10     20
## 5        1      0         1      175    420
## 6        1      0         1        0      0

重新命名 labels

dta3$region <- as.factor(dta3$region)
levels(dta3$region)[1:4] <- c("Nothern", "Southern", "Eastern", "Western")

dta3$district <- as.factor(dta3$district)
levels(dta3$district)[1:5] <- c("North East", "South East", "South West", "North West", "Central West")

dta3$quarter <- as.factor(dta3$quarter)
levels(dta3$quarter)[1:4] <- c("1st", "2nd", "3rd", "4th")

dta3$month <- as.factor(dta3$month)
levels(dta3$month)[1:12] <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
head(dta3)
##    product category customer year month quarter market sales expense  region
## 1    Shoes    Shoes     Acme 2001   Jan     1st      1   300     240 Nothern
## 2    Boots    Shoes     Acme 2001   Jan     1st      1  2200    1540 Nothern
## 3 Slippers Slippers     Acme 2001   Jan     1st      1   900     540 Nothern
## 4    Shoes    Shoes     Acme 2001   Feb     1st      1   100      80 Nothern
## 5    Boots    Shoes     Acme 2001   Feb     1st      1  1400     980 Nothern
## 6 Slippers Slippers     Acme 2001   Feb     1st      1     0       0 Nothern
##     district return constantv quantity income
## 1 North East      0         1       30     60
## 2 North East      0         1      275    660
## 3 North East      0         1      180    360
## 4 North East      0         1       10     20
## 5 North East      0         1      175    420
## 6 North East      0         1        0      0
library(lattice)
library(Rtools)

總銷售量自年初到年尾逐漸升高

xyplot(sales ~ month, data=dta3, xlab="Month", type=c('p', 'g',"r"))

西區並無任何銷售額

bwplot(sales ~ product | region, 
       data = dta3, 
       xlab = "Product",
       ylab = "Quantity", 
       panel = function(x, y, ...) {
               panel.bwplot(x,y, ...)
               panel.pvalue(x, y,
               std = NULL,
               symbol = TRUE, pvalue = FALSE, 
               pval_digits = 1, cex = 0.8, offset = -0.5, 
               fixed_pos = NULL, verbose = TRUE
               )
       })

## p-value for comparison of  1 
##  with  Boots Shoes Slippers  =  1 0.1701787 0.3182698 
## p-value for comparison of  1 
##  with  Boots Shoes Slippers  =  1 0.3582681 0.5092293 
## p-value for comparison of  1 
##  with  Boots Shoes Slippers  =  1 0.5567856 0.8956515

三種商品替公司在各季所帶來的收入並無明顯差異

bwplot(income ~ product | quarter,
       data = dta3, 
       xlab = "Product",
       ylab = "Income",
       layout(1,4),
       panel = function(x, y, ...) {
               panel.bwplot(x,y, ...)
               panel.pvalue(x, y,
               std = NULL,
               symbol = TRUE, pvalue = FALSE, 
               pval_digits = 1, cex = 0.6, offset = 3, 
               fixed_pos = NULL, verbose = TRUE
               )
       })

## p-value for comparison of  1 
##  with  Boots Shoes Slippers  =  1 0.9405557 0.1220574 
## p-value for comparison of  1 
##  with  Boots Shoes Slippers  =  1 0.5005069 0.6120658 
## p-value for comparison of  1 
##  with  Boots Shoes Slippers  =  1 0.5592303 0.5452863 
## p-value for comparison of  1 
##  with  Boots Shoes Slippers  =  1 0.4357351 0.1375524

2001 年和 2002 年的收益都是隨月份增加

dotplot(income ~ month, 
          group=year,
          data=dta3,
          alpha=.8,
          type=c('g','p',"r"),
          jitter.data=TRUE, 
          xlab="Month",
          ylab="Income",
          auto.key=list(space="top", 
                        columns=2))

北區、南區、東區的客戶單一不重複,北區的客戶下單次數較多

xyplot(sales ~ product | region, 
          group=customer,
          data=dta3,
          alpha=.8,
          type=c('g','p',"r"),
          jitter.data=TRUE, 
          xlab="Product",
          ylab="Sales",
          auto.key=list(space="top", 
                        columns=3))

只有北區的銷售額四季皆有,南區集中在秋季,東區集中在冬季

bwplot(sales ~ quarter  | region,
       data = dta3, 
       xlab = "Product",
       ylab = "Income",
       layout(1,4),
       panel = function(x, y, ...) {
               panel.bwplot(x,y, ...)
               panel.pvalue(x, y,
               std = NULL,
               symbol = TRUE, pvalue = FALSE, 
               pval_digits = 1, cex = 0.6, offset = 3, 
               fixed_pos = NULL, verbose = TRUE
               )
       })

## p-value for comparison of  1 
##  with  1st 2nd 3rd 4th  =  1 0.07440167 0.004949564 1.105484e-05 
## p-value for comparison of  3 
##  with  3rd  =  NA NA 1 NA 
## p-value for comparison of  4 
##  with  4th  =  NA NA NA 1

大部分的產品都是賠錢賣

parallelplot(~ dta3[, c("expense", "income")] | region,
             groups = product,
            data = dta3, 
            horizontal.axis=F, 
            auto.key=list(column=3),
            scales=list(x=list(rot=90)),
            par.settings=standard.theme(color=FALSE))

Homework 4

Use the Lattice package to graphically explore the age and gender effects on reaction time reported in the Bassin data example.

Each year the U.S. Naval Postgraduate School sets aside a “Discovery Day” during which the general public is invited into their laboratories. The data come from October 21 1995, when visitors could test their reaction times and hand-eye coordination in the Human Systems Integration Laboratory. The variable of interest, “anticipatory timing”, was measured by a Bassin timer, which measures the ability to estimate the speed of a moving light and its arrival at a designated point. The Timer consists of a 10 foot row of lights which is controlled by a variable speed potentiometer. The lights are switched on sequentially from one end to the other so that light ‘travels’ at 5 miles per hour down the Timer. Each visitor was instructed to anticipate the ‘arrival’ of the light at one end of the Timer and at that time to swing a plastic bat across a light beam at the same end of the Timer. An automatic timing device measured the difference between the breaking of the beam and the actual arrival of the light. A negative value of a trial variable indicates the bat broke the beam before the light actually arrived. Each of 113 visitors completed the trial five times. Age and gender were also recorded. Visitors tended to come in family groups, but that information was not recorded. It may be that subject #35, who is a two year old with much slower reaction times, should be deleted.

Source: OzData

讀取資料

dta4 <- read.table("C:/Users/TheorEco Lab/Desktop/108-2/DataManagement/0413/timetrial.dat.txt", header = TRUE)
dta4$mean <- (dta4$Trial1 + dta4$Trial2 + dta4$Trial3 + dta4$Trial4 + dta4$Trial5)/5
str(dta4)
## 'data.frame':    113 obs. of  8 variables:
##  $ Sex   : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Age   : int  31 30 30 27 30 28 34 28 28 33 ...
##  $ Trial1: num  0.051 0.074 0.051 0.182 0.077 0.103 -0.066 0.204 -0.231 -0.052 ...
##  $ Trial2: num  0.023 0.006 0.094 0.166 0.001 0.065 0.031 -0.106 -0.124 -0.011 ...
##  $ Trial3: num  0.106 0.003 0.084 -0.073 0 0.063 0.036 -0.09 -0.065 -0.025 ...
##  $ Trial4: num  0.076 0.02 0.176 -0.044 -0.027 0.059 0.11 -0.04 -0.19 -0.014 ...
##  $ Trial5: num  0.013 0.022 0.103 0.029 -0.2 0.059 0.045 -0.03 -0.211 -0.059 ...
##  $ mean  : num  0.0538 0.025 0.1016 0.052 -0.0298 ...

Column 1: Gender ID

Column 2: Age (year)

Column 3: Response time Trial 1

Column 4: Response time Trial 2

Column 5: Response time Trial 3

Column 6: Response time Trial 4

Column 7: Response time Trial 5

男生和女生的反應平均時間並無明顯差異

bwplot( mean ~ Sex,
        data = dta4,
        xlab = "Sex",
        ylab = "Mean of Response time",
        panel = function(x, y, ...) {
          panel.bwplot(x,y, ...)
          panel.pvalue(x, y,
                       std = NULL,
                       symbol = TRUE, pvalue = TRUE, 
                       pval_digits = 1, cex = 0.6, offset = -2.5, 
                       fixed_pos = NULL, verbose = TRUE
          )
          }
       )

## p-value for comparison of  1 
##  with  F M  =  1 0.4239052

較年輕的男性反應較快,女性則反之

xyplot(mean ~ Age, 
          group=Sex,
          data=dta4,
          alpha=.8,
          type=c('g','p',"r"),
          jitter.data=TRUE, 
          xlab="Age",
          ylab="Mean of Reaction Time",
       ylim = c(-0.3, 0.5),
          auto.key=list(space="top", 
                        columns=2))