DataM: Homework Exercise 0413 - Trellis
HW exercise 1.
Use trellis graphics to explore various ways to display the sample data from the National Longitudinal Survey of Youth.
The data are drawn from the National Longitudinal Survey of Youth (NLSY). The sample observations are from the 1986, 1988, 1990, and 1992 assessment periods. Children were selected to be in kindergarten, first, and second grade and to be of age 5, 6, or 7 at the first assessment (1986). Both reading and mathematical achievement scores are recorded. The former is a recognition subscore of the Peabody Individual Achievement Test (PIAT). This was scaled as the percentage of 84 items that were answered correctly. The same 84 items were administered at all four time points, providing a consistent scale over time. The data set is a subsample of 166 subjects with complete observations.
Source: Bollen, K.A. & Curran, P.J. (2006). Latent curve models. A structural equation perspective. p.59.
- Column 1: Student ID
- Column 2: Gender, male or female
- Column 3: Race, minority or majority
- Column 4: Measurement occasions
- Column 5: Grade at which measurements were made, Kindergarten = 0, First grade = 1, Second grade = 2
- Column 6: Age in years
- Column 7: Age in months
- Column 8: Math score
- Column 9: Reading score
Load in the package lattice and the data set
'data.frame': 664 obs. of 9 variables:
$ id : int 2390 2560 3740 4020 6350 7030 7200 7610 7680 7700 ...
$ sex : Factor w/ 2 levels "Female","Male": 1 1 1 2 2 2 2 2 1 2 ...
$ race : Factor w/ 2 levels "Majority","Minority": 1 1 1 1 1 1 1 1 1 1 ...
$ time : int 1 1 1 1 1 1 1 1 1 1 ...
$ grade: int 0 0 0 0 1 0 0 0 0 0 ...
$ year : int 6 6 6 5 7 5 6 7 6 6 ...
$ month: int 67 66 67 60 78 62 66 79 76 67 ...
$ math : num 14.29 20.24 17.86 7.14 29.76 ...
$ read : num 19.05 21.43 21.43 7.14 30.95 ...
Gender difference
Draw the density plot of math socore and that of reading score with grouping of gender.
lst_gender <- lapply(split(NLSY, NLSY$sex), function(df) {
df_long <- df %>% select(math, read) %>% stack()
df_long$gender <- as.character(df$sex)[1]
return(df_long)
})
df_gender <- rbind(lst_gender[[1]], lst_gender[[2]])
densityplot(~ values | ind, groups = gender, data = df_gender,
layout = c(1, 2), auto.key=list(column=2), xlab='Score')Draw boxplots of math score of each grade with grouping of gender.
Draw boxplots of reading score of each grade with grouping of gender.
Draw scatter plots of math score and reading score with the regreesion line for each grade with grouping of gender.
xyplot(math ~ read | factor(grade), groups = sex, data = NLSY,
layout=c(4, 2), type=c('p', 'r', 'g'), auto.key=list(column=2),
xlab = 'Mathematics score', ylab = 'Reading score')Race
Draw the density plot of math socore and that of reading score with grouping of race.
lst_race <- lapply(split(NLSY, NLSY$race), function(df) {
df_long <- df %>% select(math, read) %>% stack()
df_long$race <- as.character(df$race)[1]
return(df_long)
})
df_race <- rbind(lst_race[[1]], lst_race[[2]])
densityplot(~ values | ind, groups = race, data = df_race,
layout = c(1, 2), auto.key=list(column=2), xlab='Score')Draw boxplots of math score of each grade with grouping of race.
Draw boxplots of reading score of each grade with grouping of race.
Draw scatter plots of math score and reading score with the regreesion line for each grade with grouping of race.
xyplot(math ~ read | factor(grade), groups = race, data = NLSY,
layout=c(4, 2), type=c('p', 'r', 'g'), auto.key=list(column=2),
xlab = 'Mathematics score', ylab = 'Reading score')Repeated factorial ANOVA
Y: Math score
model_math <- aov(math ~ (sex * race * grade) + Error(id / (sex * race * grade) + time), data = NLSY)
summary(model_math)
Error: id
Df Sum Sq Mean Sq
sex 1 1651 1651
Error: time
Df Sum Sq Mean Sq
grade 1 170290 170290
Error: id:sex
Df Sum Sq Mean Sq
sex 1 127.2 127.2
Error: id:race
Df Sum Sq Mean Sq
sex 1 1065 1065
Error: id:grade
Df Sum Sq Mean Sq
sex 1 1228 1228
Error: id:sex:race
Df Sum Sq Mean Sq
sex 1 112.8 112.8
Error: id:sex:grade
Df Sum Sq Mean Sq
sex 1 357.7 357.7
Error: id:race:grade
Df Sum Sq Mean Sq
sex 1 452.9 452.9
Error: id:sex:race:grade
Df Sum Sq Mean Sq
sex 1 2.812 2.812
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
sex 1 146 146 1.992 0.15866
race 1 776 776 10.612 0.00118 **
grade 1 5375 5375 73.506 < 2e-16 ***
sex:race 1 98 98 1.340 0.24747
sex:grade 1 15 15 0.199 0.65551
race:grade 1 7 7 0.095 0.75807
sex:race:grade 1 97 97 1.322 0.25057
Residuals 647 47314 73
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Both of race effect and grade effect is significant in math score. There is no significant gender difference in math score. All interaction terms are not significant.
Y: Reading score
model_read <- aov(read ~ (sex * race * grade) + Error(id / (sex * race * grade) + time), data = NLSY)
summary(model_read)
Error: id
Df Sum Sq Mean Sq
sex 1 437.5 437.5
Error: time
Df Sum Sq Mean Sq
grade 1 196976 196976
Error: id:sex
Df Sum Sq Mean Sq
sex 1 122.1 122.1
Error: id:race
Df Sum Sq Mean Sq
sex 1 621.2 621.2
Error: id:grade
Df Sum Sq Mean Sq
sex 1 1114 1114
Error: id:sex:race
Df Sum Sq Mean Sq
sex 1 188.9 188.9
Error: id:sex:grade
Df Sum Sq Mean Sq
sex 1 6.778 6.778
Error: id:race:grade
Df Sum Sq Mean Sq
sex 1 2038 2038
Error: id:sex:race:grade
Df Sum Sq Mean Sq
sex 1 6.317 6.317
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
sex 1 12 12 0.090 0.76397
race 1 1103 1103 8.580 0.00352 **
grade 1 4188 4188 32.591 1.73e-08 ***
sex:race 1 487 487 3.787 0.05209 .
sex:grade 1 77 77 0.600 0.43873
race:grade 1 120 120 0.936 0.33360
sex:race:grade 1 103 103 0.803 0.37065
Residuals 647 83150 129
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Both of race effect and grade effect is significant in reading score. There is no significant gender difference in reading score. All interaction terms are not significant.
HW exercise 2.
Eight different physical measurements of 30 French girls were recorded from 4 to 15 years old. Explore various ways to display the data using trellis graphics.
Source: Sempe, M., et al. (1987). Multivariate and longitudinal data on growing children: Presentation of the French auxiological survey. In J. Janssen, et al. (1987). Data analysis. The Ins and Outs of solving real problems (pp. 3-6). New York: Plenum Press.
- Column 1: Weight in grams
- Column 2: Height in mms
- Column 3: Head to butt length in mms
- Column 4: Head circumference in mms
- Column 5: Chest circumference in mms
- Column 6: Arm length in mms
- Column 7: Calf length in mms
- Column 8: Pelvis circumference in mms
- Column 9: Age in years
- Column 10: Girl ID
[Solution and Answer]
Load in the data set and check the data structure
'data.frame': 360 obs. of 10 variables:
$ Wt : int 1456 1426 1335 1607 1684 1374 1570 1450 1214 1456 ...
$ Ht : int 1025 998 961 1006 1012 1012 1040 990 968 983 ...
$ Hb : int 602 572 560 595 584 580 586 561 571 563 ...
$ Hc : int 486 501 494 497 490 492 511 488 481 485 ...
$ Cc : int 520 520 495 560 553 525 540 520 476 532 ...
$ Arm : int 157 150 145 178 165 158 153 159 145 158 ...
$ Calf : int 205 215 214 218 220 202 220 210 198 219 ...
$ Pelvis: int 170 169 158 172 158 167 180 158 150 154 ...
$ age : int 4 4 4 4 4 4 4 4 4 4 ...
$ id : Factor w/ 30 levels "S1","S10","S11",..: 1 12 23 25 26 27 28 29 30 2 ...
HW exercise 3.
Your manager gave you a sales data on sevral products in a SAS format. Your task is to summarize and report the data in tables and graphs using the R lattice package.
Source: Gupta, S. K. (2006). Data Management and Reporting Made Easy with SAS Learning Edition 2.0
- Recode the region variable (1 to 4) by “Nothern”, “Southern”, “Eastern” and “Western”;
- the district variable (1 - 5) by “North East”, “South East”, “South West”, “North West”, “Central West”;
- the quarter variable (1-4) by “1st”, “2nd”, “3rd”, “4th”;
- and the month variable (1-12) by “Jan”, “Feb”, etc. Set negative sales values to zero.
[Solution and Answer]
Load in the data
product category customer year month
Boots :24 Shoes :48 Acme :60 Min. :2001 Min. : 1.00
Shoes :24 Slippers:24 BigX : 6 1st Qu.:2001 1st Qu.: 3.75
Slippers:24 TwoFeet: 6 Median :2002 Median : 6.50
Mean :2002 Mean : 6.50
3rd Qu.:2002 3rd Qu.: 9.25
Max. :2002 Max. :12.00
quarter market sales expense
Min. :1.00 Min. :1.000 Min. :-1400 Min. :-980
1st Qu.:1.75 1st Qu.:1.000 1st Qu.: 1000 1st Qu.: 660
Median :2.50 Median :2.000 Median : 1550 Median :1065
Mean :2.50 Mean :1.667 Mean : 1686 Mean :1172
3rd Qu.:3.25 3rd Qu.:2.000 3rd Qu.: 2525 3rd Qu.:1860
Max. :4.00 Max. :2.000 Max. : 4700 Max. :2960
region district return constantv
Min. :1.000 Min. :1.0 Min. :0.0000 Min. :1
1st Qu.:1.000 1st Qu.:1.0 1st Qu.:0.0000 1st Qu.:1
Median :1.000 Median :1.0 Median :0.0000 Median :1
Mean :1.333 Mean :1.5 Mean :0.6667 Mean :1
3rd Qu.:1.000 3rd Qu.:1.0 3rd Qu.:0.0000 3rd Qu.:1
Max. :4.000 Max. :5.0 Max. :5.0000 Max. :1
quantity
Min. : 0.0
1st Qu.:135.2
Median :220.0
Mean :248.9
3rd Qu.:287.8
Max. :940.0
See the distribution of scales for each region.
There is no sale in Eastern region (region=3).
Compare sales of each quarter
Generally, the sale increase as the quarter goes by.
Compare sales of each market
There is a larger sale in market 2 than makret 1. And the variation of sales of market 1 is smaller than that of market 2.
Compare sales of each product
Slippers have a better sale.
Compare sales of each product in 2 markets
The sale of slippers is not that good in market 1.
Compare sales change of months for each product
dotplot(quantity ~ month, groups = product, data = dta3,
xlab="Month", ylab="Quantity",
type=c('p', 'g',"r"), auto.key=list(space="top", columns=3))Compare sales of each customer in 2 markets
We have only one customer, Acme, in market 1.
Compare distribution of sales of each product in four quaters.
densityplot(~ quantity | factor(quarter), groups = product, data = dta3,
xlab = "Quarter", auto.key = list(column=3), layout = c(1, 4))Finding
- We should develop markets in Eastern region
- Sale of TwoFeet and BigX are lower than Acme. We should put more efforts on attracting these two customers, especially in market 1.
- Keep a good relationship with our biggest customer, Acme.
- Put more resources on marketing boots and shoes.
- Try to develop products for spring to increase sales in month
HW exercise 4.
Use the Lattice package to graphically explore the age and gender effects on reaction time reported in the Bassin data example.
Each year the U.S. Naval Postgraduate School sets aside a “Discovery Day” during which the general public is invited into their laboratories. The data come from October 21 1995, when visitors could test their reaction times and hand-eye coordination in the Human Systems Integration Laboratory. The variable of interest, “anticipatory timing”, was measured by a Bassin timer, which measures the ability to estimate the speed of a moving light and its arrival at a designated point. The Timer consists of a 10 foot row of lights which is controlled by a variable speed potentiometer. The lights are switched on sequentially from one end to the other so that light ‘travels’ at 5 miles per hour down the Timer. Each visitor was instructed to anticipate the ‘arrival’ of the light at one end of the Timer and at that time to swing a plastic bat across a light beam at the same end of the Timer. An automatic timing device measured the difference between the breaking of the beam and the actual arrival of the light. A negative value of a trial variable indicates the bat broke the beam before the light actually arrived. Each of 113 visitors completed the trial five times. Age and gender were also recorded. Visitors tended to come in family groups, but that information was not recorded. It may be that subject #35, who is a two year old with much slower reaction times, should be deleted.
Source: OzData
- Column 1: Gender ID
- Column 2: Age (year)
- Column 3: Response time Trial 1
- Column 3: Response time Trial 2
- Column 3: Response time Trial 3
- Column 3: Response time Trial 4
- Column 3: Response time Trial 5
[Solution and Answer]
Load in the data set
'data.frame': 113 obs. of 7 variables:
$ Sex : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
$ Age : int 31 30 30 27 30 28 34 28 28 33 ...
$ Trial1: num 0.051 0.074 0.051 0.182 0.077 0.103 -0.066 0.204 -0.231 -0.052 ...
$ Trial2: num 0.023 0.006 0.094 0.166 0.001 0.065 0.031 -0.106 -0.124 -0.011 ...
$ Trial3: num 0.106 0.003 0.084 -0.073 0 0.063 0.036 -0.09 -0.065 -0.025 ...
$ Trial4: num 0.076 0.02 0.176 -0.044 -0.027 0.059 0.11 -0.04 -0.19 -0.014 ...
$ Trial5: num 0.013 0.022 0.103 0.029 -0.2 0.059 0.045 -0.03 -0.211 -0.059 ...
Create new variables
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.6360 -0.0880 0.2390 0.2676 0.5110 5.7950
Age effect
Scatter plot
xyplot(RT_av ~ Age, data = OzData, type = c('p', 'r', 'g'),
xlab= 'Age', ylab = 'Average reaction time')Group age to draw boxplot
OzData$Age_group <- cut(OzData$Age,
breaks=c(0, median(OzData$Age), 100),
labels=c('Young', 'Old'))
bwplot(RT_av ~ Age_group, data = OzData,
xlab= 'Age group', ylab = 'Average reaction time')The variation of average reaction time of younger participants is larger than that of older participants.
Gender effect
The variation of average reaction time of female is larger than that of male.
ANOVA
OzData_long <- OzData %>% select(contains('Trial')) %>% stack()
OzData_long$ID <- rep(rownames(OzData), each=5)
OzData_long$Sex <- rep(OzData$Sex, each=5)
OzData_long$Age <- rep(OzData$AGE, each=5)
OzData_long$Age_group <- rep(OzData$Age_group, each=5)
model_OzData <- aov(values ~ (Sex*Age_group) + Error(ID / (Sex*Age_group) + ind), data = OzData_long)Warning in aov(values ~ (Sex * Age_group) + Error(ID/(Sex * Age_group) + :
Error() model is singular
Error: ID
Df Sum Sq Mean Sq F value Pr(>F)
Sex 1 0.104 0.10428 3.176 0.0775 .
Age_group 1 0.010 0.01014 0.309 0.5795
Sex:Age_group 1 0.017 0.01734 0.528 0.4689
Residuals 109 3.579 0.03284
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: ind
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 4 0.03211 0.008026
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 448 17.25 0.03849
Neither gender effect nor age effect is significant. The interaction effect is not significant, either.