DataM: Homework Exercise 0413 - Trellis

HW exercise 1.

Use trellis graphics to explore various ways to display the sample data from the National Longitudinal Survey of Youth.

The data are drawn from the National Longitudinal Survey of Youth (NLSY). The sample observations are from the 1986, 1988, 1990, and 1992 assessment periods. Children were selected to be in kindergarten, first, and second grade and to be of age 5, 6, or 7 at the first assessment (1986). Both reading and mathematical achievement scores are recorded. The former is a recognition subscore of the Peabody Individual Achievement Test (PIAT). This was scaled as the percentage of 84 items that were answered correctly. The same 84 items were administered at all four time points, providing a consistent scale over time. The data set is a subsample of 166 subjects with complete observations.

Source: Bollen, K.A. & Curran, P.J. (2006). Latent curve models. A structural equation perspective. p.59.

  • Column 1: Student ID
  • Column 2: Gender, male or female
  • Column 3: Race, minority or majority
  • Column 4: Measurement occasions
  • Column 5: Grade at which measurements were made, Kindergarten = 0, First grade = 1, Second grade = 2
  • Column 6: Age in years
  • Column 7: Age in months
  • Column 8: Math score
  • Column 9: Reading score

library(dplyr)
library(lattice)

Load in the package lattice and the data set

NLSY <- read.table('../data/NLSY.txt', header = TRUE, sep = ',')
head(NLSY)
str(NLSY)
'data.frame':   664 obs. of  9 variables:
 $ id   : int  2390 2560 3740 4020 6350 7030 7200 7610 7680 7700 ...
 $ sex  : Factor w/ 2 levels "Female","Male": 1 1 1 2 2 2 2 2 1 2 ...
 $ race : Factor w/ 2 levels "Majority","Minority": 1 1 1 1 1 1 1 1 1 1 ...
 $ time : int  1 1 1 1 1 1 1 1 1 1 ...
 $ grade: int  0 0 0 0 1 0 0 0 0 0 ...
 $ year : int  6 6 6 5 7 5 6 7 6 6 ...
 $ month: int  67 66 67 60 78 62 66 79 76 67 ...
 $ math : num  14.29 20.24 17.86 7.14 29.76 ...
 $ read : num  19.05 21.43 21.43 7.14 30.95 ...

Gender difference

Draw the density plot of math socore and that of reading score with grouping of gender.

lst_gender <- lapply(split(NLSY, NLSY$sex), function(df) {
  df_long <- df %>% select(math, read) %>% stack()
  df_long$gender <- as.character(df$sex)[1]
  return(df_long)
})
df_gender <- rbind(lst_gender[[1]], lst_gender[[2]])

densityplot(~ values | ind, groups = gender, data = df_gender,
            layout = c(1, 2), auto.key=list(column=2), xlab='Score')

Draw boxplots of math score of each grade with grouping of gender.

bwplot(math ~ factor(grade) | sex, data = NLSY,
       xlab = 'Grade', ylab = 'Mathematics score')

Draw boxplots of reading score of each grade with grouping of gender.

bwplot(read ~ factor(grade) | sex, data = NLSY,
       xlab = 'Grade', ylab = 'Reading score')

Draw scatter plots of math score and reading score with the regreesion line for each grade with grouping of gender.

xyplot(math ~ read | factor(grade), groups = sex, data = NLSY,
       layout=c(4, 2), type=c('p', 'r', 'g'), auto.key=list(column=2),
       xlab = 'Mathematics score', ylab = 'Reading score')


Race

Draw the density plot of math socore and that of reading score with grouping of race.

lst_race <- lapply(split(NLSY, NLSY$race), function(df) {
  df_long <- df %>% select(math, read) %>% stack()
  df_long$race <- as.character(df$race)[1]
  return(df_long)
})
df_race <- rbind(lst_race[[1]], lst_race[[2]])

densityplot(~ values | ind, groups = race, data = df_race,
            layout = c(1, 2), auto.key=list(column=2), xlab='Score')

Draw boxplots of math score of each grade with grouping of race.

bwplot(math ~ factor(grade) | race, data = NLSY,
       xlab = 'Grade', ylab = 'Mathematics score')

Draw boxplots of reading score of each grade with grouping of race.

bwplot(read ~ factor(grade) | race, data = NLSY,
       xlab = 'Grade', ylab = 'Reading score')

Draw scatter plots of math score and reading score with the regreesion line for each grade with grouping of race.

xyplot(math ~ read | factor(grade), groups = race, data = NLSY,
       layout=c(4, 2), type=c('p', 'r', 'g'), auto.key=list(column=2),
       xlab = 'Mathematics score', ylab = 'Reading score')

Repeated factorial ANOVA

Y: Math score

model_math <- aov(math ~ (sex * race * grade) + Error(id / (sex * race * grade) + time), data = NLSY)
summary(model_math)

Error: id
    Df Sum Sq Mean Sq
sex  1   1651    1651

Error: time
      Df Sum Sq Mean Sq
grade  1 170290  170290

Error: id:sex
    Df Sum Sq Mean Sq
sex  1  127.2   127.2

Error: id:race
    Df Sum Sq Mean Sq
sex  1   1065    1065

Error: id:grade
    Df Sum Sq Mean Sq
sex  1   1228    1228

Error: id:sex:race
    Df Sum Sq Mean Sq
sex  1  112.8   112.8

Error: id:sex:grade
    Df Sum Sq Mean Sq
sex  1  357.7   357.7

Error: id:race:grade
    Df Sum Sq Mean Sq
sex  1  452.9   452.9

Error: id:sex:race:grade
    Df Sum Sq Mean Sq
sex  1  2.812   2.812

Error: Within
                Df Sum Sq Mean Sq F value  Pr(>F)    
sex              1    146     146   1.992 0.15866    
race             1    776     776  10.612 0.00118 ** 
grade            1   5375    5375  73.506 < 2e-16 ***
sex:race         1     98      98   1.340 0.24747    
sex:grade        1     15      15   0.199 0.65551    
race:grade       1      7       7   0.095 0.75807    
sex:race:grade   1     97      97   1.322 0.25057    
Residuals      647  47314      73                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both of race effect and grade effect is significant in math score. There is a significant difference in math score between minority race and majority race. All interaction terms are not significant.

Y: Reading score

model_read <- aov(read ~ (sex * race * grade) + Error(id / (sex * race * grade) + time), data = NLSY)
summary(model_read)

Error: id
    Df Sum Sq Mean Sq
sex  1  437.5   437.5

Error: time
      Df Sum Sq Mean Sq
grade  1 196976  196976

Error: id:sex
    Df Sum Sq Mean Sq
sex  1  122.1   122.1

Error: id:race
    Df Sum Sq Mean Sq
sex  1  621.2   621.2

Error: id:grade
    Df Sum Sq Mean Sq
sex  1   1114    1114

Error: id:sex:race
    Df Sum Sq Mean Sq
sex  1  188.9   188.9

Error: id:sex:grade
    Df Sum Sq Mean Sq
sex  1  6.778   6.778

Error: id:race:grade
    Df Sum Sq Mean Sq
sex  1   2038    2038

Error: id:sex:race:grade
    Df Sum Sq Mean Sq
sex  1  6.317   6.317

Error: Within
                Df Sum Sq Mean Sq F value   Pr(>F)    
sex              1     12      12   0.090  0.76397    
race             1   1103    1103   8.580  0.00352 ** 
grade            1   4188    4188  32.591 1.73e-08 ***
sex:race         1    487     487   3.787  0.05209 .  
sex:grade        1     77      77   0.600  0.43873    
race:grade       1    120     120   0.936  0.33360    
sex:race:grade   1    103     103   0.803  0.37065    
Residuals      647  83150     129                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both of race effect and grade effect is significant in reading score. There is a significant difference in reading score between minority race and majority race. All interaction terms are not significant.


HW exercise 2.

Eight different physical measurements of 30 French girls were recorded from 4 to 15 years old. Explore various ways to display the data using trellis graphics.

Source: Sempe, M., et al. (1987). Multivariate and longitudinal data on growing children: Presentation of the French auxiological survey. In J. Janssen, et al. (1987). Data analysis. The Ins and Outs of solving real problems (pp. 3-6). New York: Plenum Press.

  • Column 1: Weight in grams
  • Column 2: Height in mms
  • Column 3: Head to butt length in mms
  • Column 4: Head circumference in mms
  • Column 5: Chest circumference in mms
  • Column 6: Arm length in mms
  • Column 7: Calf length in mms
  • Column 8: Pelvis circumference in mms
  • Column 9: Age in years
  • Column 10: Girl ID


[Solution and Answer]

Load in the data set and check the data structure

dta2 <- read.table('../data/data_hw0413_trellis_2.txt', header = TRUE)
head(dta2)
summary(dta2)
       Wt             Ht             Hb              Hc              Cc       
 Min.   :1214   Min.   : 932   Min.   :546.0   Min.   :465.0   Min.   :476.0  
 1st Qu.:1981   1st Qu.:1145   1st Qu.:631.0   1st Qu.:508.0   1st Qu.:575.0  
 Median :2778   Median :1312   Median :694.5   Median :522.0   Median :630.0  
 Mean   :3041   Mean   :1318   Mean   :707.2   Mean   :522.8   Mean   :656.4  
 3rd Qu.:4025   3rd Qu.:1504   3rd Qu.:786.0   3rd Qu.:537.0   3rd Qu.:740.0  
 Max.   :6285   Max.   :1682   Max.   :905.0   Max.   :585.0   Max.   :935.0  
                                                                              
      Arm             Calf           Pelvis           age              id     
 Min.   :144.0   Min.   :198.0   Min.   :150.0   Min.   : 4.00   S1     : 12  
 1st Qu.:170.0   1st Qu.:234.8   1st Qu.:190.0   1st Qu.: 6.75   S10    : 12  
 Median :186.5   Median :266.0   Median :209.0   Median : 9.50   S11    : 12  
 Mean   :192.8   Mean   :271.5   Mean   :214.8   Mean   : 9.50   S12    : 12  
 3rd Qu.:216.0   3rd Qu.:309.2   3rd Qu.:241.0   3rd Qu.:12.25   S13    : 12  
 Max.   :285.0   Max.   :380.0   Max.   :305.0   Max.   :15.00   S14    : 12  
                                                                 (Other):288  
length(unique(dta2$age))
[1] 12

Compute average measurements for each girl

       Wt             Ht             Hb              Hc              Cc       
 Min.   :1214   Min.   : 932   Min.   :546.0   Min.   :465.0   Min.   :476.0  
 1st Qu.:1981   1st Qu.:1145   1st Qu.:631.0   1st Qu.:508.0   1st Qu.:575.0  
 Median :2778   Median :1312   Median :694.5   Median :522.0   Median :630.0  
 Mean   :3041   Mean   :1318   Mean   :707.2   Mean   :522.8   Mean   :656.4  
 3rd Qu.:4025   3rd Qu.:1504   3rd Qu.:786.0   3rd Qu.:537.0   3rd Qu.:740.0  
 Max.   :6285   Max.   :1682   Max.   :905.0   Max.   :585.0   Max.   :935.0  
                                                                              
      Arm             Calf           Pelvis           age              id     
 Min.   :144.0   Min.   :198.0   Min.   :150.0   Min.   : 4.00   S1     : 12  
 1st Qu.:170.0   1st Qu.:234.8   1st Qu.:190.0   1st Qu.: 6.75   S10    : 12  
 Median :186.5   Median :266.0   Median :209.0   Median : 9.50   S11    : 12  
 Mean   :192.8   Mean   :271.5   Mean   :214.8   Mean   : 9.50   S12    : 12  
 3rd Qu.:216.0   3rd Qu.:309.2   3rd Qu.:241.0   3rd Qu.:12.25   S13    : 12  
 Max.   :285.0   Max.   :380.0   Max.   :305.0   Max.   :15.00   S14    : 12  
                                                                 (Other):288  
[1] 12

Present relationship between weight and height for each year and generally

xyplot(Wt ~ Ht, groups = factor(age), data = dta2,
       auto.key = list(column=4), pch=16,
       type = c('p', 'r', 'g'), xlab= 'Height', ylab = 'Weight')

# Correlation
with(dta2, cor(Wt, Ht))
[1] 0.9491352
split(dta2, dta2$age) %>% sapply(., function(df) {with(df, cor(Wt, Ht))})
        4         5         6         7         8         9        10        11 
0.6589876 0.7131878 0.7563261 0.7550721 0.7381013 0.6127379 0.6706406 0.6738371 
       12        13        14        15 
0.7103265 0.6313972 0.4659650 0.3489307 
# Intecept and slope in linear model (weight ~ height)
split(dta2, dta2$age) %>% sapply(., function(df) {coefficients(lm(Wt ~ Ht, data = df))[1]})
 4.(Intercept)  5.(Intercept)  6.(Intercept)  7.(Intercept)  8.(Intercept) 
     -1492.131      -2071.391      -2397.710      -2525.049      -3362.959 
 9.(Intercept) 10.(Intercept) 11.(Intercept) 12.(Intercept) 13.(Intercept) 
     -3230.680      -4717.608      -4661.448      -5253.717      -6183.765 
14.(Intercept) 15.(Intercept) 
     -4622.857      -1026.453 
split(dta2, dta2$age) %>% sapply(., function(df) {coefficients(lm(Wt ~ Ht, data = df))[2]})
    4.Ht     5.Ht     6.Ht     7.Ht     8.Ht     9.Ht    10.Ht    11.Ht 
3.019906 3.558213 3.827905 3.936904 4.638774 4.579293 5.745730 5.712182 
   12.Ht    13.Ht    14.Ht    15.Ht 
6.131886 6.841161 5.939274 3.772245 
  • Generally, height highly positively correlated with weight. But when grouping with age, the correaltion are not so high.
  • In the linear models of weight ~ height, all slopes are positive. This indicates that from age 4-15, height keep positively correlatied with weight in girls.
  • While the intercept of model with age 15 is quite different from others, indicating that girls get heavier in age 15 than other age.


Present the relationship between weight (Wt) and other measurments for each age

Standardize each measurement

dta2_std <- dta2 %>% mutate_all(function(x) {as.numeric(robustHD::standardize(x))})
Warning in mean.default(x): argument is not numeric or logical: returning NA
Warning in Ops.factor(x, center): '-' not meaningful for factors
dta2_std$age <- dta2$age
dta2_std$id <- dta2$id
summary(dta2_std)
       Wt                Ht                 Hb                Hc          
 Min.   :-1.4813   Min.   :-1.89954   Min.   :-1.7435   Min.   :-2.78039  
 1st Qu.:-0.8593   1st Qu.:-0.85086   1st Qu.:-0.8241   1st Qu.:-0.71183  
 Median :-0.2137   Median :-0.03113   Median :-0.1372   Median :-0.03835  
 Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.00000  
 3rd Qu.: 0.7977   3rd Qu.: 0.91785   3rd Qu.: 0.8526   3rd Qu.: 0.68324  
 Max.   : 2.6301   Max.   : 1.79298   Max.   : 2.1399   Max.   : 2.99233  
                                                                          
       Cc               Arm               Calf             Pelvis       
 Min.   :-1.7838   Min.   :-1.6862   Min.   :-1.7252   Min.   :-1.9485  
 1st Qu.:-0.8049   1st Qu.:-0.7872   1st Qu.:-0.8628   1st Qu.:-0.7464  
 Median :-0.2610   Median :-0.2167   Median :-0.1295   Median :-0.1755  
 Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
 3rd Qu.: 0.8267   3rd Qu.: 0.8034   3rd Qu.: 0.8855   3rd Qu.: 0.7862  
 Max.   : 2.7548   Max.   : 3.1892   Max.   : 2.5457   Max.   : 2.7095  
                                                                        
      age              id     
 Min.   : 4.00   S1     : 12  
 1st Qu.: 6.75   S10    : 12  
 Median : 9.50   S11    : 12  
 Mean   : 9.50   S12    : 12  
 3rd Qu.:12.25   S13    : 12  
 Max.   :15.00   S14    : 12  
                 (Other):288  

Turn data set into long form

dta2_Wt <- dta2_std %>% select(-Wt, -id, -age) %>% stack
dta2_Wt$Wt <- rep(dta2_std$Wt, ncol(dta2_std)-3)
dta2_Wt$age <- rep(dta2_std$age, ncol(dta2_std)-3)
dta2_Wt$id <- rep(dta2_std$id, ncol(dta2_std)-3)
str(dta2_Wt)
'data.frame':   2520 obs. of  5 variables:
 $ values: num  -1.44 -1.57 -1.76 -1.54 -1.51 ...
 $ ind   : Factor w/ 7 levels "Ht","Hb","Hc",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Wt    : num  -1.29 -1.31 -1.38 -1.16 -1.1 ...
 $ age   : int  4 4 4 4 4 4 4 4 4 4 ...
 $ id    : Factor w/ 30 levels "S1","S10","S11",..: 1 12 23 25 26 27 28 29 30 2 ...

Plot

xyplot(Wt ~ values | ind, groups = paste0('Age ', age), data = dta2_Wt,
       auto.key = list(column=4), pch=16, cex=.4, type = c('p', 'r', 'g'),
       xlab= 'Standardized measurement', ylab = 'Standardized weight')

The slopes of Ht in predicting Wt are relatively similar in different ages. While slopes of other measurement in predicting Wt differ in different ages. Take head circumference (Hc) as an example, the slope of Hc in predicting Wt get higher when girls get older.

Try another variables: Present the relationship between one measuremeant and other measurments for each age

Put codes into a function

xyplots_dta2 <- function(var_name, var_label) {
  library(lattice)
  
  # Turn data set into long form
  dta2_var <- dta2_std %>% select(-var_name, -id, -age) %>% stack
  dta2_var$var_name <- rep(dta2_std[,var_name], ncol(dta2_std)-3)
  dta2_var$age <- rep(dta2_std$age, ncol(dta2_std)-3)
  dta2_var$id <- rep(dta2_std$id, ncol(dta2_std)-3)
  
  # Plot
  xyplot(var_name ~ values | ind, groups = paste0('Age ', age), data = dta2_var,
         auto.key = list(column=4), pch=16, cex=.4, type = c('p', 'r', 'g'), 
         xlab = 'Standardized measurement',
         ylab = paste0('Standardized ', var_label))
}


Head to butt length

xyplots_dta2(var_name = 'Hb', var_label = 'length of head to butt')
Note: Using an external vector in selections is ambiguous.
ℹ Use `all_of(var_name)` instead of `var_name` to silence this message.
ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.

  • In linear models of Hb ~ Ht of each age, both of intecepts and slopes are similar.
  • In linear models of Hb ~ Hc of each age,slopes are similar while intecepts differ.


Arm

xyplots_dta2(var_name = 'Arm', var_label = 'length of arm')

  • In linear models of Arm ~ Ht of each age, slope of age 15 is negative, which is quite different from other ages.
  • In linear models of Hb ~ Hb of each age, slopes of age 6 and 15 are quite smaller than other ages.


HW exercise 3.

Your manager gave you a sales data on sevral products in a SAS format. Your task is to summarize and report the data in tables and graphs using the R lattice package.

Source: Gupta, S. K. (2006). Data Management and Reporting Made Easy with SAS Learning Edition 2.0

  1. Recode the region variable (1 to 4) by “Nothern”, “Southern”, “Eastern” and “Western”;
  2. the district variable (1-5) by “North East”, “South East”, “South West”, “North West”, “Central West”;
  3. the quarter variable (1-4) by “1st”, “2nd”, “3rd”, “4th”;
  4. and the month variable (1-12) by “Jan”, “Feb”, etc. Set negative sales values to zero.


[Solution and Answer]

Load in the data

pacman::p_load('sas7bdat')
dta3 <- read.sas7bdat('../data/sales.sas7bdat')
head(dta3)
summary(dta3)
     product       category     customer       year          month      
 Boots   :24   Shoes   :48   Acme   :60   Min.   :2001   Min.   : 1.00  
 Shoes   :24   Slippers:24   BigX   : 6   1st Qu.:2001   1st Qu.: 3.75  
 Slippers:24                 TwoFeet: 6   Median :2002   Median : 6.50  
                                          Mean   :2002   Mean   : 6.50  
                                          3rd Qu.:2002   3rd Qu.: 9.25  
                                          Max.   :2002   Max.   :12.00  
    quarter         market          sales          expense         region     
 Min.   :1.00   Min.   :1.000   Min.   :-1400   Min.   :-980   Min.   :1.000  
 1st Qu.:1.75   1st Qu.:1.000   1st Qu.: 1000   1st Qu.: 660   1st Qu.:1.000  
 Median :2.50   Median :2.000   Median : 1550   Median :1065   Median :1.000  
 Mean   :2.50   Mean   :1.667   Mean   : 1686   Mean   :1172   Mean   :1.333  
 3rd Qu.:3.25   3rd Qu.:2.000   3rd Qu.: 2525   3rd Qu.:1860   3rd Qu.:1.000  
 Max.   :4.00   Max.   :2.000   Max.   : 4700   Max.   :2960   Max.   :4.000  
    district       return         constantv    quantity    
 Min.   :1.0   Min.   :0.0000   Min.   :1   Min.   :  0.0  
 1st Qu.:1.0   1st Qu.:0.0000   1st Qu.:1   1st Qu.:135.2  
 Median :1.0   Median :0.0000   Median :1   Median :220.0  
 Mean   :1.5   Mean   :0.6667   Mean   :1   Mean   :248.9  
 3rd Qu.:1.0   3rd Qu.:0.0000   3rd Qu.:1   3rd Qu.:287.8  
 Max.   :5.0   Max.   :5.0000   Max.   :1   Max.   :940.0  

See the distribution of scales for each region.

densityplot(~ quantity | as.character(region), data = dta3, layout=c(1, 3))

There is no sale in Eastern region (region=3).


Compare sales of each quarter

bwplot(quantity ~ factor(quarter), data = dta3, xlab="Quarter")

Generally, the sale increase as the quarter goes by.


Compare sales of each quarter

bwplot(quantity ~ factor(quarter), data = dta3, xlab="Quarter")

Compare sales of each market

bwplot(quantity ~ factor(market), data = dta3, xlab="Market")

There is a larger sale in market 2 than makret 1. And the variation of sales of market 1 is smaller than that of market 2.

Compare sales of each product

bwplot(quantity ~ factor(product), data = dta3, xlab="Market")

Slippers have a better sale.


Compare sales of each product in 2 markets

bwplot(quantity ~ factor(product) | factor(market), data = dta3, xlab="Market")

The sale of slippers is not that good in market 1.


Compare sales change of months for each product

dotplot(quantity ~ month, groups = product, data = dta3, 
        xlab="Month", ylab="Quantity", 
        type=c('p', 'g',"r"), auto.key=list(space="top", columns=3))


Compare sales of each customer in 2 markets

bwplot(quantity ~ factor(customer) | factor(market), data = dta3, xlab="Market")

We have only one customer, Acme, in market 1.

Compare distribution of sales of each product in four quaters.

densityplot(~ quantity | factor(quarter), groups = product, data = dta3, 
            xlab = "Quarter", auto.key = list(column=3), layout = c(1, 4))

Finding

  1. We should develop markets in Eastern region
  2. Sale of TwoFeet and BigX are lower than Acme. We should put more efforts on attracting these two customers, especially in market 1.
  3. Keep a good relationship with our biggest customer, Acme.
  4. Put more resources on marketing boots and shoes.
  5. Try to develop products for spring to increase sales in month

HW exercise 4.

Use the Lattice package to graphically explore the age and gender effects on reaction time reported in the Bassin data example.

Each year the U.S. Naval Postgraduate School sets aside a “Discovery Day” during which the general public is invited into their laboratories. The data come from October 21 1995, when visitors could test their reaction times and hand-eye coordination in the Human Systems Integration Laboratory. The variable of interest, “anticipatory timing”, was measured by a Bassin timer, which measures the ability to estimate the speed of a moving light and its arrival at a designated point. The Timer consists of a 10 foot row of lights which is controlled by a variable speed potentiometer. The lights are switched on sequentially from one end to the other so that light ‘travels’ at 5 miles per hour down the Timer. Each visitor was instructed to anticipate the ‘arrival’ of the light at one end of the Timer and at that time to swing a plastic bat across a light beam at the same end of the Timer. An automatic timing device measured the difference between the breaking of the beam and the actual arrival of the light. A negative value of a trial variable indicates the bat broke the beam before the light actually arrived. Each of 113 visitors completed the trial five times. Age and gender were also recorded. Visitors tended to come in family groups, but that information was not recorded. It may be that subject #35, who is a two year old with much slower reaction times, should be deleted.

Source: OzData

  • Column 1: Gender ID
  • Column 2: Age (year)
  • Column 3: Response time Trial 1
  • Column 3: Response time Trial 2
  • Column 3: Response time Trial 3
  • Column 3: Response time Trial 4
  • Column 3: Response time Trial 5

[Solution and Answer]

Load in the data set

OzData <- read.table('../data/OzData.txt', header = TRUE)
head(OzData)
str(OzData)
'data.frame':   113 obs. of  7 variables:
 $ Sex   : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
 $ Age   : int  31 30 30 27 30 28 34 28 28 33 ...
 $ Trial1: num  0.051 0.074 0.051 0.182 0.077 0.103 -0.066 0.204 -0.231 -0.052 ...
 $ Trial2: num  0.023 0.006 0.094 0.166 0.001 0.065 0.031 -0.106 -0.124 -0.011 ...
 $ Trial3: num  0.106 0.003 0.084 -0.073 0 0.063 0.036 -0.09 -0.065 -0.025 ...
 $ Trial4: num  0.076 0.02 0.176 -0.044 -0.027 0.059 0.11 -0.04 -0.19 -0.014 ...
 $ Trial5: num  0.013 0.022 0.103 0.029 -0.2 0.059 0.045 -0.03 -0.211 -0.059 ...

Create new variables

OzData$RT_av <- OzData %>% select(contains('Trial')) %>% rowSums()
summary(OzData$RT_av)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-1.6360 -0.0880  0.2390  0.2676  0.5110  5.7950 

Age effect

Scatter plot

xyplot(RT_av ~ Age, data = OzData, type = c('p', 'r', 'g'),
       xlab= 'Age', ylab = 'Average reaction time')

Group age to draw boxplot

OzData$Age_group <- cut(OzData$Age, 
                        breaks=c(0, median(OzData$Age), 100), 
                        labels=c('Young', 'Old'))

bwplot(RT_av ~ Age_group, data = OzData,
       xlab= 'Age group', ylab = 'Average reaction time')

The variation of average reaction time of younger participants is larger than that of older participants.


Gender effect

bwplot(RT_av ~ Sex, data = OzData, xlab= 'Gender', ylab = 'Average reaction time')

The variation of average reaction time of female is larger than that of male.


ANOVA

OzData_long <- OzData %>% select(contains('Trial')) %>% stack()
OzData_long$ID <- rep(rownames(OzData), each=5)
OzData_long$Sex <- rep(OzData$Sex, each=5)
OzData_long$Age <- rep(OzData$AGE, each=5)
OzData_long$Age_group <- rep(OzData$Age_group, each=5)

model_OzData <- aov(values ~ (Sex*Age_group) + Error(ID / (Sex*Age_group) + ind), data = OzData_long)
Warning in aov(values ~ (Sex * Age_group) + Error(ID/(Sex * Age_group) + :
Error() model is singular
summary(model_OzData)

Error: ID
               Df Sum Sq Mean Sq F value Pr(>F)  
Sex             1  0.104 0.10428   3.176 0.0775 .
Age_group       1  0.010 0.01014   0.309 0.5795  
Sex:Age_group   1  0.017 0.01734   0.528 0.4689  
Residuals     109  3.579 0.03284                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Error: ind
          Df  Sum Sq  Mean Sq F value Pr(>F)
Residuals  4 0.03211 0.008026               

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
Residuals 448  17.25 0.03849               

Neither gender effect nor age effect is significant. The interaction effect is not significant, either.

Jay Liao

2020-04-20