Data sets used in the following exercises can be found on Github.
1. Consider the following series of data collected on 8 subjects:
5.4 6.1 6.2 NA 6.2 5.6 19.0 6.3
Store these values in a variable named x. It turns out that the 7th observation is inexact and should be considered as a missing value. It is also suggested to replace the two missing values (observations 4 and 7) by the mean computed on the whole sample.
2. Create a treatment facteur, tx, of size 60, with balanced levels (30 std and 30 new) organized as follows:
std std std new new new ... std std std new new new
Replace label std with old. Then, update this vector so that the sequence of labels starts with new, instead of old (i.e., swap the two labels of the factor). Finally, intermix the two labels randomly.
3. Load the birthwt data set from the MASS package (data(birthwt, package="MASS")), and compute the following quantities:
ht).smoke=1) during pregnancy but was free of hypertension (ht=0).ptl variable: ptl2=0 if ptl=0, ptl2=1 when ptl>0.4. Based on the following artificial data of students' heights in two classrooms, indicate if the largest student comes from classroom A or B. Data were generated as follows:
d <- data.frame(height = rnorm(40, 170, 10), class = sample(LETTERS[1:2], 40, rep = TRUE))
d$height[sample(1:40, 1)] <- 220
5. Load the data included in lungcancer.txt. This data file features several coding problems (variables type, missing values, etc.). Inspect these data (e.g., using summary()) and suggest appropriate corrections.
6. Import the reading2.csv data file using read.csv() or read.csv2() depending on the type of field delimiter (refer the on-line documentation). Details about this study are given in Schmitt (1987). Note that the original data set was modified for the purpose of this exercise.
Treated vs. Control).wilcox.test).7. The fusion.dat file includes data from an experiment on visual perception relying on the use of random-dots stereograms (Frisby & Clatworthy, 1975). The aim of this study was to verify that prior knowledge of visual shape to detect enhances processing time for fusing the two images. One group of subjects (NV) received either no information or verbal vue only, while the other group (VV) received verbal and visual cues (drawing of the target object).
var.test()).8. The IQ_Brain_Size.txt data file contains data from a brain imaging study on brain volume and anthropometric characteristics of Monozygotic twins (Tramo et al. 1998). More details are available in the file header. It can be loaded as follows:
brain <- read.table("IQ_Brain_Size.txt", header = FALSE, skip = 27, nrows = 20)
This assumes that the data file sits in the current working directory (otherwise, you can change the pathname, or update the current working directory with setwd() or RStudio's facilities). Do read carefully the on-line help for read.table() to understand what the above command is actually doing.
CCMIDSA, HC, TOTSA, TOTVOL.TOTVOL.For questions (8) and (9), don't just give the p-values; give informative summary of existing differences (e.g., means ± standard deviation, standardized means difference), if any.
9. Here is an artificial data set to study the one-way ANOVA model.
set.seed(101)
k <- 3 # number of groups
ni <- 10 # number of observations per group
mi <- c(10, 12, 8) # group means
si <- c(1.2, 1.1, 1.1) # group standard deviations
grp <- gl(k, ni, k * ni, labels = c("A", "B", "C"))
resp <- c(rnorm(ni, mi[1], si[1]), rnorm(ni, mi[2], si[2]), rnorm(ni, mi[3], si[3]))
Note that the last command is not necessarily the most elegant way to generate individual responses, and such operations can easily be vectorized like this:
resp <- as.vector(mapply(rnorm, ni, mi, si))
All variables are then grouped into a data frame called dfrm:
dfrm <- data.frame(grp, resp)
rm(grp, resp) # delete original variables
C). Compare the results with those obtained from a t-test.10. The taste.dat file includes results from a study on appetite induced by different foods (Street & Carroll, 1972). Subjects assessed food flavor on a 7-point ordinal scale (-3 = awful to +3 = excellent). There were a total of 50 participants randomized to one of 4 conditions defined by the following factors: screen type (SCR, 0=coarse or 1=fine) and concentration of a liquid component (LIQ, 0=low or 1=high). The experiment was repeated 4 times, so that there are 16 groups of 50 participants in total. The response variable was the total score (SCORE) recorded for each treatment.
SCR + LIQ + SCR:LIQ)?help.search()), and apply the command to the current data. What conclusion can be drawn from its results?SCR factor? Compare with what would be obtained using a simple t-test.11. Data in the file brain_size.dat were collected by authors interested in the relationship between brain size and weight and intelligence (Willerman et al. 1991).
Hint: Be careful with missing data when loading the data.
MRI_count and Weight, with its 95% confidence interval.r.test() command from the psych package (install.packages("psych")).MRI_count and Weight , highlighted by gender levels.fm <- MRI_count ~ Weight. Does its 95% confidence interval include the value 0?12. Load data included in the R file sim.rda. Once loaded using data(), you will have access to a data frame named sim. This is a rectangular data set with 80 observations and 44 variables. The response variable is y, and explanatory variables are named x1 to x43. We seek to isolate among them predictors showing interesting relationship with y, in terms of positive or negative linear correlation. To keep a variable, a criteria based on a significant correlation test (unadjusted 5% level) will be considered.
y as the response variable and all candidates predictors. Comment the results.13. Consider the ToothGrowth dataset, available in R with the following command:
data(TootGrowth, package = "MASS")
We are interested in studying the effect of dose alone (i.e., don't pay attention to the supp factor). Compare and discuss the results of the following models:
dose treated as a factor with unordered levelsdose treated as a factor with ordered levels (ToothGrowth$dose <- factor(ToothGrowth$dose, ordered=TRUE))dose treated as a numeric variable.dose and I(dose^2) (which means the square of the dose) as numeric variables.14. Using data from Exercise 10, we will be interested in modeling the relationship between score and screen type, SCORE ~ SCR, without treating SCR as a factor.
SCORE ~ SCR, and show the regression line in a scatterplot display, together with individual data points.m, what represent the following values: taste$SCORE - fitted(m)?SCR treated as a grouping variable?SCR variable are currently coded as {0,1}. Update them to {-1,+1}. Does this affect the previous comparison? If so, in what sense?15. Authors studied the effect of different dieting treatment on 60 rats, and they considered two factors of interest: type of proteins used to feed the rats (beef, pork, cereals) and their concentration level (high or low) (Snedecor & Cochran, 1980). The response variable was the weight gain (in grams) after treatment. Data are available in the rats.rda data file.
load("rats.rda")
rat <- within(rat, {
Diet.Amount <- factor(Diet.Amount, levels = 1:2, labels = c("High", "Low"))
Diet.Type <- factor(Diet.Type, levels = 1:3, labels = c("Beef", "Pork", "Cereal"))
})
Beef/High, Beef/Low, Pork/High, Pork/Low, Cereal/High, et Cereal/Low. What is the result of the F-test for a one-way ANOVA?pairwise.t.test() with Bonferroni correction to identify pairs of treatments that differ significantly one from the other.Beef/High and Pork/High) vs. the average of all other treatments. Is the result significant at the 5% level? What does this result suggest?Frisby J and Clatworthy J (1975). “Learning to see complex random-dot steregrams.” Perception, 4, pp. 173-178.
Schmitt MC (1987). The Effects on an Elaborated Directed Reading Activity on the Metacomprehension Skills of Third Graders. PhD thesis, Purdue University.
Snedecor G and Cochran W (1980). Statistical Methods. Iowa State Press.
Street E and Carroll M (1972). “Preliminary Evaluation of a Food
Product.” In Tanur J (ed.), pp. 220-238. Holden-Day, San Francisco.
Tramo M, Loftus W, Green R, Stukel T, Weaver J and Gazzaniga M (1998). “Brain Size, Head Size, and IQ in Monozygotic Twins.” Neurology, 50, pp. 1246-1252.
Willerman L, Schultz R, Rutledge J and Bigler E (1991). “In Vivo Brain Size and Intelligence.” Intelligence, 15, pp. 223-228.