Week 3 YAY!

Conceptual questions

1. What are some disadvantages of using experimental control (i.e. random assignment) that statistical control doesn’t have?

Answer: Random assignment is not effective in small samples; group membership may have an effect. Also, an appropriate control group is required in differentiating treatment variables among groups. Moreover, if experimenters fail to identify all potential causes, third variables may exert a moderating effect (making it difficult to determining causal relationships). Finally, if comparing group means to test group relationships, we may be unable to detect effects that only happen at extremes.

2. What are some limitations of statistical control?

Answer: Statistical control may be under or over. In the case of undercontrol, a researcher may have failed to include all causes of the outcome correlated to the independent variables in the model. In overcontrol, effects of of the outcome might be contained in the predictor set.

3. What advantages are there to random assignment?

Answer: Model specification requires that all causes Y related to any X in the model is included in the model. By randomly assignment participants to groups, we expect that the only difference between is the treatment. Thus, no variable should be correlated to group membership and model specification is satisfied.

4. What are the advantages of using experimental and statistical control together?

Answer: Used together, experimental and statistical control increase plausibility of causal inference. Specifically, the Rubin Causal Model, calls for manipulation of the independent variable, temporal precedence (in which the cause occurs before the effect), and correct model specification combine to strengthen claims of causal inference.

5. What is cross-validation and why is it useful?

Answer: Cross-validation assists with predictive validity - i.e., considers whether a model derived from sample could be used to predict the outcome in another sample. In cross-validation, a model derived from one sample is used to estimate Y in another sample. The multiple correlation between Y and estimated Y in the new sample is then assessed.

6. Why is shrunken R useful?

Answer: Shrunken R-squared corrects for potential bias by comparing the sample size to the number of terms in the regression model. Lower sample size requires more correction (shrinking).

7. What are forward and backward stepwise regression?

Answer: Stepwise regression is algorithmically-guided selection of regression models, as compared to a theoretically-driven selection of regression models. In forward selection, regressors are evaluated to see which has the largest R^2, this is added first to the model. Subsequent regressors are added in order of largest R-2 change. In backward elimination, all regressors are entered into the model then deleted in the order of which reduces R-squared the least.

8. Why is stepwise regression problematic?

Answer: Stepwise regression is problematic because R-squared values are already upwardly biased. Collinearity problems are also worsened by this approach.

9. What are the advantages and disadvantages of using regressed change scores and gain/difference scores?

Answer: Gain scores may give more accurate estimates but is influenced by regression to the mean. Regressed change scores account for regression to the mean. Gain scores have bias in the estimator while residual scores have bias in the estimate.

Dichotomous predictor (aplied questions)

Prompt: In this assignment we use taste ratings of French fries that may or may not have been treated. Our outcome is how potatoy the French fries taste. 6 people tasted French fries with the treatment and 6 people tasted French fires without the treatment. They rated how potatoy, buttery, grassy, rancid, and painty the fries tasted. Let’s predict potato taste from treatment condition.

Before we run any analyses, let’s load our packages and data and make sure it’s in good shape

Here are the packages we will use this week

library(QuantPsyc)
## Loading required package: boot
## Loading required package: MASS
## 
## Attaching package: 'QuantPsyc'
## The following object is masked from 'package:base':
## 
##     norm
library(psych)
## 
## Attaching package: 'psych'
## The following object is masked from 'package:boot':
## 
##     logit
library(pwr)

Here we load the data

ff = read.csv("hw3data.csv")
#let's see what we got here
str(ff)
## 'data.frame':    696 obs. of  9 variables:
##  $ time     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ treatment: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ subject  : int  3 3 10 10 15 15 16 16 19 19 ...
##  $ rep      : int  1 2 1 2 1 2 1 2 1 2 ...
##  $ potatoy  : num  2.9 14 11 9.9 1.2 8.8 9 8.2 7 13 ...
##  $ buttery  : num  0 0 6.4 5.9 0.1 3 2.6 4.4 3.2 0 ...
##  $ grassy   : num  0 0 0 2.9 0 3.6 0.4 0.3 0 3.1 ...
##  $ rancid   : num  0 1.1 0 2.2 1.1 1.5 0.1 1.4 4.9 4.3 ...
##  $ painty   : num  5.5 0 0 0 5.1 2.3 0.2 4 3.2 10.3 ...
head(ff) #show only a few rows
##   time treatment subject rep potatoy buttery grassy rancid painty
## 1    1         1       3   1     2.9     0.0    0.0    0.0    5.5
## 2    1         1       3   2    14.0     0.0    0.0    1.1    0.0
## 3    1         1      10   1    11.0     6.4    0.0    0.0    0.0
## 4    1         1      10   2     9.9     5.9    2.9    2.2    0.0
## 5    1         1      15   1     1.2     0.1    0.0    1.1    5.1
## 6    1         1      15   2     8.8     3.0    3.6    1.5    2.3
#Next we have to do a few things to our data to make it work for the current research questions
#removing treatment 2 so will compare 1 to 3
#also removing all but first time point
#and removing pre-test
ff = ff[!ff$treatment == 2 & ff$time == 1 & ff$rep == 2,]
#"random" assignment - just assigning first 6 people to 1 condition and second 6 to people to other condition
#doing this by subestting since in reality each person got both treatments
length(unique(ff$subject))
## [1] 12
ff = ff[c(1:6, 19:24),]
#removing unnecesary columns
ff= ff[,c(2, 5:9)]
#dummy coding dichtomous predictor
ff$treatment[ff$treatment == 1] = 0
ff$treatment[ff$treatment == 3] = 1
#describe dataset and correlations
describe(ff)
##           vars  n mean   sd median trimmed  mad min  max range skew kurtosis
## treatment    1 12 0.50 0.52   0.50    0.50 0.74   0  1.0   1.0 0.00    -2.16
## potatoy      2 12 9.75 2.55   9.35    9.80 1.63   5 14.0   9.0 0.08    -0.91
## buttery      3 12 2.69 2.62   2.75    2.53 4.00   0  7.0   7.0 0.18    -1.69
## grassy       4 12 1.51 1.59   1.15    1.44 1.70   0  3.7   3.7 0.18    -1.93
## rancid       5 12 2.16 2.08   1.45    1.99 1.56   0  6.0   6.0 0.72    -1.04
## painty       6 12 2.63 3.82   0.00    2.13 0.00   0 10.3  10.3 0.92    -0.86
##             se
## treatment 0.15
## potatoy   0.74
## buttery   0.76
## grassy    0.46
## rancid    0.60
## painty    1.10
print(cor(ff)) #wrapping in print will do just that, print!
##             treatment    potatoy     buttery      grassy     rancid      painty
## treatment  1.00000000 -0.5598476  0.02322993 -0.09322804 -0.2640749 -0.03644808
## potatoy   -0.55984762  1.0000000 -0.24299827 -0.40605343  0.2061370 -0.23152006
## buttery    0.02322993 -0.2429983  1.00000000  0.16992407 -0.0604959 -0.28119059
## grassy    -0.09322804 -0.4060534  0.16992407  1.00000000  0.2127916  0.58280403
## rancid    -0.26407493  0.2061370 -0.06049590  0.21279163  1.0000000  0.50401512
## painty    -0.03644808 -0.2315201 -0.28119059  0.58280403  0.5040151  1.00000000

1. create a scatter plot of potato taste as a function of treatment. What does the scatterplot reveal?

#dichotomous predictor
plot(ff$treatment, ff$potatoy)

Answer: The scatter plot indicates that potatoes that did not receive treatment receive better taste reatings relative to those that do.

2. Run a regression model predicting potatoy taste from treatment. What is the intercept and slope? What do these numbers represent?

#running model
treatmodel = lm(potatoy ~ treatment, ff) 
#seeing output
summary(treatmodel)
## 
## Call:
## lm(formula = potatoy ~ treatment, data = ff)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3833 -1.4917  0.0167  1.8917  2.8833 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.1167     0.9046  12.289 2.33e-07 ***
## treatment    -2.7333     1.2793  -2.137   0.0584 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.216 on 10 degrees of freedom
## Multiple R-squared:  0.3134, Adjusted R-squared:  0.2448 
## F-statistic: 4.565 on 1 and 10 DF,  p-value: 0.05837

Call: lm(formula = potatoy ~ treatment, data = ff)

Residuals: Min 1Q Median 3Q Max -3.3833 -1.4917 0.0167 1.8917 2.8833

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.1167 0.9046 12.289 2.33e-07 *** treatment -2.7333 1.2793 -2.137 0.0584 .
— Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.216 on 10 degrees of freedom Multiple R-squared: 0.3134, Adjusted R-squared: 0.2448 F-statistic: 4.565 on 1 and 10 DF, p-value: 0.05837

Answer: The intercept is 11.1 and the slope is -2.73. Th intercept represents potato taste with no treatment, while the slope indicates change in potato taste rating with treatment.

3. Calculate a standardized regression coefficient and the correlation between potatoy taste and treatment. Are they the same or different? Why?

#betas and rs
lm.beta(treatmodel)
##  treatment 
## -0.5598476
#some more info
cor.test(ff$potatoy, ff$treatment)
## 
##  Pearson's product-moment correlation
## 
## data:  ff$potatoy and ff$treatment
## t = -2.1366, df = 10, p-value = 0.05837
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.85805754  0.02070715
## sample estimates:
##        cor 
## -0.5598476

lm.beta(treatmodel) treatment -0.5598476 #some more info cor.test(ff\(potatoy, ff\)treatment)

Pearson's product-moment correlation

data: ff\(potatoy and ff\)treatment t = -2.1366, df = 10, p-value = 0.05837 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.85805754 0.02070715 sample estimates: cor -0.5598476

Answer: The standardized correlation coefficient is -0.56 and the correlation are the same.The test of Beta yields the same result as the t-test of b.

5. If we convert the correlation to a t-value, that t-value will equal the t-value we saw where?

#can also get the r and store it in an object with:
treatr = cor(ff$treatment, ff$potatoy)
#can convert this r to a t and get the same t value in the regression model
sqrt(10*treatr^2/(1-treatr^2))
## [1] 2.13662

Answer:The t-value is equal to the t-value we saw for the Pearson correlation.

Power (applied)

1. Given our sample size and an alpha level of .05, what effect size do we have 80% power to detect? Do we have low or high power?

#what effect size do we have 80% power to detect?
pwr.r.test(n = 12, sig.level = .05, power = .8)
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 12
##               r = 0.7183955
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
 approximate correlation power calculation (arctangh transformation) 

          n = 12
          r = 0.7183955
  sig.level = 0.05
      power = 0.8
alternative = two.sided

Answer: We have 80% to detect an effect size of .71.

2. Given our sample size and an alpha level of .05, what power do we have for the effect size we found? This is called ____ power.

#what power do we have for the effect size we found?
pwr.r.test(n = 12, r = treatr, sig.level = .05)
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 12
##               r = 0.5598476
##       sig.level = 0.05
##           power = 0.501989
##     alternative = two.sided
approximate correlation power calculation (arctangh transformation) 

          n = 12
          r = 0.5598476
  sig.level = 0.05
      power = 0.501989
alternative = two.sided

Answer: We have 50% power for the effect size we found. This is low power.

3. Given an alpha level of .05, how many cases would we need in a follow-up study to have 80% power to detect the same effect size we obtained. This is called ___ power.

#how many people would we need to have 80% power for our effect size
pwr.r.test(r = treatr, sig.level = .05, power = .8)
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 21.88629
##               r = 0.5598476
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided

approximate correlation power calculation (arctangh transformation)

          n = 21.88629
          r = 0.5598476
  sig.level = 0.05
      power = 0.8
alternative = two.sided

Answer: We would need 22 cases to have 80% power to detect the same effect size.

Adding multiple predictors

1. Build a regression model in which potatoy taste is predicted by treatment and how buttery, grassy, rancid, and painty the French fries were. Rank these predictors in order of their importance (more accurately, effect consistency) based on the regression coefficients. Should you use bs or betas for this?

#adding a set of predictors
fullmodel = lm(potatoy ~ treatment + buttery + grassy + rancid + painty, ff)
summary(fullmodel)
## 
## Call:
## lm(formula = potatoy ~ treatment + buttery + grassy + rancid + 
##     painty, data = ff)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.60479 -1.11475  0.00082  1.21511  2.82154 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.2108     1.5100   8.087 0.000192 ***
## treatment    -2.5759     1.3476  -1.912 0.104488    
## buttery      -0.2268     0.2980  -0.761 0.475426    
## grassy       -0.5133     0.5869  -0.875 0.415396    
## rancid        0.3066     0.3980   0.770 0.470277    
## painty       -0.1708     0.2851  -0.599 0.570968    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.204 on 6 degrees of freedom
## Multiple R-squared:  0.5926, Adjusted R-squared:  0.253 
## F-statistic: 1.745 on 5 and 6 DF,  p-value: 0.2581
lm.beta(fullmodel)
##  treatment    buttery     grassy     rancid     painty 
## -0.5276057 -0.2333181 -0.3195422  0.2497109 -0.2559847

Call: lm(formula = potatoy ~ treatment + buttery + grassy + rancid + painty, data = ff)

Residuals: Min 1Q Median 3Q Max -2.60479 -1.11475 0.00082 1.21511 2.82154

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.2108 1.5100 8.087 0.000192 *** treatment -2.5759 1.3476 -1.912 0.104488
buttery -0.2268 0.2980 -0.761 0.475426
grassy -0.5133 0.5869 -0.875 0.415396
rancid 0.3066 0.3980 0.770 0.470277
painty -0.1708 0.2851 -0.599 0.570968
— Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.204 on 6 degrees of freedom Multiple R-squared: 0.5926, Adjusted R-squared: 0.253 F-statistic: 1.745 on 5 and 6 DF, p-value: 0.2581

lm.beta(fullmodel) treatment buttery grassy rancid painty -0.5276057 -0.2333181 -0.3195422 0.2497109 -0.2559847

Answer: Betas reveal the importance of different variables in a model. Given this, the predictors ranked would be: treatment (-.53), grassy (-.32), painty (-.26), rancid (.25), and buttery (-.23).

2. How much better is our model? What statistic do we use for this? Calculate a p-value for this statistic.

#change in R-squared
chgR2 = summary(fullmodel)$r.squared - summary(treatmodel)$r.squared 
chgR2
## [1] 0.2791368
#is this significant?
anova(treatmodel, fullmodel)
## Analysis of Variance Table
## 
## Model 1: potatoy ~ treatment
## Model 2: potatoy ~ treatment + buttery + grassy + rancid + painty
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     10 49.097                           
## 2      6 29.136  4    19.961 1.0277 0.4639
#to get exact p-value
#create object from anova test
a = anova(treatmodel, fullmodel)
#see what is in the object
str(a)
## Classes 'anova' and 'data.frame':    2 obs. of  6 variables:
##  $ Res.Df   : num  10 6
##  $ RSS      : num  49.1 29.1
##  $ Df       : num  NA 4
##  $ Sum of Sq: num  NA 20
##  $ F        : num  NA 1.03
##  $ Pr(>F)   : num  NA 0.464
##  - attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Model 1: potatoy ~ treatment\nModel 2: potatoy ~ treatment + buttery + grassy + rancid + painty"
#extract out the info
a$`Pr(>F)`[2]
## [1] 0.4638846

Answer: R-square change is .279. F(4) = 1.03, p = .46. The model is not significantly better.

3. Calculate the semipartial correlation of the added set of predictors. How does this relate to the statistic in the previous question? What does the semipartial correlation represent?

#change in adjusted Rsquared
chgadjR2 = summary(fullmodel)$adj.r.squared - summary(treatmodel)$adj.r.squared 
chgadjR2
## [1] 0.008265667

Answer: The semi-partial correlation is .52. This is the unique contribution of the added set of predictors in the prediction of taste rating.

4. Calculate the partial correlation of the added set of predictors. What does this number represent?

#semipartial correlation of added set of predictors
sqrt(chgR2)
## [1] 0.528334
#partial correlation of added set of predictors
sqrt(chgR2/(1-summary(treatmodel)$r.squared))
## [1] 0.6376259

Answer: The partial correlation of the added set of predictors is .64. This is the correlation between taste rating and the added set of predictors, holding them constant in both taste rating and in treatment.