Week 3 YAY!

Conceptual questions

1. What are some disadvantages of using experimental control (i.e. random assignment) that statistical control doesn't have?

Answer: Random assignment is not very useful with small samples. Control groups can be difficult to come up with depending on the complexity of what we are studying.

2. What are some limitations of statistical control?

Answer: We can never cotrol for absolutely all potential confounds.

3. What advantages are there to random assignment?

Answer: Radnom assignment also assigns all variables (except the ones we want to study) randomly, meaning there is presumably no difference between groups other than the concepts we are interested in.

4. What are the advantages of using experimental and statistical control together?

Answer: We can experimentally control for as many variables as possible and then use statsitical control to account for any differences that we cannot solve via random distribution.

5. What is cross-validation and why is it useful?

Answer: Cross-validation is a way of testing our regression equation by testing it on different samples (or subsamples). It is useful becuase with it we can make sure that our regression equation has some generalizability.

6. Why is shrunken R useful?

Answer: Shrunken R lets us predict the population regression from our sample regresssion weights.

7. What are forward and backward stepwise regression?

Answer: Stepwise regression is a way of estimating the relative importance of different predictors in the model. Forward stepwise regression starts with no predictors in the model and then adds predictors starting with the one with the largest correlaiton with Y. Backward stepwise regression starts with all predictors in the model and then eliminates them starting with the least significant one.

8. Why is stepwise regression problematic?

Answer: Stepwise regression is problematic becuase it still potentially allows for many covariates to be kept in the model, inflating R.

9. What are the advantages and disadvantages of using regressed change scores and gain/difference scores?

Answer: Regressed gain scores let us estimate the difference between points in the scatterplot and the regression line. However, it is impossible to make these scores work as completely unbiased estimates of the value we are trying to estimate.

Dichotomous predictor (aplied questions)

Prompt: In this assignment we use taste ratings of French fries that may or may not have been treated. Our outcome is how potatoy the French fries taste. 6 people tasted French fries with the treatment and 6 people tasted French fires without the treatment. They rated how potatoy, buttery, grassy, rancid, and painty the fries tasted. Let’s predict potato taste from treatment condition.

Before we run any analyses, let's load our packages and data and make sure it's in good shape

Here are the packages we will use this week

library(QuantPsyc)
## Warning: package 'QuantPsyc' was built under R version 4.0.3
## Loading required package: boot
## Loading required package: MASS
## 
## Attaching package: 'QuantPsyc'
## The following object is masked from 'package:base':
## 
##     norm
library(psych)
## 
## Attaching package: 'psych'
## The following object is masked from 'package:boot':
## 
##     logit
library(pwr)
## Warning: package 'pwr' was built under R version 4.0.3

Here we load the data

ff = read.csv("hw3data.csv")
#let's see what we got here
str(ff)
## 'data.frame':    696 obs. of  9 variables:
##  $ time     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ treatment: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ subject  : int  3 3 10 10 15 15 16 16 19 19 ...
##  $ rep      : int  1 2 1 2 1 2 1 2 1 2 ...
##  $ potatoy  : num  2.9 14 11 9.9 1.2 8.8 9 8.2 7 13 ...
##  $ buttery  : num  0 0 6.4 5.9 0.1 3 2.6 4.4 3.2 0 ...
##  $ grassy   : num  0 0 0 2.9 0 3.6 0.4 0.3 0 3.1 ...
##  $ rancid   : num  0 1.1 0 2.2 1.1 1.5 0.1 1.4 4.9 4.3 ...
##  $ painty   : num  5.5 0 0 0 5.1 2.3 0.2 4 3.2 10.3 ...
head(ff) #show only a few rows
##   time treatment subject rep potatoy buttery grassy rancid painty
## 1    1         1       3   1     2.9     0.0    0.0    0.0    5.5
## 2    1         1       3   2    14.0     0.0    0.0    1.1    0.0
## 3    1         1      10   1    11.0     6.4    0.0    0.0    0.0
## 4    1         1      10   2     9.9     5.9    2.9    2.2    0.0
## 5    1         1      15   1     1.2     0.1    0.0    1.1    5.1
## 6    1         1      15   2     8.8     3.0    3.6    1.5    2.3
#Next we have to do a few things to our data to make it work for the current research questions
#removing treatment 2 so will compare 1 to 3
#also removing all but first time point
#and removing pre-test
ff = ff[!ff$treatment == 2 & ff$time == 1 & ff$rep == 2,]
#"random" assignment - just assigning first 6 people to 1 condition and second 6 to people to other condition
#doing this by subestting since in reality each person got both treatments
length(unique(ff$subject))
## [1] 12
ff = ff[c(1:6, 19:24),]
#removing unnecesary columns
ff= ff[,c(2, 5:9)]
#dummy coding dichtomous predictor
ff$treatment[ff$treatment == 1] = 0
ff$treatment[ff$treatment == 3] = 1
#describe dataset and correlations
describe(ff)
##           vars  n mean   sd median trimmed  mad min  max range skew kurtosis
## treatment    1 12 0.50 0.52   0.50    0.50 0.74   0  1.0   1.0 0.00    -2.16
## potatoy      2 12 9.75 2.55   9.35    9.80 1.63   5 14.0   9.0 0.08    -0.91
## buttery      3 12 2.69 2.62   2.75    2.53 4.00   0  7.0   7.0 0.18    -1.69
## grassy       4 12 1.51 1.59   1.15    1.44 1.70   0  3.7   3.7 0.18    -1.93
## rancid       5 12 2.16 2.08   1.45    1.99 1.56   0  6.0   6.0 0.72    -1.04
## painty       6 12 2.63 3.82   0.00    2.13 0.00   0 10.3  10.3 0.92    -0.86
##             se
## treatment 0.15
## potatoy   0.74
## buttery   0.76
## grassy    0.46
## rancid    0.60
## painty    1.10
print(cor(ff)) #wrapping in print will do just that, print!
##             treatment    potatoy     buttery      grassy     rancid      painty
## treatment  1.00000000 -0.5598476  0.02322993 -0.09322804 -0.2640749 -0.03644808
## potatoy   -0.55984762  1.0000000 -0.24299827 -0.40605343  0.2061370 -0.23152006
## buttery    0.02322993 -0.2429983  1.00000000  0.16992407 -0.0604959 -0.28119059
## grassy    -0.09322804 -0.4060534  0.16992407  1.00000000  0.2127916  0.58280403
## rancid    -0.26407493  0.2061370 -0.06049590  0.21279163  1.0000000  0.50401512
## painty    -0.03644808 -0.2315201 -0.28119059  0.58280403  0.5040151  1.00000000

1. create a scatter plot of potato taste as a function of treatment. What does the scatterplot reveal?

#dichotomous predictor
plot(ff$treatment, ff$potatoy)

Answer: This scatterplot show us that there is a binary distribution of the data.

2. Run a regression model predicting potatoy taste from treatment. What is the intercept and slope? What do these numbers represent?

#running model
treatmodel = lm(potatoy ~ treatment, ff) 
#seeing output
summary(treatmodel)
## 
## Call:
## lm(formula = potatoy ~ treatment, data = ff)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3833 -1.4917  0.0167  1.8917  2.8833 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.1167     0.9046  12.289 2.33e-07 ***
## treatment    -2.7333     1.2793  -2.137   0.0584 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.216 on 10 degrees of freedom
## Multiple R-squared:  0.3134, Adjusted R-squared:  0.2448 
## F-statistic: 4.565 on 1 and 10 DF,  p-value: 0.05837

Answer: The intercept (potatoy-ness when treatment is 0) is 11.1167, while the slope (change in potatoy-ness from a 1-unit change in treatment) is -2.7333.

3. Calculate a standardized regression coefficient and the correlation between potatoy taste and treatment. Are they the same or different? Why?

#betas and rs
lm.beta(treatmodel)
##  treatment 
## -0.5598476
#some more info
cor.test(ff$potatoy, ff$treatment)
## 
##  Pearson's product-moment correlation
## 
## data:  ff$potatoy and ff$treatment
## t = -2.1366, df = 10, p-value = 0.05837
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.85805754  0.02070715
## sample estimates:
##        cor 
## -0.5598476

Answer:They are the same. The slope remains the same whether we standardize the values or not.

5. If we convert the correlation to a t-value, that t-value will equal the t-value we saw where?

#can also get the r and store it in an object with:
treatr = cor(ff$treatment, ff$potatoy)
#can convert this r to a t and get the same t value in the regression model
sqrt(10*treatr^2/(1-treatr^2))
## [1] 2.13662

Answer: The t-value is the same, but positive (because we squared the value in order to convert it).

Power (applied)

1. Given our sample size and an alpha level of .05, what effect size do we have 80% power to detect? Do we have low or high power?

#what effect size do we have 80% power to detect?
pwr.r.test(n = 12, sig.level = .05, power = .8)
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 12
##               r = 0.7183955
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided

Answer: We can detect an effect size of .72. We have decent power.

2. Given our sample size and an alpha level of .05, what power do we have for the effect size we found? This is called ____ power.

#what power do we have for the effect size we found?
pwr.r.test(n = 12, r = treatr, sig.level = .05)
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 12
##               r = 0.5598476
##       sig.level = 0.05
##           power = 0.501989
##     alternative = two.sided

Answer: We have the power of .5. This is called statistical power.

3. Given an alpha level of .05, how many cases would we need in a follow-up study to have 80% power to detect the same effect size we obtained. This is called ___ power.

#how many people would we need to have 80% power for our effect size
pwr.r.test(r = treatr, sig.level = .05, power = .8)
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 21.88629
##               r = 0.5598476
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided

Answer: We need a sample size of 22. This is predictive power.

Adding multiple predictors

1. Build a regression model in which potatoy taste is predicted by treatment and how buttery, grassy, rancid, and painty the French fries were. Rank these predictors in order of their importance (more accurately, effect consistency) based on the regression coefficients. Should you use bs or betas for this?

#adding a set of predictors
fullmodel = lm(potatoy ~ treatment + buttery + grassy + rancid + painty, ff)
summary(fullmodel)
## 
## Call:
## lm(formula = potatoy ~ treatment + buttery + grassy + rancid + 
##     painty, data = ff)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.60479 -1.11475  0.00082  1.21511  2.82154 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.2108     1.5100   8.087 0.000192 ***
## treatment    -2.5759     1.3476  -1.912 0.104488    
## buttery      -0.2268     0.2980  -0.761 0.475426    
## grassy       -0.5133     0.5869  -0.875 0.415396    
## rancid        0.3066     0.3980   0.770 0.470277    
## painty       -0.1708     0.2851  -0.599 0.570968    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.204 on 6 degrees of freedom
## Multiple R-squared:  0.5926, Adjusted R-squared:  0.253 
## F-statistic: 1.745 on 5 and 6 DF,  p-value: 0.2581
lm.beta(fullmodel)
##  treatment    buttery     grassy     rancid     painty 
## -0.5276057 -0.2333181 -0.3195422  0.2497109 -0.2559847

Answer: I should use betas because the ratings are arbitrary.

2. How much better is our model? What statistic do we use for this? Calculate a p-value for this statistic.

#change in R-squared
chgR2 = summary(fullmodel)$r.squared - summary(treatmodel)$r.squared 
chgR2
## [1] 0.2791368
#is this significant?
anova(treatmodel, fullmodel)
## Analysis of Variance Table
## 
## Model 1: potatoy ~ treatment
## Model 2: potatoy ~ treatment + buttery + grassy + rancid + painty
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     10 49.097                           
## 2      6 29.136  4    19.961 1.0277 0.4639
#to get exact p-value
#create object from anova test
a = anova(treatmodel, fullmodel)
#see what is in the object
str(a)
## Classes 'anova' and 'data.frame':    2 obs. of  6 variables:
##  $ Res.Df   : num  10 6
##  $ RSS      : num  49.1 29.1
##  $ Df       : num  NA 4
##  $ Sum of Sq: num  NA 20
##  $ F        : num  NA 1.03
##  $ Pr(>F)   : num  NA 0.464
##  - attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Model 1: potatoy ~ treatment\nModel 2: potatoy ~ treatment + buttery + grassy + rancid + painty"
#extract out the info
a$`Pr(>F)`[2]
## [1] 0.4638846

Answer: There is no significant difference in r-squared.

3. Calculate the semipartial correlation of the added set of predictors. How does this relate to the statistic in the previous question? What does the semipartial correlation represent?

#change in adjusted Rsquared
chgadjR2 = summary(fullmodel)$adj.r.squared - summary(treatmodel)$adj.r.squared 
chgadjR2
## [1] 0.008265667

Answer: The adjusted R-squared is the R-squared statistic but making sure it does not change due to adding more variables.

4. Calculate the partial correlation of the added set of predictors. What does this number represent?

#semipartial correlation of added set of predictors
sqrt(chgR2)
## [1] 0.528334
#partial correlation of added set of predictors
sqrt(chgR2/(1-summary(treatmodel)$r.squared))
## [1] 0.6376259

Answer: The partial correlation is the correlation between a predictor and the outcome with other predictors held constant for both that predicor and the outcome.