Week 3 YAY!
1. What are some disadvantages of using experimental control (i.e. random assignment) that statistical control doesn’t have?
Answer: Some disadvantages include issues that arise when sample size is small. This increases the chance that some hidden variable is related to group membership. Another issue is that it can be hard to detect true differences between randomized groups, when using a control group might be more appropriate.
2. What are some limitations of statistical control?
Answer: Some limitations of statistical control include the risk of over or under fitting the data, as well as running into issues related to collinearity. Suppression is also an issue that can arise, which is why it’s important to know about your regressors of interest before conducting the analysis.
3. What advantages are there to random assignment?
Answer: Random assignment has the main advantage of “washing out” any possible unknown variable that may be related to group membership. It allows you to account for hidden or unknown variables without knowing what they are.
4. What are the advantages of using experimental and statistical control together?
Answer: Using these methods together will hopefully increase the interpretability of your model while also keeping the bias low.
5. What is cross-validation and why is it useful?
Answer: Cross validation is the process of taking half of your data and calculating a regression equation with that data. Then, that regression equation is used/tested on the remaining half of the data, and a correlation is computed between the predicted Y and the true Y of the test data. This process is repeated in the other direction, and the two correlation coefficients are averaged to obtain the best fit.
6. Why is shrunken R useful?
Answer: This is useful because it simplifies the model to only include regressors that explain the most variance of Y.
7. What are forward and backward stepwise regression?
Answer: Forward stepwise regression is the process of calculating a regression model with the goal of finding the regressors that have the largest R squared value. This process continues for all regressors until they’re all included, or until some pre-determined threshold is reached, such as only including R squared values that are significant.
8. Why is stepwise regression problematic?
Answer: This is problematic because it’s essentially p-hacking. It’s important to approach regression with an idea of what regressors should be included.
9. What are the advantages and disadvantages of using regressed change scores and gain/difference scores?
Answer: Using gain scores have the advantage of showing when the gain in your correlation is positive and negative, but they have the disadvantage of having low reliability, or only showing evidence of regression to the mean phenomenon. Using regressed scores instead can show us the actual distance of the data point from the regression line. It can inform us about how much gain in Y is coming from each X in the model.
Prompt: In this assignment we use taste ratings of French fries that may or may not have been treated. Our outcome is how potatoy the French fries taste. 6 people tasted French fries with the treatment and 6 people tasted French fires without the treatment. They rated how potatoy, buttery, grassy, rancid, and painty the fries tasted. Let’s predict potato taste from treatment condition.
Before we run any analyses, let’s load our packages and data and make sure it’s in good shape
Here are the packages we will use this week
library(QuantPsyc)
## Loading required package: boot
## Loading required package: MASS
##
## Attaching package: 'QuantPsyc'
## The following object is masked from 'package:base':
##
## norm
library(psych)
##
## Attaching package: 'psych'
## The following object is masked from 'package:boot':
##
## logit
library(pwr)
Here we load the data
ff = read.csv("hw3data.csv")
#let's see what we got here
str(ff)
## 'data.frame': 696 obs. of 9 variables:
## $ time : int 1 1 1 1 1 1 1 1 1 1 ...
## $ treatment: int 1 1 1 1 1 1 1 1 1 1 ...
## $ subject : int 3 3 10 10 15 15 16 16 19 19 ...
## $ rep : int 1 2 1 2 1 2 1 2 1 2 ...
## $ potatoy : num 2.9 14 11 9.9 1.2 8.8 9 8.2 7 13 ...
## $ buttery : num 0 0 6.4 5.9 0.1 3 2.6 4.4 3.2 0 ...
## $ grassy : num 0 0 0 2.9 0 3.6 0.4 0.3 0 3.1 ...
## $ rancid : num 0 1.1 0 2.2 1.1 1.5 0.1 1.4 4.9 4.3 ...
## $ painty : num 5.5 0 0 0 5.1 2.3 0.2 4 3.2 10.3 ...
head(ff) #show only a few rows
## time treatment subject rep potatoy buttery grassy rancid painty
## 1 1 1 3 1 2.9 0.0 0.0 0.0 5.5
## 2 1 1 3 2 14.0 0.0 0.0 1.1 0.0
## 3 1 1 10 1 11.0 6.4 0.0 0.0 0.0
## 4 1 1 10 2 9.9 5.9 2.9 2.2 0.0
## 5 1 1 15 1 1.2 0.1 0.0 1.1 5.1
## 6 1 1 15 2 8.8 3.0 3.6 1.5 2.3
#Next we have to do a few things to our data to make it work for the current research questions
#removing treatment 2 so will compare 1 to 3
#also removing all but first time point
#and removing pre-test
ff = ff[!ff$treatment == 2 & ff$time == 1 & ff$rep == 2,]
#"random" assignment - just assigning first 6 people to 1 condition and second 6 to people to other condition
#doing this by subestting since in reality each person got both treatments
length(unique(ff$subject))
## [1] 12
ff = ff[c(1:6, 19:24),]
#removing unnecesary columns
ff= ff[,c(2, 5:9)]
#dummy coding dichtomous predictor
ff$treatment[ff$treatment == 1] = 0
ff$treatment[ff$treatment == 3] = 1
#describe dataset and correlations
describe(ff)
## vars n mean sd median trimmed mad min max range skew kurtosis
## treatment 1 12 0.50 0.52 0.50 0.50 0.74 0 1.0 1.0 0.00 -2.16
## potatoy 2 12 9.75 2.55 9.35 9.80 1.63 5 14.0 9.0 0.08 -0.91
## buttery 3 12 2.69 2.62 2.75 2.53 4.00 0 7.0 7.0 0.18 -1.69
## grassy 4 12 1.51 1.59 1.15 1.44 1.70 0 3.7 3.7 0.18 -1.93
## rancid 5 12 2.16 2.08 1.45 1.99 1.56 0 6.0 6.0 0.72 -1.04
## painty 6 12 2.63 3.82 0.00 2.13 0.00 0 10.3 10.3 0.92 -0.86
## se
## treatment 0.15
## potatoy 0.74
## buttery 0.76
## grassy 0.46
## rancid 0.60
## painty 1.10
print(cor(ff)) #wrapping in print will do just that, print!
## treatment potatoy buttery grassy rancid painty
## treatment 1.00000000 -0.5598476 0.02322993 -0.09322804 -0.2640749 -0.03644808
## potatoy -0.55984762 1.0000000 -0.24299827 -0.40605343 0.2061370 -0.23152006
## buttery 0.02322993 -0.2429983 1.00000000 0.16992407 -0.0604959 -0.28119059
## grassy -0.09322804 -0.4060534 0.16992407 1.00000000 0.2127916 0.58280403
## rancid -0.26407493 0.2061370 -0.06049590 0.21279163 1.0000000 0.50401512
## painty -0.03644808 -0.2315201 -0.28119059 0.58280403 0.5040151 1.00000000
1. create a scatter plot of potato taste as a function of treatment. What does the scatterplot reveal?
#dichotomous predictor
plot(ff$treatment, ff$potatoy)
Answer: This scatterplot reveals that when there is no treatment, potatoy-ness is lower than without treatment.
2. Run a regression model predicting potatoy taste from treatment. What is the intercept and slope? What do these numbers represent?
#running model
treatmodel = lm(potatoy ~ treatment, ff)
#seeing output
summary(treatmodel)
##
## Call:
## lm(formula = potatoy ~ treatment, data = ff)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3833 -1.4917 0.0167 1.8917 2.8833
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.1167 0.9046 12.289 2.33e-07 ***
## treatment -2.7333 1.2793 -2.137 0.0584 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.216 on 10 degrees of freedom
## Multiple R-squared: 0.3134, Adjusted R-squared: 0.2448
## F-statistic: 4.565 on 1 and 10 DF, p-value: 0.05837
Answer: The intercept in this output is 11.12 and the slope is -2.73. This suggests that for each “unit” of treatment, (in this case it’s 1 or 0), there is a decrease of -2.73 units on the potatoy scale.
3. Calculate a standardized regression coefficient and the correlation between potatoy taste and treatment. Are they the same or different? Why?
#betas and rs
lm.beta(treatmodel)
## treatment
## -0.5598476
#some more info
cor.test(ff$potatoy, ff$treatment)
##
## Pearson's product-moment correlation
##
## data: ff$potatoy and ff$treatment
## t = -2.1366, df = 10, p-value = 0.05837
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.85805754 0.02070715
## sample estimates:
## cor
## -0.5598476
Answer: They are the same! Because the definition of the standardized regression coefficient is logically equivalent to the correlation coefficient.
5. If we convert the correlation to a t-value, that t-value will equal the t-value we saw where?
#can also get the r and store it in an object with:
treatr = cor(ff$treatment, ff$potatoy)
#can convert this r to a t and get the same t value in the regression model
sqrt(10*treatr^2/(1-treatr^2))
## [1] 2.13662
Answer: this was our slope (b value for treatment) in our regression a few questions back!
1. Given our sample size and an alpha level of .05, what effect size do we have 80% power to detect? Do we have low or high power?
#what effect size do we have 80% power to detect?
pwr.r.test(n = 12, sig.level = .05, power = .8)
##
## approximate correlation power calculation (arctangh transformation)
##
## n = 12
## r = 0.7183955
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
Answer: The effect size to have 80% power would be our r value, in this case is approximately 0.72.
2. Given our sample size and an alpha level of .05, what power do we have for the effect size we found? This is called ____ power.
#what power do we have for the effect size we found?
pwr.r.test(n = 12, r = treatr, sig.level = .05)
##
## approximate correlation power calculation (arctangh transformation)
##
## n = 12
## r = 0.5598476
## sig.level = 0.05
## power = 0.501989
## alternative = two.sided
Answer: Since our effect size (in this case r value) is lower than what’s needed by about 0.2, then we can conclude that our power in this case is about 50% power. This is called weak power.
3. Given an alpha level of .05, how many cases would we need in a follow-up study to have 80% power to detect the same effect size we obtained. This is called ___ power.
#how many people would we need to have 80% power for our effect size
pwr.r.test(r = treatr, sig.level = .05, power = .8)
##
## approximate correlation power calculation (arctangh transformation)
##
## n = 21.88629
## r = 0.5598476
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
Answer: We would need to have at least 22 people (since you cannot have 0.8 of a person, I rounded up) in order to acheive 80% power. This is called strong power.
1. Build a regression model in which potatoy taste is predicted by treatment and how buttery, grassy, rancid, and painty the French fries were. Rank these predictors in order of their importance (more accurately, effect consistency) based on the regression coefficients. Should you use bs or betas for this?
#adding a set of predictors
fullmodel = lm(potatoy ~ treatment + buttery + grassy + rancid + painty, ff)
summary(fullmodel)
##
## Call:
## lm(formula = potatoy ~ treatment + buttery + grassy + rancid +
## painty, data = ff)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.60479 -1.11475 0.00082 1.21511 2.82154
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.2108 1.5100 8.087 0.000192 ***
## treatment -2.5759 1.3476 -1.912 0.104488
## buttery -0.2268 0.2980 -0.761 0.475426
## grassy -0.5133 0.5869 -0.875 0.415396
## rancid 0.3066 0.3980 0.770 0.470277
## painty -0.1708 0.2851 -0.599 0.570968
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.204 on 6 degrees of freedom
## Multiple R-squared: 0.5926, Adjusted R-squared: 0.253
## F-statistic: 1.745 on 5 and 6 DF, p-value: 0.2581
lm.beta(fullmodel)
## treatment buttery grassy rancid painty
## -0.5276057 -0.2333181 -0.3195422 0.2497109 -0.2559847
Answer: In this case, using betas might be better, given the occurence of negative values in the model. In any case, one would include the regressors that have the largest b values. In this case, it would be: treatment, and maybe grassy? The others are so much smaller in magnitude that they are likely contributing very little to the model and predictability.
2. How much better is our model? What statistic do we use for this? Calculate a p-value for this statistic.
#change in R-squared
chgR2 = summary(fullmodel)$r.squared - summary(treatmodel)$r.squared
chgR2
## [1] 0.2791368
#is this significant?
anova(treatmodel, fullmodel)
## Analysis of Variance Table
##
## Model 1: potatoy ~ treatment
## Model 2: potatoy ~ treatment + buttery + grassy + rancid + painty
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 10 49.097
## 2 6 29.136 4 19.961 1.0277 0.4639
#to get exact p-value
#create object from anova test
a = anova(treatmodel, fullmodel)
#see what is in the object
str(a)
## Classes 'anova' and 'data.frame': 2 obs. of 6 variables:
## $ Res.Df : num 10 6
## $ RSS : num 49.1 29.1
## $ Df : num NA 4
## $ Sum of Sq: num NA 20
## $ F : num NA 1.03
## $ Pr(>F) : num NA 0.464
## - attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Model 1: potatoy ~ treatment\nModel 2: potatoy ~ treatment + buttery + grassy + rancid + painty"
#extract out the info
a$`Pr(>F)`[2]
## [1] 0.4638846
Answer: With treatment alone, our beta was 0.55, and in this case our value is 0.46, which is a smaller value. We can deduce that including these additional regressors does not improve our model.
3. Calculate the semipartial correlation of the added set of predictors. How does this relate to the statistic in the previous question? What does the semipartial correlation represent?
#change in adjusted Rsquared
chgadjR2 = summary(fullmodel)$adj.r.squared - summary(treatmodel)$adj.r.squared
chgadjR2
## [1] 0.008265667
Answer: This statistic represents the strength of how much the added predictors contribute to the overall predictability of the model. Given how small this value is, we can interpret this to be consistent with the last answer, in that these additional regressors do not help our model be any more predictable.
4. Calculate the partial correlation of the added set of predictors. What does this number represent?
#semipartial correlation of added set of predictors
sqrt(chgR2)
## [1] 0.528334
#partial correlation of added set of predictors
sqrt(chgR2/(1-summary(treatmodel)$r.squared))
## [1] 0.6376259
Answer: The partial correlation indicates the variance accounted for by the regressors in our model, which in this case is a decent amount (~0.64), but we can conclude that the additional regressors beyond treatment do not contribute to the model in a meaningful way.