Week 3 YAY!

Conceptual questions

1. What are some disadvantages of using experimental control (i.e. random assignment) that statistical control doesn’t have?

Answer: One disadvantage to experimental control are that it can create situations that are not realistic, since the variables and conditions are under such tight controls, it is at risk that the for producing data that might be skewed in the more desired direction or might produce a positive result but that result cannot be recreated in a real life circumstance. Additionally, there are some ethical/practical issues with experimental control, as sometimes variables cannot be manipulated as they need to be either because it would be unethical or because it is not possible. Lastly, effective research must be objective but it is not always possible to manipulate variables objectively, and this might reduce the accuracy or generlizability of the results.

2. What are some limitations of statistical control?

Answer: Limitations of statistical control include being at risk for over control and under control. All correlated causes of Y must be included in the regression model, and if they are not this means that the regression model was undercontroled when attempting to use statistical control to keep all extraneous factors constant. Additionally, a specified model will not include any effects of Y in the predictor set of X, and if this does occur then that means the regression model was subject to “overcontrol”.

3. What advantages are there to random assignment?

Answer: Random assignment is considered the best way to correctly specify a model. Random assignment allows you to manipulate the independent variable instead of measuring it, thus allowing for no variable to be correlated to group membership. This then strongly enables a causal inference.

4. What are the advantages of using experimental and statistical control together?

Answer:The advantage of using experimental control and statistical control is that extraneous factors can be controlled to limit the risk of having confounds effect the data while also ensuring that no variable will be correlated with group membership. Using random assignment with statistical control reduces the risk of overcontrolling or undercontrolling a specified model. Therefore, these two approaches balance the other’s limitations.

5. What is cross-validation and why is it useful?

Answer: Cross-validation is a technique used to assess the validity of the regression equation in a new sample, allowing you to assess the validity of using this equation in unseen data. The cross-validation technique is useful because it allows the researcher to flag issues of the model such as overfitting and seletion bias.

6. Why is shrunken R useful?

Answer: Shrunken R is useful because it allows you to predict what the mulitple R would be across validations.

7. What are forward and backward stepwise regression?

Answer: A forward stepwise regression is a regression that begins with no variables, testing the addition of each variable using a chosen model fit. A backward stepwise regression is a regression that begins with all the variables, and the variables get tested as through the removal of the variables using a model fit criterion.

8. Why is stepwise regression problematic?

Answer: Stepwise regression is problematic because this way of manipulating the variables could result in causal effects not showing up as significant, and extraneous variables showing up as significant.

9. What are the advantages and disadvantages of using regressed change scores and gain/difference scores?

Answer: An advantage to using change scores is that you model the change with respect to the covariaviate, and this allows you to assess the relationship between x and the change in score. The disadvantage is that change scores are unreliable and can bias the regression towards the mean. An Advantage of using gain/difference scores is that it allows you to measure the same variable in different conditions in order to assess a composite score of the pretest and posttest. A disadvantage is that the gain scores are unreliable and biased towards upward movement. Therefore, if students start off with low pretest scores, they are more likely to have higher scores, or make “gains”, in the posttest. Whereas, if they have higher scores in the pretest are likely to move down.

Dichotomous predictor (aplied questions)

Prompt: In this assignment we use taste ratings of French fries that may or may not have been treated. Our outcome is how potatoy the French fries taste. 6 people tasted French fries with the treatment and 6 people tasted French fires without the treatment. They rated how potatoy, buttery, grassy, rancid, and painty the fries tasted. Let’s predict potato taste from treatment condition.

Before we run any analyses, let’s load our packages and data and make sure it’s in good shape

Here are the packages we will use this week

library(QuantPsyc)
## Warning: package 'QuantPsyc' was built under R version 4.0.3
## Loading required package: boot
## Warning: package 'boot' was built under R version 4.0.3
## Loading required package: MASS
## 
## Attaching package: 'QuantPsyc'
## The following object is masked from 'package:base':
## 
##     norm
library(psych)
## Warning: package 'psych' was built under R version 4.0.3
## 
## Attaching package: 'psych'
## The following object is masked from 'package:boot':
## 
##     logit
library(pwr)

Here we load the data

ff = read.csv("hw3data.csv")
#let's see what we got here
str(ff)
## 'data.frame':    696 obs. of  9 variables:
##  $ time     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ treatment: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ subject  : int  3 3 10 10 15 15 16 16 19 19 ...
##  $ rep      : int  1 2 1 2 1 2 1 2 1 2 ...
##  $ potatoy  : num  2.9 14 11 9.9 1.2 8.8 9 8.2 7 13 ...
##  $ buttery  : num  0 0 6.4 5.9 0.1 3 2.6 4.4 3.2 0 ...
##  $ grassy   : num  0 0 0 2.9 0 3.6 0.4 0.3 0 3.1 ...
##  $ rancid   : num  0 1.1 0 2.2 1.1 1.5 0.1 1.4 4.9 4.3 ...
##  $ painty   : num  5.5 0 0 0 5.1 2.3 0.2 4 3.2 10.3 ...
head(ff) #show only a few rows
##   time treatment subject rep potatoy buttery grassy rancid painty
## 1    1         1       3   1     2.9     0.0    0.0    0.0    5.5
## 2    1         1       3   2    14.0     0.0    0.0    1.1    0.0
## 3    1         1      10   1    11.0     6.4    0.0    0.0    0.0
## 4    1         1      10   2     9.9     5.9    2.9    2.2    0.0
## 5    1         1      15   1     1.2     0.1    0.0    1.1    5.1
## 6    1         1      15   2     8.8     3.0    3.6    1.5    2.3
#Next we have to do a few things to our data to make it work for the current research questions
#removing treatment 2 so will compare 1 to 3
#also removing all but first time point
#and removing pre-test
ff = ff[!ff$treatment == 2 & ff$time == 1 & ff$rep == 2,]
#"random" assignment - just assigning first 6 people to 1 condition and second 6 to people to other condition
#doing this by subestting since in reality each person got both treatments
length(unique(ff$subject))
## [1] 12
ff = ff[c(1:6, 19:24),]
#removing unnecesary columns
ff= ff[,c(2, 5:9)]
#dummy coding dichtomous predictor
ff$treatment[ff$treatment == 1] = 0
ff$treatment[ff$treatment == 3] = 1
#describe dataset and correlations
describe(ff)
##           vars  n mean   sd median trimmed  mad min  max range skew kurtosis
## treatment    1 12 0.50 0.52   0.50    0.50 0.74   0  1.0   1.0 0.00    -2.16
## potatoy      2 12 9.75 2.55   9.35    9.80 1.63   5 14.0   9.0 0.08    -0.91
## buttery      3 12 2.69 2.62   2.75    2.53 4.00   0  7.0   7.0 0.18    -1.69
## grassy       4 12 1.51 1.59   1.15    1.44 1.70   0  3.7   3.7 0.18    -1.93
## rancid       5 12 2.16 2.08   1.45    1.99 1.56   0  6.0   6.0 0.72    -1.04
## painty       6 12 2.63 3.82   0.00    2.13 0.00   0 10.3  10.3 0.92    -0.86
##             se
## treatment 0.15
## potatoy   0.74
## buttery   0.76
## grassy    0.46
## rancid    0.60
## painty    1.10
print(cor(ff)) #wrapping in print will do just that, print!
##             treatment    potatoy     buttery      grassy     rancid      painty
## treatment  1.00000000 -0.5598476  0.02322993 -0.09322804 -0.2640749 -0.03644808
## potatoy   -0.55984762  1.0000000 -0.24299827 -0.40605343  0.2061370 -0.23152006
## buttery    0.02322993 -0.2429983  1.00000000  0.16992407 -0.0604959 -0.28119059
## grassy    -0.09322804 -0.4060534  0.16992407  1.00000000  0.2127916  0.58280403
## rancid    -0.26407493  0.2061370 -0.06049590  0.21279163  1.0000000  0.50401512
## painty    -0.03644808 -0.2315201 -0.28119059  0.58280403  0.5040151  1.00000000

1. create a scatter plot of potato taste as a function of treatment. What does the scatterplot reveal?

#dichotomous predictor
plot(ff$treatment, ff$potatoy)

Answer: The scatterplot does not reveal all of the treatment groups clearly. However, according to this scatterplot the first treatment group (labeled 0 on the scatterplot) has the highest potatoy score.

2. Run a regression model predicting potatoy taste from treatment. What is the intercept and slope? What do these numbers represent?

#running model
treatmodel = lm(potatoy ~ treatment, ff) 
#seeing output
summary(treatmodel)
## 
## Call:
## lm(formula = potatoy ~ treatment, data = ff)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3833 -1.4917  0.0167  1.8917  2.8833 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.1167     0.9046  12.289 2.33e-07 ***
## treatment    -2.7333     1.2793  -2.137   0.0584 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.216 on 10 degrees of freedom
## Multiple R-squared:  0.3134, Adjusted R-squared:  0.2448 
## F-statistic: 4.565 on 1 and 10 DF,  p-value: 0.05837

Answer: In this regression equation, the intercept is 11.117 and the slope is -2.73. The intercept represents the mean when the independent variable is 0, this indicates where the regression line intersects on the axis. The slope represents the direction and the steepness of the regression line.

3. Calculate a standardized regression coefficient and the correlation between potatoy taste and treatment. Are they the same or different? Why?

#betas and rs
lm.beta(treatmodel)
##  treatment 
## -0.5598476
#some more info
cor.test(ff$potatoy, ff$treatment)
## 
##  Pearson's product-moment correlation
## 
## data:  ff$potatoy and ff$treatment
## t = -2.1366, df = 10, p-value = 0.05837
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.85805754  0.02070715
## sample estimates:
##        cor 
## -0.5598476

Answer: The regression coefficient is -0.5598 and the correlation between potatoy and taste is -0.5598. These two values are the same. The regression coefficient is the same as the correlation because both coefficients measure the strength of the relationship between the two variables.

5. If we convert the correlation to a t-value, that t-value will equal the t-value we saw where?

#can also get the r and store it in an object with:
treatr = cor(ff$treatment, ff$potatoy)
#can convert this r to a t and get the same t value in the regression model
sqrt(10*treatr^2/(1-treatr^2))
## [1] 2.13662

Answer: The t-value will equal the t-value we saw in the Pearson Correlation.

Power (applied)

1. Given our sample size and an alpha level of .05, what effect size do we have 80% power to detect? Do we have low or high power?

#what effect size do we have 80% power to detect?
pwr.r.test(n = 12, sig.level = .05, power = .8)
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 12
##               r = 0.7183955
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided

Answer: We have a large effect size, as r= 0.718 and so it varies around 0.5 which is the indicator for a large effect size.

2. Given our sample size and an alpha level of .05, what power do we have for the effect size we found? This is called ____ power.

#what power do we have for the effect size we found?
pwr.r.test(n = 12, r = treatr, sig.level = .05)
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 12
##               r = 0.5598476
##       sig.level = 0.05
##           power = 0.501989
##     alternative = two.sided

Answer: We have approximately 50% power. This is called predictive power.

3. Given an alpha level of .05, how many cases would we need in a follow-up study to have 80% power to detect the same effect size we obtained. This is called ___ power.

#how many people would we need to have 80% power for our effect size
pwr.r.test(r = treatr, sig.level = .05, power = .8)
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 21.88629
##               r = 0.5598476
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided

Answer: We would need approximately 22 people to have 80% power. This is called conditional power.

Adding multiple predictors

1. Build a regression model in which potatoy taste is predicted by treatment and how buttery, grassy, rancid, and painty the French fries were. Rank these predictors in order of their importance (more accurately, effect consistency) based on the regression coefficients. Should you use bs or betas for this?

#adding a set of predictors
fullmodel = lm(potatoy ~ treatment + buttery + grassy + rancid + painty, ff)
summary(fullmodel)
## 
## Call:
## lm(formula = potatoy ~ treatment + buttery + grassy + rancid + 
##     painty, data = ff)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.60479 -1.11475  0.00082  1.21511  2.82154 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.2108     1.5100   8.087 0.000192 ***
## treatment    -2.5759     1.3476  -1.912 0.104488    
## buttery      -0.2268     0.2980  -0.761 0.475426    
## grassy       -0.5133     0.5869  -0.875 0.415396    
## rancid        0.3066     0.3980   0.770 0.470277    
## painty       -0.1708     0.2851  -0.599 0.570968    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.204 on 6 degrees of freedom
## Multiple R-squared:  0.5926, Adjusted R-squared:  0.253 
## F-statistic: 1.745 on 5 and 6 DF,  p-value: 0.2581
lm.beta(fullmodel)
##  treatment    buttery     grassy     rancid     painty 
## -0.5276057 -0.2333181 -0.3195422  0.2497109 -0.2559847

Answer: For this model, we should use b, the slope, to compare the predictors of how potatoy the french fry is.

2. How much better is our model? What statistic do we use for this? Calculate a p-value for this statistic.

#change in R-squared
chgR2 = summary(fullmodel)$r.squared - summary(treatmodel)$r.squared 
chgR2
## [1] 0.2791368
#is this significant?
anova(treatmodel, fullmodel)
## Analysis of Variance Table
## 
## Model 1: potatoy ~ treatment
## Model 2: potatoy ~ treatment + buttery + grassy + rancid + painty
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     10 49.097                           
## 2      6 29.136  4    19.961 1.0277 0.4639
#to get exact p-value
#create object from anova test
a = anova(treatmodel, fullmodel)
#see what is in the object
str(a)
## Classes 'anova' and 'data.frame':    2 obs. of  6 variables:
##  $ Res.Df   : num  10 6
##  $ RSS      : num  49.1 29.1
##  $ Df       : num  NA 4
##  $ Sum of Sq: num  NA 20
##  $ F        : num  NA 1.03
##  $ Pr(>F)   : num  NA 0.464
##  - attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Model 1: potatoy ~ treatment\nModel 2: potatoy ~ treatment + buttery + grassy + rancid + painty"
#extract out the info
a$`Pr(>F)`[2]
## [1] 0.4638846
pwr.r.test(n = 697, r = treatr, sig.level = .05)
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 697
##               r = 0.5598476
##       sig.level = 0.05
##           power = 1
##     alternative = two.sided

Answer: This model is a lot more comprehensive as it shows the relationship between the predictors. The p-value for this model is 1.

3. Calculate the semipartial correlation of the added set of predictors. How does this relate to the statistic in the previous question? What does the semipartial correlation represent?

#change in adjusted Rsquared
chgadjR2 = summary(fullmodel)$adj.r.squared - summary(treatmodel)$adj.r.squared 
chgadjR2
## [1] 0.008265667

Answer: This relates to the statistic in the previous question because the correlation between two variables with the influence of extraneous variables partialed out effects the strength of the p-value. The semi-partial correlation represents the correlation between each predictor and the y value with other variables partialed out. The semi-partial correlation is 0.0082.

4. Calculate the partial correlation of the added set of predictors. What does this number represent?

#semipartial correlation of added set of predictors
sqrt(chgR2)
## [1] 0.528334
#partial correlation of added set of predictors
sqrt(chgR2/(1-summary(treatmodel)$r.squared))
## [1] 0.6376259

Answer: This number represents how strong of a predictor just treatment is on how potatoy the french fries are while holding the other factors (rancid, grassy, painty) constant. Therefore, this number tells us what the contribution of treatment alone is to the variation of the dependent variable (potatoy).