Week_3_HW_Ashley

Week 3 YAY!

Conceptual questions

1. What are some disadvantages of using experimental control (i.e. random assignment) that statistical control doesn’t have?

Answer: One disadvantage is the possibility that other third variables may be influencing a relationship, and experimenter control can't always account for that. Another disadvantage is that experimental control needs sufficiently large Ns to avoid chance or null results.

2. What are some limitations of statistical control?

Answer: You have to be worried about undercontrol (not accounting for all causes of Y correlated with), and overcontrol (not including effects of Y).

3. What advantages are there to random assignment?

Answer: When using random assignment, no variables should be correlated to group membership. Essentially, it prevents results that might be due to participants natural groups. Additionally, it is a necessary requirement for inferring causality.

4. What are the advantages of using experimental and statistical control together?

Answer: Capitalizing on the advantages of both, causal claims may be made with experimental control, and controlling for covariates can help create a proper model.

5. What is cross-validation and why is it useful?

Answer: Cross-validation is a way to check the predictive power of a model on a new sample. It uses one sample to compute a regression equation, another to test it, and then the values are compared.

6. Why is shrunken R useful?

Answer: Shrunken R is useful because it doesn't have a lot of the bias associated with regular R2.

7. What are forward and backward stepwise regression?

Answer: Forward stepwise regression is adding regressors into the model based on high R2 scores. Backwards stepwise regression does the opposite, removes the regressor with the smallest R2 score.

8. Why is stepwise regression problematic?

Answer: It will only enter significant regressors into the model, which is a good thing unless real explanatory variables are non-significant.

9. What are the advantages and disadvantages of using regressed change scores and gain/difference scores?

Answer: One disadvantage is that results that come from gain/difference scores might be due to regression to the mean, rather than an actual effect. A second disadvantage is that positive gain scores tend to have low reliability. Some advantages include that it is useful for seeing trends in data.

Dichotomous predictor (aplied questions)

Prompt: In this assignment we use taste ratings of French fries that may or may not have been treated. Our outcome is how potatoy the French fries taste. 6 people tasted French fries with the treatment and 6 people tasted French fires without the treatment. They rated how potatoy, buttery, grassy, rancid, and painty the fries tasted. Let’s predict potato taste from treatment condition.

Before we run any analyses, let's load our packages and data and make sure it's in good shape

Here are the packages we will use this week

library(QuantPsyc)

## Loading required package: boot

## Loading required package: MASS

## 
## Attaching package: 'QuantPsyc'

## The following object is masked from 'package:base':
## 
##     norm

library(psych)

## 
## Attaching package: 'psych'

## The following object is masked from 'package:boot':
## 
##     logit

library(pwr)

Here we load the data

ff = read.csv("hw3data.csv")
#let's see what we got here
str(ff)

## 'data.frame':    696 obs. of  9 variables:
##  $ time     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ treatment: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ subject  : int  3 3 10 10 15 15 16 16 19 19 ...
##  $ rep      : int  1 2 1 2 1 2 1 2 1 2 ...
##  $ potatoy  : num  2.9 14 11 9.9 1.2 8.8 9 8.2 7 13 ...
##  $ buttery  : num  0 0 6.4 5.9 0.1 3 2.6 4.4 3.2 0 ...
##  $ grassy   : num  0 0 0 2.9 0 3.6 0.4 0.3 0 3.1 ...
##  $ rancid   : num  0 1.1 0 2.2 1.1 1.5 0.1 1.4 4.9 4.3 ...
##  $ painty   : num  5.5 0 0 0 5.1 2.3 0.2 4 3.2 10.3 ...

head(ff) #show only a few rows

##   time treatment subject rep potatoy buttery grassy rancid painty
## 1    1         1       3   1     2.9     0.0    0.0    0.0    5.5
## 2    1         1       3   2    14.0     0.0    0.0    1.1    0.0
## 3    1         1      10   1    11.0     6.4    0.0    0.0    0.0
## 4    1         1      10   2     9.9     5.9    2.9    2.2    0.0
## 5    1         1      15   1     1.2     0.1    0.0    1.1    5.1
## 6    1         1      15   2     8.8     3.0    3.6    1.5    2.3

#Next we have to do a few things to our data to make it work for the current research questions
#removing treatment 2 so will compare 1 to 3
#also removing all but first time point
#and removing pre-test
ff = ff[!ff$treatment == 2 & ff$time == 1 & ff$rep == 2,]
#"random" assignment - just assigning first 6 people to 1 condition and second 6 to people to other condition
#doing this by subestting since in reality each person got both treatments
length(unique(ff$subject))

## [1] 12

ff = ff[c(1:6, 19:24),]
#removing unnecesary columns
ff= ff[,c(2, 5:9)]
#dummy coding dichtomous predictor
ff$treatment[ff$treatment == 1] = 0
ff$treatment[ff$treatment == 3] = 1
#describe dataset and correlations
describe(ff)

##           vars  n mean   sd median trimmed  mad min  max range skew kurtosis
## treatment    1 12 0.50 0.52   0.50    0.50 0.74   0  1.0   1.0 0.00    -2.16
## potatoy      2 12 9.75 2.55   9.35    9.80 1.63   5 14.0   9.0 0.08    -0.91
## buttery      3 12 2.69 2.62   2.75    2.53 4.00   0  7.0   7.0 0.18    -1.69
## grassy       4 12 1.51 1.59   1.15    1.44 1.70   0  3.7   3.7 0.18    -1.93
## rancid       5 12 2.16 2.08   1.45    1.99 1.56   0  6.0   6.0 0.72    -1.04
## painty       6 12 2.63 3.82   0.00    2.13 0.00   0 10.3  10.3 0.92    -0.86
##             se
## treatment 0.15
## potatoy   0.74
## buttery   0.76
## grassy    0.46
## rancid    0.60
## painty    1.10

print(cor(ff)) #wrapping in print will do just that, print!

##             treatment    potatoy     buttery      grassy     rancid      painty
## treatment  1.00000000 -0.5598476  0.02322993 -0.09322804 -0.2640749 -0.03644808
## potatoy   -0.55984762  1.0000000 -0.24299827 -0.40605343  0.2061370 -0.23152006
## buttery    0.02322993 -0.2429983  1.00000000  0.16992407 -0.0604959 -0.28119059
## grassy    -0.09322804 -0.4060534  0.16992407  1.00000000  0.2127916  0.58280403
## rancid    -0.26407493  0.2061370 -0.06049590  0.21279163  1.0000000  0.50401512
## painty    -0.03644808 -0.2315201 -0.28119059  0.58280403  0.5040151  1.00000000

1. create a scatter plot of potato taste as a function of treatment. What does the scatterplot reveal?

#dichotomous predictor
plot(ff$treatment, ff$potatoy)

Answer: Scatter plot of buttery ratings by treatment looks bimodal.

2. Run a regression model predicting potatoy taste from treatment. What is the intercept and slope? What do these numbers represent?

#running model
treatmodel = lm(potatoy ~ treatment, ff) 
#seeing output
summary(treatmodel)

## 
## Call:
## lm(formula = potatoy ~ treatment, data = ff)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3833 -1.4917  0.0167  1.8917  2.8833 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.1167     0.9046  12.289 2.33e-07 ***
## treatment    -2.7333     1.2793  -2.137   0.0584 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.216 on 10 degrees of freedom
## Multiple R-squared:  0.3134, Adjusted R-squared:  0.2448 
## F-statistic: 4.565 on 1 and 10 DF,  p-value: 0.05837

Answer: The intercept is 11.11, this represents rating that most participants start with. Slope is -2.73 (is it?) which suggests that for every unit of treatment, potatoey taste goes down -2.73 units.

3. Calculate a standardized regression coefficient and the correlation between potatoy taste and treatment. Are they the same or different? Why?

#betas and rs
lm.beta(treatmodel)

##  treatment 
## -0.5598476

#some more info
cor.test(ff$potatoy, ff$treatment)

## 
##  Pearson's product-moment correlation
## 
## data:  ff$potatoy and ff$treatment
## t = -2.1366, df = 10, p-value = 0.05837
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.85805754  0.02070715
## sample estimates:
##        cor 
## -0.5598476

Answer: No they are not different, because of math reasons, a standardized regression with one IV and DV is just a correlation if there are no other regressors to control for.

5. If we convert the correlation to a t-value, that t-value will equal the t-value we saw where?

#can also get the r and store it in an object with:
treatr = cor(ff$treatment, ff$potatoy)
#can convert this r to a t and get the same t value in the regression model
sqrt(10*treatr^2/(1-treatr^2))

## [1] 2.13662

Answer: No, not exactly. It is the same number, but this one is positive rather than negative.

Power (applied)

1. Given our sample size and an alpha level of .05, what effect size do we have 80% power to detect? Do we have low or high power?

#what effect size do we have 80% power to detect?
pwr.r.test(n = 12, sig.level = .05, power = .8)

## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 12
##               r = 0.7183955
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided

Answer: We can detect an effect size of .71 with 80% power.

2. Given our sample size and an alpha level of .05, what power do we have for the effect size we found? This is called ____ power.

#what power do we have for the effect size we found?
pwr.r.test(n = 12, r = treatr, sig.level = .05)

## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 12
##               r = 0.5598476
##       sig.level = 0.05
##           power = 0.501989
##     alternative = two.sided

Answer: We have low power (because we were not able to detect out effect, despite power being at 50%)?

3. Given an alpha level of .05, how many cases would we need in a follow-up study to have 80% power to detect the same effect size we obtained. This is called ___ power.

#how many people would we need to have 80% power for our effect size
pwr.r.test(r = treatr, sig.level = .05, power = .8)

## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 21.88629
##               r = 0.5598476
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided

Answer: About 22 people (technically 21.88, but the IRB doesn't like it when you try to split people into pieces so we are forced to round up). This is called high power?

Adding multiple predictors

1. Build a regression model in which potatoy taste is predicted by treatment and how buttery, grassy, rancid, and painty the French fries were. Rank these predictors in order of their importance (more accurately, effect consistency) based on the regression coefficients. Should you use bs or betas for this?

#adding a set of predictors
fullmodel = lm(potatoy ~ treatment + buttery + grassy + rancid + painty, ff)
summary(fullmodel)

## 
## Call:
## lm(formula = potatoy ~ treatment + buttery + grassy + rancid + 
##     painty, data = ff)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.60479 -1.11475  0.00082  1.21511  2.82154 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.2108     1.5100   8.087 0.000192 ***
## treatment    -2.5759     1.3476  -1.912 0.104488    
## buttery      -0.2268     0.2980  -0.761 0.475426    
## grassy       -0.5133     0.5869  -0.875 0.415396    
## rancid        0.3066     0.3980   0.770 0.470277    
## painty       -0.1708     0.2851  -0.599 0.570968    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.204 on 6 degrees of freedom
## Multiple R-squared:  0.5926, Adjusted R-squared:  0.253 
## F-statistic: 1.745 on 5 and 6 DF,  p-value: 0.2581

lm.beta(fullmodel)

##  treatment    buttery     grassy     rancid     painty 
## -0.5276057 -0.2333181 -0.3195422  0.2497109 -0.2559847

Answer: I am leaning towards the answer that we should use b's, because I'm not certain the units are meaningful. Rank order: treatment, grassy, rancid, buttery, painty.

2. How much better is our model? What statistic do we use for this? Calculate a p-value for this statistic.

#change in R-squared
chgR2 = summary(fullmodel)$r.squared - summary(treatmodel)$r.squared 
chgR2

## [1] 0.2791368

#is this significant?
anova(treatmodel, fullmodel)

## Analysis of Variance Table
## 
## Model 1: potatoy ~ treatment
## Model 2: potatoy ~ treatment + buttery + grassy + rancid + painty
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     10 49.097                           
## 2      6 29.136  4    19.961 1.0277 0.4639

#to get exact p-value
#create object from anova test
a = anova(treatmodel, fullmodel)
#see what is in the object
str(a)

## Classes 'anova' and 'data.frame':    2 obs. of  6 variables:
##  $ Res.Df   : num  10 6
##  $ RSS      : num  49.1 29.1
##  $ Df       : num  NA 4
##  $ Sum of Sq: num  NA 20
##  $ F        : num  NA 1.03
##  $ Pr(>F)   : num  NA 0.464
##  - attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Model 1: potatoy ~ treatment\nModel 2: potatoy ~ treatment + buttery + grassy + rancid + painty"

#extract out the info
a$`Pr(>F)`[2]

## [1] 0.4638846

Answer: Our model is a little bit better, it went from .25 to .27, we compared the two R adjusted values to determine this. No, not significant, we got a p-value of .46 :(

3. Calculate the semipartial correlation of the added set of predictors. How does this relate to the statistic in the previous question? What does the semipartial correlation represent?

#change in adjusted Rsquared
chgadjR2 = summary(fullmodel)$adj.r.squared - summary(treatmodel)$adj.r.squared 
chgadjR2

## [1] 0.008265667

Answer: Semi-partial correlation represents the correlation between the outcome Y and the residual of a regressor X. This relates to the previous question about R2 because semi-partial correlations are changes in R2.

4. Calculate the partial correlation of the added set of predictors. What does this number represent?

#semipartial correlation of added set of predictors
sqrt(chgR2)

## [1] 0.528334

#partial correlation of added set of predictors
sqrt(chgR2/(1-summary(treatmodel)$r.squared))

## [1] 0.6376259

Answer: This number represents the correlation between X and Y when the residual of X has been partialed out of both X and Y.

Week_3_HW_Ashley_Miller

Conceptual questions

Dichotomous predictor (aplied questions)

Power (applied)

Adding multiple predictors