Midterm

Zero-order correlation (r) measures the strength and direction of the linear relationship between two variables, without controlling for the effects of other variables. The merit of r is its simplicity and ease of interpretation. However, its drawback is that it may not accurately reflect the true relationship between the variables, as it does not account for the effects of covariates or confounding variables. Partial correlation measures the strength and direction of the linear relationship between two variables of interest, while controlling for the effects of other variables of interest. The merit of pr is that it provides a more accurate estimate of the true relationship between the variables by accounting for confounding variables. However, it removes variance accounting for other non-focal variables from both the predictor and the outcome and can not accurately measure incremental variance of a variable over and above the other variables of interest. Partial correlation is often referred to as the “true(r)” relationship because it isolates how the predictor and outcome are correlated, removing any other measured variables from that specific relationship. In contrast, semipartial correlation is often referred to as the “over and above” relationship between the predictor and the outcome–removing the correlations that other measured variables have with the predictor only, making the predictor have to account for whatever covariability in the outcome it can by itself. So essentially, partial correlation is removing the shared variance between all the variables, whereas the semi-partial correlation removes the redundant variance between the predictors and accounts for the incremental variance.

2a) The unstandardized beta coefficient (b) represents the change in the predictor variable (Y) for change in the outcome variable (X) in hyperspace in raw units (units in their original scale), while holding all other variables constant. The standardized beta coefficient (b\(*\)) represents the change in variable (Y) for a one standard deviation change in variable (X), while holding all other variables constant. In a simple correlation, the two variables are each transformed into z-scores by subtracting their respective means and dividing by their respective standard deviations. The correlation coefficient is then calculated using the z-scores of the two variables, resulting in a standardized measure of the relationship between the two variables. In a partial and semi partial correlation, the variables are standardized in the formula.

2b) As mentioned above, in zero order correlation the variables are transformed to z scores in the process of correlation analysis, when subtracting the variables’ means and dividing by their respective standard deviations. In a partial and semi partial correlation, the variables are standardized in the formula. The standardized coefficient is calculated in the process of the regression analysis, and it is used to adjust the relationship between the two variables of interest for the effects of the other variables. For example in the partial correlation formula if one knows the zero order correlations among Y, Z, and X, where rXY is the zero-order correlation between variables X and Y, rXZ is the zero-order correlation between X and Z, and rYZ is the zero-order correlation between Y and Z, then the standard deviation has been calculated in those simple correlations.

2c) In all three forms of correlation analysis (r, pr, sr) the relationship between the variables is bidirectional, whereas in calculating the standardized beta coefficient one can predict the change in Y based on the change in X and vice versa, therefore the relationship has a unidirectional and predictive nature to it. The standardized beta coefficient (b\(*\)) is useful for comparing the relative relationship of different predictors in the same model to the outcome variable (least distance between data points in multidimensional space), regardless of the scale of the variables. In a multiple regression analysis, the predictor variables may have different units of measurement and different ranges of values (e.g., meters for one variable and years for the other, and minutes for another). Therefore, it is necessary to standardize the predictor variables by transforming them to z-scores. The resulting standardized beta coefficients indicate the amount of change in the outcome variable associated with a one standard deviation change in the predictor variable, while holding all other predictors constant.

3a) The comparison strategy of looking at the p-values for the same predictors between the two groups is not the best because the p-values only mean the probability of accepting the ‘null hypothesis’, and do not mean the probability of accepting the ‘study hypothesis’. The NHST only explains whether the relationship between the tested variables exists by chance. 3b) A multiple regression analysis with sustained attention as the outcome variable, and IQ, working memory capacity, age, and group (depressed vs. non-depressed) as predictor variables would be a good method for the researcher to examine. The interaction between each predictor and group could also be included in the model to test for differences of the predictors between the two groups. This would allow the researcher to examine whether the predictors have different effects on sustained attention for the two groups, while controlling for the effects of the other predictors and the group factor.

4a) The expected value of multiple R under the null hypothesis depends on the ratio of cases and predictors. The rule of thumb is 10 cases to 1 predictor, based on the formula of (k/(N-1)). The observed multiple R gets subtracted from the above formula’s results. In the case of our university researcher friend having 10 predictors with the n of 30, that the expectation with completely random data under the null hypothesis would be formulated as 10/(30-1)= .35, so his multiple R of .35 getting subtracted from the formula’s result would be 0, thus not a good fit under the null hypothesis testing. 4b) The previous researchers having only 5 predictors which would result in 5/(30-1)=.17 and if subtracted from .22 would qualify as a good fit for the model. In order to remedy our researcher’s problem it’s wiser to either expand the sample size or reduce the number of predictors in order to avoid the inflation of the multiple R under the null hypothesis.

Midterm

John Majoubi

4/28/2023