Concept Study Guide
Statistical Modeling: A Fresh Approach
Daniel Kaplan
The questions are organized not by topic but by cognitive level, according to Bloom's Taxonomy
Knowledge
- What is a standard deviation?
- What is a median?
- What is a percentile?
- What is an outlier?
- What's an categorical variable?
- What is a correlation coefficient?
- What is a model value?
- What is a variance?
- \( R^2 \) is a ratio. Of what?
- What's an indicator variable (sometimes called “dummy” variable)?
- What's a backdoor pathway?
- What's a null hypothesis?
- What are the two possible outcomes of a hypothesis test?
- What is a probability?
- What is an odds?
- What is a sampling frame?
- What is a random sample?
- What is a “rank transform”?
- What is resampling?
- What is shuffling?
- What is the form of a confidence interval?
- What's a Type I error? A Type II error?
- What is “bootstrapping”?
- What is a vector?
- What does it mean to be “orthogonal”?
- What is a skew distribution?
Comprehension
- For a given variable, how are the variance and the standard deviation related?
- Why does it make no sense to calculate the mean of a categorical variable?
- How do backdoor pathways potentially confuse interpretation of correlations as causation?
- Which percentile is the median?
- Why are indicator variables used to represent categorical variables?
- What is the criterion used for fitting a model (in linear models)?
- Why is the median robust to outliers?
- Why is the range of a variable not robust to outliers?
- Is the variance robust to outliers?
- What's the highest and lowest possible values for \( R^2 \)? Why?
- What's the difference between a null hypothesis and an alternative hypothesis?
- In ANOVA, F is a ratio. Of what?
- In a regression report, t is a ratio. Of what?
- An odds and a probability relate the same information. How are they related?
- How does sampling introduce variation?
- What is a sampling distribution?
- What is a confidence interval?
- What's the relationship between a standard error, a sampling distribution, and a margin of error?
- What's an intercept term?
- To which hypothesis does a Type I error relate? A Type II error?
- How do model vectors relate to a quantitative variable?
- What model vectors are created for a categorical variable?
- What kind of number is each of these things: (e.g. an integer, positive, negative, and so on)
- a degree of freedom?
- a variance?
- a standard deviation?
- a correlation coefficient \( r \)?
- a coefficient of determination \( R^2 \)?
- How does an experiment differ from an observational study?
- What is a conditional probability and how does it differ from a joint or marginal probability?
- What is a probability density? How is it related to a cumulative probability?
- What is the link between these two conditional probabilities: \( p(a|b) \) and \( p(b|a) \)?
Application
- How do you calculate the variance of a variable?
- How do you calculate the variance of residuals from a model?
- How do you include an intercept term in a model? How do you exclude it?
- How would you display a distribution density for a quantitative variable?
- How do you include a main effect in a model?
- How do you include an interaction term in a model?
- How do you include a covariate in a model?
- How do you produce a regression report from a model?
- How do you produce an ANOVA report from a model?
- How does a p-value guide the outcome of a hypothesis test?
- If you fail to reject the null hypothesis, does that mean the alternative hypothesis is right?
- What is a risk ratio/probability ratio?
- Why are random samples preferred?
- What's the difference in the uses of \( r \) and \( R^2 \)?
- What are main effects and how do they differ from interaction terms?
- How do you resample a data set?
- How do you shuffle a variable in a data set (when constructing a model)?
- How would you measure the angle between two variables?
- Broadly speaking, how do you perform an experiment? What's are the essential things to do?
- What is a rank transformation and what does it do to the distribution of a variable?
- What does the term “least squares” refer to?
Analysis
- What's the relationship between the variance of model values, of residuals, and of the response?
- Why use a covariate in a model?
- What is the difference between an explanatory and a response variable?
- What is an alternative hypothesis used for?
- How can one use the power of a hypothesis test in interpreting the outcome of the test?
- How would use use a hypothetical causal network to identify covariates to include or exclude in a model?
- What's the relationship between the degrees of freedom listed in an ANOVA table?
- Given the units of a response variable A and explanatory variables B and C:
- What are the units of the intercept?
- What are the units of the coefficient on the main effect B?
- What are the units of the coefficient on an interaction term B:C?
- What's the difference between a p-value and a significance level?
- For what purpose is resampling used?
- For what purpose is shuffling used?
- What does it mean for covariates to “eat variance”?
- How are degrees of freedom used?
- What does it mean to say that two model terms are “colinear”?
Synthesis
- What will happen to \( R^2 \) when you add a new explanatory term to a model?
- What factors influence the size of standard errors of coefficients?
- What's the point of converting an F statistic to a p-value?
- Why is orthogonality advantageous when interpreting a model?
- How could you change a model (not the data!) to reduce the size of the residuals?
- When would you use a rank transform?
- When would you use logistic regression instead of ordinary least squares regression?
Evaluation
- How would you judge whether including a covariate in a model improves the model?
- How would you decide which is more appropriate for a given purpose: an ANOVA or a regression report?
- What's the advantage of using log odds in modeling probabilities?
- Why use a 95% percent coverage interval?
- How would you decide on an appropriate alternative hypothesis?
- How would you decide if a random sample is really random?
- Are smaller residuals always a sign that a model is better?