1 Basic concepts in Stats

1.1 Sample, population, inferential statistics

What is a sample?

What is a population? (Think about sample mean and population mean)

What is inferential statistics? (Try to connect sample to population)

1.2 Data type and distribution

What are the common types of data?

How will you describe a continuous variable, a categorical variable?

What plot will you choose to visualize a continuous variable or a categorical variable?

What is left-skewed data and what is right-skewed data?

What is the numeric order of mean, median, and mode for a right-skewed data? What is the order if the data is normally distributed?

How will you choose to visualize the association between two variables if the two variables are:

  • variable 1 is continuous and variable 2 is also continuous
  • variable 1 is continuous and variable 2 is binary
  • variable 1 is continuous and variable 2 is categorical but not ordered
  • variable 1 is continuous and variable 2 is categorical and ordered
  • variable 1 is categorical and variable 2 is categorical

2 Hypothesis testing

2.1 What

What are the common steps of hypothesis testing?

How will you explain p-value to your neighbor who knows nothing about statistics?

What do you say if a test p-value is smaller than 0.05?

How will you explain confidence interval to your neighbor who somehow got interested in statistics after your previous question?

If you have to choose one, will you report p-value or confidence interval, why?

What is type-I error and what is type-II error?

How do type-I and type-II errors connect with power (or sample size)?

2.2 When and how

When do you use each of the following test?

  • T-test
  • F-test (ANOVA)
  • Chi-squared test
  • Pearson’s test
  • non-parametric tests such as Wilcoxon rank-sum test, Kruskall Wallis test, …

What test will you choose to understand the association between the following two variables, if:

  • variable 1 is continuous and variable 2 is also continuous
  • variable 1 is continuous and variable 2 is binary
  • variable 1 is continuous and variable 2 is categorical but not ordered
  • variable 1 is continuous and variable 2 is categorical and ordered
  • variable 1 is binary and variable 2 is binary
  • variable 1 is binary and variable 2 is categorical but not ordered
  • variable 1 is binary and variable 2 is categorical and ordered
  • variable 1 is categorical and variable 2 is categorical

What will you do if an ANOVA test was significant (comparison made across multiple groups)?

What will you suggest if multiple tests were conducted simultaneously?


3 Regression

What is simple linear regression, and what is multiple linear regression?

  • In what scenario is multiple linear regression helpful?

When do you use linear regression, and what are the assumptions?

What if the assumptions were violated?

What if the outcome variable is not numeric, what if it’s binary, ordinal, nominal?

How do you interpret \(\beta_1\) in this hypothetical model: \(income=\beta_0+\beta_1*age+\beta_2*gender+\beta_3*education\ years\)


4 Machine learning

What do you know about machine learning?

Is regression a type of machine learning?

What are the other types of machine learning models that you know of, and when were they commonly used?


5 Software

What are some statistical software that you use?

How familiar are you with each?

Pick one software, and give examples of what you used it in? (consider a past project that you used this software, what did you do? i.e., regression, t-test, data transformation, visualization?)

Why do you prefer this software than the other ones that you mentioned?