Plan for today
- Hypothesis Testing
- Difference in Means
- Regressions
January 15, 2018
Plan for today
Why do we need hypothesis testing?
Using Probability Theory to Measure Type 1 Errors
Sample vs Population
Formula for differences in means:
\(\frac{{\overline x_1 - \overline x_2}}{\sqrt{\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}}} = t\)
Checking if a mean of one group is greater than a value (V):
\(\frac{{\overline x - V}}{\sqrt{\frac{\sigma^2}{n}}} = t\)
mean(A) ; mean(B)
## [1] 49.88979
## [1] 44.47433
sd(A) ; sd(B)
## [1] 4.247403
## [1] 3.729741
sum(!is.na(A)) ; sum(!is.na(B))
## [1] 30
## [1] 30
t.test function:t.test(A, B)
## ## Welch Two Sample t-test ## ## data: A and B ## t = 5.2475, df = 57.047, p-value = 2.358e-06 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 3.348931 7.481990 ## sample estimates: ## mean of x mean of y ## 49.88979 44.47433
Although for each variable in the regression we increase dimensions, we can still look at a cross section of our plot.
## ## Call: ## lm(formula = Cirrhosis_death_rate ~ Pct_urban + Liquor_consumption_per_capita + ## Wine_consumption_per_capita, data = dat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -28.5939 -5.0002 0.7397 7.2051 18.1331 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.8706 7.1618 0.540 0.591738 ## Pct_urban 0.4965 0.1414 3.512 0.001078 ** ## Liquor_consumption_per_capita 0.2286 0.1002 2.281 0.027702 * ## Wine_consumption_per_capita 1.6008 0.3919 4.085 0.000194 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 10.96 on 42 degrees of freedom ## Multiple R-squared: 0.796, Adjusted R-squared: 0.7814 ## F-statistic: 54.62 on 3 and 42 DF, p-value: 1.503e-14
summary(mod)$r.squared
## [1] 0.7959899
summary(mod)$adj.r.squared
## [1] 0.7814178
Direction of Causality
Ommitted variable bias
Linearity of Parameters
Conditional Expected Value of Error is 0
No Heteroskedasticity
No Perfect Co-linearity
## male female ## 1: 1 0 ## 2: 0 1 ## 3: 1 0 ## 4: 0 1 ## 5: 0 1
Data is Randomly 'Drawn' from the Population