title: “Inference” output: html_document —

Inference with R

One sample proportion test:

The command is prop.test(x, n, p = what Ho says, alternative = “less” or “greater”) the less or greater depends on the alternative hypothesis. If your alternative is less than, then write “less”, if your alternative is greater than, then write “greater”, and if your alternative is not equal, then leave off the alternative statement. This will be the case in most situations.

Example: If n = 500 and x = 125 and your alternative is greater than 0.22, then

prop.test(125, 400, p = 0.22, alternative = "greater")

## 
##  1-sample proportions test with continuity correction
## 
## data:  125 out of 400
## X-squared = 19.409, df = 1, p-value = 5.275e-06
## alternative hypothesis: true p is greater than 0.22
## 95 percent confidence interval:
##  0.2745463 1.0000000
## sample estimates:
##      p 
## 0.3125

t-test

For a one sample t-test, the command is t.test(~variable, mu=,alternative=“less” or “greater”, data=Data Frame) The less or greater depends on the alternative hypothesis. If your alternative is less than, then use“less”, If your alternative is greater than, then use“greater”. If your alternative is not equal, then leave off the alternative statement. Again, usually this statement will be left off.

Example: If you want to see if the emission levels of a new engine are less than the national standards of 20, then

Engine <-read.csv("https://krkozak.github.io/MAT160/engine.csv")
t.test(~emission, data=Engine, alternative="less", mu=20)

## 
##  One Sample t-test
## 
## data:  emission
## t = -3.0016, df = 9, p-value = 0.007458
## alternative hypothesis: true mean is less than 20
## 95 percent confidence interval:
##      -Inf 18.89829
## sample estimates:
## mean of x 
##     17.17

One sample proportion confidence level

The command is :prop.test(x, n, conf.level=C)

Am example:if you want to find a 90% confidence interval the population proportion from a sample with n = 500 and x = 125

prop.test(125, 500, conf.level=0.90)

## 
##  1-sample proportions test with continuity correction
## 
## data:  125 out of 500
## X-squared = 124, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 90 percent confidence interval:
##  0.2185980 0.2841772
## sample estimates:
##    p 
## 0.25

One sample t confidence interval

The command is t.test(~variable, data=Data Frame, conf.level=C)

Example:Find the 95% confidence interval for the amount of emission produced by the new engine being developed.

Engine <-read.csv("https://krkozak.github.io/MAT160/engine.csv")
t.test(~emission, data=Engine, conf.level=.95)

## 
##  One Sample t-test
## 
## data:  emission
## t = 18.211, df = 9, p-value = 2.071e-08
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  15.0372 19.3028
## sample estimates:
## mean of x 
##     17.17

Two-sample proportion test

The R command is prop.test(x=c(x1, x2), n=c(n1, n2), alternative = “less” or “greater”) the less or greater depends on the alternative hypothesis. If your alternative is not equal, then leave off the alternative statement.

Example: you want to see if proportion 1 is different from proportion 2, using a sample of n1=1000, n2=1200, x1=231, and x2=176.

prop.test(x=c(231, 176), n=c(1000,1200))

## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(231, 176) out of c(1000, 1200)
## X-squared = 25.173, df = 1, p-value = 5.241e-07
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.05050705 0.11815962
## sample estimates:
##    prop 1    prop 2 
## 0.2310000 0.1466667

If you want a confidence interval, then add conf.level=C, where C is the confidence level in decimal.

Two-sample paired test:

You need to mutate your Data Frame to include a new variable that finds the difference between two variable values for each individual. Suppose the two variables in your Data Frame that you want to find the difference in are called variablea and variableb. Then the command is

NewData Frame<-Data Frame%>% mutate(difference_variable=variablea-variableb, data=Data Frame)

Now the command for testing the hypothesis on R is t.test(~difference_variable,alternative=“less”or“greater”, data=NewData Frame). If your alternative is not equal, then leave off the alternative statement.

Example= suppose you have the weight of a woman before the weight loss program and the weight of a woman after the weight loss program

Diet <-read.csv("https://krkozak.github.io/MAT160/weight_before_after.csv")
Newdiet<-Diet%>%
mutate(diff=before-after)
t.test(~diff, data=Newdiet, alternative="greater")

## 
##  One Sample t-test
## 
## data:  diff
## t = 16.738, df = 5, p-value = 6.955e-06
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  18.91162      Inf
## sample estimates:
## mean of x 
##      21.5

Two-sample paired confidence interval:

You need to mutate your Data Frame to include a new variable that finds the difference between two variable values for each individual. Suppose the two variables in your data stet that you want to find the difference in are called variablea and variableb. Then the command is

NewData Frame<-Data Frame%>% mutate(difference_variable=variablea-variableb) Now the command for creating a confidence interval in R is

t.test(~difference_variable, conf.level=C, data=NewData Frame). C must be as a decimal.

Example= suppose you want to find a 90% confidence interval the amount of weight a woman lost from before the weight loss program to after the weight loss program

Diet <-read.csv("https://krkozak.github.io/MAT160/weight_before_after.csv")
Newdiet<-Diet%>%
mutate(diff=before-after)
t.test(~diff, data=Newdiet, conf.level=0.90)

## 
##  One Sample t-test
## 
## data:  diff
## t = 16.738, df = 5, p-value = 1.391e-05
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
##  18.91162 24.08838
## sample estimates:
## mean of x 
##      21.5

Two-sample independent t-test

To calculate a hypothesis test when you are comparing two populations, you need to figure out what the variable is and what the quantitative variable that is the factor you are saying they are different. Think of the factor as the qualitative variable that you are making the statement that the quantitative variable being different about The command is: t.test(variable~factor, data=Data Frame, alternative=“less” or “greater”) The less or greater depends on the alternative hypothesis. If the alternative is not equal, then leave off the alternative statement, usually the case. You need your Data Frame to be tidy data.

Example: Suppose you want to see if the amount of calories in beef hot dogs is different than the number of calories in poultry. Note: for this Data Frame the quantitative variable is calories and the qualitative variable is type in this case.

Hotdogs <-read.csv("https://krkozak.github.io/MAT160/hotdog_1.csv")
t.test(calories~type, data=Hotdogs)

## 
##  Welch Two Sample t-test
## 
## data:  calories by type
## t = 5.11, df = 34.09, p-value = 1.229e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  22.94024 53.23035
## sample estimates:
##    mean in group Beef mean in group Poultry 
##              156.8500              118.7647

Two-sample independent confidence interval

t.test(variable~factor, conf.level=C, data=Data Frame)

Example: Suppose you want to calculate a 95% confidence interval for the difference in the mean amount of calories in beef hot dogs versus the amount in poultry. Note: for this Data Frame the quantitative variable is calories and the qualitative variable is type in this case.

Hotdogs <-read.csv("https://krkozak.github.io/MAT160/hotdog_1.csv")
t.test(calories~type, conf.level=0.95, data=Hotdogs)

## 
##  Welch Two Sample t-test
## 
## data:  calories by type
## t = 5.11, df = 34.09, p-value = 1.229e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  22.94024 53.23035
## sample estimates:
##    mean in group Beef mean in group Poultry 
##              156.8500              118.7647