title: “Inference” output: html_document —

Inference with R

One sample proportion test:

The command is prop.test(x, n, p = what Ho says, alternative = “less” or “greater”) the less or greater depends on the alternative hypothesis. If your alternative is less than, then write “less”, if your alternative is greater than, then write “greater”, and if your alternative is not equal, then leave off the alternative statement.

Example: If n = 500 and x = 125 and your alternative is greater than 0.22, then

prop.test(125, 400, p = 0.22, alternative = "greater")

## 
##  1-sample proportions test with continuity correction
## 
## data:  125 out of 400
## X-squared = 19.409, df = 1, p-value = 5.275e-06
## alternative hypothesis: true p is greater than 0.22
## 95 percent confidence interval:
##  0.2745463 1.0000000
## sample estimates:
##      p 
## 0.3125

t-test

For a one sample t-test, the command is t.test(~variable, mu=,alternative=“less” or “greater”, data=Dataset) The less or greater depends on the alternative hypothesis. If your alternative is less than, then use“less”, If your alternative is greater than, then use“greater”. If your alternative is not equal, then leave off the alternative statement.

Example: If you want to see if the emission levels of a new engine are less than the national standards of 20, then

Engine <-read.csv("https://krkozak.github.io/MAT160/engine.csv")
t.test(~emission, data=Engine, alternative="less", mu=20)

## 
##  One Sample t-test
## 
## data:  emission
## t = -3.0016, df = 9, p-value = 0.007458
## alternative hypothesis: true mean is less than 20
## 95 percent confidence interval:
##      -Inf 18.89829
## sample estimates:
## mean of x 
##     17.17

One sample proportion confidence level

The command is :prop.test(x, n, conf.level=C)

Am example:if you want to find a 90% confidence interval the population proportion from a sample with n = 500 and x = 125

prop.test(125, 500, conf.level=0.90)

## 
##  1-sample proportions test with continuity correction
## 
## data:  125 out of 500
## X-squared = 124, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 90 percent confidence interval:
##  0.2185980 0.2841772
## sample estimates:
##    p 
## 0.25

One sample t confidence interval

The command is t.test(~variable, data=Dataset, conf.level=C)

Example:Find the 95% confidence interval for the amount of emission produced by the new engine being developed.

Engine <-read.csv("https://krkozak.github.io/MAT160/engine.csv")
t.test(~emission, data=Engine, conf.level=.95)

## 
##  One Sample t-test
## 
## data:  emission
## t = 18.211, df = 9, p-value = 2.071e-08
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  15.0372 19.3028
## sample estimates:
## mean of x 
##     17.17

Two-sample proportion test

The R command is prop.test(x=c(x1, x2), n=c(n1, n2), alternative = “less” or “greater”) the less or greater depends on the alternative hypothesis. If your alternative is not equal, then leave off the alternative statement.

Example: you want to see if proportion 1 is greater than proportion 2, using a sample of n1=1000, n2=1200, x1=231, and x2=176.

prop.test(x=c(231, 176), n=c(1000,1200), alternative = "greater")

## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(231, 176) out of c(1000, 1200)
## X-squared = 25.173, df = 1, p-value = 2.621e-07
## alternative hypothesis: greater
## 95 percent confidence interval:
##  0.05579805 1.00000000
## sample estimates:
##    prop 1    prop 2 
## 0.2310000 0.1466667

If you want a confidence interval, then replace the alternative=“greater” statement with conf.level=C, where C is a a decimal.

Two-sample paired test:

You need to mutate your Dataset to include a new variable that finds the difference between two variable values for each individual. Suppose the two variables in your dataset that you want to find the difference in are called variablea and variableb. Then the command is

Newdataset<-Dataset%>% mutate(difference_variable=variablea-variableb, data=Dataset)

Now the command for testing the hypothesis on R is t.test(~difference_variable,alternative=“less”or“greater”, data=Newdataset). If your alternative is not equal, then leave off the alternative statement.

Example= suppose you have the weight of a woman before the weight loss program and the weight of a woman after the weight loss program

Diet <-read.csv("https://krkozak.github.io/MAT160/weight_before_after.csv")
Newdiet<-Diet%>%
mutate(diff=before-after)
t.test(~diff, data=Newdiet, alternative="greater")

## 
##  One Sample t-test
## 
## data:  diff
## t = 16.738, df = 5, p-value = 6.955e-06
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  18.91162      Inf
## sample estimates:
## mean of x 
##      21.5

Two-sample paired confidence interval:

You need to mutate your Dataset to include a new variable that finds the difference between two variable values for each individual. Suppose the two variables in your data stet that you want to find the difference in are called variablea and variableb. Then the command is

Newdataset<-Dataset%>% mutate(difference_variable=variablea-variableb) Now the command for creating a confidence interval in R is

t.test(~difference_variable, conf.level=C, data=Newdataset). C must be as a decimal.

Example= suppose you want to find a 90% confidence interval the amount of weight a woman lost from before the weight loss program to after the weight loss program

Diet <-read.csv("https://krkozak.github.io/MAT160/weight_before_after.csv")
Newdiet<-Diet%>%
mutate(diff=before-after)
t.test(~diff, data=Newdiet, conf.level=0.90)

## 
##  One Sample t-test
## 
## data:  diff
## t = 16.738, df = 5, p-value = 1.391e-05
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
##  18.91162 24.08838
## sample estimates:
## mean of x 
##      21.5

Two-sample independent t-test

To calculate a hypothesis test when you are comparing two populations, you need to figure out what the variable is and what the quantitative variable that is the factor you are saying they are different. Think of the factor as the qualitative variable that you are making the statement that the quantitative variable being different about The command is: t.test(variable~factor, data=Dataset, alternative=“less” or “greater”) The less or greater depends on the alternative hypothesis. If the alternative is not equal, then leave off the alternative statement. You need your dataset to be tidy data.

Example: Suppose you want to see if the amount of calories in beef hot dogs is greater than the calories in poultry. Note: for this Dataset the quantitative variable is calories and the qualitative variable is type in this case.

Hotdogs <-read.csv("https://krkozak.github.io/MAT160/hotdog_1.csv")
t.test(calories~type, alternative="greater", data=Hotdogs)

## 
##  Welch Two Sample t-test
## 
## data:  calories by type
## t = 5.11, df = 34.09, p-value = 6.143e-06
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  25.4836     Inf
## sample estimates:
##    mean in group Beef mean in group Poultry 
##              156.8500              118.7647

Note: R may not have the factor in the same order as you have them. To check on this, if you think beef has more calories than poultry, then make sure that in your Dataset, that beef is listed as the type for the first individual. If not, then let alternative = “less” instead of “greater”.

Two-sample independent confidence interval

t.test(variable~factor, conf.level=C, data=Dataset)

Example: Suppose you want to calculate a 95% confidence interval for the difference in the mean amount of calories in beef hot dogs versus the amount in poultry. Note: for this Dataset the quantitative variable is calories and the qualitative variable is type in this case.

Hotdogs <-read.csv("https://krkozak.github.io/MAT160/hotdog_1.csv")
t.test(calories~type, conf.level=0.95, data=Hotdogs)

## 
##  Welch Two Sample t-test
## 
## data:  calories by type
## t = 5.11, df = 34.09, p-value = 1.229e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  22.94024 53.23035
## sample estimates:
##    mean in group Beef mean in group Poultry 
##              156.8500              118.7647