Calculations of Inference

One sample proportion test:

The command is

prop.test(x, n, p = what Ho says, alternative = “less” or “greater”) the less or greater depends on the alternative hypothesis. If your alternative is less than, then write “less”, if your alternative is greater than, then write “greater”, and if your alternative is not equal, then leave off the alternative statement.

Example:

If n = 500 and x = 125 and your alternative is greater than 0.22, then

prop.test(125, 400, p = 0.22, alternative = "greater") 
## 
##  1-sample proportions test with continuity correction
## 
## data:  125 out of 400
## X-squared = 19.409, df = 1, p-value = 5.275e-06
## alternative hypothesis: true p is greater than 0.22
## 95 percent confidence interval:
##  0.2745463 1.0000000
## sample estimates:
##      p 
## 0.3125

One sample proportion confidence level

The command is :prop.test(x, n, conf.level=C)

Example:

If you want to find a 90% confidence interval the population proportion from a sample with n = 500 and x = 125

prop.test(125, 500, conf.level=0.90) 
## 
##  1-sample proportions test with continuity correction
## 
## data:  125 out of 500
## X-squared = 124, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 90 percent confidence interval:
##  0.2185980 0.2841772
## sample estimates:
##    p 
## 0.25

t-test

For a one sample t-test, the command is

t.test(~variable, mu=ho says, alternative=“less” or “greater”, data=Dataset) The less or greater depends on the alternative hypothesis. If your alternative is less than, then use”less”, If your alternative is greater than, then use”greater”. If your alternative is not equal, then leave off the alternative statement.

Example:

If you want to see if the emission levels of a new engine are less than the national standards of 20, using the dataset Engine. The command would be:

t.test(~emission, data=Engine, alternative="less", mu=20)
## 
##  One Sample t-test
## 
## data:  emission
## t = -3.0016, df = 9, p-value = 0.007458
## alternative hypothesis: true mean is less than 20
## 95 percent confidence interval:
##      -Inf 18.89829
## sample estimates:
## mean of x 
##     17.17

One sample t confidence interval

The command is t.test(~variable, data=Dataset, conf.level=C)

Example:

Find the 95% confidence interval for the amount of emission produced by the new engine being developed using the dataset Engine.

t.test(~emission, data=Engine, conf.level=.95)
## 
##  One Sample t-test
## 
## data:  emission
## t = 18.211, df = 9, p-value = 2.071e-08
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  15.0372 19.3028
## sample estimates:
## mean of x 
##     17.17

Two-sample Proportion Test

The r command is prop.test(x=c(x1, x2), n=c(n1, n2), alternative = “less” or “greater”) the less or greater depends on the alternative hypothesis. If your alternative is not equal, then leave off the alternative statement.

Example:

You want to see if proportion 1 is greater than proportion 2, using a sample of n1=1000, n2=1200, x1=231, and x2=176.

prop.test(x=c(231, 176), n=c(1000,1200), alternative = "greater") 
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c out of c231 out of 1000176 out of 1200
## X-squared = 25.173, df = 1, p-value = 2.621e-07
## alternative hypothesis: greater
## 95 percent confidence interval:
##  0.05579805 1.00000000
## sample estimates:
##    prop 1    prop 2 
## 0.2310000 0.1466667

If you want a confidence interval, then replace the alternative=“greater” statement with conf.level=C, where C is a a decimal.

Two-sample paired test:

You need to mutate your Dataset to include a new variable that finds the difference between two variable values for each individual. Suppose the two variables in your dataset that you want to find the difference in are called variablea and variableb. Then the command is

Dataset<-

Dataset |>

mutate(difference_variable=variablea-variableb)

Now the command for testing the hypothesis on r is t.test(~difference_variable,alternative=“less”or”greater”, data=Dataset). If your alternative is not equal, then leave off the alternative statement.

Example:

Suppose you have the weight of a woman before the weight loss program and the weight of a woman after the weight loss program using the dataset Diet. First mutate the dataset.

Diet<-
  Diet|>
  mutate(difference=before-after)

To compute the test statistic and p-value

t.test(~difference, data=Diet, alternative="greater") 
## 
##  One Sample t-test
## 
## data:  difference
## t = 16.738, df = 5, p-value = 6.955e-06
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  18.91162      Inf
## sample estimates:
## mean of x 
##      21.5

Two-sample paired confidence interval:

You need to mutate your Dataset to include a new variable that finds the difference between two variable values for each individual. Suppose the two variables in your data stet that you want to find the difference in are called variablea and variableb. Then the command is

Dataset<-

Dataset|>

mutate(difference_variable=variablea-variableb)

Now the command for creating a confidence interval in r is

t.test(~difference_variable, conf.level=C, data=Dataset). C must be as a decimal.

Example

Suppose you want to find a 90% confidence interval the amount of weight a woman lost from before the weight loss program to after the weight loss program using the Diet dataset. The difference variable was created in the last example.

t.test(~difference, data=Diet, conf.level=0.90)
## 
##  One Sample t-test
## 
## data:  difference
## t = 16.738, df = 5, p-value = 1.391e-05
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
##  18.91162 24.08838
## sample estimates:
## mean of x 
##      21.5

Two-Sample Independent t-test

To calculate a hypothesis test when you are comparing two populations, you need to figure out what the quanitative variable is and what the categorical variable that is the factor you are saying they are different. Think of the factor as the categorical variable that you are making the statement that the quantitative variable being different about The command is: t.test(quantitative_variable~categorical, data=Dataset, alternative=“less” or “greater”) The less or greater depends on the alternative hypothesis. If the alternative is not equal, then leave off the alternative statement. You need your dataset to be tidy data.

Example:

Suppose you want to see if the amount of calories in beef hot dogs is different than the calories in poultry. Note: for this Dataset the quantitative variable is calories and the qualitative variable is type in this case.

t.test(calories~type, data=Hotdogs)
## 
##  Welch Two Sample t-test
## 
## data:  calories by type
## t = 5.11, df = 34.09, p-value = 1.229e-05
## alternative hypothesis: true difference in means between group Beef and group Poultry is not equal to 0
## 95 percent confidence interval:
##  22.94024 53.23035
## sample estimates:
##    mean in group Beef mean in group Poultry 
##              156.8500              118.7647

Note: R may not have the factor in the same order as you have them. To check on this, if you think beef has more calories than poultry, then make sure that in your Dataset, that beef is listed as the type for the first individual. If not, then let alternative = “less” instead of “greater”.

Two-sample independent confidence interval

t.test(quantitative_variable~categorical_variable, conf.level=C, data=Dataset)

Example:

Suppose you want to calculate a 95% confidence interval for the difference in the mean amount of calories in beef hot dogs versus the amount in poultry. Note: for the Hotdog dataset the quantitative variable is calories and the categorical variable is type in this case.

t.test(calories~type, conf.level=0.95, data=Hotdogs)
## 
##  Welch Two Sample t-test
## 
## data:  calories by type
## t = 5.11, df = 34.09, p-value = 1.229e-05
## alternative hypothesis: true difference in means between group Beef and group Poultry is not equal to 0
## 95 percent confidence interval:
##  22.94024 53.23035
## sample estimates:
##    mean in group Beef mean in group Poultry 
##              156.8500              118.7647

Note: R may not have the factor in the same order as you have them. To check on this, if you think beef has more calories than poultry, then make sure that in your Dataset, that beef is listed as the type for the first individual. If not, then let alternative = “less” instead of “greater”.

Chi-squared test for Independence

This can answer the questions about if two variables are dependent or not. The process of this can be done two different ways.

Way 1: You have a dataset:

cross_tabulation <- tally(variable 1 ~ variable 2, data = dataset), you can call this anything you want. You don’t have to call it cross_tabulation.

calculations:

chisq.test(cross_tabulation)

chisq.test(cross_tabulation)$expected - this lets you see the expected values so you can see if they are all more than 5

Way 2. You have a contingency table:

row1 = c(data from row 1 separated by commas)
row2 = c(data from row 2 separated by commas)
keep going until you have all of your rows typed in.
data.table = rbind(row1, row2, …) – makes the data into a table where the ... represents all the rows you have. You can call it what ever you want. It does not have to be data.table.
data.table – use if you want to look at the table
chisq.test(data.table) – calculates the chi-squared test for independence

chisq.test(data.table) $expected - this lets you see the expected values so you can see if they are all more than 5.

Example:

Is there a relationship between autism and breastfeeding? To determine if there is, a researcher asked mothers of autistic and non-autistic children to say what time period they breastfed their children. The data is in the following contingency table.

Autism and Breast Feeding Incidents
Breast Feeding
Autism Not Breast Feed Breast Feed less than 2 months Breast Feed 2 to 6 months Breast Feed more than 6 months
yes 241 198 164 215
no 20 25 27 44
yes = c(241, 198, 164, 215)
no = c(20, 25, 27, 44)
data.table = rbind(yes, no) 
data.table 
##     [,1] [,2] [,3] [,4]
## yes  241  198  164  215
## no    20   25   27   44
chisq.test(data.table)
## 
##  Pearson's Chi-squared test
## 
## data:  data.table
## X-squared = 11.217, df = 3, p-value = 0.01061
chisq.test(data.table)$expected
##          [,1]      [,2]      [,3]      [,4]
## yes 228.58458 195.30407 167.27837 226.83298
## no   32.41542  27.69593  23.72163  32.16702