aragorn = rnorm(50, mean = 180, sd = 10)
gimli = rnorm(50, mean = 132, sd = 15)
legolas = rnorm(50, mean = 195, sd = 15)
t.test(legolas, aragorn, alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: legolas and aragorn
## t = 6.6858, df = 87.098, p-value = 2.099e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 12.01283 22.17695
## sample estimates:
## mean of x mean of y
## 195.6145 178.5196
t.test(legolas, gimli, alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: legolas and gimli
## t = 20.826, df = 97.721, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 57.61429 69.75091
## sample estimates:
## mean of x mean of y
## 195.6145 131.9319
Since the p-value for both t-tests is approaching zero, there is evidence with high confidence of significant differences between the height of Legolas and Aragorn or Gimli.
var.test(gimli, legolas)
##
## F test to compare two variances
##
## data: gimli and legolas
## F = 1.1128, num df = 49, denom df = 49, p-value = 0.7098
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.6314953 1.9609876
## sample estimates:
## ratio of variances
## 1.112814
The p-value is high (p = 0.7505), indicating there is no significant difference in variance.
library(magrittr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
iris = read.csv("iris.csv")
iris %>%
group_by(Species)
## # A tibble: 150 x 6
## # Groups: Species [3]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Code
## <dbl> <dbl> <dbl> <dbl> <chr> <int>
## 1 5.1 3.5 1.4 0.2 setosa 1
## 2 4.9 3 1.4 0.2 setosa 1
## 3 4.7 3.2 1.3 0.2 setosa 1
## 4 4.6 3.1 1.5 0.2 setosa 1
## 5 5 3.6 1.4 0.2 setosa 1
## 6 5.4 3.9 1.7 0.4 setosa 1
## 7 4.6 3.4 1.4 0.3 setosa 1
## 8 5 3.4 1.5 0.2 setosa 1
## 9 4.4 2.9 1.4 0.2 setosa 1
## 10 4.9 3.1 1.5 0.1 setosa 1
## # … with 140 more rows
setosa = iris %>%
filter(Species == "setosa")
versicolor = iris %>%
filter(Species == "versicolor")
virginica = iris %>%
filter(Species == "virginica")
cor.test(setosa$Sepal.Length, setosa$Sepal.Width)
##
## Pearson's product-moment correlation
##
## data: setosa$Sepal.Length and setosa$Sepal.Width
## t = 7.6807, df = 48, p-value = 6.71e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5851391 0.8460314
## sample estimates:
## cor
## 0.7425467
cor.test(versicolor$Sepal.Length, versicolor$Sepal.Width)
##
## Pearson's product-moment correlation
##
## data: versicolor$Sepal.Length and versicolor$Sepal.Width
## t = 4.2839, df = 48, p-value = 8.772e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2900175 0.7015599
## sample estimates:
## cor
## 0.5259107
cor.test(virginica$Sepal.Length, virginica$Sepal.Width)
##
## Pearson's product-moment correlation
##
## data: virginica$Sepal.Length and virginica$Sepal.Width
## t = 3.5619, df = 48, p-value = 0.0008435
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2049657 0.6525292
## sample estimates:
## cor
## 0.4572278
The low p-value for all species indiates that there is a significant correlation betwen Sepal Length and Sepal Width among species.
deer = read.csv("deer.csv")
str(deer)
## 'data.frame': 1182 obs. of 9 variables:
## $ Farm : chr "AL" "AL" "AL" "AL" ...
## $ Month : int 10 10 10 10 10 10 10 10 10 10 ...
## $ Year : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Sex : int 1 1 1 1 1 1 1 1 1 1 ...
## $ clas1_4: int 4 4 3 4 4 4 4 4 4 4 ...
## $ LCT : num 191 180 192 196 204 190 196 200 197 208 ...
## $ KFI : num 20.4 16.4 15.9 17.3 NA ...
## $ Ecervi : num 0 0 2.38 0 0 0 1.21 0 0.8 0 ...
## $ Tb : int 0 0 0 0 NA 0 NA 1 0 0 ...
table(deer$Month)
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## 256 165 27 3 2 35 11 19 58 168 189 188
chisq.test(table(deer$Month))
##
## Chi-squared test for given probabilities
##
## data: table(deer$Month)
## X-squared = 997.07, df = 11, p-value < 2.2e-16
Since the p-value for this test is low, it suggests there is a significant difference in the number of deer caught per month
table(deer$Tb, deer$Farm)
##
## AL AU BA BE CB CRC HB LCV LN MAN MB MO NC NV PA PN QM RF RN
## 0 10 23 67 7 88 4 22 0 28 27 16 186 24 18 11 39 67 23 21
## 1 3 0 5 0 3 0 1 1 6 24 5 31 4 1 0 0 7 1 0
##
## RO SAL SAU SE TI TN VISO VY
## 0 31 0 3 16 9 16 13 15
## 1 0 1 0 10 0 2 1 4
chisq.test(table(deer$Tb, deer$Farm))
## Warning in chisq.test(table(deer$Tb, deer$Farm)): Chi-squared approximation may
## be incorrect
##
## Pearson's Chi-squared test
##
## data: table(deer$Tb, deer$Farm)
## X-squared = 129.09, df = 26, p-value = 1.243e-15
Since the p-value is so low, it is unlikely the relationship between tuberculosis cases and farms occured due to random chance.