Today in class, we reviewed topics from Probability and Mathematical Statistics, split up into groups and explained different hypothesis tests to the class, and began using RStudio.
The Central Limit Theorem was the main item that we reviewed from Probability class. The main idea being that no matter the original distribution, when there is a large set of sample means from the same population, those sample means will follow a normal distribution.
When we split up into groups, each group was given a situation where a hypothesis test was involved, but sometimes the test was for two-sided alternative hypothesis or maybe the parameters for the test were unknown. My group explained how to run a one-sided hypothesis test for a mean where only sample statistics are known.
Step 1. Identify the null and alternate hypothesis.
* Ho: mu=c
* Ha: mu>c
Step 2. State a significance level (denoted “alpha”)
Step 3. Calculate the test statistic
* T(n-1)~(mu-c)/(s/n^(1/2))
Step 4. Calculate the t-value such that P(T>t)=alpha
Step 5. Calculate the p-value
Step 6. If the p-value is less than the significance level, reject the null hypothesis
This method was the method used in previous courses, but, thanks to another group’s presentation, I now know how to conduct the same test in R.
To begin, I load the data that I wish to use. In this case, I would like to use the annual depth of Lake Huron.
data("LakeHuron")
Now that R knows that I would like to use the Lake Huron data, I can perform R functions using my new Lake Huron data. I will now display a summary of the data.
summary(LakeHuron)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 576.0 578.1 579.1 579.0 579.9 581.9
Now, I would like to run a one-sided hypothesis test for the mean using my sample data. Because it is one-sided I use “greater” or “less”. I will assign a standard significance level of 0.05 by listing the confidence level as (1-0.05). I notice that the sample mean is 579.0 feet, I will assign this as my sample mean in the t-test.
t.test(LakeHuron, alternative=c("greater"),conf.level=0.95,mu=579.0)
##
## One Sample t-test
##
## data: LakeHuron
## t = 0.03065, df = 97, p-value = 0.4878
## alternative hypothesis: true mean is greater than 579
## 95 percent confidence interval:
## 578.7829 Inf
## sample estimates:
## mean of x
## 579.0041
The R output tells me a couple of things:
1. My test statistic (t=0.03)
2. Degrees of freedom (df=97)
3. The number of data points I had (df+1=98)
4. The p-value for this test (p-value=0.49)
5. A confirmation of my alternative hypothesis (mu>579)
6. A confirmation of my selected confidence level (95%)
From these facts, I can choose to reject or fail to reject the null hypothesis. Because the p-value>significance level (0.49>0.05), I fail to reject the null hypothesis.
While re-learning how to do hypothesis tests in R is a good skill, the most valuable lesson from today’s class was the tutorial on R and how to create and publish an R Markdown file. Prior to today’s class, I had been familiar with R, but had only had experience creating R Scripts and running them in the console. Whenever I wanted to explain something to myself I would comment out an explanation for the line of code I just typed.
x=4+9 #Will sum the values 4 and 9 and save the sum to the variable "x"
Now, knowing how to create an R Markdown file, this process is much cleaner.