Hypothesis Testing

20180212080200

Critical values

We have a Null Hypothesis and Alternative Hypothesis.

Our Alternative Hypothesis specifically stating that there is a difference from some value specified in the Null Hypothesis.

This could be a difference in either direction or a directional difference.

What we need is a criteria of what constitutes a significante difference

This is where critical values and p-values and Alpha Levels (α) come into play.

Overview

Alpha level  
Stating point, it´s our concept for difference in our chosen test.Our alpha level and our statistical test tells us what are our critical values.


Critical value  
Cutoff for differences. A statistic that has passed this critical value rejects our null hypothesis.  


p-value  
It relates to both as it tells us the porpotion of finding a test statistic from our chosen test purely by chance and if it smaller than our alpha we reject the null hypothesis.  

Hypothesis statements

1.Set up a Null and an Alternative Hypothesis;
2.Determine the Alpha Level (α);
3.Gather data;
4.Analyse data;
5 Draw conclusion.

20180212080401

Alpha level (α)

To understand what Alpha Levels (α) are we need an example.

Let´s use the following Cattle/Beef example!

The scenario of the investigation

  • A local ranch has 1000 head of adult steer;
  • The ranch wants to feed the steer a High Fat Feed Diet;
  • To see if there is a chance to increase their weigth;
  • The ranch randomly selects a sample of 32 adult head from the population;
  • And feeds them this special diet for 2 months;
  • The diet will be considered satisfactory if the ranch sees an average difference in the weight of the sample of steer of at least 15 pounds;
  • This will justify the cost of the new diet;
  • Hopefully it will be a weigth gain of greater then 15 pounds;
  • But in all fairness the diet could go both ways, gain or loss of weight;
  • If the cattle only gain or only loses weight we have a one-tailor hypothesis;
  • If the cattle might either gain or lose weight we have a two-tailored hypothesis;

The results after two months

In the figure above we can see that:

  • The Null Hypothesis states that after two months there is no significant difference in weight (gain or loss) from 15 pounds;
  • The Alternative Hypothesis states that after two months there is a significant difference in weight (gain or loss) greater then 15 pounds;
  • The sample from the population was 32 randomly selected steer;
  • After 2 months the sample showed an average weight gain/loss of 16.4 pounds;
  • And the Standard Deviation of Weight gain/loss for the population was 4.6.

With all this information we can perform a single sample z-test to determine if our sample mean of 16.4 of weight gain is different from the null of 15 pounds.

Visualization of the Null Hypothesis

We can see the proposed Null Hypothesis in the centre of the distribution.
We need to determine at what point along this distribution will the sample mean value be considered different from our Null Hypothesis.
This value is the Alpha Level (α).

Visualization of the Alpha Levels (α)

The Alpha Level (α) is ordinarily set at 0.05.

It means that we will say that something is different from the Null Hypothesis if the likelihood of it happening by chance is 5% or less.

In our case both tailes have equal possibility (either in the low-end and in the high-end) this is considered a two-tailed Hypothesis Test.

If we use an Alpha Level (α = 0.05) we would have:

20180212080402

Critical value

Since we are using Z-test and a Normal Distribuiton we can re-draw the graph as follows:

Observe that μ changes:

Before, μ = 15
Now, μ is equivalent to a z = 0

That is the idea of a z-score distribution.

Defining the Alpha Level (α)

In our example we are using α = 0.05.

Defining the Cutoffs for the chosen Alpha Level (α)

You can use the qnorm() function to calculate the cutoffs for our α. See here how.

So, for a distribution ranging from 0 to 1, the cutoffs for our α are: -1.959964 and 1.959964.

Their are specific to our statistc test and are known as critical values.

Observe that we are now using the values of the statistical test we are using, a z-test, in which case μ equals zero.

They are called critical because a statistical finding that falls beyond theses values fartheraways from the center of the distribution in either direction will cause us to reject our Null Hypothesis.

These values designate where at this specific Alpha Level (α) we would consider our finding to be diffent from the null.

Defining the z-score of the distribution
With these values we can calculte the equivalente z-scores.See here how.

Using the formula we find a z-score value of 1.72 for our example, this value is not outpast our critical values.

It is not in the critical region of our test distribution.

So we Fail to reject the Null Hypothesis.

If we want we can even see what these actual raw weight difference scores would be.

20180212080403

p-value

We know that our critical values informed by our alpha level show us where 5% of our test distribution lies.

Our sample mean had a row score of 16.4 which correspond to a z-statistic of 1.72.

A probability of finding a z stattistc of 1.72 or higher is:
1-pnom(1.72)=0.0425

A probability of finding a z stattistc of -1.72 or higher is:
1-pnom(-1.72)=0.0425

They are equal, adding them up we have our p-value

p-value = 0.0425+0.0425

p-value of the two tailed z-test p-value= 0.0852 = 8.52%

In this case our p-value does not fall below our alpha level, which means that we can not reject the Null Hypothesis.

20180214085000

Using a z-table

You use a z-table when you already have the z-value to find the p-value.

Let´s suppose you have a z-value = 0.41.

The way you use a z-table is in the following manner:

The first 2 numbers of the z-value are the row of the z-table. The third number is the column.

So, in our case:

Row = 0.4
Column = 0.01

From the z-table above, our p-value is:

p-value = 0.6591

20171125093800

ANOVA - Analysis of Variance

Verificar se há diferença entre a média de 3 ou mais grupos de uma mesma variável.

It helps us to determine natural error to group error. And determine if there is a difference somewhere between the groups.

Formally it asks if there is an impact of one INDEPENDENT categorical value on one DEPENDENCE quantitative variable.

It might work either with one categorical variabel One-Way ANOVA or two categorical variables Two-Way ANOVA, where one can see if the response across levels of one categorical variable is different based on another categorical variable.

What is the goal of an ANOVA analysis?
- To determine if significant mean differences exist between multiple groups

Two specific group means can be said to be significantly different if:
- a Tukey HSD pairwise comparison shows p < 0.05 (or the identified level of significance)

  1. One-Way ANOVA
  2. Two-Way ANOVA
  3. ANOVA - Como calcular manualmente os coeficientes

20171219085500

One-Way ANOVA

Suppose a salesperson wants to compare the level of satisfaction of customers for four difference insurance companies. Our question is: “Is there a difference in satisfaction scores across the four difference insurance companies?”

The satisfaction scores for a sample of customers for each insurance company are recorded:

We could conduct a series of t-tests to determine if any of the sample means differ. However, this would be tedious and has a major flaw, which we will discuss shortly. Instead, we use something called the Analysis of Variance (ANOVA), which allows us to test the hypothesis that multiple population means and variances of scores are equal.

The Null and Alternative hypotheses for a one-way ANOVA can be written as:

H0 : Means of all factor levels are equal
HA : At least one factor level has a different mean

The ANOVA can be used when we want to test the means of three or more populations at once. Theoretically, we could test hundreds of population means using this procedure. This ANOVA is technically called “one-way” as it has just one main grouping factor: company. In the next chapter we’ll see how we can have an ANOVA with more than one factor.

Shortcomings of Comparing Multiple Means Using Previously Explained Methods

Why should we learn a new test called ANOVA, when we could just conduct a series of t-tests? To answer our question, we could just run six different independent samples t-tests. It turns out this is a very bad idea, and has a major flaw: When more than one t-test is run, each at its own level of significance, the probability of making one or more Type I errors multiplies exponentially.

Assumptions of the ANOVA test

Before we can use the one-way ANOVA, we must see if we satisfy some assumptions, just like we had in our previous hypothesis tests:

  1. All observations are independent of one another and randomly selected from the population which they represent.
  2. The population at each factor level is approximately normal.
  3. The variances for each factor level are approximately equal to one another.

The Steps of the ANOVA Method

With the ANOVA method, we are actually analyzing the total variation of the scores, including the variation of the scores within the groups and the variation between the group means. Since we are interested in two different types of variation, we first calculate each type of variation independently and then calculate the ratio between the two -called an F-value.

ANOVA has its own distribution that we need to use, called an F-distribution to set our critical values and test our hypothesis. F-distribution also relies on degrees of freedom. Since the F-value is actually a ratio of two different sources of variance, we’ll need two different degrees of freedom.

When using the ANOVA method, we are testing the null hypothesis that the means and the variances of our samples are equal. When we conduct a hypothesis test, we are testing the probability of obtaining an extreme F-statistic by chance. If we reject the null hypothesis that the means and variances of the samples are equal, and then we are saying that the difference that we see could not have happened just by chance.

To test a hypothesis using the ANOVA method, there are several steps that we need to take. To help us in completing those steps, we need to employ a nice little tool called the ANOVA table:

Notice on the left side, there’s a column called “source.” This column lists where the variation in the test is coming from: Between the groups, within the groups, or all the variance for all the observations (Total). The columns may also be familiar to you as well: SS is the Sums of Squares (hint: we used Sums of Squares to calculate Standard Deviation and Variance) and d f is the Degrees of Freedom. We’ll explain the other columns shortly.

Working through ANOVA

Let’s use the ANOVA table with our ANOVA calculation steps and some data from our company satisfaction example from above.

Take a look at the ANOVA table:

Intepreting the table above

Numerator Degree of Freedom: 3
Denominator Degree of Freedom: 28
SSTotal: 12.965
MSBetween: 2.055
MSWithin: 0.243
F Statistic: 8.457
F critical: 2.947

Calculations

Numerator Degree of Freedom = Número de variávies menos 1 (e.g. 4-1 = 3)
Denominator Degree of Freedom = Número de observações menos o número de variáveis (e.g. 32 - 4 = 28)
MSBetween: SSBetween/DF = 6.166/3 = 2.055333 = round(2.055333,3)
MSWithin: SSWithin/DF = 6.799/28 = 0.2428214 = round(0.2428214,3)
F statistic = MSBetween/MSWithin = round(2.055333,3)/round(0.2428214,3) = round(8.45679,3)

Interpret the results of the hypothesis test

In ANOVA, the last step is to decide whether to reject the null hypothesis and then provide clarification about what that decision means.

In our example of the insurance companies, our F-value from the ANOVA test is greater than the F-critical value, so we would reject our Null Hypothesis.

We can conclude that the average customer satisfaction scores of the four insurance companies are not equal to one another - at least one of them is different from the others.

Now What?

Now that we’ve found a way to test the Null Hypothesis when we want to compare three or more population means, we need to learn one more step.

What happens when we reject the Null Hypothesis?

If we have an ANOVA that rejects the Null Hypothesis we must find out where the difference lies -what group or groups are difference from one another.

To do this we use a post-hoc test. Post-hoc is Latin for “after this” -so we literally run another analysis after the ANOVA shows a rejection of the Null Hypothesis.

Let’s use our Insurance Company example

We saw that we rejected the Null Hypothesis, so we are allowed to run the post-hoc tests to discover more about the difference in the group means.

Here are the results of each comparison, along with the mean of the groups, the t-value, the comparison df, t-critical value, and the p-value:

If the reported p-value of the comparison was less than the corrected significance level, then we reject the Null hypothesis that the comparison means are equal.

Our post-hoc comparisons from the above table show us that Company 3 was significantly higher in customer satisfaction than Company 1 and Company 2, but not Company 4. Also, no other Company was significantly different from the others.

Lesson Summary

When testing multiple independent samples to determine if they come from the same population, we could conduct a series of separate t-tests in order to compare all possible pairs of means. However, a more precise and accurate analysis is the Analysis of Variance (ANOVA).

In ANOVA, we analyze the total variation of the scores, including the variation of the scores within the groups, the variation between the group means, and the total mean of all the groups (also known as the grand mean).

In this analysis, we calculate the F-value, which is the ratio of mean of squares between groups divided by the mean of squares within groups.

If we are able to reject our Null Hypothesis, we continue on, conducting post-hoc analyses to discover where the difference in the sample means lies.

20171219085600

Two-Way ANOVA

Sometimes, however, we are interested in testing the means and variances of more than one independent variable.

Say, for example, that a researcher is interested in determining the effects of different dosages of a dietary supplement on the performance of both males and females on a physical endurance test. The three different dosages of the medicine are low, medium, and high, and the genders are male and female.

There are several questions that can be answered by a study like this, such as:

“Does the medication improve physical endurance, as measured by the test?”
“Do males and females respond in the same way to the medication?”

As mentioned in the previous lesson, ANOVA allows us to examine the effect of a single independent variable on a dependent variable (i.e., the effectiveness of a reading program on student achievement). With two-way ANOVA, we are not only able to study the effect of two independent variables (i.e., the effect of dosages and gender on the results of a physical endurance test), but also the interaction between these variables.

Two-Way ANOVA Procedures

In two-way ANOVA, there are two independent variables and a single dependent variable. Changes in the dependent variables are assumed to be the result of changes in the independent variables.

In two-way ANOVA, we need to calculate a ratio that measures not only the variation between the dependent and independent variables, but also the interaction between the two independent variables.

Determining the total variation in two-way ANOVA includes calculating:

  1. Variation within the group (within-cell variation)
  2. Variation in the dependent variable attributed to one independent variable (variation among the row means)
  3. Variation in the dependent variable attributed to the other independent variable (variation among the column means), and
  4. Variation between the independent variables (the interaction effect).

For two-way ANOVA, we have three null hypotheses:

  1. In the population, the means for the rows equal each other. In the example above, we would say that the mean for males equals the mean for females.
  2. In the population, the means for the columns equal each other. In the example above, we would say that the means for the three dosages are equal.
  3. In the population, the null hypothesis would be that there is no interaction between the two variables. In the example above, we would say that there is no interaction between gender and amount of dosage, or that all effects equal 0.

One Example

Say that a gym teacher is interested in the effects of the length of an exercise program on the flexibility of male and female students. The teacher randomly selected 48 students (24 males and 24 females) and assigned them to exercise programs of varying lengths (1, 2, or 3 weeks). At the end of the programs, she measured the students’ flexibility and recorded the following results. Each cell represents the score of a student:

Question: Do gender and the length of an exercise program have an effect on the flexibility of students?

From these data, we can calculate the following summary statistics:

As we can see from the tables above, it appears that females have more flexibility than males and that the longer programs are associated with greater flexibility. Also, we can take a look at the standard deviation of each group to get an idea of the variance within groups. This information is helpful, but it is necessary to calculate the test statistic to more fully understand the effects of the independent variables and the interaction between these two variables.

Note that the computer finds the degrees of freedom for the interaction by multiplying together the degrees of freedom for each variable (rows and columns). From this summary table, we can see that all three F-ratios exceed their respective critical values.

This means that we can reject all three null hypotheses and conclude that:

  1. In the population, the mean for males differs from the mean of females. 2 .In the population, the means for the three exercise programs differ. 3 .There is an interaction between the length of the exercise program and the student’s gender.

Lesson Summary

With two-way ANOVA, we are not only able to study the effect of two independent variables, but also the interaction between these variables.

Determining the total variation in two-way ANOVA includes calculating the following:

  1. Variation within the group (within-cell variation)
  2. Variation in the dependent variable attributed to one independent variable (variation among the row means)
  3. Variation in the dependent variable attributed to the other independent variable (variation among the column means)
  4. Variation between the independent variables (the interaction effect)

Where we go from here?

We can do the decomposition of the interation:

We can run two one-way ANOVAS, one for each gender And we can run three t-tests one for each program. (I didn´t understand why it shoudl be a t-test and not another one-way anova).

20180202092400

ANOVA - Como calcular os coeficientes

Calculating t-test VARIABLE

Calculation t-test CONSTANT

Coefficients / Standard Error = tStat
19.9800 / 4.389675533 = 4.551589

Calculating r squared

R Squared = 1 - (Sum of Squares Residuals / Sum of Squares Total)
or
R Squared = 1 - (SSResidual/SSTotal)

In this case

rSquared = 1-(587.11/2326) = 0,7475881341358555

Calculating f-test

Calculating standard error

20180210103905

What is Hypothesis Testing?

Definition

Um teste de hipótese é um procedimento para comparar se há diferença entre duas médias (ou proporções) é devida ao acaso.

A statistical hypothesis, sometimes called confirmatory data analysis, is a hypothesis that is testable on the basis of observing a process that is modeled via a set of random variables.

A statistical hypothesis test is a method of statistical inference.

Commonly, two statistical data sets are compared, or a data set obtained by sampling is compared against a synthetic data set from an idealized model.

A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis that proposes no relationship between two data sets.

The comparison is deemed statistically significant if the relationship between the data sets would be an unlikely realization of the null hypothesis according to a threshold probability-the significance level.

Hypothesis tests are used in determining what outcomes of a study would lead to a rejection of the null hypothesis for a pre-specified level of significance.

The process of distinguishing between the null hypothesis and the alternative hypothesis is aided by identifying two conceptual types of errors (type 1 & type 2), and by specifying parametric limits on e.g. how much type 1 error will be permitted.

Analogia do tribunal

When a case goes to court there is an assumption of innnocence. Evidence is needed to prove otherwise. If not enough evidence is provided to prove that a crime has been commited the sentence is stated as “not guilty”, there was no evidence to prove guilty.

In hypothesis testing there is a similarity. In hypothesis testing there is an assuption that the Null Hypothesis (ho) is true. There is need of enough evidence to prove it is wrong, or in other words, the is need of evidence to reject the null hypothesis.

The correct phrasing is that we failed to reject the null hypothesis.

It leads us to think that there are two possibilities in hypothesis testing.

1 Reject the null hypothesis, or
2 Fail to reject the null hypotheis.

But in fact there are four.

Hipóteses

H0: Não há diferença entre as médias (ou percentuais) entre os dois grupos.
H1: Há diferença entre as médias (ou percentuais) entre os dois grupos.

The H1 hypothesis depicts the research hypothesis - the statement that we hope to demonstrate is true.

Alpha Levels (p-value)

Se p <= 0.05 rejeita-se H0 e aceita-se H1 senão falhamos em rejeitar H0.

In hypothesis testing one should take a sample from a population and hypothesise how much it deviates from the population.

The p-value describes how common it would be to take a sample from this population.

The null hypothesis means that your example fits in the hypothesized population - that there is no difference or change.

If the sample deviates a lot from the mean we are supposed to reject the null hypothesis, and state that the sample probably belongs to some other distrbiution with a different mean, the alternative hypothesis. Usually we reject the null hypothesis if the p-value <= 0.05.

This means that if you would make 100 samplings and those samplings are really from the null hypothesis, during your 100 samplings you will probably reject your null hyphotesis 5 times although the null hypothesis is true.

Erros

Court Case Analogy

Erro tipo I (alfa): Rejeita H0 quando H0 está certo.
Erro tipo II (beta): Aceita H0 quando H0 está errada.

Observação: Deve-se evitar o erro tipo I que é o mais prejudicial.

Procedimentos para resolver um teste de hipóteses

  • Hypothesis statements: Set up null and alternative hypothesis;
  • Determine our Alpha Level
  • Gather data;
  • Analyse data;
  • Draw a conclusion: Reject or Fail to reject our null hypothesis, and we have two possibilities to be correct in our conclusion, but we also have two possibilities to be incorrect.

201802017094101

Using qnorm() to find the cutoffs for our Alpha Level

Here you have how to calculate the Cutoffs for the a specific alpha level.

In our example we are using α = 0.05.

In that case we have α/2 = 0.025 = 2.5% on each side of the distribution.

So…

Lower-End 2.5%

Left-critical value for α = 0.025

qnorm(.025)
## [1] -1.959964

Upper-End 2.5%
Right-critical value for α = 1-0.025 = 0.975

qnorm(.975)
## [1] 1.959964

201802017094102

Using a formula to find the z-score value

If we go thru the mechanics and solve for our z statistic with the follwing formula:

Solving using the above formulas we have:

x (mean) = 16.4
μ = 15
σ = 4.6
n = 32

(16.4-15)/(4.6/sqrt(32)

20180211091603

The T-Distribution

When you should use a t-distribution?

Also known as the Students t-distribution.

It comes into play when we have small sample sizes. When it happens the z-test and the Normal distribution just don´t work. This is due because small samples sizes make the distribution of samples not normal. And a normal distribution is based on the knowledge of sigma σ (the standard deviation of the population).

Large sample theory states that the standard deviation of the sample is supposed to be a estimate of the standard deviation of the population, but as sample size decreases, the standard deviation sigma (σ) of each sample, gets farther and farther away from the actual population Standard Deviation (σ)..

The formula below, used in 1908, using only ‘n’ in the denominator, causes an underestimation of σ.

Unfortunately, the Normal Distribution is based on Sigma (σ) and as sample sizes decrease our estimate of Sigma (σ) will be off.

Fixing the small sample size problem

1st - Using a better estimation of sigma (σ);

2nd - We apply the t-distribution

Which takes into account the size of the sample. And as sample sizes INCREASES the resulting distribution between population mean and sample mean and the new STANDARD DEVIATION now called t instead of z becomes more and more likely normal distribution.

But to use a t-distribution effectevely we need to understand the concept of Degrees of Freedom.

Degrees of freedom

Each test that uses t-distribution has a different way to calculate the degrees of freedom, but we can define it as: Number of observations that are free to vary.

For instance: Suppose the mean grade of a population of students is 75. If we get 5 observations in a single sample with 5 degrees of freedom to vary, we can come up with 12,32,99,98,79 which gives 64 as a mean, what is different from 75. However we could use a sample size of 5 observations with 4 degrees of freedom to vary. Lets say those 4 free varying means are 11,12,20,30. In order to keep a mean of 75, the 5th mean is not free to vary, it must be 302. So, the mean of 11,12,20,30,302 = 75, the last grade, the 5th grade not free to vary garantees the mean we know to be true.

The same idea would be used to 10 observation, we would need 9 degrees of freedom.

t-distribution in action

t-distribution fo 2 degrees of freedom

Compare it to a normal distribution. The peak is shorter and the tails fatter.

In the figure below we are using an alpha of .05 for a t-distribution with two degrees of freedom, observe the critical values.

Observe below the shape with 5 degrees of freedom the new critical values. This new shape has a taller peak with correspondely thinner tails. So the critical values are a little less a little closer to the middle of the distribution.

Below there is a t-distribution with 10 degrees of freedom, even thinner tails and smaller critical values. Which shows that as our n increases the t-distribution becomes more and more like the normal (z) distribution. And our critical values for the same alpha level whatever that values happens to be look closer and closer to the critical values of the normal distribution.

Therefore, as our n increases the t distribution becomes more and more likely the z distribution and our critical values for the sama Alpha Level (α), what ever that value happens to be will get closer and closer to the critical values of the normal distribution.

And, that is the t-distribution.

So, the answer to the following question “Should we use a t-dstribution over a normal distribution? is: If you don´t know the sigma (standard deviation) of the population then use a t-distribution.

20180224100300

Understanding the t-test

We use a single sample T-Test when we do not know Sigma (σ) of the population (Standard Deviation) but we have normal distribujtion. The t-test can be used, for example, to determine if two sets of data are significantly different from each other.

O exemplo a seguir ajudará a entender melhor o uso do t-test.

Na fabricação de cerveja (Brewing) o índice de amargura padrão, conhecido como International Bitterness Unit (IBU) é 40.

A cerveja é conseguida pela fermentgação da cevada (Barley) e o uso de um novo tipo de cevada pode conseguir ou não atingir o IBU padrão.

Portanto ao usar uma nova cevada é interesse saber se ela está promovendo um IBU, índice de amargura dentro dos padrões internacionais, para um pequeno lote de cerveja (batch of bear).

Esse é o caso deste exemplo.

Here´s the data of IBU (International bitterness unit) scores for 12 small batches of bear and we want to compare the sample mean to the compariton of an IBU of 40.

We know nothing about the population, we dont know sigma (σ), the standard deviation.

So, we will use what´s called a SIMPLE SAMPLE T-TEST. Because we don´t know sigma σ(sd) of the population and we are only looking here at one sample and comparting it to same reference value.

We are here assuming three things:

  1. The sample was randomly collected;
  2. We used independt observations (in our case we assume that each brew of bear was done independently);
  3. The population is approximaly normally distributed.

A t-test is robust to deviations of normallity assumption.

Although we can look at a histogram of twelve items, it can not tell us much about the population normality of the measure.

But for most constinuous measures, the population distribution should be fairly normal.

Once we have the sample data and confirmed the test assumptions, let´s get down to the mechanics of the test.

This is a hypothesis test.

20180224100301

The mechanics of the t-test

Step 1: Hypothesis statements (we should find the null and alternative hypoteshsi)

Since, our question is not directional we will stablish a simple difference or not equal to alternative hypothesis
H0 : μ = 40
H1 : μ <> 40

We will assume that the sample mean does not vary from the standard BTU of 40.

Step 2: Establish our Alpha Level

α = 0.05 (standard)

Step 3: Collect/Anayze data

If we were to calculate a z-test, we would use the following formula, where the denominator is the STANDARD ERROR OF THE MEAN:

The STANDARD ERROR FO THE MEAN is calculated from the population standard deviation. but with our current sample, we do not have the population standard deviation.

Therefore we will have to use a t-test, because it uses as the denominator a different STANDARD ERROR OF THE MEAN

The STANDARD ERROR OF THE MEAN in a t-test is calculated using the SAMPLE STANDARD DEVIATION (or Estimated Sample error of the mean) as you can see below.

20180224100302

How to manually calculate the t-test

Suponha que o índice de amargura para uma cerveja seja por padrão 40.

Agora suponha que para 12 amostras de cerveja, produzidas usando uma nova cevada tenhamos obtido os seguintes IBUs para essa amostra:

{38, 39, 41, 34, 37, 40, 38, 35, 37, 38, 36, 39}

We want to compare the SAMPLE MEAN to the comparition of an IBU of 40.

We do not know anything about the population (we don´t know sigma - σ).

Because we don´t know σ (sigma) of the population, we will use the single sample t-test. Also, because we are only looking here to one sample and comparing it to some comparition value.

Assumption for this test are:

  1. Random sample from the population
  2. Independent observations (each brew of beer was done independently)
  3. Approximately normal population

Remember that although the t-test is robust to deviations of normality assumption however the small sample size really can´t inform us about the population normality of the measure. But for most continuous measures the population distribution should be fairly normal.

Step 1: Hypothesis Statements

H0 : μ = 40
Ha : μ ≠ 40

Will a new crop of barleys affect the bitness of guiness beer? It is not directional. So we will use a simple difference (not equal to) alternativa hypothesis.

Step 2: Establish our alpha

0.05

Step 3: Collect/Analyse data

Uma vez que

e que …

Precisamos realizar alguns cálculo:

20180224110201

Criar um vetor com os dados da amostra

# cRIANDO UM VETOR COM O DADOS DA AMOSTRA
ibu = c(38, 39, 41, 34, 37, 40, 38, 35, 37, 38, 36, 39)

20180224110202

Calcular o Standard Error of the mean do t-test

Para calcular o Standard Error of the mean do t-test precisamos de:

  1. Standard deviation da amostra
sdAMOSTRA = round(sd(ibu),2)
  1. Raiz quadrada do ‘n’
  n = 12

Portanto, pela fórmula o Standarard error of the mean é calculado como segue:

sdERROR = round(sdAMOSTRA/sqrt(n),2)

20180224110203

Calcular o t value

Para calcular o t value pela fórmula precisamos respectivamente:

  1. Null hypothesis
  mu = 40
  1. mean da amostra
meanAMOSTRA = round(mean(ibu),2)
  1. Standard deviation da amostra
sdAMOSTRA
## [1] 2.02

Calculando o t value temos

t = (meanAMOSTRA-mu)/ sdERROR
t
## [1] -4.017241

Step 4 - Draw conclusions

Our conclusion is based on our analysis.

Is this t-statistic of -4.011 a large enough t value to claim a difference between this sample value and our null value?

To answer that we turn to our critical values.

Remember we need degrees of freedom to use the t distribution and in the case of the single sample t test the degrees of freedom is iqual to df = (n-1).

In our case 11 degrees of freedom.

For a t distribuiton with 11 degrees of freedom the critical value to cut off the lower and upper 2,5 % are

t*(11) = +- 2.20 (To see this value use a t-statistic table of cutoff values)

From the graph bellow

We can reject the null hypothesis since our calculated t statistic falls outside these two values and into the critical region of the t distribution.

So for 11 degrees of freedom we can reject the null hypothesis.

The IBU for this particular particular sample of beer batches brewed with this barley is significantly different than the tested IBU of 40. So if IBU is solely what distinguishes the taste of guiness it is safe to say that this new barley would not be good to use.

20180224100303

How to calculate the p-value

p-value (calculated using probabilities)

What is the probability of getting a t-statitic of -4.011 or smaller and 4.011 or larger since we have a non directional hypothesis test.

# Probabilities
prob1 = pt(-4.001, 11)
prob1
## [1] 0.001041315
prob2 = 1 -  pt(4.001, 11)
prob2
## [1] 0.001041315
# We add these probabilities together
prob = prob1 + prob2

# and we get our non directional p-value
prob
## [1] 0.00208263

Since 0.00208263 is less then our alpha level of 0.05 we can again reject the null hypothesis.

20180224100304

How to use the t.test() function

Running the t.test() function

In fact this value is the same value if you run this test in r using the t.test funtion.

ibu = c(38, 39, 41, 34, 37, 40, 38, 35, 37, 38, 36, 39)

t.test(ibu,mu=40)
## 
##  One Sample t-test
## 
## data:  ibu
## t = -4.0112, df = 11, p-value = 0.002047
## alternative hypothesis: true mean is not equal to 40
## 95 percent confidence interval:
##  36.38634 38.94700
## sample estimates:
## mean of x 
##  37.66667

In fact this method using the t.test function is the easiest method to run the single sample t.test.

It even gives us the 95% confidence intervals which we can interpret.

We are 95% confident that the true population value for IBU for this type of Brew Process with this type of Barley is between 36.38634 and 38.94700 and since this range does not capture our Null Hypothesis value of 40 that´s exactly what we would expect since we Rejected our Null Hypothesis.

20180228155500

Should you use one-tail or two-tail test?

Some types of statistical tests specifically allow for a direction for the test statistic to be defined:

  1. z-test
  2. t-test

However the choosing of a direction might be skewed/biased.

For example, suppose a new clibing company has a new formula for fiber productrion making clibing ropes more resilient that gives it´s ropes a higher UIAA (number of falls that a rope can handle before it need to be replaced) rating then the leading competitors.

So in terms of hypothesis we have:

Ho : μ <= 17
Ha : μ > 17

Meaning that if μ is equal or less than 17 the new company rope does not make anything different.

Let´s investigate the results, here is the data for the new hope.

Where 17.8 is the average of the UIAA rating for this sample.

If we solve for t we have:

So the t-value is 1.82.

Using a t-table we find that for a one-tail test the critical value is 1.761.

Therefore, in that case the t-value is greater than the critical value, and we would REJECT THE NULL HYPOTHESIS.

But is it the Whole story?

So, recaptulating the hypothesis:

In that case we reject the null hypothesis and therefore the new rope is statistically different from the other.

Which is a great thing for thenew company. In this case people can buy the new hope, for it is better than the new one.

If you use the t.test() function to talculate the t statistic you will come to the same conclusion.

The Set-up

There is however a set up here.

The trick here is the wording of the hypothesis. A different wording would come out with a different interpretation.

Let´s suppose our hypothesis was non-direciotnal as follows:

Using the same t-table for a two-tailed test we come up with a critical value of 2.145.

In this case the critical value of 2.145 is beyond (greater) then the t-value found (1.82). The t-value (1.82) is within the boundaries of the distribution critical values (2.145) and therefore, in this case, we can not reject the null hypothsis.

In this case the company is not making better ropes.

You came to the same conclusion if you use the t.test() function.

The problem here is a moral one.

A company can therefore choose the direction that better suits it´s goals.

The solution (be skeptical): Always preffer a non-direction hypothesis (two tailor test). That is what serious scientific papers do!

20180303181500

Independent Samples t-test

Up to here we only saw Hypothesis Testing as a way to compare a single sample mean to a comparition value.

Our null hypothesis value.

But we can also can use the t-distribution to compare two independent samples to one another, allowing us to ask some interesting questions.

Conditions for Independent Sample t-test

  1. The two groups are completely independent;
  2. We want to compare the means of a continuous measure.

Assumptions

  1. Random sample;
  2. Indepedent groups;
  3. Population distribution for the two groups needs to be approximatelly normal.

Hypothesis Test

PASSOS:

  1. Hypothesis Statements

    H0: µ1 - µ2 = 0 (There is no difference)
    H1: µ1 - µ2 <> 0

  2. Establish our alpha (α)

    α = 0.05

  3. Collect/Analyze Data

Calculating the t-statistic for Independent Samples

Let´s break down the analysis.

There are two groupings here:

  1. The first one represents the data;
  2. The second that represents the null hypothesis value.

So our numerator effectivelly compares our data to some hypothesised comparison value. In the case of the indepdente samples t-test. Our data is actually the difference between two indepdent sample means.

Calculating the Standard Error for Independent Samples

The denominator is familiar also, it is a measure of standard error.

But there is a twist, is a standard error based on the difference in the two sample means.

Observation: We need the standard deviation and the n of both samples.

Calculating the critical values

There is not a formula to calculate the critical value.

You should you a t table.

In a t table if you know the degrees of freedom, the alpha (usually 0.05) and whether you are using a one tail or a two tail test you can find the critical value for this distribution.

Calculating the degrees of freedom of independent sample t-test

Observation: An alternate, convervative option to using the exact degrees of freedom calculaton can be made by choosing the smaller between the to sample sizes (n1-1) or (n2-1).

Conclusion

You should compare the t-statistic you found with the t-critical value.

If the t-statistic is greater/smaller than the critical value, you should reject the null hypothesis.

In other worlds the p-value < 0.05.

Examples

20180303181502

Example 1

Fonte da pesquisa

In this example we are using the following study: Daake D. & Gueldner S. (1989) Imagery instruction and the control of postsurgical pain. Applied Nursing Research, 2(3), 114-20.

Amostra da pesquisa

32 voluntary patientes were randomly assigned to 2 Groups.

A control group that simply received procedural information prior to the surgery, and an experimental group, that received both procedural information and information on how to perform pleasant imagery to help with pain reduction.

Perceived pain was measured on the day after surgery using a visual analog scale:

I do not have any pain ————————- My pain could not be worse

A great way to measure continuous self reported data.

Subjects simply marked where they thought where their pain was on a horizontal line.

Question: Will the group trained with pleasant imagery report different post-surgical perceived pain from the control groups?

Dados da pesquisa

Onde,

X1 e X2 representam a média da escala de dor (final mean pain score).

S1 e S2 os standard deviation dos dois grupos.

n1 e n2 o número de participantes em cada grupo.

Calculando o t-statistic

Conclusões

Once we have the t-value we need the degrees of freedom to make a conclustion.

However the formula for the Independent Samples t-test is as follows:

Luckyly you have software to make the calculations.

You can also use the value

(nlowest -1) as the degrees of freedwom but only if you can´t calculate.

In our example the df = 22.20

# Using a t distribution with 22.2 degrees of freedwom gives us a critical value of plus or minus 2.07279

qt(0.025, 22.2)
## [1] -2.07279

Since our t-value is outside the critical region we can: Reject the null hypothesis

In other words, the two groups of participants are not equal on their visual analog scale.

In fact, experimental group had a lower perceived pain score than the control group indicating that using pleasant imagery was effectve at lowering perceived postsurgery pain. Observe that this is a two-tailor test.

Using the t.test() function

You can do the same as above using the t.test() funciton . However you would have to have both datasets to do so.

So, if you, let´s suppose you had the following variables:

  1. Control: variable with the data from the control group. Scores of pain.
  2. Experimental: variable with the data from the experimental group. Scores of pain.

And you could run the t.test()

t.text(Control, Experimental)

And by doing so you could find that:

What would lead us to the same conclusion: Since the t statistic () is greater/smaller then the critical value we can reject the null hypothesis and assume p-value < 0.05.

The p-value of 0.0002407 would tell you to also reject the null hypothesis.

This is from the examples of the course and the data was not provided so I could run the test myself.

And for added visualization we could use a graph to show this difference.

20180303181503

Example 2

Fazendo os cálculos manualmente

Fonte da pesquisa

A study was conducted to compare the resting pulse rates of college smokers and non-smokers.

Amostra da pesquisa

The data for a randomly selected group is summarized in the table below. Pulse rates were normally distributed within each group.

Hypothesis statements

h0: meanSmoker = meanNonSmokers or meanSmoker - meanNonSmokers = 0
h1: meanSmoker > meanNonSmokers

Alpha level (α)

α = 0.05

Variables needed for the calculations

mean1 = 80 
mean2 = 74
SD1 = 5
SD2 = 6
N1 = 26
N2 = 32

Calculating Standard Error

  SE = sqrt(((SD1)^2/N1)+((SD2)^2/N2))
  SE
## [1] 1.444486

Calculating the t-statistic

  t.statistic = ((mean1-mean2) + (0))/SE

  t.statistic
## [1] 4.153728

Calculating the degrees of freedom

Using the conservative aproach the smaller value is 26 so we have df(n-1).

df(n-1) = 25

Calculating the critical value (using a t-table)

Using a t-table we have

alpha = 0.05
one tail (h1: mean1 > mean2)
df(n-1) = 25

Therefore: Critical value = 1.708

Conclusion

Since the t statistic () is greater/smaller then the critical value we can reject the null hypothesis and assume p-value < 0.05. Therefore there is a difference in resting pulse rates of college smokers and non-smokers.

20180303181501

Paired Samples t-test

Formalizing

If we want to know if two means are different from one another and these two means are not independent then we must use what is called the Paired Sample t-test (dependent samples t-test).

This is also knows as: Pre Post Design

Case Study

The study observed 42 children who had recently undergone a tonsillectomy.

A day after surgery, children were asked if they were in any pain and asked to point to a position on a Faces Pain Scale that represented their pain.

If there were in pain, they were asked (with their parents) if they wanted to receive acupuncture to help alleviate the pain.

31 participants opted to receive acupuncture and were asked their pain score immediately following the acupuncture procedure.

Let’s look at some simulated data based on the study where pre and post stand for the pain scale.

pre=c(10,9,8,8,7.5,7,7,7,7,6,6,6,5,4,4,4,4,4,2,2,7)
post=c(10,8,6,7,7,7,6,6,6,5,4,5,4,4,3,3,2,3,1,2,5)
diff = post-pre
SubjectsID=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21)
experiment = data.frame(SubjectsID,pre,post,diff)
str(experiment)
## 'data.frame':    21 obs. of  4 variables:
##  $ SubjectsID: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ pre       : num  10 9 8 8 7.5 7 7 7 7 6 ...
##  $ post      : num  10 8 6 7 7 7 6 6 6 5 ...
##  $ diff      : num  0 -1 -2 -1 -0.5 0 -1 -1 -1 -1 ...

Let´s compare these two values with a plot

boxplot(experiment$pre, experiment$post,names=c("pre","post"), ylab="Pain Scale", main="Effect of acupuncture on pain Incorrect CI", xlab="Time")

We can see an overlap of the 95% confidence intervals. It does not seem to be significant.

However a Indepent samples t-test is not the test to be perfomed here, because we would violate important assumptions, namely:

  1. Violation of the assumption of independence. The columns are related.
  2. We would be using the wrong errors. The standard error for each column. If we did it we would be saying that the columns are not related which is incorrect. That the pre-measure is unrelated to the post-measure. But we would be breaking the relationship between the two columns, the fact that pre and post are related to one another, throu the subject.

In order to capture this ideia of relatedness and come up with the correct test we turn to different scores.

Since the two measures of data are related to one another trhou the subject we can subtract one column from the next and come up with a change score.

This single value now represents the change from pre and post for all subjects.

With the avareage of this value we came up with the average of the change.

A single sample mean. Does that sounds familiar?

And if we wanted to know if that mean difference was significant? If the treatment had no effect then the comparision value (the average of the difference) should be zero (0).

See where this is going?

This is basicaly a single sample t-test (just using the difference scores as our single sample).

Formalizing

If we want to know if two means are different from one another and these two means are not independent then we must use what is called the Paired Sample t-test (dependent samples t-test).

Assumptions

To do so we have some assumptions for this particular test:

  1. The sample of differences is random;
  2. Each difference score is independent from another;
  3. Subjects are independent;
  4. And the distribution of differences in the population is normal.

This assumption sound a lot like the Single Samples t-test.

However, instead of looking at a single raw score, we are looking at a difference between two related raw scores, but it is still just one sample of values.

Step 1: Hypothesis statements

Our null hypothesis states that the difference between the two related scores will be some value, usually zero. Instead of μ we have the greek symbol for delta (δ) representing a difference score.

Our alternative hypothesis is that the difference will not be equal to zero.

This is a non directional test. We don´t know what the effect of acupuncture will have on the perceived pain scores for the subjects.

Step 2: Establish our alpha

α = 0.05

Step 3: Collect / Analyse Data

The formula for t looks a lot like the single sample t-test, instead of referencing a single sample of raw values, it is referencing a single sample of differences.

The numerator compares the difference from our data:

to some Null Hypothesis value:

δ

Usually zero.

The denominator is another standard error, this time the estimated standard error mean DIFFERENCE. Again it´s similar to the single sample t-test estimated standard error mean, but this time it uses the standard deviation of the difference score, with n been the number o difference values.

Solving for our example

pre=c(10,9,8,8,7.5,7,7,7,7,6,6,6,5,4,4,4,4,4,2,2,7)
post=c(10,8,6,7,7,7,6,6,6,5,4,5,4,4,3,3,2,3,1,2,5)
diff = post-pre
SubjectsID=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21)
experiment = data.frame(SubjectsID,pre,post,diff)

d = mean(diff)
delta = 0 
nValue = nrow(experiment)

sd = sd(diff)
nSquared = sqrt(nValue)


SEd = sd/nSquared

numerator = d - delta
denominator = sd/nSquared


t = numerator/denominator
t
## [1] -6.970209

There is some error above, the result is differente from the video.

Step 4: Draw conclusions

Using a t-table we have a critical value of 2.086 for df(n-1) so the t we found is way above the critical value so we Reject the null hypothesis.

However in the examploe df:(n-1) =30 so I missed some data, although the result is the same.

Below you can see the example of the video.

The explanation is as follows:

Is this t-statistic, one that indicates a difference between the two samples?

To answer that we turn to our t-distribution and our degrees of freedom. The paired samples t-test uses the same idea of degrees of freedom as the single samples t-test (n-1) , with n here beeing the number of difference values in the data. Using our t-distribuiton, 20 (n-1) degrees of freedom gives us a critical value of +- 2.04 (using a t-table). Since our t-statistic is outside this critical value and into the critical region we reject the null hypothesis.

The difference between the pre and post pain scores are not in fact zero. Examing the means of the two scores we see that the pain significantly decreseas after acupuncture.

Using the t.test() function

t.test(experiment$pre, experiment$post, paired=TRUE)
## 
##  Paired t-test
## 
## data:  experiment$pre and experiment$post
## t = 6.9702, df = 20, p-value = 9.144e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.6840475 1.2683335
## sample estimates:
## mean of the differences 
##               0.9761905

If we want to visualize the dependent samples should be represented by a line to indicate the relatioship between pre and post. With a 95% confidence interval using the method proposed by Morey in 2005.

Although pre/post data is the primary use of a paired sample t-test, we can also use it to study twins, just a trivia.

20180210103903

Categorical Data

When it cames to Hypothesis Testing for Categorical variables and only categorical variables we have a few choices depending upon our data and our question of interest.

20180321234201

Chi-Square Goodness of Fit

Let´s see how we run a hypothesis test on categorical data with just one categorical variable.

Used when we have only one categorical varaible and we can ask a simple questiona about the distribuition of responses.

e.g
Do you prefer red, black, white?

It compares the observed counts from a table of counts to calculated expected counts.

To do this kind of test we will need a new statistical test, the Chi-Square Test and a new distribution, the Chi-Square distribution.

What does a person do when they sneeze or cough?

They use a Tissue The Elbow Their hands Or do nothing?

In the study entitled “Examining university student´s szeezing and coughing etiquette” here are the results the following question was posted:

Was there a pattern of behavior not random?

By the table of counts below you can see the patter is not random.

If it were we would see a percentage of 25% for each category.

However the real question is:

Can we determine if this observerd pattern is significantly different from what we would expect (a random distribution of responses)?

We can assert this with another hypothesis test, that is called: Chi-Square Goodness of fit (Developed by Carl Pearson and expanded by Ronald Fisher)

It compares the observed counts from a table of counts to calculated expected counts.

Assumptions

  1. Random sample;
  2. Independent observations;
  3. 3.1 No expected count can be less then 1
    3.2 No more than 20% of cells with an expected count can be less then 5.

Step 1: Hypothesis statements

H0: p = 1/4, for each of the possible behaviors
H1: Not all probabilities are 1/4.

Step 2: Alpha level

Standard 0.05

Step 3: Collect/Analyse

The Chi-Square goodness of fit formula is as follows:

X^2 is the so called chi-square statistic.

Let´s see one example:

Example - Calculating by hand

To solve the formula we need to do some table calculations:

After that we can sum up:

Which gives us a chi-square statistic of:

X^2 = 216.76

Questions

How do we know if that test statistic represents that significant departure of the observed counts from the expected counts? How can we tell if we reject or if we fail to reject our null hypothesis?

Chi-Square Distribution

To answer the question above we turn to the Chi-Square distribution.

Chi-Square is driven by the degrees of freedom. It is a directional distribution.

In a Chi-Square distribution the critical value is always Positive.

In our exemple, df = 3, since we have 4 cells.

So, df = m -1 (Where m stands for the observed counts categories)

The figure above shows the critical value for a test statistic (x^2) of 216.76.

How do you come up with this value?

You use a t table Critical Values for Chi-Square.

Step 4: Draw Conclusions

OUr x^2 is greater then the critival value (216.76>7.81).
So we reject the null hypothesis.
This behaviour does not follow a random distribution.
The pattern of behaviour is not random.

Calculating Chi-Square Using R

Observe that in order to apply the chisq.test() function you have to have a contingency table.

list = read.csv('hypothesistest1.csv')
event = table(list$V1)
event
## 
##   Elbow    Hand NoCover  Tissue 
##      86     204      89       2
chisq.test(event)
## 
##  Chi-squared test for given probabilities
## 
## data:  event
## X-squared = 216.7638, df = 3, p-value < 2.2e-16

We can see the p-value (p-value < 2.2e-16) is way below our alpha (.05) .

We reject the null hypothesis.

Sumarizing

The red line shows to the reader what the expeced count is so the reader can easely see just how far off the observerd counts are.

That´s how how we run a hypothesis test on categorical data with just one categorical variable.

20180321234202

Categorical Data (Expected distribution)

What if we didn´t have a random distribuiton to our null hypothesis. Could we test a specific, expected distribution pattern? Sure, we can!

The underlying point here is that given the actual distribution counts of a certain category we might have reasons to expect a heavier distribution on some of those categories rather than others.

If we didn’t have a random distribution to our null hypothesis could we test a specific expected distribution pattern?

The Chi-Square can be used to test an observed pattern of categorical against a random distribuition.

However the test can also be used to test a specific null distribution, not just a random one.

This test is about comparing counts. What is observed and what is expected.

And in this particular case, us (the investigators) can decide exactly what the expected values should be.

The underlying point here is that given the actual distribution counts of a certain category we might have reasons to expect a heavier distribution on some of those categories rather than others, and that is the difference to an equal distribution in the previous application which expected 1/4 (25%) of chances for each category.

In our exemple, CDC recommends using tissue or the elbow to cough or sneeze. The other options might represent a bigger transmittion possibility. So it is reasonable to think that there is a larger possibility of tissue/elbow use.

So instead of a random distribution we can set those distribution to be more aligned with CDC recommendations (a differente null distribution possibility).

So if 70% is the CDC expected use of tissue or elbow, we have a 35% for each expectation, and the remaining 30% is split in 15% for the remaining ones.

From here on we use the same procedures used to a equal distribution of possibilities.

Assumptions

  1. Random sample;
  2. Independent observations;
  3. 3.1 No expected count can be less then 1
    3.2 No more than 20% of cells with an expected count can be less then 5.

Step 1: Hypothesis statements

As you saw the assumptions hold but there are slight changes to the hypothesis.

H0:
P1=.35
p2=.35
p3=.15
p3=.15

H1: Not all probabilities in H0 are correct.

Step 2: Alpha level (α)

Standard 0.05

Step 3: Collect/Analyse

Now we use the new specific percentages to calculate the expected counts.

With a x^2 = 541,29 and x^2 critical value of 7,81 (use a t table to find the critical value), we are ready to make conclusions.

Step 4: Draw Conclusions

Since:

Our X^2 > X^2 critical value.
541,28 > 7,81 (t table)

We reject the null hypothesis.

The observed distribution of the categorical variable of behaviours does not follow the spected distribution of behaviours.

Calculating Chi-Square Using R

Again we can use the chisq.test() function but this time we need to tell the fuction what the expected probabilities values should be.

list = read.csv('hypothesistest1.csv')
event = table(list$V1)
event
## 
##   Elbow    Hand NoCover  Tissue 
##      86     204      89       2
exp = c(.35,.15,.15,.35)
chisq.test(event, p=exp)
## 
##  Chi-squared test for given probabilities
## 
## data:  event
## X-squared = 541.2822, df = 3, p-value < 2.2e-16

We reject the null hypothesis.

But this time we told the funciton what the expected probabilities should be .

Again the p-value <.05 so we reject the null hypothesis.

The take away for all this:  
1. The hypoteshis test for a single categorical variable uses the chi-Square goodness of fit test.
2. That test can use either random probabilities or specified probabilities we define.  
The mechanics of the test is just the same.

20180321234203

Chi-Square test of independence (Comparing two categorial variables)

Used when we want to compare two categorical variables.

  1. Can we expand the idea of expanding observed and expected counts into a contingency table?
  2. Can we move into a hypothesis test?

Sure we can, all it takes is a second categorical variable.

In other words:

Is one categorical variable independent of another categorical variable?

In the above table we already know how to examine this distribution of a sigle categorrical variable of behaviour, namely, how people deal with a sneeze.

We use a Chi-Square Goodness-of-fit, which allows us to use the observed counts and the observed counts from a hypothesised distribution to run a hypothesis test.

However, the researches in the original experiment also recorded the gender of subjects.

It allows us to go from a Table of Counts to a Contingency Table.

Based on the contingency table and with the use of row percentages:

We see that the behaviour distribution looks different for each gender. Perhaps the behaviour is not in fact independent of gender.

It looks like gender might be driving a differente behaviour response.

How do we test the question:  
1) Is the respiratory event independent of gender?  
2) How do we test the independency of one categorical variable from another categorical variable?

Chi-Square test of independence

If we have categorical data (two categorical variables) and we want to examine how the distribution of one categorical variable might differ when we consider another categorical variable we use the Chi-square test of independence (created by Carl Pearson).

Assumptions

  1. Random sample;
  2. Independent observations;
  3. Actually two assumptions roled into one:
  1. No expected count < 1;
  2. No more then 20% of one cells with an expected count of less then 5.

Step 1: Hypothesis statements

H0: Variable A and variable B are independent of one another.
Ha: Variable A and variable B are in fact not independent of one another.

Other way to think about this:
For the Null Hypothesis the relative proportions of one variable are independent of the second variable.

If this is true the proportions or percentages of one variable would be the same for differente values or levels of the second variable.

Step 2: Alpha level (α)

Standard 0.05

Step 3: Collect/Analyse

The formula for the Chi-square test of independence is the same of the goodness of fit test:

We have our observed counts. What about the expected counts. Where do they come from?

We can use our marginal distributions.

Remember the idea of the null that the proportions of percentages of one variable will be the same for differente values of the second variable.

Let´s take a look at the contingency table.

Can we see what the proportions of behaviour SHOULD be if it was not associated with gender?

Look at the Marginal Distribution (2,86,204,90 | 264,117) below:

Now look at the Proportions related to that (0.0052, 0.2257, 0.5354, 0.2336 | 264, 117)below.

Namely:
Tissue (0.0052): 2 out of 381 observations.
Elbow (0.2257): 86 out of 381 observations.
Hand (0.5354): 204 out of 381 observations.
No cover (0.2336): 89 out of 381 observations.

So if all were right with the world and gender was not related to behaviour we should see the same proportions for each value of gender.

Let´s calculate the EXPECTED COUNTS based on the maginal distributions above

As seen above, the EXPECTED PROPORTIONS were:

Tissue (0.0052): .52%
Elbow (0.2257): 22.57%
Hand (0.5354): 53.54%
No cover (0.2336): 23.36%

We have:
264 Women
117 Men

If we apply the proportions abo e to the counts of Men and Women above we would have:

For Women:
Tissue (0.0052 x 264): 1,3728 Women
Elbow (0.2257 x 264): 59,5848 Women
Hand (0.5354 x 264): 141,3456 Women
No cover (0.2336 x 264): 61,6704 Women

For Men:
Tissue (0.0052 x 117): 0,6084 Men
Elbow (0.2257 x 117): 26,4069 Men
Hand (0.5354 x 117): 62,6418 Men
No cover (0.2336 x 117): 27,3312 Men

Let´s reassemble the contingency table with the results above:

Now we can move on to using the chi-square formula.

To solve the formula we need the values of observerd and expeted items.

The secrect is use the chi-square formula for one categorical value (female) and then use the formula for the other categorical value (male) and so on and so forth and add up the results to find the real X^2 value.

So, if we apply the formula:

we came up with the Chi-square statistic of

X^2 = 18.25

as you can se below.

Degrees of freedom

Now, let´s find our degrees of freedom. In the chi-square test of idenpendence the degrees of freedom formula is the number of levels of one cathegorical variable minus one times the number of levels in the other categorical variable minus one, as follows:

(Levels of A - 1) * (Levels of B - 1)

In other words:

(Levels of Behaviour - 1) * (Levels of Sex - 1)

We can think of it as rows minus columns

(Rows - 1) * (Columns - 1)

In our exemple we have

(4-1)*(2-1) = 3 degrees of freedom

The chi-square critical value at our chosen alpha level for 3 degrees of freedom is (consult table)

X^2(3) critical = 7.81

Step 4: Draw Conclusions

Since X^2 > Critical X^2

We reject the null hypothesis. The two variables are not independent of one another. We can readly see this if we look at the row percentage.

Use of tissue and elbow are about the same for males and females but females tend to use more hands the males. And males tend to use nothing at all more then females.

Houston we have a problem

A violation of an assumption:

  • Not only do we have a cell with an expected count of less than one;
  • But two cells with an expected count of less than five, which violates our 20% rule.

There are two choices here:

  1. Continue and tell the reader that we violated one of our assumptions of chi-square and that our results may be off;
  2. Or we can try and fix it.

To fix it we have two options:

  1. We can combine levels, so that the expected counts are better and that combination needs to make sense, or;
  2. We can loose some data.

In our case our best bet is to loose the data of tissue. We have only two people and males.

So we loose the tissue column and rerun the analysis.

Now we can run the test in R.

To do so you must reproduce the following table

events <- matrix(c(61,155,46,25,49,43),ncol=3,byrow=TRUE)
colnames(events) <- c("Elbow","Hand","NoCover")
rownames(events) <- c("Women","Men")
health <- as.table(events)
class(events)
## [1] "matrix"
class(health)
## [1] "table"
events 
##       Elbow Hand NoCover
## Women    61  155      46
## Men      25   49      43
health
##       Elbow Hand NoCover
## Women    61  155      46
## Men      25   49      43
chisq.test(events)
## 
##  Pearson's Chi-squared test
## 
## data:  events
## X-squared = 17.3078, df = 2, p-value = 0.0001744
chisq.test(health)
## 
##  Pearson's Chi-squared test
## 
## data:  health
## X-squared = 17.3078, df = 2, p-value = 0.0001744

You can see that the p-value is under our alpha level, so AGAIN we can reject the null hypothesis.

20180411162500

More Than Two Group Means

========================================================================

========================================================================

Tipos de teste

Paramétricos: Pressupõem distribuição Gaussiana das duas variáveis.
Não Paramétricos: Traballham com qualquer distribuiçõa.

Tipos de Comparação

Pareada (dependente): Comparação entre duas variáveis distintas.
Não pareadas (independente): Comparação de dois sub-conjuntos de casas ded uma mesma variável.

Aplicação dos testes

t Student
Wilcoxon
Mann-Whitney U

Anova (to understand see https://www.youtube.com/watch?v=-oZ2Etv5V1M ) repeated-measures ANOVA (rANOVA) see https://statistics.laerd.com/statistical-guides/repeated-measures-anova-statistical-guide.php MANOVA (https://www.youtube.com/watch?v=e0EBqqwEZr4)

kruskal Wallis

Measuring the p-value

        bt <- seq(60, 120, 1) 
        plot(bt, dnorm(bt, 90, 10), type="l", xlim=c(60, 120), main="two-tailed test") 
        pnorm(72, 90, 10)  
## [1] 0.03593032
        abline(v=72) 
        cord.x <- c(60,seq(60,72,1),72) 
        cord.y <- c(0,dnorm(seq(60, 72, 1), 90, 10),0) 
        polygon(cord.x,cord.y,col='skyblue') 
        cord.x1 <- c(108,seq(108,120,1),120) 
        cord.y1 <- c(0,dnorm(seq(108, 120, 1), 90, 10),0) 
        polygon(cord.x1,cord.y1,col='skyblue') 
        text(65, 0.005, round(pnorm(72, 90, 10), 3)) 
        text(115, 0.005, round(pnorm(72, 90, 10), 3)) 
        text(75, 0.02,  " p = 0.072 (blue shaded area) "  ) 

Functions

t-test

Created by William Gosset in 1908. Alsou know as Student´s t-test, because the Guiness Brewery wouldn´t allow Gosset to publish his results, so he used Student as a pseudonym.

Gosset´s problem was “infering from small sample sizes” and the context was process control sampling in the brewery.

So when standard deviation of the population is unknown you will have to infer the population standard deviation from the sample standard deviation.

The result of a t-test is T (not normally distributed), it actually follow the broader T distribution.

Our estimate of the population standard deviation grows better as sample sizes grow larger.

So, there is one T distribution for each sample size, or more generally, for each degree of freedom (DF).

For sample sizes larger that 300 the T distribution will be very simillar to a Z distribution.

Test functions replace measurement units such as cm or kg by variability as a new unit. The goal is to find out if the difference is large or negligible, given the variability.

    t.test()

    alternative = c("two.sided","less", "greater")

            One tailed test: "Less" or "greater"
            Two tailed test: "two.sided"


    Paired

            Group wise test (default): Comparing the mean of two independent groups
            
                    paired = FALSE

            Repeated measures: One subject has been measured twice.
            
                    paired = FALSE

    var.equal

            Classical t-test

                    var.equal = TRUE

            Welch test (default): Compensates for different variations in the two groups.
            Calculates a different degree of freedom for the test.

                    var.equal = FALSE

Parametric

Group wise t-test (Não Pareada)

Suppose two differente samples from a population (Control Group and Risk Group). You want to know if the mean blood pressure differs in the populations you have drawn samples from. So you can compare the mean of two independent groups.

To use a t-test both samples must be parametric since they are different samples from the same variable we want to measure.

weight = read.csv("ttests.csv")
weight = weight[,-1]
names(weight) = c("Control Group", "Risk Group")
weight
##    Control Group Risk Group
## 1             97         76
## 2             98         87
## 3             78         89
## 4             80         90
## 5             81         94
## 6             84         96
## 7             85         98
## 8             85         99
## 9             86        100
## 10            94        106
## 11           100        109
## 12           102        112
## 13           103        113
## 14           104        105

A sample of size n>=30 is large enough for you to assume that the sampling distribution is normally distributed, so you can use the t-test.

You can either use the Welch Two Sample t-test or the two sample t-test. See below how to do this.

** One Sample t.test**

var = c(38, 39, 41, 34, 37, 40, 38, 35, 37, 38, 36, 39)
mu= 40 # true value of the mean

t.test(var,mu=mu)
## 
##  One Sample t-test
## 
## data:  var
## t = -4.0112, df = 11, p-value = 0.002047
## alternative hypothesis: true mean is not equal to 40
## 95 percent confidence interval:
##  36.38634 38.94700
## sample estimates:
## mean of x 
##  37.66667

Welch Two Sample t-test

In the example below R calculates the Welch t-test (var.equal=FALSE, Default) that takes into account a different variability in the two groups, comparing the mean of two independent groups (paired=FALSE). Observe that p-value >=0.05 so we do not reject the null hypothesis. There is no differente between the means of these samples. Observe that the DF (degrees of freedom) is a strange number due to the Welch calculation.

weight = read.csv("ttests.csv")
t.test(weight$A, weight$B)
## 
##  Welch Two Sample t-test
## 
## data:  weight$A and weight$B
## t = -1.8423, df = 25.684, p-value = 0.077
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -14.6635326   0.8063898
## sample estimates:
## mean of x mean of y 
##  91.21429  98.14286

Two Sample t-test

In the example below we will assume the same variability in the two groups (var.equal=TRUE). Nevertheless, p-value is still >= 0.05 so we would not reject the null hypothesis. Observe that the DF is 26 measurements, which is 28 measurements minus 1 per group.

weight = read.csv("ttests.csv")
t.test(weight$A, weight$B, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  weight$A and weight$B
## t = -1.8423, df = 26, p-value = 0.07686
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -14.6589062   0.8017634
## sample estimates:
## mean of x mean of y 
##  91.21429  98.14286

Group wise Anova (Não pareada)

Imagine that you have mesaured the blood pressure of a control group and the blood pressure of 3 experimental groups. Each of them subjected to a differente intervention.

In this case, since the variable measured if blook pressure and you have differente samples of this variable, all sample must be parametric to use the ANOVA test.

Group1 = c(2,3,7,2,6)
Group2 = c(10,8,7,5,10)
Group3 = c(10,13,14,13,15)


Contabined_Groups = data.frame(Group1, Group2,Group3)
Contabined_Groups
##   Group1 Group2 Group3
## 1      2     10     10
## 2      3      8     13
## 3      7      7     14
## 4      2      5     13
## 5      6     10     15
summary(Contabined_Groups)
##      Group1      Group2       Group3  
##  Min.   :2   Min.   : 5   Min.   :10  
##  1st Qu.:2   1st Qu.: 7   1st Qu.:13  
##  Median :3   Median : 8   Median :13  
##  Mean   :4   Mean   : 8   Mean   :13  
##  3rd Qu.:6   3rd Qu.:10   3rd Qu.:14  
##  Max.   :7   Max.   :10   Max.   :15
Stacked_Groups = stack(Contabined_Groups)
Stacked_Groups
##    values    ind
## 1       2 Group1
## 2       3 Group1
## 3       7 Group1
## 4       2 Group1
## 5       6 Group1
## 6      10 Group2
## 7       8 Group2
## 8       7 Group2
## 9       5 Group2
## 10     10 Group2
## 11     10 Group3
## 12     13 Group3
## 13     14 Group3
## 14     13 Group3
## 15     15 Group3
Anova_Results = aov(values ~ind, data=Stacked_Groups)                       

summary(Anova_Results) 
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## ind          2  203.3   101.7   22.59 8.54e-05 ***
## Residuals   12   54.0     4.5                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

How to present the results?

F(2,12)=22.59, p<.05

How to interpret the results?

You would reject the null hypothesis. The results are signficante, we didn´t get them by random chance.

What is the table where you find the F(2,12)?

Pair Wise t-test

On another hand suppose that you have measured the blood pressure of 30 persons and subjected then to some kind of intervention.

After the intervention you measured the blood pressure of each peson agina.
In this case you are not asking whether the group mean has chanted, insttead you collect the indiviual change of each pearson and then you calculate the mean of all the changes (difference). You want to ask if this mean differs (difference) from zero.

You are doing repeated measures: one subject has been measure twice.

This strategy will allow you to detect the effect on an intervention even if the initial values of the subjects differs a lot.

If the variables Before and After below are normal their diffence is also normal.

weight = read.csv("ttests.csv")
names(weight) = c("Subject","Before", "After")
weight$difference = weight$Before-weight$After
weight
##    Subject Before After difference
## 1        1     97    76         21
## 2        2     98    87         11
## 3        3     78    89        -11
## 4        4     80    90        -10
## 5        5     81    94        -13
## 6        6     84    96        -12
## 7        7     85    98        -13
## 8        8     85    99        -14
## 9        9     86   100        -14
## 10      10     94   106        -12
## 11      11    100   109         -9
## 12      12    102   112        -10
## 13      13    103   113        -10
## 14      14    104   105         -1

In the example below we are dealing with a different experiment where the values of A and B are logically coupled in pair.

Then we would set paired=TRUE.

Observe that in this case we get the mean of the differences instead of mean of x mean of y .

The p-value <= 0.05 shows us that the mean paired differences bewtween the two groups so at the 5% we would reject the null hypothesis and conclude that there seem to be a difference between the two groups.

Perhapes it was due to measurment of weights before and after some kind of intervention like a diet.

weight = read.csv("ttests.csv")
        
t.test(weight$A, weight$B, paired=TRUE)
## 
##  Paired t-test
## 
## data:  weight$A and weight$B
## t = -2.4884, df = 13, p-value = 0.02717
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -12.943697  -0.913446
## sample estimates:
## mean of the differences 
##               -6.928571

Non parametric

Group wise (Não pareada) Wilcoxon / Mann-Whitney U

IF your sample has less then 30 observations and you do not know if the sample is from a normally distributed population, you should use a non parametric test.

Wilcoxon applied to ordinals (1,2,3..) or Mann-Whitney.

weight = read.csv("ttests.csv")
weight = weight[,-1]
weight = weight[1:5,]
names(weight) = c("Control.Group", "Risk.Group")
weight
##   Control.Group Risk.Group
## 1            97         76
## 2            98         87
## 3            78         89
## 4            80         90
## 5            81         94

Observe from the wilcoxon test below that p-value=1 so we do not reject the null hypothesis.

wilcox.test(weight$Control.Group,weight$Risk.Group)
## 
##  Wilcoxon rank sum test
## 
## data:  weight$Control.Group and weight$Risk.Group
## W = 13, p-value = 1
## alternative hypothesis: true location shift is not equal to 0

Observe from the wilcoxon test below that p-value=1 so we also do not reject the null hypothesis.

dengue.fewer <-c(3000,3200,3500,5068,5679,6200,6300,7020)
scrub.typhus<-c(4400,4500,5900,6839,7561,9047,12300,14000)
wilcox.test(dengue.fewer,scrub.typhus)
## 
##  Wilcoxon rank sum test
## 
## data:  dengue.fewer and scrub.typhus
## W = 14, p-value = 0.06496
## alternative hypothesis: true location shift is not equal to 0

It is importante to point out the the non parametric wilcoxon/mann whitney test will also work if the values of the variables are ordinal ones.

So instead of:

weight = read.csv("ttests.csv")
weight = weight[,-1]
weight = weight[1:5,]
names(weight) = c("Control.Group", "Risk.Group")
weight
##   Control.Group Risk.Group
## 1            97         76
## 2            98         87
## 3            78         89
## 4            80         90
## 5            81         94

We could use the wilcoxon/Mann whitney test if the values were ordinals like the example below

Control.Group <-c(1,5,3,2,6.5)
Risk.Group<-c(4,9,8,6.5,10)
tabelaordinal = data.frame(Control.Group,Risk.Group)
tabelaordinal
##   Control.Group Risk.Group
## 1           1.0        4.0
## 2           5.0        9.0
## 3           3.0        8.0
## 4           2.0        6.5
## 5           6.5       10.0

Group wise Kruskal Wallis

The Kruskal-Wallis test is a version of the independent measures (One-Way) ANOVA that can be performed on ordinal or ranked data.

Imagine that you have mesaured the blood pressure of a control group and the blood pressure of 3 experimental groups. Each of them subjected to a differente intervention.

Why is the chi Square table useful for if I have the p-value?

Presentation of results:
H=2.854(2,N=18), p>.05 (We do not reject the null hypothesis)

#Example

Group1 = c(3.06,2.6,2.55,2.42,2.35)
Group2 = c(3.41,3.23,3.93,3.74,3.18)
Group3 = c(2.92,2.88,3.25,2.64,3.28)
Groups = data.frame(Group1,Group2, Group3)
Groups
##   Group1 Group2 Group3
## 1   3.06   3.41   2.92
## 2   2.60   3.23   2.88
## 3   2.55   3.93   3.25
## 4   2.42   3.74   2.64
## 5   2.35   3.18   3.28
Groups_Stacked = stack(Groups)
Groups_Stacked
##    values    ind
## 1    3.06 Group1
## 2    2.60 Group1
## 3    2.55 Group1
## 4    2.42 Group1
## 5    2.35 Group1
## 6    3.41 Group2
## 7    3.23 Group2
## 8    3.93 Group2
## 9    3.74 Group2
## 10   3.18 Group2
## 11   2.92 Group3
## 12   2.88 Group3
## 13   3.25 Group3
## 14   2.64 Group3
## 15   3.28 Group3
kruskal.test(Groups_Stacked$values,Groups_Stacked$ind)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Groups_Stacked$values and Groups_Stacked$ind
## Kruskal-Wallis chi-squared = 9.26, df = 2, p-value = 0.009755
# Reject the null hypothesis


# Example 

x <- c(2.9, 3.0, 2.5, 2.6, 3.2) # normal subjects
y <- c(3.8, 2.7, 4.0, 2.4)      # with obstructive airway disease
z <- c(2.8, 3.4, 3.7, 2.2, 2.0) # with asbestosis
kruskal.test(list(x, y, z))
## 
##  Kruskal-Wallis rank sum test
## 
## data:  list(x, y, z)
## Kruskal-Wallis chi-squared = 0.7714, df = 2, p-value = 0.68
# Do not reject the null hypothesis


# Example

require(graphics)
boxplot(Ozone ~ Month, data = airquality)

kruskal.test(Ozone ~ Month, data = airquality)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Ozone by Month
## Kruskal-Wallis chi-squared = 29.2666, df = 4, p-value = 6.901e-06
# REject the null hypothesis

To solve this situation you use a Kruskal Wallis.

Pair wise (Pareada) Wilcoxon / Mann-Whitney U

On another hand suppose that you have measured the blood pressure of 30 persons and subjected then to some kind of intervention.

After the intervention you measured the blood pressure of each person.

In this case you are not asking whether the group mean has chanted, instead you collect the indiviual change of each person and then you calculate the mean of all the changes (difference).

You want to ask if this mean differs (difference) from zero.

You are doing repeated measures: one subject hs been measure twice.

This strategy will allow you to detect the effect on an intervention even if the initial values of the subjects differs a lot.

You can use Wilcoxn for ordinals (1,2,3…) or not.

weight = read.csv("ttests.csv")
names(weight) = c("Subject","Before", "After")
weight$difference = weight$Before-weight$After
weight
##    Subject Before After difference
## 1        1     97    76         21
## 2        2     98    87         11
## 3        3     78    89        -11
## 4        4     80    90        -10
## 5        5     81    94        -13
## 6        6     84    96        -12
## 7        7     85    98        -13
## 8        8     85    99        -14
## 9        9     86   100        -14
## 10      10     94   106        -12
## 11      11    100   109         -9
## 12      12    102   112        -10
## 13      13    103   113        -10
## 14      14    104   105         -1

See the example below

A <- c(91,91,93,106,97,108,97,105,106,103,105,96,105,95,90,101)
B <- c(90,89,85,99,93,104,89,103,102,95,103,95,105,87,86,101)
wilcox.test(A,B, paired=T)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  A and B
## V = 105, p-value = 0.001021
## alternative hypothesis: true location shift is not equal to 0