Statistical test

A statistical test provides a mechanism for making quantitative decisions about a process or processes. The intent is to determine whether there is enough evidence to “reject” a conjecture or hypothesis about the process. The conjecture is called the null hypothesis. Not rejecting may be a good result if we want to continue to act as if we “believe” the null hypothesis is true. Or it may be a disappointing result, possibly indicating we may not yet have enough data to “prove” something by rejecting the null hypothesis.

Test decision chart

PARAMETRIC TEST

Z-Test

This test is used for testing significance difference between two means (n>30) When the variance are known. It is used for comparing sample mean with population mean, two sample means, sample proportion with population proportion and two sample proportions.

Assumption

  • The sample must be randomly selected.
  • Data must be quantitative.
  • Sample should be larger than 30.
  • Data should follow Normal distribution.
  • Sample variances should be almost the same in both the groups of study.

Note: - If the SD of the populations is known a Z-test can be applied even if the sample is smaller than 30

Test Statistics

One sample t-Test

A one-sample t-test is used to test whether a population mean is significantly different from some hypothesized value.

Assumption

  • Dependent variables should be normally distributed.
  • Cases of the samples should be independent
  • Single sample n.
  • The sample size less than 30.
  • The sample should not contain any outliers.
  • The data is measurement data interval/ratio
  • We should know the population mean
  • The group have equal variance (Homogeneity of variance)

Large values of t-will lead to rejection of the null hypothesis when H0 is true.

Test Statistics

Two sample t-Test

A two sample t-test is used to compare the means of two independent populations denote µ1 & µ2) OR A two-sample t-test is used to test the difference (d0) between two population means. Greater the difference between the mean the evidence against the H0 is untrue.

Assumption

  • The data are continuous (not discrete)
  • The variance of the two populations is equal.
  • The two samples are independent (IID). There is no relationship between the individual in one sample as compared to the other.
  • Both samples are simple random samples from their respective population. Each individual in the population has an equal probability of being selected in the sample.
  • The data follows the normal probability distribution

Test Statistics

Pooled variance Formula

Paired t-Test

A paired t-test is used when we are interested in the difference between two variables for the same subject. Often the two variables are separated by time.

Assumption

  • The data are continuous (not discrete).
  • The data, i.e., the differences for the matched-pairs, follow a normal probability distribution.
  • The sample of pairs is a simple random sample from its population. Each individual in the population has an equal probability of being selected in the sample.
  • The paired t–test does not assume that observations within each group are normal, only that the differences are normal. And it does not assume that the groups are homoscedastic.

Null hypothesis

The null hypothesis is that the mean difference between paired observations is zero. When the mean difference is zero, the means of the two groups must also be equal. Because of the paired design of the data, the null hypothesis of a paired t–test is usually expressed in terms of the mean difference.

Test Statistics

One-Way ANOVA

One-Way ANOVA is used to simultaneously compare two or more group means based on independent samples from each group. The bigger the variation among sample group means relative to the variation of individual measurements within the groups. The greater the evidence that the hypothesis of equal group means is untrue. It is the extension of t-test. MSG is an estimate of the variability among groups and MSE is an estimate of the variability within groups. Uses of one way ANOVA method might be appropriate for comparing mean responses among a number of parallel dose groups or among various strata based on patients.

Assumption

  • Each group sample is drawn from a normally distributed population
  • All samples are drawn independently of each other
  • Within each sample, the observations are sampled randomly and independently of each other
  • Factor effects are additive
  • In ANOVA you assume variance homogeneity which means that the within group variance is constant across group Expressed as, simply All populations have a common variance σ12 = σ22= ————= σk2 = σ2 Where, σi2 denotes the unknown variance of the ith population. The common variance σ2 is estimate by S2 which is a weighted of the K sample variances.

Layout for the one – way ANOVA

The formula for the sample variance for Group i

Because Si2 is the estimated variance within Group i, s2 represents an average Within - group variation over all groups. In ANOVA, s2 is called the mean square Error (MSE) and its numerator is the sum of squares for error (SSE). The ‘error’ is The deviation of each observation from its group mean. If SSE is expressed as the Sum of squared errors,

then the pooled variance s2 is just SSE/ (N–k). The denominator N–k, where N = n1 + n2 +…+ nk, is the total sample size over all samples and is known as the Degrees of freedom associated with the error.

The variability among groups can be measured by the deviation of the average Observation in each group from the overall average, y. That is, the overall variance Obtained by replacing each observation with its group mean ( ), represents the Between group variability MSG. Its numerator is the sum of squares for groups (SSG), computed as

Where y is the mean of all N observations. Each group mean is treated as a single observation, so there are k–1 degrees of freedom associated with the SSG. The mean square for the GROUP effect is the sum of squares divided by its degrees of freedom

MSG = SSG / (k–1)

When the null hypothesis is true, the variation between groups should be the same as the variation within groups. Therefore, under H0, the test statistic F should be close to 1 and has an F-distribution with k–1 upper degrees of freedom and N–k lower degrees of freedom. Critical F-values based on the F-distribution are used to determine the rejection region.

Repeated Measures Analysis (RMANOVA)

Repeated measures ANOVA is the equivalent of the one-way ANOVA, but for related, not independent groups, and is the extension of the dependent t-test (paired). A repeated measures ANOVA is also referred to as a within-subjects ANOVA or ANOVA for correlated samples. All these names imply the nature of the repeated measures ANOVA, that of a test to detect any overall differences between related means.

It is used for two types of study design.

  • Change in mean scores over 3 or more time points.
  • Change in mean Score Under three or more different Conditions. (To detect any overall differences between related means)

The repeated response measurements can be used to characterize a response profile over time.

Assumption

  • The observation within each treatment condition must be independent
  • The population distribution within each treatment must be normal
  • The variances of the population distribution for each treatment should be equivalent

Layout for a Repeated Measures Design with 3 Groups

ANOVA Summary for Repeated Measures Design

Two Way ANOVA

The two-way ANOVA is a method for simultaneously analyzing two factors that affect a response. As in the one way ANOVA there is a group effect such as treatment group or dose level. The two-way ANOVA also includes another identifiable source of variation called a blocking factor whose variation can be separated from the error variation to give more precise group comparisons. For this reason the two way ANOVA layout is sometimes called a randomized block design.

Layout of Two-Way ANOVA

The general entries in a two-way ANOVA summary table are represented as shown In below

ANOVA Summary Table for the Two-Way ANOVA

  • SS represents the sum of squared deviations associated with the factor listed under ‘Source’.
  • The mean square (MS) is found by dividing the SS by the degrees of freedom. The MS represents a measure of variability associated with the factor listed under ‘Source’. When there is no effect due to the specified factor, this variability reflects Measurement error variability, σ2, which is also estimated by MSE.
  • The F-values are ratios of the effect mean squares to the mean square error (MSE). Under the null hypothesis of no effect, the F-ratio should be close to 1. These F values are used as the test statistics for testing the null hypothesis of no mean Differences among the levels of the factor.
  • The F-test for group (FG) tests the primary hypothesis of no group effect. Denoting the mean for the ith group by μi, the test summary is summarized as follows:

Assumption

  • The dependent variable should be measured at the continuous level (i.e., they are interval or ratio variables).
  • The two independent variables should each consist of two or more categorical, independent groups
  • The samples must be independent.
  • The variances of the populations must be equal.
  • The groups must have the same sample size.
  • There needs to be homogeneity of variances for each combination of the groups of the two independent variables.
  • The dependent variable should be approximately normally distributed for each combination of the groups of the two independent variables.
  • There should be no significant outliers. Outliers are data points within your data that do not follow the usual pattern

F-Test

The F-test is designed to test if two population variances are equal. It does this by comparing the ratio of two variances. So, if the variances are equal, the ratio of the variances will be 1.

Assumption

  • The larger variance should always be placed in the numerator
  • The populations from which the samples were obtained must be normally or approximately normally distributed.
  • The samples must be independent of each other.
  • The variances of the populations must be equal.

Test Statistics

  • S1 – mean square value of first sample
  • S2 – mean square value of second sample If the null hypothesis is true, then the F test-statistic given above can be simplified. This ratio of sample variances will be test statistic used. If the null hypothesis is false, then we will reject the null.

Non-PARAMETRIC TEST

Chi-Square Test

The chi-square test is used to compare two independent binomial proportions p1 & p2. In the analysis of clinical data, the binomial proportions typically represent a response rate, cure rate, survival rate, abnormality rate or other rate. It’s used to compare above rates between a treated group and parallel control group. It is an approximate test which may be used when the normal approximation to the binomial distributions is valid and alternative of chi-square test is fisher’s exact test.

Observation is made of X1 responders out of n1 patients who are studied in one group, and X2 responders out of n2 patients in a second, independent group, as shown in below

Assume that each of the ni patients in Group i (i =1, 2) have the same chance, pi, of responding, so that X1 and X2 are independent binomial random variables .The goal is to compare population ‘response’ rates (p1 vs. p2) based on these sample data. Compute

Assuming that the normal approximation to the binomial distribution is applicable, the chi-square test summary is

This computing formula for the chi-square statistic can be shown below

where the Oi’s and Ei’s are the observed and expected cell frequencies, respectively, as shown in below Observed (O) and Expected (E) Cell Frequencies

Assumption

  • All the observation must be independent. No individual item should be included twice or a number of items in the sample.
  • The total number of observation should be large. The chi-square test should not be used if n>50.
  • All the events must be mutually exclusive
  • For comparison purposes, the data must be in original units.
  • If the theoretical frequencies is less than five, than we pool it with the preceding or the succeeding frequency, so that the resulting sum is greater than five.

Wilcoxon Signed Rank Test

The Wilcoxon signed rank test is a non-parametric analog of the one-sample t-test. The signed rank test can be used to make inferences about a population mean or median without requiring the assumption of normally distributed data. Wilcoxon signed rank test is based on the ranks of the data. In clinical trials is used to compare responses between correlated or paired data. Layout is same as that of paired differences t-test.

Assumption

  • It is used to compare two sets of scores that come from the same participants.
  • The observations are independent.
  • The scale of measurement is at least interval.
  • The sample population is symmetric.
  • The variable of interest is continuous.

Formula:-

Mann-Whitney Test (Wilcoxon Rank Sum Test)

The Wilcoxon rank sum test is non-parametric analog of the two-sample t-test based on ranks of the data. It’s used to compare location parameter such as the mean or median.

Assumption

  • Two independent populations are non-normally distributed.
  • It is developed for use with continuous numeric data.
  • Samples are drawn independently from each other

Test is also applied to the analysis of ordered categorical data.

Kruskal-Wallis Test

The kruskal-wallis test is a non-parametric analogue of the one-way ANOVA. It is used to compare population location parameters (mean, median etc.) among two or more groups based on independent samples. Just as one-way ANOVA, it is an extension of the two-sample t-test based on ranks of the data and used to compare responses among three or more dose groups or treatment groups using samples of non-normally distributed response data.

Assumption

  • The Kruskal-WallisKruskal-Wallis HH TestTest is a nonparametric procedure that can be used to compare more than two populations in a completely randomized design
  • Responses are non-normally distributed.
  • It is an extension of the Wilcoxon rank sum test for more than two groups.
  • We use the sums of the ranks of the k samples to compare the distributions.

Layout of the Kruskal-Wallis Test

Friedman Test

Friedman test is non-parametric alternative to the one-way ANOVA with repeated measures. It’s used to test for differences between groups when the dependent variable being measured is ordinal. It can also be used to continuous data that has violated the assumptions necessary to run the one-way ANOVA with repeated measures.

Assumption

  • One group that is measured on three or more different occasions.
  • Group is random sample from the population.
  • Your dependent variable should be measured at the ordinal or interval/ratio level
  • Samples do not need to be normally distributed.

Binomial Test

The binomial test is used to make inferences about a proportion or response rate based on a series of independent observations. Each resulting in one of two possible mutually exclusive outcomes. The outcomes can be response to treatment or no response, cure or no cure, survival or death or in general event or non-event. In clinical trials a common use of the binomial test is for estimating a response rate p using the no of patients (x) who respond to an investigating treatment out of total of an studied.

Assumption

  • Items are dichotomous (i.e. there are two of them) and nominal.
  • The sample size is significantly less than the population size.
  • The sample is a fair representation of the population.
  • Sample items are independent(one item has no bearing on the probability of another).

The general formula for a binomial probability is:

Fisher’s Exact Test

The Fisher Exact test is a test of significance that is used in the place of chi square test in 2 by 2 tables, especially in cases of small samples.

The Fisher Exact test tests the probability of getting a table that is as strong due to the chance of sampling. The word ‘strong’ is defined as the proportion of the cases that are diagonal with the most cases.

The Fisher Exact test is generally used in one tailed tests. However, it can also be used as a two tailed test as well. It is sometimes called a Fisher Irwin test. It is given this name because it was developed at the same time by Fisher, Irwin and Yates in 1930.

Assumption

  • It is assumed that the sample that has been drawn from the population is done by the process of random sampling.
  • A directional hypothesis is assumed.
  • It is assumed that the value of the first person or the unit of items that are being sampled do not get affected by the value of the second person or the other unit of item being sampled. This assumption of the fisher exact test would be violated if the data is pooled or united.
  • In the fisher exact test, mutual exclusivity within the observations is assumed. * The dichotomous level of measurement of the variables is assumed

Layout for Fisher’s Exact Test

Given equal proportions, p1 = p2, the probability of observing the configuration. When the marginal totals are fixed, is found by the hyper geometric probability distribution as

is the combinatorial symbol that represents “the number of ways ‘b’ items can be Selected from a set of ‘a’ items”. (Note: The symbol ‘!’ is read ‘factorial’ with a! =a(a-1)(a-2)….(3)(2)(1). For example, 5! = (5)(4)(3)(2)(1) = 120.) The probability of the table configuration simplifies to

The p-value for the test, Fisher’s exact probability, is the probability of the observed configuration plus the sum of the probabilities of all other configurations with a more extreme result for fixed row and column totals.

McNamara’s Test

The McNemar test is a non-parametric test for paired nominal data. It’s used when you are interested in finding a change in proportion for the paired data. For example, you could use this test to analyze retrospective case-control studies, where each treatment is paired with a control. It could also be used to analyze an experiment where two treatments are given to matched pairs. This test is sometimes referred to as McNemar’s Chi-Square test because the test statistic has a chi-square distribution.

Assumption

  • You must have one nominal variable with two categories (i.e. dichotomous variables) and one independent variable with two connected groups.
  • The two groups in your the dependent variable must be mutually exclusive. In other words, participants cannot appear in more than one group.
  • Your sample must be a random sample.

Layout for McNamara’s Test

Cochran-Mantel-Haenszel

The C.M.H test is used in clinical trials to compute two binomial proportions from independent population based on stratified samples. This test provides a means of combining a number of 2x2 tables of the type. The stratifications factor can represent patient subgroup such as study centers gender, age group or disease severity and acts similar to the blocking factor in a two-way ANOVA. The CMH test obtains an overall comparison of response rates adjusted for the stratification variables. The adjustment is simply a weighting of 2x2 tables in proportions to the within strata sample sizes. The CMH test is often used in the comparison of response rates between two treatment groups in a multi-center study using the study center as strata. A test used in the analysis of stratified or matched categorical data. It allows an investigator to test the association between a binary predictor or treat binary outcome such as case or control states while taking into account the stratification.

Layout for the Cochran-Mantel-Haenszel Test

Let p1 and p2 denote the overall response rates for Group 1 and Group 2 respectively. For Stratum j, compute the quantities

Layout for the Cochran-Mantel-Haenszel Test