Stats_La2: Chi-Square tests

Author

Aman Aftab

title: “Stats_La2: Chi-Square tests” author: “Aman Aftab” format: html editor: visual


Abstract

Hypothesis testing is a technique for interpreting and drawing inferences about a population based on sample data. It aids in determining which sample data best support mutually exclusive population claims.Null Hypothesis (H0) - The Null Hypothesis is the assumption that the event will not occur. A null hypothesis has no bearing on the study’s outcome unless it is rejected. H0 is the symbol for it, and it is pronounced H-naught. Alternate Hypothesis(H1 or Ha) - The Alternate Hypothesis is the logical opposite of the null hypothesis. The acceptance of the alternative hypothesis follows the rejection of the null hypothesis. H1 is the symbol for it. The Chi-Square test is a statistical procedure for determining the difference between observed and expected data. This test can also be used to determine whether it correlates to the categorical variables in our data. It helps to find out whether a difference between two categorical variables is due to chance or a relationship between them.

Introduction

A chi-square test is a statistical test that is used to compare observed and expected results. The goal of this test is to identify whether a disparity between actual and predicted data is due to chance or to a link between the variables under consideration. As a result, the chi-square test is an ideal choice for aiding in our understanding and interpretation of the connection between our two categorical variables. A chi-square test or comparable nonparametric test is required to test a hypothesis regarding the distribution of a categorical variable. Categorical variables, which indicate categories such as animals or countries, can be nominal or ordinal. They cannot have a normal distribution since they can only have a few particular values. For example, a meal delivery firm in India wants to investigate the link between gender, geography, and people’s food preferences. In the standard applications of this test, the observations are classified into mutually exclusive classes. If the null hypothesis that there are no differences between the classes in the population is true, the test statistic computed from the observations follows a χ2 frequency distribution. The purpose of the test is to evaluate how likely the observed frequencies would be assuming the null hypothesis is true. Test statistics that follow a χ2 distribution occur when the observations are independent. There are also χ2 tests for testing the null hypothesis of independence of a pair of random variables based on observations of the pairs. Chi-squared tests often refers to tests for which the distribution of the test statistic approaches the χ2 distribution asymptotically, meaning that the sampling distribution (if the null hypothesis is true) of the test statistic approximates a chi-squared distribution more and more closely as sample sizes increase.

Formula

Figure 1

Where

c = Degrees of freedom

O = Observed Value

E = Expected Value

The degrees of freedom in a statistical calculation represent the number of variables that can vary in a calculation. The degrees of freedom can be calculated to ensure that chi-square tests are statistically valid. These tests are frequently used to compare observed data with data that would be expected to be obtained if a particular hypothesis were true.

The Observed values are those you gather yourselves.

The expected values are the frequencies expected, based on the null hypothesis.

Applications

1.Wald Test

In statistics, the Wald test(named after Abraham Wald) assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the precision of the estimate. Intuitively, the larger this weighted distance, the less likely it is that the constraint is true. While the finite sample distributions of Wald tests are generally unknown, it has an asymptotic χ2-distribution under the null hypothesis, a fact that can be used to determine statistical significance.

2.Minimum chi-square estimation

In statistics, minimum chi-square estimation is a method of estimation of unobserved quantities based on observed data.

In certain chi-square tests, one rejects a null hypothesis about a population distribution if a specified test statistic is too large, when that statistic would have approximately a chi-square distribution if the null hypothesis is true. In minimum chi-square estimation, one finds the values of parameters that make that test statistic as small as possible.

Among the consequences of its use is that the test statistic actually does have approximately a chi-square distribution when the sample size is large. Generally, one reduces by 1 the number of degrees of freedom for each parameter estimated by this method.

3.Nonparametric statistics

Nonparametric statistics is the branch of statistics that is not based solely on parametric families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution’s parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are violated. Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data have a ranking but no clear numerical interpretation, such as when assessing preferences. In terms of levels of measurement, non-parametric methods result in ordinal data. As non-parametric methods make fewer assumptions, their applicability is much wider than the corresponding parametric methods. In particular, they may be applied in situations where less is known about the application in question. Also, due to the reliance on fewer assumptions, non-parametric methods are more robust.

FAQs on Chi-Square test

1.What Does a Chi-Square Statistic Tell You?

Independence

A χ2 test for independence can tell us how likely it is that random chance can explain any observed difference between the actual frequencies in the data and these theoretical expectations.

Goodness-of-Fit

χ2 provides a way to test how well a sample of data matches the (known or assumed) characteristics of the larger population that the sample is intended to represent. This is known as goodness of fit.

If the sample data do not fit the expected properties of the population that we are interested in, then we would not want to use this sample to draw conclusions about the larger population.

2.When to Use a Chi-Square Test?

A chi-square test is used to help determine if observed results are in line with expected results, and to rule out that observations are due to chance.

A chi-square test is appropriate for this when the data being analyzed are from a random sample, and when the variable in question is a categorical variable.2 A categorical variable is one that consists of selections such as type of car, race, educational attainment, male or female, or how much somebody likes a political candidate (from very much to very little).

These types of data are often collected via survey responses or questionnaires. Therefore, chi-square analysis is often most useful in analyzing this type of data.

3.How to Perform a Chi-Square Test?

These are the basic steps whether you are performing a goodness of fit test or a test of independence:

  • Create a table of the observed and expected frequencies;

  • Use the formula to calculate the chi-square value;

  • Find the critical chi-square value using a chi-square value table or statistical software;

  • Determine whether the chi-square value or the critical value is the larger of the two;

  • Reject or accept the null hypothesis

4.What Is a Chi-square Test Used for?

Chi-square is a statistical test used to examine the differences between categorical variables from a random sample in order to judge goodness of fit between expected and observed results. 

5.Who Uses Chi-Square Analysis?

Since chi-square applies to categorical variables, it is most used by researchers who are studying survey response data. This type of research can range from demography to consumer and marketing research to political science and economics.

Conclusion

The test of independence and the test of goodness of fit are the two different kinds of chi-square tests. Both are used to judge whether an assumption or a hypothesis is true. A decision-making tool is produced as a result of the process. For instance:

A business might wish to assess whether its new product, a herbal supplement that claims to offer users an energy boost, is reaching the people who are most likely to be interested as a test of independence. On the premise that those who are active and health-conscious are most likely to purchase it, it is being sold on websites dedicated to sports and fitness. It conducts a thorough poll with the goal of gauging interest in the product among various demographic groups. According to the survey, there is no connection between interest in this product and those who are most health concerned.

In a test of goodness of fit, a marketing professional is considering launching a new product that the company believes will be irresistible to women over 45. The company has conducted product testing panels of 500 potential buyers of the product. The marketing professional has information about the age and gender of the test panels, This allows the construction of a chi-square test showing the distribution by age and gender of the people who said they would buy the product. The result will show whether or not the likeliest buyer is a woman over 45. If the test shows that men over 45 or women between 18 and 44 are just as likely to buy the product, the marketing professional will revise the advertising, promotion, and placement of the product to appeal to this wider group of customers

References

Wikipedia

Simplilearn

Investopedia