Week 7

This week we will be performing AB Testing and using hypothesis testing to calculate a difference between two groups.

Hypothesis 1: Legendary Pokemon

For our first hypothesis, let’s consider legendary Pokemon. Legendary Pokemon are special Pokemon that are typically found at the end of each Pokemon game, and you can typically only battle and capture them once, making them incredibly rare. There can be any number of Legendary Pokemon in a game, and some games have more legendary Pokemon than others.

Most Pokemon are modeled after real-life objects and creatures found in nature. However, some typings such as ‘Dragon’ and ‘Fairy’ are more ‘mystical’ than logical typings such as ‘water’ or ‘rock’. As such, let’s assume that …

H(sub-0): Legendary status is independent of Pokemon type (chi square test) .

H1(sub-1): Legendary status is not independent of Pokemon type

What we’re asking is whether the proportion of Legendary Pokemon should be roughly the same across all types. In other words, if 10% of all Pokemon are legendary overall, then we could expect, under independence, approx. 10% of water, fire, grass types etc. would be legendary.

First, let’s determine if we have enough data to perform a hypothesis test using the Neyman-Pearson framework. This framework allows us to be more objective and provides more explicit guidelines for how we can reject a hypothesis (important because statistics is most valuable in disproving hypothesis).

The test we’ll be using is the chi-square test. Since both columns are categorical, this test will measure how far observes counts deviate from our expectations.

\[ x^2 = \sum(O - E^2)/(E) \]

For our chosen significance level, let’s say, a = 0.05, we reject our alternative hypothesis if:

\[ x^2 \geq x^2_a, df \]

When the chi-square statistic becomes too large, this indicates that our observes and expected counts are very different, typically leaning towards the decision to reject the null hypothesis.

Here’s a table to make this even easier to read:

##           
##              0   1
##   bug       69   3
##   dark      26   3
##   dragon    20   7
##   electric  34   5
##   fairy     17   1
##   fighting  28   0
##   fire      47   5
##   flying     2   1
##   ghost     26   1
##   grass     74   4
##   ground    30   2
##   ice       21   2
##   normal   102   3
##   poison    32   0
##   psychic   36  17
##   rock      41   4
##   steel     18   6
##   water    108   6

These are our observed numbers of legendary and non-legendary Pokemon. Now that we have a table, we can perform a chi-squared test.

## Warning in chisq.test(legendary_table): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  legendary_table
## X-squared = 73.908, df = 17, p-value = 4.533e-09

There are 18 possible types and 2 categories (legendary vs. non-legendary), meaning our df are calculated as follows:

\[ df = (18-1)(2-1)= 17 \]

At a = 0.05, the critical chi-squared value is about

\[ x^2_.0517\approx 27.59 \]

So the decision rule is…

  1. Reject our null hypothesis if our critical value is greater than or equal to 27.59
  2. Fail to reject our null hypothesis if our critical value is less than 27.59

Let’s even look at our expected values to compare to our observations

## Warning in chisq.test(legendary_table): Chi-squared approximation may be
## incorrect
##           
##                     0         1
##   bug       65.707865 6.2921348
##   dark      26.465668 2.5343321
##   dragon    24.640449 2.3595506
##   electric  35.591760 3.4082397
##   fairy     16.426966 1.5730337
##   fighting  25.553059 2.4469413
##   fire      47.455680 4.5443196
##   flying     2.737828 0.2621723
##   ghost     24.640449 2.3595506
##   grass     71.183521 6.8164794
##   ground    29.203496 2.7965044
##   ice       20.990012 2.0099875
##   normal    95.823970 9.1760300
##   poison    29.203496 2.7965044
##   psychic   48.368290 4.6317104
##   rock      41.067416 3.9325843
##   steel     21.902622 2.0973783
##   water    104.037453 9.9625468

As we can see, both from our tables and our critical value, that we need to reject independence

This is a mosaic plot. This kind of plot is best for showing independence vs dependence because it directly encodes our expected vs observed values from above.

Hypothesis 2

For our next hypothesis let’s access whether the attack stat of Pokemon in our data set tend to stay around the average. In other words…

H(sub-0): The attack state values in the Pokemon data set follow a normal distribution

H(sub-1): The attack state values in the Pokemon data set do not follow a normal distribution

Instead of our previous framework, we’ll be using Fisher’s Significance Testing framework. In this framework, we assume that our null hypothesis is true and use our p-value to measure the test statistic (small p-value: reject null hyp., large p-value: do not reject null hyp.)

Based on our null hypothesis, we should look for the following:

Evidence that would contradict normality in this case are the following (non-exhaustive):

## 
##  Shapiro-Wilk normality test
## 
## data:  pokemon$attack
## W = 0.97948, p-value = 3.581e-09

The above calculation is a Shapiro-Wilk statistic (W). This gives a numerical measure of how normal the data will look, and is used to compute our p-value. If our data is closely aligned with a normal curve, W is close 1, meanwhile, if the data deviates (with clusters or long tails), then W will get smaller.

We can see from our numbers that our p-value is incredibly small, meaning that there is strong evidence against the hypothesis that Attack is normally distributed.

Let’s plot this drive our point home:

Although close to a bell-curve at a glance, from our formulas, we can see in actuality that the attack stat does not follow a normal distribution.