Final Data Analysis Project

Carlos Castillo, Roger Rodriguez, Jose Canovas, Jaume Reverte, Jordi Pages, Claudio Maiorana

Pokemon data analisy

Data content

Table with all the type of data we are going to work with,the type of the data and a little description about the data
Name DataType DataUnits
ID int ID for each pokemon
Name factor Name of each poemon
Type 1 factor Each pokemon has a type, this determinates weakness/resistance to arracks
Type 2 factor Some pokemon are dual type and have 2
Total int Sum of all stats that come after this, a general guide to how strong a pokemon is
HP int Hit points, or health, defines how much damage a pokemon can withstand before fainting
Attack int The base modifier for normal attacks(Scratch,Punch,…)
Defense int The base damage resistance against normal attacks
SP Attack int Special attack, the base modifier for special attacks(fire blast, bubble beam,…)
SP Defense int The base damage resistance against special attacks
Speed int Determines which pokemon attacks first each round
Generation number The generation from that concret Pokémon
Legendary boolean If the Pokémon is legendary or not

Problem understanding

We choosed this data in order to help the community improving the team building, the battle strategies and probabilities for Pokémon tournaments

Descriptive statistics

Probability

We computed probability based on Type 1 Pokemons and their Legendary status, as Type 1 represents their basic element.

Population

We worked on a total population of 800 Pokemons divided in 18 Type 1 categories and 2 Legendary different status.

Joint Probability between Type 1 & Legendary

Marginal Probabilities

Conditional probability

Conditional probability Pokemons type1 if false legendary

Conditional probability Pokemons type1 if true legendary

Conditional probability Pokemons legendary if each pokemon type1

Steel

Rock

Psychic

Poison

Normal

Ice

Ground

Grass

Ghost

Flying

Fire

Fighting

Fairy

Electric

Dragon

Dark

Bug

Discrete Random Variable

Binomial

We want to know the probabilities in a 20 combat competition of the first pokemon on the enemy team to be ghost type (we can imagine our first pokemon is normal type, and normal type pokemon are inmune to ghost type so it would be beneficial for us):

HyperGeometric

We are in a battle, each player has a team of 6 Pokémon and our initial Pokémon is a Pokémon of the plant type. We would like to know the probability that there would be X Pokémon that were strong against our plant type Pokémon.

Continuous Random Variable

We couldnt find any example with this data to create a continuous random variable problem, so we did data carpentry and we created a problem based on the chance of an attack to deal more damage or not. Everytime a pokemon attacks with an speacial/physical attack there is a range, based on a random number given by the Pokémon calculating damage formula, that goes from 49 to 52. Concretelly, we want to know the probability that the damage done to the enemy is greater than or equal to 50 since the enemy Pokémon has 50 life points left.

Inference

Confidence Interval (1 Population)

Standard formation for an equilibrate team composition: Physical Sweeper, Physical Sweeper, Special Sweeper, Special Sweeper, Physical Tank and Wall.

->Physical Sweeper: Attack + speed ->Special Sweeper: SP Atack + speed ->Physical Tank: Attack + Defense ->Wall: HP + Defense + SP Defence

If we took 300 Pokémons at random, which would be the 95% of the “Confidence Interval for the population mean” for each one of the team member types? (Physical Sweeper, Special Sweeper, Physical Tank and Wall)

Confidence Interval (Physical Sweeper)

## We can be 95% Confident that the population mean ( 147.9367 ) lies in this interval{ 152.977 ,  142.8964 }

Confidence Interval (Special Sweeper)

## We can be 95% Confident that the population mean ( 142.5033 ) lies in this interval{ 147.3274 ,  137.6793 }
## We can be 95% Confident that the population mean ( 149.7067 ) lies in this interval{ 154.7549 ,  144.6584 }
## We can be 95% Confident that the population mean ( 215.0967 ) lies in this interval{ 221.125 ,  209.0683 }

Hypothesis Testing (1 Population)

Keeping in mind the last problem’s data (300 random pokemons), for each type of the team composition (Physical Sweeper, Special Sweeper, Physical Tank and Wall), is there enough evidences at “5% level of significance” to conclude the following Hypothesis?

Hypothesis Testing (Physical Sweeper) H0: μ <= 140 -> The mean is equal or less than 140 H1: μ > 140 -> The mean is greatter than 140

## Using the Critical Values:
## We reject the null hypothesis (H0 µ <= 140) at 5% level of significance as z ( 2.664255 ) is > than ( 1.644854 ).
## Using the P-Value:
## We reject de null hypothesis (H0 µ <= 140) at 5% level of significance as P-Value ( 0.003857957 ) is < than ( 0.05 ).
## There is enough evidence at the 5% level of significance to suggest that the mean is greater than 140

Hypothesis Testing (Special Sweeper) H0: μ <= 140 -> The mean is equal or less than 140 H1: μ > 140 -> The mean is greatter than 140

## Using the Critical Values:
## We confirm the null hypothesis (H0 µ <= 140) at 5% level of significance as z ( 1.422639 ) is < than ( 1.644854 ).
## Using the P-Value:
## We confirm the null hypothesis (H0 µ <= 140) at 5% level of significance as P-Value ( 0.07742036 ) is > than ( 0.05 ).
## There is enough evidence at the 5% level of significance to suggest that the mean is minus or igual than 140
## Using the Critical Values:
## We confirm the null hypothesis (H0 µ <= 150) at 5% level of significance as z ( 0.1260951 ) is < than ( 1.644854 ).
## Using the P-Value:
## We confirm the null hypothesis (H0 µ <= 150) at 5% level of significance as P-Value ( 0.4498283 ) is > than ( 0.05 ).
## There is enough evidence at the 5% level of significance to suggest that the mean is minus or igual than 150
## Using the Critical Values:
## We reject the null hypothesis (H0 µ <= 210) at 5% level of significance as z ( 2.006301 ) is > than ( 1.644854 ).
## Using the P-Value:
## We reject de null hypothesis (H0 µ <= 210) at 5% level of significance as P-Value ( 0.0224121 ) is < than ( 0.05 ).
## There is enough evidence at the 5% level of significance to suggest that the mean is greater than 210

Confidence Interval (2Population)

## Having in a count that the A_PhS_StDes( 53.1693 ), B_PhS_StDes ( 48.31645 ) and the Phs_StDes ( 51.15841 ) are not so far awey beetween them, we can say that they are iguals.
## We can be 95% Confident that the diference population mean ( 3 ) lies in this interval{ 3.13451 ,  2.86549 }
## Having in a count that the A_SpS_StDes( 58.12458 ), B_SpS_StDes ( 50.4189 ) and the SpS_StDes ( 53.05534 ) are not so far awey beetween them, we can say that they are iguals.
## We can be 95% Confident that the diference population mean ( 1.89 ) lies in this interval{ 2.02451 ,  1.75549 }
## Having in a count that the A_PhT_StDes( 57.28112 ), B_PhT_StDes ( 48.46968 ) and the PhT_StDes ( 53.98066 ) are not so far awey beetween them, we can say that they are iguals.
## We can be 95% Confident that the diference population mean ( 5.333333 ) lies in this interval{ 5.467843 ,  5.198823 }
## Having in a count that the A_W_StDes( 68.74741 ), B_W_StDes ( 60.19865 ) and the W_StDes ( 64.8473 ) are not so far awey beetween them, we can say that they are iguals.
## We can be 95% Confident that the diference population mean ( 6.056667 ) lies in this interval{ 6.191177 ,  5.922157 }

Hypothesis Testing (2 Population)

## Using the Critical Values:
## We confirm the null hypothesis (H0 µ1 = µ2) at 5% level of significance as t ( 1.365306 ) is < than ( 1.644854 ).
## Using the P-Value:
## We confirm the null hypothesis (H0 µ1 = µ2) at 5% level of significance as P-Value ( 0.08633507 ) is > than ( 0.05 ).
## There is enough evidence at the 5% level of significance to suggest that the  µ1 != µ2
## Using the Critical Values:
## We confirm the null hypothesis (H0 µ1 = µ2) at 5% level of significance as t ( 0.05517363 ) is < than ( 1.647406 ).
## Using the P-Value:
## We confirm the null hypothesis (H0 µ1 = µ2) at 5% level of significance as P-Value ( 0.4780093 ) is > than ( 0.05 ).
## There is enough evidence at the 5% level of significance to suggest that the  µ1 != µ2

Hypothesis Testing (Physical Tank) #H0: μ1 = μ2 -> mean1 is equal to mean2

H1: μ1 != μ2 -> mean1 is not equal to mean2

## Using the Critical Values:
## We confirm the null hypothesis (H0 µ1 = µ2) at 5% level of significance as t ( 1.041655 ) is < than ( 1.647406 ).
## Using the P-Value:
## We confirm the null hypothesis (H0 µ1 = µ2) at 5% level of significance as P-Value ( 0.1489963 ) is > than ( 0.05 ).
## There is enough evidence at the 5% level of significance to suggest that the  µ1 != µ2
## Using the Critical Values:
## We reject the null hypothesis (H0 µ1 = µ2) at 5% level of significance as t ( 2.017567 ) is > than ( 1.647406 ).
## Using the P-Value:
## We reject de null hypothesis (H0 µ1 = µ2) at 5% level of significance as P-Value ( 0.02204127 ) is < than ( 0.05 ).
## There is enough evidence at the 5% level of significance to suggest that µ1 = µ2

Conclusion

Aquí poner la conclusión

We have been able to observe that the pokemons that have better stats are the legendary ones and the megaevolutions, these are usually the most optimal pokemons for all the game positions. These pokemons are usually dragon / flying types. Seeing this we have detected three possible competitive teams: Create a team fully developed by legendary pokemons, these have the best stats but being somewhat predictable can find the opposite. Create a team of legendary against, although with less stats this team will go well against legendary. Create a team against the legendary against, this team will go well against the legendary against but will be weak against the legendary for having these better stats. This study has ended up being a mini guide for players who want to reach a higher level in the game in a simple way to see that pokemons are the best according to their playing position.

We can also conclude that the worst combination that can be chosen is that of plant type pokemons.