Pokemon Go - Higher CP will lead to be higher Pidgey’s HP after evolution?

The relationship between CP and HP

Sam Duc Trieu - s3651854

Last updated: 04, June, 2017

Introduction

“Pokemon Go is officially the biggest mobile game in the United States, with 21 million active daily users” (Kain, E 2016). Therefore, there are many statistical researches which are conducted to discover and deeply excavate the interesting statistics around this mobile game. Evolution is the interesting term is this game, that mean the pokemon will be growth and leads to be higher HP - Hit Points and higher CP - Combat Power. In this game, Pidgey is a specie of pokemon which is one of the most popular pokemons we can find in the game.

Introduction Cont.

Problem Statement

Data

In this research I use the Open Dataset that is collected by OpenIntro. This dataset consists of 75 objectives of 27 variables, which are the datas of 4 popular species of Pokemon we can find in this game - Pidgey, Caterpie, Weedle and Eevee. My research will demonstrate on Pidgey species which the sample size is 39. Besides that, the research is investigating if higher CP will lead to higher HP after evolution; therefore, the core datas we will analyse are CP and HP.

Data Cont.

As mentioned above, we will focus on CP and Height variables of Pidgey species after evolution. Specifically, - hp_new: Post-evolution Hit Points. - cp_new: Post-evolution Combat Power. The type of these numeric variables is continuous.

pokemon <- read_csv("C:/Users/JustinTSD/Desktop/Intro To Stats Assignment 4/pokemon.csv")
Pokemon_sub = pokemon %>% dplyr:: select(name, species, cp_new, hp_new)
Pidgey=Pokemon_sub%>%filter(species=="Pidgey")
Pidgey

Descriptive Statistics and Visualisation

plot(hp_new~cp_new, data=Pidgey,xlab="CP",ylab="HP",main="CP & HP of Pidgey after evolution")
PidgeyHPModel<-lm(hp_new~cp_new,data=Pidgey)
abline(PidgeyHPModel,col="red")

Decsriptive Statistics Cont.

CP_New=Pidgey %>% summarise(Min = min(cp_new,na.rm = TRUE),Q1 = quantile(cp_new,probs = .25,na.rm = TRUE),
                            Median = median(cp_new, na.rm = TRUE),
                            Q3 = quantile(cp_new,probs = .75,na.rm = TRUE),
                            Max = max(cp_new,na.rm = TRUE),Mean = mean(cp_new, na.rm = TRUE),
                            SD = sd(cp_new, na.rm = TRUE),n = n(),Missing = sum(is.na(cp_new))) 
HP_New=Pidgey %>% summarise(Min = min(hp_new,na.rm = TRUE),Q1 = quantile(hp_new,probs = .25,na.rm = TRUE),
                            Median = median(hp_new, na.rm = TRUE),
                            Q3 = quantile(hp_new,probs = .75,na.rm = TRUE),
                            Max = max(hp_new,na.rm = TRUE),Mean = mean(hp_new, na.rm = TRUE),
                            SD = sd(hp_new, na.rm = TRUE),n = n(),Missing = sum(is.na(hp_new)))
Factors=c("CP_New","HP_New")
Pidgey_DescriptiveStat=rbind(CP_New,HP_New)
PidgeyStat=data.frame(Factors,Pidgey_DescriptiveStat)
PidgeyStat

Hypothesis Testing

PidgeyHPModel%>%summary()
## 
## Call:
## lm(formula = hp_new ~ cp_new, data = Pidgey)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.686  -2.652   0.487   3.446   5.775 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 24.334296   1.346442   18.07   <2e-16 ***
## cp_new       0.090086   0.003197   28.18   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.149 on 37 degrees of freedom
## Multiple R-squared:  0.9555, Adjusted R-squared:  0.9543 
## F-statistic: 793.9 on 1 and 37 DF,  p-value: < 2.2e-16

Hypthesis Testing Cont.

\[R^2 = \frac{bLxy}{Lyy}=0.955\] - A total of 95.5% of the variability in a Pidgey’s HP can be explained by a linear relationship with Pidgey’s CP. The F-test for linear regression: H0: The data do not fit the linear regression model HA: The data fit the linear regression model F-statistic is 793.9 on df1=1 and df2=37 with p-value<.001 < 0.05 level of significance, we reject H0. There was statistically significant evidence that the data fit a linear regression model. - The intercept is a = 24.334. It shows that when CP of Pidgey is 0, Pidgey’s HP is 24.334. We test the statistical significance of a: \[H_0:\alpha=0\] \[H_A:\alpha\neq0\] t statistic = 18.07, p<0.001. The intercept is statistically significant at level of 0.05.

PidgeyHPModel%>%confint()%>%round(3)
##              2.5 % 97.5 %
## (Intercept) 21.606 27.062
## cp_new       0.084  0.097

95% CI for a to be [21.606,27.062], so we reject H0.

Hypthesis Testing Cont.

Hypthesis Testing Cont.

plot(PidgeyHPModel)

Hypthesis Testing Cont.

cor(Pidgey$hp_new,Pidgey$cp_new)
## [1] 0.9774826
library(Hmisc)
bivariate=as.matrix(dplyr::select(Pidgey,hp_new,cp_new))
rcorr(bivariate,type="pearson")
##        hp_new cp_new
## hp_new   1.00   0.98
## cp_new   0.98   1.00
## 
## n= 39 
## 
## 
## P
##        hp_new cp_new
## hp_new         0    
## cp_new  0

The correlation between HP & CP to be r = 0.98 and p-value=0.

Hypthesis Testing Cont.

A hypothesis test for r: \[H_0:r=0\] \[H_A:r\neq0\] \[t=r\sqrt\frac{(n-2)}{(1-r^2)}=29.956\] A two-tailed p-value:

2*pt(q=29.956,df=39-2,lower.tail = FALSE)
## [1] 1.515035e-27

p-value<0.001<0.05 level of significance, we reject H0. There was a statistically significant positive correlation between HP and CP of Pidgey species after evolution. We test H0 using a confidence interval approach. r is converted to a z-score: \[r=z=\frac{1}{2}ln(\frac{1+r}{1-r})=\frac{1}{2}ln(\frac{1+0.98}{1-0.98})=2.298\]

Hypthesis Testing Cont.

library(psychometric)
r=cor(Pidgey$hp_new,Pidgey$cp_new)
CIr(r=r,n=39,level=.95)
## [1] 0.9571681 0.9882202

This CI does not capture H0, so we reject H0. There was statistically significant positive correlation between HP and CP of Pidgey species after evolution.

Discussion

References