Sam Duc Trieu - s3651854
Last updated: 04, June, 2017
In this research I use the Open Dataset that is collected by OpenIntro. This dataset consists of 75 objectives of 27 variables, which are the datas of 4 popular species of Pokemon we can find in this game - Pidgey, Caterpie, Weedle and Eevee. My research will demonstrate on Pidgey species which the sample size is 39. Besides that, the research is investigating if higher CP will lead to higher HP after evolution; therefore, the core datas we will analyse are CP and HP.
As mentioned above, we will focus on CP and Height variables of Pidgey species after evolution. Specifically, - hp_new: Post-evolution Hit Points. - cp_new: Post-evolution Combat Power. The type of these numeric variables is continuous.
pokemon <- read_csv("C:/Users/JustinTSD/Desktop/Intro To Stats Assignment 4/pokemon.csv")
Pokemon_sub = pokemon %>% dplyr:: select(name, species, cp_new, hp_new)
Pidgey=Pokemon_sub%>%filter(species=="Pidgey")
Pidgeyplot(hp_new~cp_new, data=Pidgey,xlab="CP",ylab="HP",main="CP & HP of Pidgey after evolution")
PidgeyHPModel<-lm(hp_new~cp_new,data=Pidgey)
abline(PidgeyHPModel,col="red")CP_New=Pidgey %>% summarise(Min = min(cp_new,na.rm = TRUE),Q1 = quantile(cp_new,probs = .25,na.rm = TRUE),
Median = median(cp_new, na.rm = TRUE),
Q3 = quantile(cp_new,probs = .75,na.rm = TRUE),
Max = max(cp_new,na.rm = TRUE),Mean = mean(cp_new, na.rm = TRUE),
SD = sd(cp_new, na.rm = TRUE),n = n(),Missing = sum(is.na(cp_new)))
HP_New=Pidgey %>% summarise(Min = min(hp_new,na.rm = TRUE),Q1 = quantile(hp_new,probs = .25,na.rm = TRUE),
Median = median(hp_new, na.rm = TRUE),
Q3 = quantile(hp_new,probs = .75,na.rm = TRUE),
Max = max(hp_new,na.rm = TRUE),Mean = mean(hp_new, na.rm = TRUE),
SD = sd(hp_new, na.rm = TRUE),n = n(),Missing = sum(is.na(hp_new)))
Factors=c("CP_New","HP_New")
Pidgey_DescriptiveStat=rbind(CP_New,HP_New)
PidgeyStat=data.frame(Factors,Pidgey_DescriptiveStat)
PidgeyStatPidgeyHPModel%>%summary()##
## Call:
## lm(formula = hp_new ~ cp_new, data = Pidgey)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.686 -2.652 0.487 3.446 5.775
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.334296 1.346442 18.07 <2e-16 ***
## cp_new 0.090086 0.003197 28.18 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.149 on 37 degrees of freedom
## Multiple R-squared: 0.9555, Adjusted R-squared: 0.9543
## F-statistic: 793.9 on 1 and 37 DF, p-value: < 2.2e-16
\[R^2 = \frac{bLxy}{Lyy}=0.955\] - A total of 95.5% of the variability in a Pidgey’s HP can be explained by a linear relationship with Pidgey’s CP. The F-test for linear regression: H0: The data do not fit the linear regression model HA: The data fit the linear regression model F-statistic is 793.9 on df1=1 and df2=37 with p-value<.001 < 0.05 level of significance, we reject H0. There was statistically significant evidence that the data fit a linear regression model. - The intercept is a = 24.334. It shows that when CP of Pidgey is 0, Pidgey’s HP is 24.334. We test the statistical significance of a: \[H_0:\alpha=0\] \[H_A:\alpha\neq0\] t statistic = 18.07, p<0.001. The intercept is statistically significant at level of 0.05.
PidgeyHPModel%>%confint()%>%round(3)## 2.5 % 97.5 %
## (Intercept) 21.606 27.062
## cp_new 0.084 0.097
95% CI for a to be [21.606,27.062], so we reject H0.
plot(PidgeyHPModel)
cor(Pidgey$hp_new,Pidgey$cp_new)## [1] 0.9774826
library(Hmisc)
bivariate=as.matrix(dplyr::select(Pidgey,hp_new,cp_new))
rcorr(bivariate,type="pearson")## hp_new cp_new
## hp_new 1.00 0.98
## cp_new 0.98 1.00
##
## n= 39
##
##
## P
## hp_new cp_new
## hp_new 0
## cp_new 0
The correlation between HP & CP to be r = 0.98 and p-value=0.
A hypothesis test for r: \[H_0:r=0\] \[H_A:r\neq0\] \[t=r\sqrt\frac{(n-2)}{(1-r^2)}=29.956\] A two-tailed p-value:
2*pt(q=29.956,df=39-2,lower.tail = FALSE)## [1] 1.515035e-27
p-value<0.001<0.05 level of significance, we reject H0. There was a statistically significant positive correlation between HP and CP of Pidgey species after evolution. We test H0 using a confidence interval approach. r is converted to a z-score: \[r=z=\frac{1}{2}ln(\frac{1+r}{1-r})=\frac{1}{2}ln(\frac{1+0.98}{1-0.98})=2.298\] —
library(psychometric)
r=cor(Pidgey$hp_new,Pidgey$cp_new)
CIr(r=r,n=39,level=.95)## [1] 0.9571681 0.9882202
This CI does not capture H0, so we reject H0. There was statistically significant positive correlation between HP and CP of Pidgey species after evolution.
Kain, E 2016, “‘Pokémon Go’ Is The Biggest Mobile Game In US History - And It’s About To Top Snapchat”, Forbes, 13 July, viewed 27 May 2017, https://www.forbes.com/sites/erikkain/2016/07/13/pokemon-go-is-the-biggest-mobile-game-in-us-history-and-its-about-to-top-snapchat/#1f8f864f5d5c.