options(width=100)
knitr::opts_chunk$set(out.width='1000px',dpi=200,message=FALSE,warning=FALSE)
#load packages
library(ggplot2)
library(dplyr)
library(gridExtra)
library(Amelia)
library(corrplot)
library(MASS)
library(caret)
I like videogames and I like numbers, analyzing data and making some nice visualizations. Most of the games I played the last few years were like Borderlands 2, ie shooter-looter where the end goal is actually getting the best loot (gears, weapons) possible. These games have key-points in common :
As the end goal of these games is to reach the level cap as soon as possible, in order to engage in end games activities (raid bosses, raid, rifts,etc …)), the progression to reach these caps is relatively fast.
Also, as a gamer, I always found the progression in the early levels very satisfactory because you actually see your character getting stronger (in stats and visually (more important gears/armors/weapons)) and also it gives the feeling that you are making actual progression. (I’m looking at you the Witcher 3)
So I decided to qualify this feeling by looking at the data and see how it translates in term of level progression vs. time played.
disclaimer :
There are 2 levels to max in Destiny : character level (LVL) and Light level (LIGHT). LVL increases by doing activities (story missions, strikes, patrols) whereas LIGHT is an aggregate of the LIGHT levels of all you weapons/gears.
LVL is a soft level cap and the LIGHT is the hard level cap, meaning you reach the cap quicker, whereas reaching the hard level cap requires more grinding.
The data are retrieved via Bungie’s API (HTTP request along with personal API key) under a JSON format :
id1 ='4611686018428669871' id2 ='2305843009399642403' #get character summary fullName = 'http://www.bungie.net/Platform/Destiny/2/Account/'+(id1)+'/Character/'+ (id2)
There are several features I saved and looked throughout the progression :
destinyData<-read.csv('destiny/destiny_Warlock.csv',header=TRUE,sep=',')
head(destinyData,2)
## X date time LVL LIGHT time_played time_played_session hashId Activity
## 1 1 2016-09-05 15:33:08 1 5 1 1 1846390409 Story
## 2 2 2016-09-05 15:34:02 1 5 1 1 1846390409 Story
## Location Planet Name Vanguard Crucible Dead.orbit New.monarchy Fwc Crypto
## 1 Old Russia Earth A Guardian Rises 0 0 0 0 0 0
## 2 Old Russia Earth A Guardian Rises 0 0 0 0 0 0
## Gunsmith Crota House Queen session
## 1 0 0 0 0 1
## 2 0 0 0 0 1
h1 <- ggplot(data=destinyData,aes(x=time_played,y=LVL)) +
geom_point(aes(color=factor(session)),size=2) +
xlab('time played [min]') + ylab('LVL') +
theme(legend.position="none",axis.text=element_text(size=12),axis.title=element_text(size=14)) +
geom_hline(yintercept = 40,size=1,alpha=.5,color='black')
g1 <- ggplot(data=destinyData,aes(x=time_played,y=LIGHT)) +
geom_point(aes(color=factor(session)),size=3) +
xlab('Time played [min]') + ylab('LIGHT') + ylim(0,350) +
theme(legend.position="none",axis.text=element_text(size=12),axis.title=element_text(size=14)) +
geom_hline(yintercept = 335,size=1,alpha=.5,color='black')
model<-lm(destinyData$LIGHT~ (destinyData$time_played + I(destinyData$time_played^2)))
myfit<-function(x) model$coefficient[3]*x^2 + model$coefficient[2]*x + model$coefficient[1]
g1 <- g1 + stat_function(fun=myfit, colour="black",size=1)
grid.arrange(h1,g1,ncol=2)
Comments :
LIGHT = a + b*TimePlayed + c*(TimePlayed)^{2}
Another useful plot is the LVL progression per session, meaning how the LVL increases each time played.
I defined a (gameplay) session each time I play this character ; most of the time I played for 1-2 hours. Having this quantity helps better to have an understanding of how the levels increased vs time/time_played_in_session.
What is shown here is the LIGHT increase vs the time played for each session, where session is in chronological order. Lines are linear fits.
R
has some packages to plot the residuals (true - fit) and some other diagnostic plots. But we also see that this simple polynomial function achieves a R-squared
relatively good (0.996) so it is enough to use it for predictions, meaning how long do I need to play to reach a certain LVL (note : I do this in the Borderlands 2
section)
summary(model)
##
## Call:
## lm(formula = destinyData$LIGHT ~ (destinyData$time_played + I(destinyData$time_played^2)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.6327 -3.4388 -0.6347 2.8556 18.7959
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.788e+00 3.497e-01 16.55 <2e-16 ***
## destinyData$time_played 2.409e-01 7.310e-04 329.58 <2e-16 ***
## I(destinyData$time_played^2) -5.228e-05 3.177e-07 -164.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.277 on 2026 degrees of freedom
## Multiple R-squared: 0.996, Adjusted R-squared: 0.996
## F-statistic: 2.525e+05 on 2 and 2026 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(model)
timeAct<-ggplot(data=destinyData,aes(x=Activity)) + geom_bar(aes(fill=factor(Planet))) + ylab("Time played [min]") + xlab("Activities") + scale_fill_manual(name="Planet",values=c("#3B9AB2", "#78B7C5", "#EBCC2A", "#E1AF00", "#F21A00","#9A8822", "#F5CDB4"))
print(timeAct)
Knowing these type of data is very valuable for the player because it gives insights on how he is playing and where he can improve on his character progression. For Destiny, I was surprised to notice how much I spent in Social Activities ( = Tower, Reef). Clearly I spent a lot of time for gears and weapons management/optimization. It is also reflected in the plots LIGHT vs. total time played where we see small peaks at recurrent times and traduces how I play : session of 1-2 hours of strikes/story missions then I go to the Tower to decrypt engrams, hence the increase in LIGHT level.
ggplot(data=destinyData,aes(x=time_played,y=LVL)) +
geom_point(aes(color=factor(Activity)),size=2) +
xlab('time played [min]') + ylab('LVL') +
theme(axis.text=element_text(size=12),axis.title=element_text(size=14),legend.position='top') +
scale_color_manual(name="Activity",values=c("#3B9AB2", "#78B7C5", "#EBCC2A", "#E1AF00", "#F21A00","#9A8822")) +
geom_hline(yintercept = 40,size=.5,alpha=.5,color='black')
The game has 3 main characteristics, which also are a combination of your gears and weapons levels :
For this study I used the game API to collect the data. However, contrary to Destiny, the API is hosted on Uplay : Ubisoft games portal, which made this analysis less precise because :
As for Destiny :
divisionData<-read.csv('thedivision/theDivision_character.csv',sep=',')
head(divisionData,2)
## X date time playtime maxLevel mainStory maxDZ maxUGRank TotalKills ItemsExtracted
## 1 1 2017-01-01 07:20:02 0 30 100 12 0 0 0
## 2 2 2017-01-01 07:21:01 0 30 100 12 0 0 0
## SkillKills RogueAgentsKilled tempo Session SessionTimePlayedSec TotTimePlayedSec
## 1 0 0 2017-01-01 07:20:02 1 0 0
## 2 0 0 2017-01-01 07:21:01 1 59 59
## SessionTimePlayedMin KillsPerSession PVE TotTimePlayedMin
## 1 0.0000000 0 1 0.0000000
## 2 0.9833333 0 1 0.9833333
g1 <- ggplot(data=divisionData, aes(x=SessionTimePlayedMin, y=SkillKills)) +
geom_line(aes(color=factor(Session))) +
xlab('Time played (min)') + ylab('Total Kills') +
theme(text = element_text(size=8))
#fit of kills per session with intercept at starting kills
g2 <- ggplot(data=divisionData, aes(x=SessionTimePlayedMin, y=TotalKills, color=factor(Session))) +
xlab('Time played (min)') + ylab('Total Kills') +
theme(text = element_text(size=8))
g2 <- g2 + geom_smooth(method='lm') + geom_point(size=1)
g2
Contrary to the character EXP / LVL ,the number of TotalKills is relatively identical throughout the sessions played.
tt<-as.data.frame(divisionData %>%
dplyr::group_by(Session) %>%
dplyr::select(Session, KillsPerSession) %>%
dplyr::summarise(session = mean(Session), maxKills = max(KillsPerSession)))
ggplot(data=tt, aes(x=Session, y=maxKills)) +
geom_bar(stat="identity") +
xlab('Session') +
theme(text = element_text(size=8))
#fit of kills per session with intercept = 0
k1 <- ggplot(data=divisionData, aes(x=SessionTimePlayedMin, y=KillsPerSession, color=factor(Session))) +
xlab('Time played (min)') + ylab('Kills per Session') +
theme(text = element_text(size=8),legend.position="top")
k1 <- k1 + geom_point(size=1) + geom_smooth(formula = y ~ x+ x*x)
#all kills vs all time, but fit independent per session
k2 <- ggplot(data=divisionData, aes(x= TotTimePlayedMin, y=TotalKills, color=factor(Session))) +
xlab('Time played (min)') + ylab('Total Kills') +
theme(text = element_text(size=8),legend.position="top")
k2 <- k2 + geom_point(size=1) + geom_smooth(formula = y ~ x+ x*x)
grid.arrange(k1,k2,ncol=2)
Diablo 3 characters have a lot of features to manage, making this analysis very enjoyable.
Diablo 3 has a soft level cap at 70, then further progression increases the parangon
level. (similar to BL2 BAR)
d3Data<-read.csv('d3/d3-wizard.csv',sep=',')
head(d3Data,2)
## X date time act LVL difficulty strength dexterity intelligence vitality damage
## 1 1 2017-03-19 07:19:15 1 2 hard 9 9 13 11 3
## 2 2 2017-03-19 07:21:04 1 2 hard 9 9 13 11 3
## toughness healing gold Parangon tempo hour min Session SessionTimePlayedSec
## 1 205 0 6001668 87 2017-04-19 07:19:15 7 19 1 0
## 2 228 18 5994237 87 2017-04-19 07:21:04 7 21 1 109
## TotTimePlayedSec SessionTimePlayedMin TotTimePlayedMin
## 1 0 0.000000 0.000000
## 2 109 1.816667 1.816667
The plot below shows the LVL progression
vs. the TimePlayed
(left is for the full time played, right is to exclude the linear part due to LVL cap, for fitting purpose). Linear smooth
is used for fitting.
l1<-ggplot(data=d3Data,aes(x= TotTimePlayedMin,y=LVL)) +
geom_point(aes(color=factor(Session))) +
theme(legend.position="top") +
geom_hline(yintercept = 70, size=.5,alpha=.5,color='black')
l2<-ggplot(data=d3Data,aes(x= TotTimePlayedMin,y=LVL)) +
geom_point(aes(color=factor(Session))) +
geom_smooth(fullrange=FALSE,color='black') +
theme(legend.position="top") + xlim(0,900) +
geom_hline(yintercept = 70, size=.5,alpha=.5,color='black')
grid.arrange(l1,l2,ncol=2)
The plot below shows the comparison with other hardcore attempts (monk_v2
,monk_v3
). The interesting thing here is the difference in level progression : we see that the progression :
monk_v2
to monk_v3
is better : it was due to a weapon that does 100% critical hit damage, hence increasing the XP obtained from enemies.monk
to wizard
: due to a better optimization of the difficulty settings throughout the story playtime. Increasing the difficulty in D3 increases the XP gained.monk2<-read.csv('d3/monk-v2.csv',sep=',')
monk3<-read.csv('d3/monk-v3.csv',sep=',')
levDif = c('hard','expert','master','torment_1')
monk2$difficulty_reorder<- factor(monk2$difficulty, levels = levDif)
monk3$difficulty_reorder<- factor(monk3$difficulty, levels = levDif)
d3Data$difficulty_reorder<- factor(d3Data$difficulty, levels = levDif)
ggplot() +
geom_line(data=monk2,aes(x=TotTimePlayedMin,y=LVL,color="monk_v2"),size=1) +
geom_line(data=monk3,aes(x=TotTimePlayedMin,y=LVL,color="monk_v3"),size=1) +
geom_line(data=d3Data,aes(x=TotTimePlayedMin,y=LVL,color="current_wiz"),size=1) +
scale_color_manual(name="Character",values=c(current_wiz="#46ACC8",monk_v2="#F21A00",monk_v3="#EBCC2A")) +
geom_hline(yintercept = 70, size=.5,alpha=.5,color='black')
The plot below shows the correlation matrices of the different numerical features. It’s useful to know since it gives insights of which features is correlated (as some features are a combination of others) with another. The thinner the ellipse is, the more correlated the 2 features are. A positive correlation (blue in this plot) means the 2 features increases in the same direction.
resNum <- d3Data %>% dplyr::select(difficulty,LVL,strength,dexterity,intelligence,vitality,damage,toughness,healing,gold,TotTimePlayedMin)
corNum <- sapply(resNum,is.numeric)
corData<-cor(resNum[,corNum])
corrPlot<-corrplot(corData,method='ellipse')
I will explain here briefly how I tracked the data in BL2 since there was no API available :
shield
, life
data ; another for current level
progression as well as the level of the curent mission ; the last one is for the money
,eridium
df<-read.csv('./bl2/data_updated.csv')
#convert to date format
df$tempo <- as.POSIXct(paste(df$date,df$time), format="%Y-%M-%d %H:%M:%S")
df$Time <- as.POSIXct(df$time, format="%H:%M:%S")
df$hour<-as.numeric(format(as.POSIXct(df$time,format="%H:%M:%S"),"%H"))
df$min<-as.numeric(format(as.POSIXct(df$time,format="%H:%M:%S"),"%M"))
#calculate the time played in each session
diffSec<-function(mycol){
start = 0
myvec<-c()
myvec[1]=start
for(iter in 2:length(mycol)){
start = start + abs(as.numeric(difftime(mycol[iter],mycol[iter-1],units="sec")))
myvec[iter]=start
}
return(myvec)
}
df$trueLevel<-(df$current / df$maxCurrent) + df$level
#define sessions and save result in a single dataframe
cnt<-0
mylist<-list()
mylist[[1]]<-df %>% filter(date=="2017-03-29")
mylist[[2]]<-df %>% filter(date=="2017-03-30")
mylist[[3]]<-df %>% filter(date=="2017-03-31")
mylist[[4]]<-df %>% filter(date=="2017-04-01" & hour<=12)
mylist[[5]]<-df %>% filter(date=="2017-04-01" & hour >12)
mylist[[6]]<-df %>% filter(date=="2017-04-02" & hour<=12)
mylist[[7]]<-df %>% filter(date=="2017-04-02" & hour>12)
mylist[[8]]<-df %>% filter(date=="2017-04-07" & hour<=12)
mylist[[9]]<-df %>% filter(date=="2017-04-07" & hour>12)
mylist[[10]]<-df %>% filter(date=="2017-04-08" & hour<=12)
mylist[[11]]<-df %>% filter(date=="2017-04-08" & hour>12)
mylist[[12]]<-df %>% filter(date=="2017-04-09" & hour<=12)
mylist[[13]]<-df %>% filter(date=="2017-04-09" & hour>12)
mylist[[14]]<-df %>% filter(date=="2017-04-11")
mylist[[15]]<-df %>% filter(date=="2017-04-12")
mylist[[16]]<-df %>% filter(date=="2017-04-13")
mylist[[17]]<-df %>% filter(date=="2017-04-14")
mylist[[18]]<-df %>% filter(date=="2017-04-15" & hour<=12)
mylist[[19]]<-df %>% filter(date=="2017-04-15" & hour>12)
mylist[[20]]<-df %>% filter(date=="2017-04-16" & hour<=12)
mylist[[21]]<-df %>% filter(date=="2017-04-16" & hour>12)
mylist[[22]]<-df %>% filter(date=="2017-04-17")
mylist[[23]]<-df %>% filter(date=="2017-04-18")
mylist[[24]]<-df %>% filter(date=="2017-04-19")
mylist[[25]]<-df %>% filter(date=="2017-04-20")
mylist[[26]]<-df %>% filter(date=="2017-04-21")
mylist[[27]]<-df %>% filter(date=="2017-04-22" & hour<=12)
mylist[[28]]<-df %>% filter(date=="2017-04-22" & hour>12)
mylist[[29]]<-df %>% filter(date=="2017-04-23")
mylist[[30]]<-df %>% filter(date=="2017-04-25")
mylist[[31]]<-df %>% filter(date=="2017-04-26")
mylist[[32]]<-df %>% filter(date=="2017-04-27")
currentMax<-0
totMax<-0
makeSession<-function(x,num){
session<-rep(num,x[1],length(x))
return(session)
}
#calculate seesion features per dataframe
for(i in 1:length(mylist)){
mylist[[i]]$Session<-makeSession(mylist[[i]]$Time,i)
mylist[[i]]$SessionTimePlayedSec <- diffSec(mylist[[i]]$Time)
mylist[[i]]$TotTimePlayedSec <- mylist[[i]]$SessionTimePlayedSec + totMax
mylist[[i]]$SessionTimePlayedMin <- mylist[[i]]$SessionTimePlayedSec / 60
currentMax<-mylist[[i]]$SessionTimePlayedSec[length(mylist[[i]]$time)]
totMax<-totMax + currentMax
}
#merge all datsers for summary
dataBL2<-do.call(rbind,mylist)
dataBL2$TotTimePlayedMin <- dataBL2$TotTimePlayedSec / 60
RES<-data.frame('lvl'=unique(dataBL2$level)+1, 'exp'= unique(dataBL2$maxCurrent))
makeDiff<-function(x){
res<-c()
res[1]<-x[1]
for(i in 2:length(x)){
res[i]<-x[i] - x[i-1]
}
return(res)
}
RES$toNextLevel<-makeDiff(RES$exp)
ggplot() +
geom_histogram(data=RES,aes(x=lvl,y=toNextLevel),stat='identity',alpha=.5) +
geom_point(data=RES,aes(x=lvl,y=toNextLevel)) +
stat_smooth(data=RES,aes(x=lvl,y=toNextLevel),method = "lm", formula = y ~ poly(x, 2), size = 1,color='#F21A00')
This plot is very interesting because it gives the needed EXP to level up. The fit is a polynomial degree 2 which matches pretty well the data.
Having the current level of the character and the EXP needed to level is invaluable because we can calculate a true
level, meaning a floating point data instead of integer. It gives a better progression curve description.
g<-ggplot(data=dataBL2,aes(x=TotTimePlayedMin,y=trueLevel)) + geom_point(aes(color=factor(Session)),size=4)
g <- g + stat_smooth(method = "lm", formula = y ~ sqrt(x), size = .5 , color='black')
g
The fit is trueLevel
= sqrt(time_played
). More on the validity of the fit in the next section.
One another interesting feature to look at is the difference between the LVL required for a mission and the actual character LVL. What is interesting is the plot below :
character LVL
and mission LVL
; black line is perfect correlation. We see tha I tend to be over-levelup, meaning to be 1 or 2 LVL higher than the one required. This can have an impact on the progression since doing such missions will reward in less EXP points.character LV
L and mission LVL
vs. TimePlayed
; during all this playthrough I’m above the LVL required.f1<-ggplot(data=dataBL2,aes(x=levelMission,y=trueLevel)) + geom_point(aes(color=factor(Session)),size=4) + geom_abline(intercept = 0, slope = 1,color='black') + theme(legend.position='top')
f2<-ggplot() +
geom_line(data=dataBL2,aes(x=TotTimePlayedMin,y=trueLevel,color="character")) +
geom_line(data=dataBL2,aes(x=TotTimePlayedMin,y=levelMission,color="mission_required")) +
scale_color_manual(name="LVL",values=c(character="#EBCC2A",mission_required="#F21A00")) +
ylab('LVL') + theme(legend.position='top')
grid.arrange(f1, f2,ncol=2)
g1<-ggplot(data=dataBL2,aes(x=TotTimePlayedMin,y=trueLevel)) + geom_point(aes(color=factor(Session)),size=1)
g2<-ggplot(data=dataBL2,aes(x=TotTimePlayedMin,y=trueLevel)) + geom_point(aes(color=factor(mode)),size=1)
g3<-ggplot(data=dataBL2,aes(x=TotTimePlayedMin,y=trueLevel)) + geom_point(aes(color=factor(type)),size=1)
grid.arrange(g1,g2,g3,ncol=1)
With this data, we can try different fit to estimate the trueLevel
as a function of time_played
. First I create 2 samples :
train<-data.frame(dataBL2 %>% filter(Session<24))
test<-data.frame(dataBL2 %>% filter(Session>=24))
model<-lm(formula = trueLevel ~ sqrt(TotTimePlayedMin),data=train)
summary(model)
##
## Call:
## lm(formula = trueLevel ~ sqrt(TotTimePlayedMin), data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.77363 -0.93707 -0.07234 0.95870 2.41917
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.419168 0.175064 -8.107 4.26e-14 ***
## sqrt(TotTimePlayedMin) 0.847083 0.005681 149.113 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.046 on 210 degrees of freedom
## Multiple R-squared: 0.9906, Adjusted R-squared: 0.9906
## F-statistic: 2.223e+04 on 1 and 210 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(model)
Then we can predict the trueLevel
on the test sample :
pred = predict(model, new=test, interval="confidence",level=0.95)
resModel<-data.frame(cbind(pred,true=test$trueLevel))
true_vs_fit<-ggplot(data=resModel,aes(x=true,y=fit)) + geom_point(size=2) + geom_abline(slope=1,color='red') + ggtitle("fitted LVL vs. true values")
true_fit_residuals<-ggplot(data=resModel,aes(x=true - fit)) + geom_histogram(bins=15) + ggtitle("LVL residuals")
grid.arrange(true_vs_fit, true_fit_residuals,ncol=2)
We see that :
LVL
(perfect correlation is symbolized by the red line)test$fit<- resModel$fit
ggplot() +
geom_point(data=train,aes(x=TotTimePlayedMin,y=trueLevel,color="train"),size=3) +
geom_point(data=test,aes(x=TotTimePlayedMin,y=trueLevel,color="test"),size=3) +
geom_line(data=test,aes(x=TotTimePlayedMin,y=fit,color="prediction"),linetype=2,size=1) +
scale_color_manual(name="data sample",values=c(train="#EBCC2A",test="#F21A00",prediction="#46ACC8"))
Comments :
LVL
test
data around the predicted values, the line fit is in good agreement with the continuation of the train
sample.Given the curve of trueLevel
vs. timePlayed
, we could have think to use a polynomial function instead of the sqrt
function.
g<-ggplot(data=dataBL2,aes(x=TotTimePlayedMin,y=trueLevel)) + geom_point(aes(color=factor(Session)),size=4)
g <- g + stat_smooth(method = "lm", formula = y ~ poly(x, 4), size = .5 , color='black')
g
modelPol4<-lm(trueLevel ~ (poly(TotTimePlayedMin,4)),data=train)
#modelPol4<-lm(trueLevel ~ (TotTimePlayedMin + I(TotTimePlayedMin^2) + I(TotTimePlayedMin^3) + I(TotTimePlayedMin^4)),data=train)
summary(modelPol4)
##
## Call:
## lm(formula = trueLevel ~ (poly(TotTimePlayedMin, 4)), data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.54270 -0.36255 0.01055 0.41131 1.28770
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.3882 0.0418 535.64 <2e-16 ***
## poly(TotTimePlayedMin, 4)1 154.8124 0.6086 254.38 <2e-16 ***
## poly(TotTimePlayedMin, 4)2 -19.2013 0.6086 -31.55 <2e-16 ***
## poly(TotTimePlayedMin, 4)3 8.1195 0.6086 13.34 <2e-16 ***
## poly(TotTimePlayedMin, 4)4 -7.5035 0.6086 -12.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6086 on 207 degrees of freedom
## Multiple R-squared: 0.9969, Adjusted R-squared: 0.9968
## F-statistic: 1.651e+04 on 4 and 207 DF, p-value: < 2.2e-16
The R-squared
score is a bit better, however it completely fails to predict unknown data (test
sample)
pred2 = predict(modelPol4, new=test, interval="confidence",level=0.95)
resModel2<-data.frame(cbind(pred2,true=test$trueLevel))
test$fit<- resModel2$fit
ggplot() +
geom_point(data=train,aes(x=TotTimePlayedMin,y=trueLevel,color="train"),size=3) +
geom_point(data=test,aes(x=TotTimePlayedMin,y=trueLevel,color="test"),size=3) +
geom_line(data=test,aes(x=TotTimePlayedMin,y=fit,color="prediction"),linetype=2,size=1) +
scale_color_manual(name="data sample",values=c(train="#EBCC2A",test="#F21A00",prediction="#46ACC8"))
anova(model,modelPol4)
History :