Gabriel Peixoto Atividade Multipla

Carregando Pacotes

require(readxl)
## Carregando pacotes exigidos: readxl
require(readr)
## Carregando pacotes exigidos: readr
## Warning: package 'readr' was built under R version 4.2.2
require(skimr)
## Carregando pacotes exigidos: skimr
require(lmtest)
## Carregando pacotes exigidos: lmtest
## Carregando pacotes exigidos: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
require(car)
## Carregando pacotes exigidos: car
## Carregando pacotes exigidos: carData
require(dplyr)
## Carregando pacotes exigidos: dplyr
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
## 
##     recode
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Importando e tratando os dados

df <- read_csv("PS4_GamesSales.csv" )
## Rows: 1034 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Game, Year, Genre, Publisher
## dbl (5): North America, Europe, Japan, Rest of World, Global
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Limpando os NA's 
df = na.omit(df)

Visualizando os dados

head(df)
## # A tibble: 6 × 9
##   Game                   Year  Genre Publi…¹ North…² Europe Japan Rest …³ Global
##   <chr>                  <chr> <chr> <chr>     <dbl>  <dbl> <dbl>   <dbl>  <dbl>
## 1 Grand Theft Auto V     2014  Acti… Rockst…    6.06   9.71  0.6     3.02   19.4
## 2 Call of Duty: Black O… 2015  Shoo… Activi…    6.18   6.05  0.41    2.44   15.1
## 3 Red Dead Redemption 2  2018  Acti… Rockst…    5.26   6.21  0.21    2.26   13.9
## 4 Call of Duty: WWII     2017  Shoo… Activi…    4.67   6.21  0.4     2.12   13.4
## 5 FIFA 18                2017  Spor… EA Spo…    1.27   8.64  0.15    1.73   11.8
## 6 FIFA 17                2016  Spor… Electr…    1.26   7.95  0.12    1.61   10.9
## # … with abbreviated variable names ¹​Publisher, ²​`North America`,
## #   ³​`Rest of World`
df$Year = as.numeric(df$Year)
df = na.omit(df)
str(df)
## tibble [825 × 9] (S3: tbl_df/tbl/data.frame)
##  $ Game         : chr [1:825] "Grand Theft Auto V" "Call of Duty: Black Ops 3" "Red Dead Redemption 2" "Call of Duty: WWII" ...
##  $ Year         : num [1:825] 2014 2015 2018 2017 2017 ...
##  $ Genre        : chr [1:825] "Action" "Shooter" "Action-Adventure" "Shooter" ...
##  $ Publisher    : chr [1:825] "Rockstar Games" "Activision" "Rockstar Games" "Activision" ...
##  $ North America: num [1:825] 6.06 6.18 5.26 4.67 1.27 1.26 4.49 3.64 3.11 2.91 ...
##  $ Europe       : num [1:825] 9.71 6.05 6.21 6.21 8.64 7.95 3.93 3.39 3.83 3.97 ...
##  $ Japan        : num [1:825] 0.6 0.41 0.21 0.4 0.15 0.12 0.21 0.32 0.19 0.27 ...
##  $ Rest of World: num [1:825] 3.02 2.44 2.26 2.12 1.73 1.61 1.7 1.41 1.36 1.34 ...
##  $ Global       : num [1:825] 19.4 15.1 13.9 13.4 11.8 ...
##  - attr(*, "na.action")= 'omit' Named int [1:209] 448 467 631 725 728 729 730 734 735 736 ...
##   ..- attr(*, "names")= chr [1:209] "448" "467" "631" "725" ...
skim(df)
Data summary
Name df
Number of rows 825
Number of columns 9
_______________________
Column type frequency:
character 3
numeric 6
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Game 0 1 4 78 0 824 0
Genre 0 1 3 16 0 17 0
Publisher 0 1 2 38 0 152 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Year 0 1 2015.97 1.30 2013 2015.00 2016.00 2017.00 2020.00 ▂▃▇▁▁
North America 0 1 0.26 0.62 0 0.00 0.05 0.19 6.18 ▇▁▁▁▁
Europe 0 1 0.31 0.87 0 0.00 0.02 0.22 9.71 ▇▁▁▁▁
Japan 0 1 0.04 0.12 0 0.00 0.00 0.04 2.17 ▇▁▁▁▁
Rest of World 0 1 0.11 0.27 0 0.00 0.02 0.09 3.02 ▇▁▁▁▁
Global 0 1 0.72 1.74 0 0.03 0.12 0.56 19.39 ▇▁▁▁▁

Criando o modelo

model1 = lm(Japan ~ Genre + Publisher + `North America` + Europe + `Rest of World`, df)

Visualizando o nosso modelo e interpretando-o

par(mfrow= c(2,2))
plot(model1)
## Warning: not plotting observations with leverage one:
##   71, 116, 155, 163, 204, 225, 257, 294, 342, 367, 378, 385, 427, 468, 499, 501, 528, 549, 560, 567, 581, 596, 601, 614, 629, 641, 642, 649, 655, 672, 679, 687, 708, 709, 711, 714, 715, 718, 721, 724, 725, 730, 744, 746, 752, 757, 759, 760, 764, 765, 768, 771, 776, 778, 785, 790, 793, 796, 797, 823, 824, 825
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos

Podemos notar no primeiro gráfico, que nossos seguem um padrão linear, o segundo nos demosntra que seguem aproximadamente uma distribuição normal, podemos notar no terceiro gráfico que nossos dados tem uma variância homogênea.

Chegando alguns testes indenpendentemente

### Chegando a normalidade via test shapiro
shapiro.test(model1$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model1$residuals
## W = 0.37162, p-value < 2.2e-16
# Podemos notar via teste de shapiro wilk que nossos residuos nao são normais 

hist(model1$residuals)

### Chegando a independencia dos residuos 

durbinWatsonTest(model1)
##  lag Autocorrelation D-W Statistic p-value
##    1      0.01790156      1.963336   0.386
##  Alternative hypothesis: rho != 0
# Podemos notar por meio do teste durbinwatson que os residuos são idependentes

### Chegando a homocedasticidade
bptest(model1)
## 
##  studentized Breusch-Pagan test
## 
## data:  model1
## BP = 41.6, df = 170, p-value = 1
# Podemos notar por meio do Breusch-Pagan Test que a variância dos nossos dados
# é homogenea