Gabriel Peixoto Atividade Multipla

Carregando Pacotes

require(readxl)

## Carregando pacotes exigidos: readxl

require(readr)

## Carregando pacotes exigidos: readr

## Warning: package 'readr' was built under R version 4.2.2

require(skimr)

## Carregando pacotes exigidos: skimr

require(lmtest)

## Carregando pacotes exigidos: lmtest

## Carregando pacotes exigidos: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

require(car)

## Carregando pacotes exigidos: car

## Carregando pacotes exigidos: carData

require(dplyr)

## Carregando pacotes exigidos: dplyr

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:car':
## 
##     recode

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Importando e tratando os dados

df <- read_csv("PS4_GamesSales.csv" )

## Rows: 1034 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Game, Year, Genre, Publisher
## dbl (5): North America, Europe, Japan, Rest of World, Global
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Limpando os NA's 
df = na.omit(df)

Visualizando os dados

head(df)

## # A tibble: 6 × 9
##   Game                   Year  Genre Publi…¹ North…² Europe Japan Rest …³ Global
##   <chr>                  <chr> <chr> <chr>     <dbl>  <dbl> <dbl>   <dbl>  <dbl>
## 1 Grand Theft Auto V     2014  Acti… Rockst…    6.06   9.71  0.6     3.02   19.4
## 2 Call of Duty: Black O… 2015  Shoo… Activi…    6.18   6.05  0.41    2.44   15.1
## 3 Red Dead Redemption 2  2018  Acti… Rockst…    5.26   6.21  0.21    2.26   13.9
## 4 Call of Duty: WWII     2017  Shoo… Activi…    4.67   6.21  0.4     2.12   13.4
## 5 FIFA 18                2017  Spor… EA Spo…    1.27   8.64  0.15    1.73   11.8
## 6 FIFA 17                2016  Spor… Electr…    1.26   7.95  0.12    1.61   10.9
## # … with abbreviated variable names ¹Publisher, ²`North America`,
## #   ³`Rest of World`

df$Year = as.numeric(df$Year)
df = na.omit(df)
str(df)

## tibble [825 × 9] (S3: tbl_df/tbl/data.frame)
##  $ Game         : chr [1:825] "Grand Theft Auto V" "Call of Duty: Black Ops 3" "Red Dead Redemption 2" "Call of Duty: WWII" ...
##  $ Year         : num [1:825] 2014 2015 2018 2017 2017 ...
##  $ Genre        : chr [1:825] "Action" "Shooter" "Action-Adventure" "Shooter" ...
##  $ Publisher    : chr [1:825] "Rockstar Games" "Activision" "Rockstar Games" "Activision" ...
##  $ North America: num [1:825] 6.06 6.18 5.26 4.67 1.27 1.26 4.49 3.64 3.11 2.91 ...
##  $ Europe       : num [1:825] 9.71 6.05 6.21 6.21 8.64 7.95 3.93 3.39 3.83 3.97 ...
##  $ Japan        : num [1:825] 0.6 0.41 0.21 0.4 0.15 0.12 0.21 0.32 0.19 0.27 ...
##  $ Rest of World: num [1:825] 3.02 2.44 2.26 2.12 1.73 1.61 1.7 1.41 1.36 1.34 ...
##  $ Global       : num [1:825] 19.4 15.1 13.9 13.4 11.8 ...
##  - attr(*, "na.action")= 'omit' Named int [1:209] 448 467 631 725 728 729 730 734 735 736 ...
##   ..- attr(*, "names")= chr [1:209] "448" "467" "631" "725" ...

skim(df)

Data summary
Name	df
Number of rows	825
Number of columns	9
_______________________
Column type frequency:
character	3
numeric	6
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
Game	1	4	78	824
Genre	1	3	16	17
Publisher	1	2	38	152

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Year	1	2015.97	1.30	2013	2015.00	2016.00	2017.00	2020.00	▂▃▇▁▁
North America	1	0.26	0.62	0	0.00	0.05	0.19	6.18	▇▁▁▁▁
Europe	1	0.31	0.87	0	0.00	0.02	0.22	9.71	▇▁▁▁▁
Japan	1	0.04	0.12	0	0.00	0.00	0.04	2.17	▇▁▁▁▁
Rest of World	1	0.11	0.27	0	0.00	0.02	0.09	3.02	▇▁▁▁▁
Global	1	0.72	1.74	0	0.03	0.12	0.56	19.39	▇▁▁▁▁

Criando o modelo

model1 = lm(Japan ~ Genre + Publisher + `North America` + Europe + `Rest of World`, df)

Visualizando o nosso modelo e interpretando-o

par(mfrow= c(2,2))
plot(model1)

## Warning: not plotting observations with leverage one:
##   71, 116, 155, 163, 204, 225, 257, 294, 342, 367, 378, 385, 427, 468, 499, 501, 528, 549, 560, 567, 581, 596, 601, 614, 629, 641, 642, 649, 655, 672, 679, 687, 708, 709, 711, 714, 715, 718, 721, 724, 725, 730, 744, 746, 752, 757, 759, 760, 764, 765, 768, 771, 776, 778, 785, 790, 793, 796, 797, 823, 824, 825

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos

Podemos notar no primeiro gráfico, que nossos seguem um padrão linear, o segundo nos demosntra que seguem aproximadamente uma distribuição normal, podemos notar no terceiro gráfico que nossos dados tem uma variância homogênea.

Chegando alguns testes indenpendentemente

### Chegando a normalidade via test shapiro
shapiro.test(model1$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  model1$residuals
## W = 0.37162, p-value < 2.2e-16

# Podemos notar via teste de shapiro wilk que nossos residuos nao são normais 

hist(model1$residuals)

### Chegando a independencia dos residuos 

durbinWatsonTest(model1)

##  lag Autocorrelation D-W Statistic p-value
##    1      0.01790156      1.963336   0.386
##  Alternative hypothesis: rho != 0

# Podemos notar por meio do teste durbinwatson que os residuos são idependentes

### Chegando a homocedasticidade
bptest(model1)

## 
##  studentized Breusch-Pagan test
## 
## data:  model1
## BP = 41.6, df = 170, p-value = 1

# Podemos notar por meio do Breusch-Pagan Test que a variância dos nossos dados
# é homogenea