Woad (World Offshore Accident Database) gives you access to accident data for diverse offshore facility types. It has been curated since 1975 by experts at DNV GL, providing accident causes, location, social and economic impacts, etc. that prove invaluable for a variety of risk management initiatives.
Offshore accident data for oil and gas facilities Access to the world’s most comprehensive available offshore accident data Information on over 6000 offshore accidents and incidents from 1970 to date Technical data on approximately 3700 offshore units The World Offshore Accident Database is a tool used extensively by key players in the offshore industry worldwide, including rig owners, drilling operators, insurance companies, consultants, salvage companies and regulatory authorities Continuously updated offshore accident data Knowledge of past accidents serves as important input to risk assessment initiatives and helps you ensure that known hazards are properly understood and effectively mitigated against. The World Offshore Accident Database contains a vast range of information covering the different aspects of an accident – technical, economic, social, operational etc. The range of useful applications of information from the World Offshore Accident Database is extensive. It shows the rate of accidents by unit type, geography, function, accident type and more - extremely useful for quantitative risk analysis (QRA).
O objetivo desta etapa é limpar os dados para preparar a organização de unidades relevantes de análise.
O objetivo desta etapa é compreender as variáveis de dados, tanto em seus níveis de dados (univariado), quanto em suas relações com outras variáveis (bivariada e multivariada). Assim, a partir dessas análises buscar-se-á estabelecer um contexto referencial que busca responder as seguintes perguntas:
O objetivo desta etapa é identificar fatores que modelam o contexto referencial.
O objetivo desta análise é aplicar modelos regressivos e preditivos para estabelecer possíveis cenários.
O objetivo desta etapa é estabelecer um sistema capaz de oferecer alternativas de ações com base nos cenários estabelecidos.
library(rattle)
library(caret)
library(rpart)
library(rpart.plot)
library(corrplot)
library(randomForest)
library(RColorBrewer)
library(dplyr)
library(plotly)
library(ggplot2)
library(tidyr)
library(magrittr)
library(plotrix)
library(rgl)
library(lubridate)
library(ggplot2)
library(GGally)
library(corrplot)
library(corrgram)
library(ppcor)
library(readr)
library(ggvis)
library(gganimate)
library(gifski)
library(av)
library(magick)
library(viridis)
library(hrbrthemes)
library(caret)panel.hist <- function(x, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
}
# 1.2 by Melina de Souza Leite
panel.lm <- function (x, y, col = par("col"), bg = NA, pch = par("pch"),
cex = 1, col.line="red") {
points(x, y, pch = pch, col = col, bg = bg, cex = cex)
ok <- is.finite(x) & is.finite(y)
if (any(ok)) {
abline(lm(y[ok]~x[ok]), col = col.line)
}
}
# 1.3 help(pairs) by Melina de Souza Leite
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
# Transform vector into a data frame with frequency of levels and proportion
.Unianalysis = function (x) {
y <- as.data.frame(as.table(table(x)))
y <- mutate(y, percentual = prop.table(y$Freq) *100)#Proportion
y <- arrange(y, desc(y$Freq))
return(y)
}
.BarPlot <- function (z) {
z %>%
ggplot( aes(x=x, y=Freq)) +
geom_bar(stat="identity", fill="#f68060", alpha=.6, width=.4) +
coord_flip() +
xlab("") +
theme_bw()
}Retirada: _ normalização dos rótulos de dados - Conversão para CSV - linhas 1 e 2 - linhas 3736 até 3738 - linhas vazias da coluna “Human Cause” foram trocadas por “not identified” - linha 418 por não ter dados completos - 3987 registros não que não possuiam dados de custo do evento foram tratados como 0, para que fosse possível fazer o somatório dos dados existentes.
A variável CustoHumano refere-se a soma de todas as fatalidades e ferimentos da tripulação e de terceiros, com peso 10 para fatalidades e peso 3 para ferimentos.
O gráfico abaixo mostra as ocorrências por tipo de unidade correlacionado com o custo humano e grau de danos.
setwd("~/Regression Models")
library(readr)
Cleaning_completo_woad <- read_delim("Cleaning_completo_woad.CSV",
";", escape_double = FALSE, col_types = cols(
DamageCost_million = col_number(),
DrillDepth_km = col_number(),
WaterDepth_m = col_number(),
WindSpeed_m_s= col_number(),
FatalitiesCrew = col_number(),
Fatalities3rd_Party = col_number(),
InjuriesCrew = col_number(),
Injuries3rd_Party = col_number(),
Accident_Date = col_date(format = "%m/%d/%Y")),
trim_ws = TRUE)trainVariable <- Cleaning_completo_woad %>%
mutate(HumanCost = (FatalitiesCrew * 10) + (Fatalities3rd_Party * 10) + (InjuriesCrew * 3) + (Injuries3rd_Party * 3)) %>%
mutate(Year = year(Accident_Date)) %>%
mutate(Month = month(Accident_Date))
by_Event_Function <-
trainVariable %>%
group_by(MainEvent, Function,TypeofUnit, Damage,SpillType, AccidentCategory,Year, Month) %>%
summarise(How_Many = n(),
Owner = n_distinct(Owner),
FatalitiesCrew = sum(FatalitiesCrew),
Fatalities3rd_Party = sum(Fatalities3rd_Party),
InjuriesCrew = sum(InjuriesCrew),
Injuries3rd_Party = sum(Injuries3rd_Party),
DamageCost_million= sum(DamageCost_million),
DrillDepth_km = mean(DrillDepth_km),
WaterDepth_m = mean(WaterDepth_m),
WindSpeed_m_s= sum(WindSpeed_m_s),
HumanCost = sum(HumanCost)) %>%
mutate(Mean_HumanCost = (FatalitiesCrew + Fatalities3rd_Party + InjuriesCrew + Injuries3rd_Party)/How_Many) %>%
arrange(desc(How_Many))
write.csv2(by_Event_Function, file = "by_Event_Function.CSV")
write.csv2(trainVariable, file = "trainVariable.CSV")
damage <- cbind(by_Event_Function$Damage,
by_Event_Function$How_Many,
by_Event_Function$HumanCost,
by_Event_Function$Mean_HumanCost,
by_Event_Function$DamageCost_million)
colnames(damage) <- cbind("Grau de danos","Ocorrências", "Total Custo Humano", "Média Custo Humano", "Danos em milões")
damage <- as.data.frame(na.omit(damage))
OverView <- apply(Cleaning_completo_woad, 2, .Unianalysis)
OverView2 <- sapply(OverView, distinct)
by_Year <- trainVariable %>%
group_by(Year) %>%
summarise(Freq = n(),
Owner = n_distinct(Owner),
FatalitiesCrew = sum(FatalitiesCrew),
Fatalities3rd_Party = sum(Fatalities3rd_Party),
InjuriesCrew = sum(InjuriesCrew),
Injuries3rd_Party = sum(Injuries3rd_Party),
DamageCost_million= sum(DamageCost_million),
HumanCost = sum(HumanCost),
WindSpeed_m_s= mean(WindSpeed_m_s))
by_HumanCost <- trainVariable %>%
filter(HumanCost > 0) %>%
group_by(HumanCost, AccidentCategory, Damage) %>%
summarise(Freq = n(),
Owner = n_distinct(Owner),
FatalitiesCrew = sum(FatalitiesCrew),
Fatalities3rd_Party = sum(Fatalities3rd_Party),
InjuriesCrew = sum(InjuriesCrew),
Injuries3rd_Party = sum(Injuries3rd_Party),
DamageCost_million= sum(DamageCost_million),
WindSpeed_m_s= mean(WindSpeed_m_s)) %>%
filter(Freq > 20)
dim(by_HumanCost)## [1] 3 11
trainVariable_N <- trainVariable %>%
dplyr::filter(HumanCost > 0)
summary(trainVariable_N$HumanCost)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 3.00 10.00 32.89 27.00 1850.00
## [1] 872 43
Apesar de ter sido retirado da amostra os registros de custo humano 0 e custo humano maior que o terceiro quartil, o histograma ainda mostra que os dados não estão normalizados, o que pode alterar a correlação de variáveis
Assim optou-se por selecionar uma amostra com valores mais aproximados da média. O valor mínimo encontrado foi 250 para custo humano, o que gerou uma perda de registros para avaliação, devido a iregularidade dos dados, inviabilizando assim a análise de correlação que recomenda ao menos 40 registros.
trainVariable_N <- trainVariable %>%
dplyr::filter(HumanCost > 250)
summary(trainVariable_N$HumanCost)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 260.0 363.0 720.0 735.7 875.0 1850.0
## [1] 11 43
O p-valor foi de 0.. Isto quer dizer que os dados são normais, pois não diferem de uma curva normal. O p-valor for < 0.05 indica que os dados não apresentam normalidade.
##
## Shapiro-Wilk normality test
##
## data: trainVariable_N$HumanCost
## W = 0.87649, p-value = 0.09387
Assim, sugere-se realizar uma pesquisa qualitativa a partir desses registros de acidentes. Abaixo a descrição dos registros.
Contudo, mesmo não havendo normalidade de dados para a variável custo humano, algumas análises serão realizadas, desde que sejam observadas as limitações dos resultados obtidos.
Há variancia de vento nos acidentes com maiores danos
## Damage WindSpeed_m_s
## 1 Insignif/no damage 24.54341
## 2 Minor damage 136.63356
## 3 Severe damage 251.19942
## 4 Significant damage 208.44087
## 5 Total loss 143.50564
## Damage WindSpeed_m_s
## 1 Insignif/no damage 0.9644248
## 2 Minor damage 3.8438178
## 3 Severe damage 6.0176391
## 4 Significant damage 6.6126021
## 5 Total loss 5.2136752
## Damage WindSpeed_m_s.Min. WindSpeed_m_s.1st Qu.
## 1 Insignif/no damage 0.0000000 0.0000000
## 2 Minor damage 0.0000000 0.0000000
## 3 Severe damage 0.0000000 0.0000000
## 4 Significant damage 0.0000000 0.0000000
## 5 Total loss 0.0000000 0.0000000
## WindSpeed_m_s.Median WindSpeed_m_s.Mean WindSpeed_m_s.3rd Qu.
## 1 0.0000000 0.9644248 0.0000000
## 2 0.0000000 3.8438178 0.0000000
## 3 0.0000000 6.0176391 0.0000000
## 4 0.0000000 6.6126021 0.0000000
## 5 0.0000000 5.2136752 0.0000000
## WindSpeed_m_s.Max.
## 1 60.0000000
## 2 83.0000000
## 3 62.0000000
## 4 83.0000000
## 5 82.0000000
As análises de variância permiterm verificar em uma só análise se as diferenças amostrais observadas são reais (causadas por diferenças significativas nas populações observadas) ou casuais (decorrentes da mera variabilidade amostral).
O gráfico (Scale-Location) serve para indicar a distribuição de pontos no intervalo de valores previstos. A variação deve ser razoavelmente igual em todo o intervalo do preditor, no nosso caso existe uma variação considerável nos intervalos.
fit <- aov(HumanCost ~ WindSpeed_m_s + WaveHeight_m + DrillDepth_km + SpillAmount_m3 + as.factor(TypeofUnit) + as.factor(Function) + as.factor(MainEvent), data = trainVariable)
summary(fit)## Df Sum Sq Mean Sq F value Pr(>F)
## WindSpeed_m_s 1 10482 10482 3.827 0.0507 .
## WaveHeight_m 1 66806 66806 24.393 9.00e-07 ***
## DrillDepth_km 1 0 0 0.000 0.9987
## SpillAmount_m3 1 33 33 0.012 0.9129
## as.factor(TypeofUnit) 19 211884 11152 4.072 1.12e-08 ***
## as.factor(Function) 17 23642 1391 0.508 0.9504
## as.factor(MainEvent) 20 180652 9033 3.298 1.32e-06 ***
## Residuals 1166 3193339 2739
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 5506 observations deleted due to missingness
## Call:
## aov(formula = fit)
##
## Terms:
## WindSpeed_m_s WaveHeight_m DrillDepth_km SpillAmount_m3
## Sum of Squares 10482 66806 0 33
## Deg. of Freedom 1 1 1 1
## as.factor(TypeofUnit) as.factor(Function) as.factor(MainEvent)
## Sum of Squares 211884 23642 180652
## Deg. of Freedom 19 17 20
## Residuals
## Sum of Squares 3193339
## Deg. of Freedom 1166
##
## Residual standard error: 52.33271
## 2 out of 63 effects not estimable
## Estimated effects may be unbalanced
## 5506 observations deleted due to missingness
## Warning: not plotting observations with leverage one:
## 100, 154, 193
## Warning: not plotting observations with leverage one:
## 100, 154, 193
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos
O objetivo da análise univariada é compreender cada unidade de dados da base analisada, de forma a estabelecer um contexto referencial. Isso foi feito a partir de perguntas chaves descritas nas próximas seções.
Evento principal
DT::datatable(OverView2$MainEvent, colnames = c('Variavel', 'Freq.','Percentual'), filter = c("top"))Categoria de acidente
DT::datatable(OverView2$AccidentCategory, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))Tipo de dano
Causa humana
DT::datatable(OverView2$HumanCause, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))Causa de equipamento
DT::datatable(OverView2$EquipmentCause, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))Tipo de unidade
DT::datatable(OverView2$TypeofUnit, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))Funções
DT::datatable(OverView2$Function, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))Operação principal
DT::datatable(OverView2$MainOperation, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))Modelo de Gestão
DT::datatable(OverView2$Class_Society, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))Fatalidades da tripulação
DT::datatable(OverView2$FatalitiesCrew, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))Fatalidades de terceiros
DT::datatable(OverView2$Fatalities3rd_Party, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))Ferimentos da tripulação
DT::datatable(OverView2$FatalitiesCrew, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))Ferimentos de terceiros
DT::datatable(OverView2$Fatalities3rd_Party, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))O Objetivo desta análise é identificar possíveis correlações entre duas unidades de análise e desta forma observar se exite possibilidade de correlação entre as variáveis.
trainVariable %>%
ggplot( aes(x=AccidentCategory, y=MainEvent, fill=AccidentCategory)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Principais eventos por tipo de gravidade") +
xlab("")trainVariable %>%
ggplot( aes(x=Damage, y=MainEvent, fill=Damage)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Principais eventos por tipo de danos") + theme(axis.text.x = element_text(angle = 35, vjust = 0.5, hjust=1)) +
xlab("")trainVariable %>%
ggplot( aes(x=Damage, y=AccidentCategory, fill=Damage)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Categorias de acidentes por tipo de danos") + theme(axis.text.x = element_text(angle = 35, vjust = 0.5, hjust=1)) +
xlab("")trainVariable %>%
ggplot( aes(x=AccidentCategory, y=Function, fill=AccidentCategory)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Ocorrências de eventos por Funções e gravidade") +
xlab("")trainVariable %>%
ggplot( aes(x=Damage, y=Function, fill=Damage)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Ocorrências de eventos por Funções e danos") + theme(axis.text.x = element_text(angle = 35, vjust = 0.5, hjust=1)) +
xlab("")trainVariable %>%
ggplot( aes(x=AccidentCategory, y=TypeofUnit, fill=TypeofUnit)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Ocorrências por tipo de unidade e gravidade") +
xlab("")trainVariable %>%
ggplot( aes(x=Damage, y=TypeofUnit, fill=Damage)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position ="none",
plot.title = element_text(size=11)
) +
ggtitle("Ocorrências por tipo de unidade e danos") + theme(axis.text.x = element_text(angle = 35, vjust = 0.5, hjust=1)) +
xlab("")trainVariable %>%
ggplot( aes(x=Year, y=AccidentCategory, fill=AccidentCategory)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Ocorrências por gravidade e ano") +
xlab("")trainVariable %>%
ggplot( aes(x=Year, y=Damage, fill=Damage)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position ="none",
plot.title = element_text(size=11)
) +
ggtitle("Ocorrências por danos e ano") + theme(axis.text.x = element_text( vjust = 0.5, hjust=1)) +
xlab("")Aplicando-se uma análise de variância simples, é possível perceber que a variações que devem ser investigadas.
## AccidentCategory WindSpeed_m_s
## 1 Accident 177.34067
## 2 Incidnt/haz.sit 54.13867
## 3 Near miss 29.61625
## 4 Unsignificant 24.06966
##
## Bartlett test of homogeneity of variances
##
## data: WindSpeed_m_s by AccidentCategory
## Bartlett's K-squared = 1665.2, df = 3, p-value < 2.2e-16
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.000 0.000 2.857 0.000 83.000 15
Na análise multivariada é identificar quais as possíveis influências dos diversos fatores que permeiam o contexto de análise. Por meio do estudo multivariado é possível obter insumos para realizar diagnósticos, modelar análise preditivas e prescritivas.
Por ano
## Warning: Removed 15 rows containing missing values (geom_point).
Por mês
qplot(log(HumanCost), as.factor(Month), data = by_Event_Function, color = Damage, geom = c("point", "smooth"), method = "lm") ## Warning: Ignoring unknown parameters: method
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 4972 rows containing non-finite values (stat_smooth).
qplot(DamageCost_million, as.factor(Month), data = trainVariable, color = Damage, geom = c("point", "smooth"), method = "lm") ## Warning: Ignoring unknown parameters: method
## `geom_smooth()` using formula 'y ~ x'
qplot(DamageCost_million, as.factor(Month), data = trainVariable, color = Damage, geom = c("point", "smooth"), method = "lm") ## Warning: Ignoring unknown parameters: method
## `geom_smooth()` using formula 'y ~ x'
Conforme observado no gráfico abaixo, não há correlação visiveis
by_Year2 <- trainVariable_N %>%
group_by(Year) %>%
summarise(Freq = n(),
Owner = n_distinct(Owner),
FatalitiesCrew = sum(FatalitiesCrew),
Fatalities3rd_Party = sum(Fatalities3rd_Party),
InjuriesCrew = sum(InjuriesCrew),
Injuries3rd_Party = sum(Injuries3rd_Party),
DamageCost_million= sum(DamageCost_million),
HumanCost = sum(HumanCost),
WindSpeed_m_s= mean(WindSpeed_m_s))
summary(by_Year[, 2:10])## Freq Owner FatalitiesCrew Fatalities3rd_Party
## Min. : 2.00 Min. : 1.00 Min. : 0.00 Min. : 0.000
## 1st Qu.: 76.25 1st Qu.: 3.50 1st Qu.: 0.00 1st Qu.: 0.000
## Median :136.50 Median :40.50 Median : 15.00 Median : 3.000
## Mean :146.37 Mean :39.07 Mean : 27.96 Mean : 4.283
## 3rd Qu.:188.50 3rd Qu.:62.00 3rd Qu.: 39.25 3rd Qu.: 6.000
## Max. :484.00 Max. :93.00 Max. :180.00 Max. :27.000
## InjuriesCrew Injuries3rd_Party DamageCost_million HumanCost
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.0
## Median : 18.00 Median : 1.500 Median : 30.02 Median : 274.5
## Mean : 21.67 Mean : 6.109 Mean : 94.90 Mean : 405.7
## 3rd Qu.: 30.75 3rd Qu.: 7.000 3rd Qu.:107.43 3rd Qu.: 585.0
## Max. :108.00 Max. :90.000 Max. :903.34 Max. :2103.0
## WindSpeed_m_s
## Min. : 0.000
## 1st Qu.: 0.000
## Median : 1.048
## Mean : 2.335
## 3rd Qu.: 3.770
## Max. :20.584
pairs(by_Year[,3:7],
diag.panel = panel.hist,
upper.panel = panel.cor,
lower.panel = panel.lm,
main="Correlação Multivariável")pairs(by_Year[, 3:10],
diag.panel = panel.hist,
upper.panel = panel.cor,
lower.panel = panel.lm,
main="Correlação Multivariável")Neste gráfico abaixo ilustra significante ocorrência de acidentes e incidentets nos meses de agosto e setembro ao longo das 5 décadas de ocorrências. Importante identificar se esses fatos devem-se aos outliers, as especificidades de algum país ou algum outro fator não frequente (mero acaso), ou há algum fator de relevância esses meses(significância).
qplot(log(How_Many), as.factor(Month), data = by_Event_Function, facets = . ~ AccidentCategory, color = Damage, geom = c("point", "smooth"), method = "lm")## Warning: Ignoring unknown parameters: method
## `geom_smooth()` using formula 'y ~ x'
## Warning in qt((1 - level)/2, df): NaNs produzidos
## Warning in qt((1 - level)/2, df): NaNs produzidos
## Warning in max(ids, na.rm = TRUE): nenhum argumento não faltante para max;
## retornando -Inf
## Warning in max(ids, na.rm = TRUE): nenhum argumento não faltante para max;
## retornando -Inf
No Gráfico abaixo são filtrados somentes os eventos chamados de Acidentes, os quais são observados a partir dos danos por milhões em relação ao custo humano, observando-se a correlação por tipo de dano. Assim é possível perceber uma relação positiva nas perdas totais, danos severos e danos significativos e negativa nos danos insignificantes.
qplot(HumanCost, DamageCost_million, data = trainVariable[AccidentCategory = "Accident"], color = Damage, geom = c("point", "smooth"), method = "lm") ## Warning: Ignoring unknown parameters: method
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 15 rows containing non-finite values (stat_smooth).
## Warning: Removed 15 rows containing missing values (geom_point).
No gráfico é possível observar uma correlação positiva nas unidades Jackup, jacket, semi-submersivel, helicopter com custo humano nas ocorrências com perda total.
# Most basic bubble plot
by_Event_Function %>%
ggplot(aes(x= HumanCost, y= DamageCost_million, size= Mean_HumanCost, color=Damage)) +
geom_point(alpha=0.5) +
scale_size(range = c(.1, 20), name="Mean_HumanCost")par(mfrow=c(1,3))
# Per Damage
ggplot(by_Event_Function, aes(fill=Damage, y=Function, x=How_Many)) +
geom_bar(position="fill", stat="identity") + ggtitle("Eventos por Função e danos")ggplot(by_Event_Function, aes(fill=Damage, y=TypeofUnit, x=How_Many)) +
geom_bar(position="fill", stat="identity") + ggtitle("Eventos por Unidade e danos")ggplot(by_Event_Function, aes(fill=Damage, y=MainEvent, x=How_Many)) +
geom_bar(position="fill", stat="identity") + ggtitle("Tipo de Eventos por danos")# Per Accident Category
ggplot(by_Event_Function, aes(fill=AccidentCategory, y=Function, x=How_Many)) +
geom_bar(position="fill", stat="identity") + ggtitle("Eventos por Função e tipo")ggplot(by_Event_Function, aes(fill=AccidentCategory, y=TypeofUnit, x=How_Many)) +
geom_bar(position="fill", stat="identity") + ggtitle("Eventos por Unidade e tipo")ggplot(trainVariable, aes(fill=AccidentCategory, y=MainEvent, x=HumanCost)) +
geom_bar(position="fill", stat="identity") + ggtitle("Eventos por Categoria e tipo")## Warning: Removed 15 rows containing missing values (position_stack).
## Warning: Removed 49 rows containing missing values (geom_bar).
As análises exploratórias tem o propósito de representar o contexto de um domínio de análise e por isso elas tendem a ser “rápidas e sujas”. “Rápidas”“, pois buscamos o maior número de informações sobre o contexto, utilizando gráficos que nos direcionam em óticas unitárias, binárias e multinéveis sobre os dados, o que acaba trazendo muita informação com ruido (”sujas").
Contudo, trata-se de um estratégia que visa resumir os dados (geralmente graficamente) e destacar qualquer recurso amplo.
Os dados sobre as ocorrências parecem terem sofrido modificações de nomenclatura no decorrer das 5 décadas analisas. Por exemplo, até o ano de 1882 não havia nenhum registro do termo “Near Miss”, indicando que talvez esse termo passou a ser utilizado a partir de 1983, provavelente por alguma razão relacionada a categorização dos eventos, que permitiram facilitar a gestão dessas ocorrências. Valeria nesse ponto uma investigação literária a respeito disso. Assim, abaixo segue a compilação dos dados coletados em quase 5 décadas (46 anos) em 91 países, com a participação de 559
Quanto ao perfil dos eventos:
Cerca de 57% de todas as ocorrências relatadas ocorrem por mal funcionamento de equipamentos (32,92%) e problemas no tempo (24,43%)
Cerca de 52% de todas as ocorrências tem por evento principal a Queda de carga / objeto descartado (20,41%), vazamento de fluido ou gás (17,88) e fogo (13,84%).
Um fato já esperado é que cerca de 85% de todas as ocorrências relatadas ocorrem nas funções de Perfuração (34,96%), Produção (29,39%), e Perfuração e produção (20,72%) 4 Ar de Transferência
Há correlação moderada (0,43) de custo financeiro e custo humano.
Há correlação moderada (0,49) da velocidade dos ventos em metro por segundo com ocorrência de lesões da tripulação.
(colocar mais achados)
Explorar perguntas e hipóteses básicas (e talvez excluir algumas)
Sugirir estratégias de modelagem para a "próxima etapa
Somente a velocidade do vento medido em metros por segundo tem significância no modelo. Conforme mostra os gráficos, esse modelo não tem um bom ajuste aos dados.
fit1 <- glm(formula = HumanCost^2 ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km -1, family = "poisson", data = trainVariable)## Warning: glm.fit: algorithm did not converge
##
## Call:
## glm(formula = HumanCost^2 ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km -
## 1, family = "poisson", data = trainVariable)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -20225.5 -5.9 -2.7 -1.9 5108.9
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## WindSpeed_m_s 1.375e-01 1.437e-05 9568 <2e-16 ***
## WaterDepth_m 1.274e-02 5.016e-06 2540 <2e-16 ***
## DrillDepth_km 7.013e-02 4.868e-06 14405 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 81106021 on 945 degrees of freedom
## Residual deviance: 457449413 on 942 degrees of freedom
## (5788 observations deleted due to missingness)
## AIC: 457450277
##
## Number of Fisher Scoring iterations: 25
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos
library(rpart)
library(rpart.plot)
arvore <- rpart(HumanCost ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km, method = "poisson", data = trainVariable)
summary(trainVariable$HumanCost)## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 0.00 0.00 4.27 0.00 1850.00 15
## Call:
## rpart(formula = HumanCost ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km,
## data = trainVariable, method = "poisson")
## n=6718 (15 observations deleted due to missingness)
##
## CP nsplit rel error xerror xstd
## 1 0.05798458 0 1.0000000 1.0041095 0.1475681
## 2 0.02273193 1 0.9420154 0.9805919 0.1366798
## 3 0.02045564 2 0.9192835 0.9941205 0.1391663
## 4 0.01060825 5 0.8579166 1.0167146 0.1440153
## 5 0.01000000 6 0.8473083 1.0564948 0.1507965
##
## Variable importance
## WindSpeed_m_s WaterDepth_m DrillDepth_km
## 48 39 13
##
## Node number 1: 6718 observations, complexity param=0.05798458
## events=28684, estimated rate=4.269723 , mean deviance=27.79048
## left son=2 (6054 obs) right son=3 (664 obs)
## Primary splits:
## WindSpeed_m_s < 7.5 to the left, improve=10825.520, (0 missing)
## WaterDepth_m < 144.5 to the right, improve= 3620.149, (3200 missing)
## DrillDepth_km < 3.65 to the right, improve= 1204.526, (5216 missing)
##
## Node number 2: 6054 observations
## events=19422, estimated rate=3.208168 , mean deviance=17.72852
##
## Node number 3: 664 observations, complexity param=0.02273193
## events=9262, estimated rate=13.94538 , mean deviance=103.2266
## left son=6 (192 obs) right son=7 (472 obs)
## Primary splits:
## WaterDepth_m < 26.5 to the left, improve=5472.268, (262 missing)
## DrillDepth_km < 3.1 to the left, improve=4224.825, (336 missing)
## WindSpeed_m_s < 52.5 to the right, improve=3155.478, (0 missing)
## Surrogate splits:
## WindSpeed_m_s < 55.5 to the right, agree=0.624, adj=0.09, (262 split)
##
## Node number 6: 192 observations
## events=294, estimated rate=1.534586 , mean deviance=11.08198
##
## Node number 7: 472 observations, complexity param=0.02045564
## events=8968, estimated rate=18.99269 , mean deviance=131.7177
## left son=14 (458 obs) right son=15 (14 obs)
## Primary splits:
## DrillDepth_km < 3.1 to the left, improve=3824.745, (256 missing)
## WaterDepth_m < 144.5 to the right, improve=3539.002, (236 missing)
## WindSpeed_m_s < 8.5 to the right, improve=2220.595, (0 missing)
##
## Node number 14: 458 observations, complexity param=0.02045564
## events=7248, estimated rate=15.81942 , mean deviance=112.8826
## left son=28 (429 obs) right son=29 (29 obs)
## Primary splits:
## WindSpeed_m_s < 8.5 to the right, improve=2830.1260, (0 missing)
## WaterDepth_m < 144.5 to the right, improve=2541.8540, (235 missing)
## DrillDepth_km < 0.77 to the right, improve= 179.6027, (256 missing)
##
## Node number 15: 14 observations
## events=1720, estimated rate=120.9059 , mean deviance=478.5848
##
## Node number 28: 429 observations, complexity param=0.01060825
## events=5353, estimated rate=12.47338 , mean deviance=85.37741
## left son=56 (267 obs) right son=57 (162 obs)
## Primary splits:
## WaterDepth_m < 98 to the right, improve=3590.4010, (221 missing)
## WindSpeed_m_s < 12.5 to the left, improve= 956.1647, (0 missing)
## DrillDepth_km < 0.77 to the right, improve= 140.2840, (241 missing)
## Surrogate splits:
## WindSpeed_m_s < 30.5 to the left, agree=0.582, adj=0.074, (221 split)
##
## Node number 29: 29 observations, complexity param=0.02045564
## events=1895, estimated rate=64.85553 , mean deviance=422.1828
## left son=58 (22 obs) right son=59 (7 obs)
## Primary splits:
## WaterDepth_m < 132.5 to the left, improve=2819.918, (14 missing)
##
## Node number 56: 267 observations
## events=1722, estimated rate=6.447528 , mean deviance=40.89064
##
## Node number 57: 162 observations
## events=3631, estimated rate=22.38739 , mean deviance=146.4727
##
## Node number 58: 22 observations
## events=45, estimated rate=2.068884 , mean deviance=8.404232
##
## Node number 59: 7 observations
## events=1850, estimated rate=255.8677 , mean deviance=1028.826
##
## Call:
## lm(formula = DamageCost_million^2 ~ HumanCost^2, data = trainVariable)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7931 -65 -65 -65 211535
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 65.313 46.462 1.406 0.16
## HumanCost 8.643 1.210 7.141 1.02e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3785 on 6716 degrees of freedom
## (15 observations deleted due to missingness)
## Multiple R-squared: 0.007536, Adjusted R-squared: 0.007388
## F-statistic: 50.99 on 1 and 6716 DF, p-value: 1.025e-12
trainVariable$predito <- predict(ajuste, newdata = trainVariable)
trainVariable %>% dplyr::select(HumanCost, DamageCost_million, predito) %>% head() %>% knitr::kable()| HumanCost | DamageCost_million | predito |
|---|---|---|
| 0 | 0.5 | 65.31251 |
| 0 | 0.0 | 65.31251 |
| 0 | 0.2 | 65.31251 |
| 0 | 0.0 | 65.31251 |
| 0 | 0.0 | 65.31251 |
| 37 | 3.5 | 385.11135 |
Aplicando regressão de poisson com as variáveis
fit2 <- glm(formula = HumanCost ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km + as.factor(AccidentCategory) + as.factor(Damage) + as.factor(MainEvent) - 1, family = "poisson", data = trainVariable)
summary(fit2)##
## Call:
## glm(formula = HumanCost ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km +
## as.factor(AccidentCategory) + as.factor(Damage) + as.factor(MainEvent) -
## 1, family = "poisson", data = trainVariable)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -19.224 -1.234 -0.090 0.000 65.463
##
## Coefficients:
## Estimate Std. Error
## WindSpeed_m_s 2.234e-02 9.337e-04
## WaterDepth_m 5.409e-03 1.719e-04
## DrillDepth_km 5.642e-03 1.067e-03
## as.factor(AccidentCategory)Accident -1.433e+01 7.118e+02
## as.factor(AccidentCategory)Incidnt/haz.sit -1.770e+01 7.118e+02
## as.factor(AccidentCategory)Near miss -3.521e+01 1.121e+03
## as.factor(AccidentCategory)Unsignificant -3.360e+01 7.927e+02
## as.factor(Damage)Minor damage -8.415e-01 1.308e-01
## as.factor(Damage)Severe damage 7.826e-01 1.248e-01
## as.factor(Damage)Significant damage 7.824e-02 1.301e-01
## as.factor(Damage)Total loss 2.921e+00 1.221e-01
## as.factor(MainEvent)Blowout 1.471e+01 7.118e+02
## as.factor(MainEvent)Breakage or fatigue 1.216e+01 7.118e+02
## as.factor(MainEvent)Capsizing,overturn,toppling 1.565e+01 7.118e+02
## as.factor(MainEvent)Collision,not offshore units 1.296e+01 7.118e+02
## as.factor(MainEvent)Collision,offshore units 1.468e+01 7.118e+02
## as.factor(MainEvent)Crane accident 1.682e+00 1.130e+03
## as.factor(MainEvent)Explosion 1.829e+01 7.118e+02
## as.factor(MainEvent)Falling load / Dropped object 1.581e+01 7.118e+02
## as.factor(MainEvent)Fire 1.473e+01 7.118e+02
## as.factor(MainEvent)Grounding 1.400e+01 7.118e+02
## as.factor(MainEvent)Helicopter accident 1.782e+01 7.118e+02
## as.factor(MainEvent)Leakage into hull -4.400e+00 1.071e+03
## as.factor(MainEvent)List, uncontrolled inclination -4.068e+00 9.217e+02
## as.factor(MainEvent)Loss of buoyancy or sinking 1.261e+01 7.118e+02
## as.factor(MainEvent)Machinery/propulsion failure 2.435e-01 5.762e+03
## as.factor(MainEvent)Other 1.516e+01 7.118e+02
## as.factor(MainEvent)Release of fluid or gas 5.354e-01 7.185e+02
## as.factor(MainEvent)Towline failure/rupture 1.755e+01 7.118e+02
## as.factor(MainEvent)Well problem, no blowout -1.795e+00 9.921e+02
## z value Pr(>|z|)
## WindSpeed_m_s 23.923 < 2e-16 ***
## WaterDepth_m 31.466 < 2e-16 ***
## DrillDepth_km 5.288 1.24e-07 ***
## as.factor(AccidentCategory)Accident -0.020 0.984
## as.factor(AccidentCategory)Incidnt/haz.sit -0.025 0.980
## as.factor(AccidentCategory)Near miss -0.031 0.975
## as.factor(AccidentCategory)Unsignificant -0.042 0.966
## as.factor(Damage)Minor damage -6.432 1.26e-10 ***
## as.factor(Damage)Severe damage 6.269 3.64e-10 ***
## as.factor(Damage)Significant damage 0.602 0.547
## as.factor(Damage)Total loss 23.924 < 2e-16 ***
## as.factor(MainEvent)Blowout 0.021 0.984
## as.factor(MainEvent)Breakage or fatigue 0.017 0.986
## as.factor(MainEvent)Capsizing,overturn,toppling 0.022 0.982
## as.factor(MainEvent)Collision,not offshore units 0.018 0.985
## as.factor(MainEvent)Collision,offshore units 0.021 0.984
## as.factor(MainEvent)Crane accident 0.001 0.999
## as.factor(MainEvent)Explosion 0.026 0.980
## as.factor(MainEvent)Falling load / Dropped object 0.022 0.982
## as.factor(MainEvent)Fire 0.021 0.983
## as.factor(MainEvent)Grounding 0.020 0.984
## as.factor(MainEvent)Helicopter accident 0.025 0.980
## as.factor(MainEvent)Leakage into hull -0.004 0.997
## as.factor(MainEvent)List, uncontrolled inclination -0.004 0.996
## as.factor(MainEvent)Loss of buoyancy or sinking 0.018 0.986
## as.factor(MainEvent)Machinery/propulsion failure 0.000 1.000
## as.factor(MainEvent)Other 0.021 0.983
## as.factor(MainEvent)Release of fluid or gas 0.001 0.999
## as.factor(MainEvent)Towline failure/rupture 0.025 0.980
## as.factor(MainEvent)Well problem, no blowout -0.002 0.999
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 48062 on 725 degrees of freedom
## Residual deviance: 16912 on 695 degrees of freedom
## (6008 observations deleted due to missingness)
## AIC: 17324
##
## Number of Fisher Scoring iterations: 16
## Warning: not plotting observations with leverage one:
## 318
## Warning: not plotting observations with leverage one:
## 318
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos
library(rpart)
library(rpart.plot)
arvore_V_E <- rpart(HumanCost ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km + as.factor(AccidentCategory) , data = trainVariable)
print("Esses são os percentis do custo humano", summary(trainVariable$HumanCost))## [1] "Esses são os percentis do custo humano"
## [1] "Esses são os percentis do custo humano"
## Call:
## rpart(formula = HumanCost ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km +
## as.factor(AccidentCategory), data = trainVariable)
## n=6718 (15 observations deleted due to missingness)
##
## CP nsplit rel error xerror xstd
## 1 0.02661014 0 1.0000000 1.000305 0.4070489
## 2 0.02367356 4 0.8881897 1.090430 0.4010282
## 3 0.01817880 5 0.8645161 1.097191 0.4009100
## 4 0.01000000 6 0.8463373 1.134120 0.4022135
##
## Variable importance
## WaterDepth_m WindSpeed_m_s
## 66 22
## as.factor(AccidentCategory)
## 12
##
## Node number 1: 6718 observations, complexity param=0.02661014
## mean=4.269723, MSE=1455.293
## left son=2 (4119 obs) right son=3 (2599 obs)
## Primary splits:
## as.factor(AccidentCategory) splits as RLLL, improve=0.0177621900, (0 missing)
## WindSpeed_m_s < 7.5 to the left, improve=0.0070606170, (0 missing)
## WaterDepth_m < 144.5 to the right, improve=0.0010778260, (3200 missing)
## DrillDepth_km < 3.65 to the right, improve=0.0006550697, (5216 missing)
## Surrogate splits:
## WindSpeed_m_s < 0.5 to the left, agree=0.638, adj=0.063, (0 split)
##
## Node number 2: 4119 observations
## mean=0.2311241, MSE=11.24496
##
## Node number 3: 2599 observations, complexity param=0.02661014
## mean=10.67026, MSE=3677.063
## left son=6 (2190 obs) right son=7 (409 obs)
## Primary splits:
## WindSpeed_m_s < 7.5 to the left, improve=0.007185765, (0 missing)
## WaterDepth_m < 66.5 to the left, improve=0.003849976, (1488 missing)
## DrillDepth_km < 3.15 to the left, improve=0.002623320, (1734 missing)
##
## Node number 6: 2190 observations
## mean=8.448858, MSE=694.4793
##
## Node number 7: 409 observations, complexity param=0.02661014
## mean=22.56479, MSE=19479.47
## left son=14 (361 obs) right son=15 (48 obs)
## Primary splits:
## WaterDepth_m < 64.5 to the left, improve=0.06229407, (181 missing)
## DrillDepth_km < 3.05 to the left, improve=0.02729433, (168 missing)
## WindSpeed_m_s < 8.5 to the right, improve=0.01114019, (0 missing)
##
## Node number 14: 361 observations
## mean=9.33795, MSE=2758.833
##
## Node number 15: 48 observations, complexity param=0.02661014
## mean=122.0417, MSE=134021.2
## left son=30 (15 obs) right son=31 (33 obs)
## Primary splits:
## WaterDepth_m < 144.5 to the right, improve=0.04860182, (0 missing)
## WindSpeed_m_s < 35 to the left, improve=0.02880777, (0 missing)
## DrillDepth_km < 2.25 to the left, improve=0.02426029, (22 missing)
##
## Node number 30: 15 observations
## mean=2.333333, MSE=18.08889
##
## Node number 31: 33 observations, complexity param=0.02367356
## mean=176.4545, MSE=185457.3
## left son=62 (25 obs) right son=63 (8 obs)
## Primary splits:
## WindSpeed_m_s < 35 to the left, improve=0.03781780, (0 missing)
## WaterDepth_m < 95.5 to the left, improve=0.02596779, (0 missing)
## DrillDepth_km < 0.965 to the left, improve=0.01239007, (14 missing)
##
## Node number 62: 25 observations, complexity param=0.0181788
## mean=129.08, MSE=181618.7
## left son=124 (18 obs) right son=125 (7 obs)
## Primary splits:
## WaterDepth_m < 98 to the left, improve=0.03914308, (0 missing)
## WindSpeed_m_s < 10.5 to the right, improve=0.03914308, (0 missing)
## DrillDepth_km < 0.435 to the right, improve=0.02035497, (10 missing)
## Surrogate splits:
## WindSpeed_m_s < 8.5 to the right, agree=0.76, adj=0.143, (0 split)
##
## Node number 63: 8 observations
## mean=324.5, MSE=168521.8
##
## Node number 124: 18 observations
## mean=76.5, MSE=79398.25
##
## Node number 125: 7 observations
## mean=264.2857, MSE=419081.6
no_outliers <- trainVariable
no_outliers <-no_outliers[which(no_outliers$HumanCost %in% outliers),]
ver <- lm(formula = HumanCost ~ WindSpeed_m_s + as.factor(MainEvent) + - 1, data = no_outliers)
dim(no_outliers)## [1] 872 44
## Confidence interval
sumCoef <- summary(ver)$coefficients
sumCoef[1,1] + c(-1, 1) * qt(.975, df = ver$df) * sumCoef[1,2]## [1] 3.59840 6.07494
## [1] -76.54902 118.53901
arvore <- rpart(HumanCost ~ WindSpeed_m_s, data = no_outliers)
print("Esses são os percentis do custo humano", summary(trainVariable$HumanCost))## [1] "Esses são os percentis do custo humano"
## [1] "Esses são os percentis do custo humano"
## Call:
## rpart(formula = HumanCost ~ WindSpeed_m_s, data = no_outliers)
## n= 872
##
## CP nsplit rel error xerror xstd
## 1 0.10360931 0 1.0000000 1.0010964 0.4265433
## 2 0.03476275 1 0.8963907 1.0123428 0.4200013
## 3 0.02040746 2 0.8616279 1.0026011 0.4152389
## 4 0.01000000 3 0.8412205 0.9834447 0.4109972
##
## Variable importance
## WindSpeed_m_s
## 100
##
## Node number 1: 872 observations, complexity param=0.1036093
## mean=32.8945, MSE=10270.17
## left son=2 (811 obs) right son=3 (61 obs)
## Primary splits:
## WindSpeed_m_s < 7.5 to the left, improve=0.1036093, (0 missing)
##
## Node number 2: 811 observations
## mean=23.94821, MSE=1551.368
##
## Node number 3: 61 observations, complexity param=0.03476275
## mean=151.8361, MSE=110976.1
## left son=6 (52 obs) right son=7 (9 obs)
## Primary splits:
## WindSpeed_m_s < 35.5 to the left, improve=0.04598845, (0 missing)
##
## Node number 6: 52 observations, complexity param=0.02040746
## mean=122.1154, MSE=99197.41
## left son=12 (45 obs) right son=13 (7 obs)
## Primary splits:
## WindSpeed_m_s < 9 to the right, improve=0.03543067, (0 missing)
##
## Node number 7: 9 observations
## mean=323.5556, MSE=144439.8
##
## Node number 12: 45 observations
## mean=98.73333, MSE=46034.95
##
## Node number 13: 7 observations
## mean=272.4286, MSE=414847.4
Modeloagem - Aprendizado Estatístico http://material.curso-r.com/modelos/