1 World Offshore Accident Database - Woad

Woad (World Offshore Accident Database) gives you access to accident data for diverse offshore facility types. It has been curated since 1975 by experts at DNV GL, providing accident causes, location, social and economic impacts, etc. that prove invaluable for a variety of risk management initiatives.

Offshore accident data for oil and gas facilities Access to the world’s most comprehensive available offshore accident data Information on over 6000 offshore accidents and incidents from 1970 to date Technical data on approximately 3700 offshore units The World Offshore Accident Database is a tool used extensively by key players in the offshore industry worldwide, including rig owners, drilling operators, insurance companies, consultants, salvage companies and regulatory authorities Continuously updated offshore accident data Knowledge of past accidents serves as important input to risk assessment initiatives and helps you ensure that known hazards are properly understood and effectively mitigated against. The World Offshore Accident Database contains a vast range of information covering the different aspects of an accident – technical, economic, social, operational etc. The range of useful applications of information from the World Offshore Accident Database is extensive. It shows the rate of accidents by unit type, geography, function, accident type and more - extremely useful for quantitative risk analysis (QRA).

2 Method

2.1 a) Fase 1 - Preparação, organização e limpeza de dados

O objetivo desta etapa é limpar os dados para preparar a organização de unidades relevantes de análise.

2.2 b) Fase 2 - Análise descritiva

O objetivo desta etapa é compreender as variáveis de dados, tanto em seus níveis de dados (univariado), quanto em suas relações com outras variáveis (bivariada e multivariada). Assim, a partir dessas análises buscar-se-á estabelecer um contexto referencial que busca responder as seguintes perguntas:

  • Qual o comportamento das variáveis de dados?
  • Quais as variáveis relevantes para descoberta de conhecimento na base de dados?
  • Quais os níveis de dados das variáveis mais relevantes (tendo em vista a completude e abrangência no domínio)

2.3 c) Fase 3 - Diagnóstico

O objetivo desta etapa é identificar fatores que modelam o contexto referencial.

  • O que os dados sumarizados revelam?
  • Há causas aparentes dos acidentes?

2.4 d) Fase 4 - Análise preditiva

O objetivo desta análise é aplicar modelos regressivos e preditivos para estabelecer possíveis cenários.

  • Quais variáveis aumentam os acidentes?
  • Quais diminuem?

2.5 e) Fase 5 - Análise prescritiva

O objetivo desta etapa é estabelecer um sistema capaz de oferecer alternativas de ações com base nos cenários estabelecidos.

  • O que fazer para diminuir os acidentes?
  • O que evitar?

3 R Package

library(rattle)
## Rattle: A free graphical interface for data science with R.
## Version 5.3.0 Copyright (c) 2006-2018 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.
library(caret)
## Warning: package 'caret' was built under R version 3.6.3
## Loading required package: lattice
## Warning: package 'lattice' was built under R version 3.6.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.6.3
library(rpart)
library(rpart.plot)
library(corrplot)
## corrplot 0.84 loaded
library(randomForest)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
## The following object is masked from 'package:rattle':
## 
##     importance
library(RColorBrewer)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:randomForest':
## 
##     combine
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(plotly)
## Warning: package 'plotly' was built under R version 3.6.3
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(ggplot2)
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.6.3
library(magrittr)
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:tidyr':
## 
##     extract
library(plotrix)
## Warning: package 'plotrix' was built under R version 3.6.3
library(rgl)
## Warning: package 'rgl' was built under R version 3.6.3
## 
## Attaching package: 'rgl'
## The following object is masked from 'package:plotrix':
## 
##     mtext3d
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.6.3
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:dplyr':
## 
##     intersect, setdiff, union
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(ggplot2)
library(GGally)
## Warning: package 'GGally' was built under R version 3.6.3
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
library(corrplot)
library(corrgram)
## Registered S3 method overwritten by 'seriation':
##   method         from 
##   reorder.hclust gclus
## 
## Attaching package: 'corrgram'
## The following object is masked from 'package:lattice':
## 
##     panel.fill
library(ppcor)
## Loading required package: MASS
## Warning: package 'MASS' was built under R version 3.6.3
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:plotly':
## 
##     select
## The following object is masked from 'package:dplyr':
## 
##     select
library(readr)
library(ggvis)
## Warning: package 'ggvis' was built under R version 3.6.3
## 
## Attaching package: 'ggvis'
## The following objects are masked from 'package:plotly':
## 
##     add_data, hide_legend
## The following object is masked from 'package:ggplot2':
## 
##     resolution
library(gganimate)
## Warning: package 'gganimate' was built under R version 3.6.3
## 
## Attaching package: 'gganimate'
## The following object is masked from 'package:ggvis':
## 
##     view_static
library(gifski)
## Warning: package 'gifski' was built under R version 3.6.3
library(av)
## Warning: package 'av' was built under R version 3.6.3
library(magick)
## Warning: package 'magick' was built under R version 3.6.3
## Linking to ImageMagick 6.9.9.14
## Enabled features: cairo, freetype, fftw, ghostscript, lcms, pango, rsvg, webp
## Disabled features: fontconfig, x11
library(viridis)
## Loading required package: viridisLite
library(hrbrthemes)
## Warning: package 'hrbrthemes' was built under R version 3.6.3

4 Functions

panel.hist <- function(x, ...)
{
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(usr[1:2], 0, 1.5) )
  h <- hist(x, plot = FALSE)
  breaks <- h$breaks; nB <- length(breaks)
  y <- h$counts; y <- y/max(y)
  rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
}

# 1.2 by Melina de Souza Leite 
panel.lm <- function (x, y, col = par("col"), bg = NA, pch = par("pch"), 
                      cex = 1, col.line="red") {
  points(x, y, pch = pch, col = col, bg = bg, cex = cex)
  ok <- is.finite(x) & is.finite(y)
  if (any(ok)) {
    abline(lm(y[ok]~x[ok]), col = col.line)
  }
}

# 1.3 help(pairs) by Melina de Souza Leite 
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  r <- abs(cor(x, y))
  txt <- format(c(r, 0.123456789), digits = digits)[1]
  txt <- paste0(prefix, txt)
  if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex.cor * r)
}


# Transform vector into a data frame with frequency of levels and proportion
.Unianalysis = function (x) {
    y <- as.data.frame(as.table(table(x)))
    y <- mutate(y, percentual = prop.table(y$Freq) *100)#Proportion
    y <- arrange(y, desc(y$Freq))
return(y)
}

.BarPlot <- function (z) {
      z %>% 
    ggplot( aes(x=x, y=Freq)) +
    geom_bar(stat="identity", fill="#f68060", alpha=.6, width=.4) +
    coord_flip() +
    xlab("") +
    theme_bw()

}

5 Fase 1

5.1 Cleaning Woad data

Retirada: _ normalização dos rótulos de dados - Conversão para CSV - linhas 1 e 2 - linhas 3736 até 3738 - linhas vazias da coluna “Human Cause” foram trocadas por “not identified” - linha 418 por não ter dados completos - 3987 registros não que não possuiam dados de custo do evento foram tratados como 0, para que fosse possível fazer o somatório dos dados existentes.

A variável CustoHumano refere-se a soma de todas as fatalidades e ferimentos da tripulação e de terceiros, com peso 10 para fatalidades e peso 3 para ferimentos.

O gráfico abaixo mostra as ocorrências por tipo de unidade correlacionado com o custo humano e grau de danos.

setwd("~/Regression Models")
library(readr)
Cleaning_completo_woad <- read_delim("Cleaning_completo_woad.CSV", 
    ";", escape_double = FALSE, col_types = cols(
    DamageCost_million = col_number(),
    DrillDepth_km = col_number(), 
    WaterDepth_m = col_number(),
    WindSpeed_m_s= col_number(), 
    FatalitiesCrew = col_number(),
    Fatalities3rd_Party = col_number(),
    InjuriesCrew = col_number(),
    Injuries3rd_Party = col_number(),
    Accident_Date = col_date(format = "%m/%d/%Y")),
    trim_ws = TRUE)

5.2 Tidying data

trainVariable <- Cleaning_completo_woad %>%
              mutate(HumanCost = (FatalitiesCrew * 10) + (Fatalities3rd_Party * 10) + (InjuriesCrew * 3) +  (Injuries3rd_Party * 3)) %>%
              mutate(Year = year(Accident_Date)) %>% 
              mutate(Month =  month(Accident_Date)) %>% 
              filter(Year > 1985)

by_Event_Function <-
        trainVariable %>% 
        group_by(MainEvent, Function,TypeofUnit, Damage,SpillType, AccidentCategory,Year, Month) %>%
        summarise(How_Many = n(),
                  Owner = n_distinct(Owner),
                  FatalitiesCrew = sum(FatalitiesCrew),
                  Fatalities3rd_Party = sum(Fatalities3rd_Party),
                  InjuriesCrew = sum(InjuriesCrew),
                  Injuries3rd_Party = sum(Injuries3rd_Party),
                  DamageCost_million= sum(DamageCost_million), 
                  DrillDepth_km = mean(DrillDepth_km), 
                  WaterDepth_m = mean(WaterDepth_m),
                  WindSpeed_m_s= sum(WindSpeed_m_s),
                  HumanCost = sum(HumanCost)) %>%
       mutate(Mean_HumanCost = (FatalitiesCrew + Fatalities3rd_Party + InjuriesCrew + Injuries3rd_Party)/How_Many) %>%
       arrange(desc(How_Many))
   

write.csv2(by_Event_Function, file = "by_Event_Function.CSV")
write.csv2(trainVariable, file = "trainVariable.CSV")



damage <- cbind(by_Event_Function$Damage,
                by_Event_Function$How_Many,
                by_Event_Function$HumanCost,
                by_Event_Function$Mean_HumanCost,
                by_Event_Function$DamageCost_million)

colnames(damage) <- cbind("Grau de danos","Ocorrências", "Total Custo Humano", "Média Custo Humano", "Danos em milões")

damage <- as.data.frame(na.omit(damage))


OverView <- apply(Cleaning_completo_woad, 2, .Unianalysis)

OverView2 <- sapply(OverView, distinct)



by_Year <- trainVariable %>%
          group_by(Year) %>%
          summarise(Freq = n(),
           Owner = n_distinct(Owner),
           FatalitiesCrew = sum(FatalitiesCrew),
           Fatalities3rd_Party = sum(Fatalities3rd_Party),
           InjuriesCrew = sum(InjuriesCrew),
           Injuries3rd_Party = sum(Injuries3rd_Party),
           DamageCost_million= sum(DamageCost_million), 
           HumanCost = sum(HumanCost),   
           WindSpeed_m_s= mean(WindSpeed_m_s)) 
  
  
by_HumanCost <- trainVariable %>%
          filter(HumanCost > 0) %>%
          group_by(HumanCost, AccidentCategory, Damage) %>%
          summarise(Freq = n(),
           Owner = n_distinct(Owner),
           FatalitiesCrew = sum(FatalitiesCrew),
           Fatalities3rd_Party = sum(Fatalities3rd_Party),
           InjuriesCrew = sum(InjuriesCrew),
           Injuries3rd_Party = sum(Injuries3rd_Party),
           DamageCost_million= sum(DamageCost_million), 
           WindSpeed_m_s= mean(WindSpeed_m_s))  %>% 
           filter(Freq > 20)
           

dim(by_HumanCost)
## [1]  3 11
by_Year<- by_Year %>% 
  mutate_all(replace_na, 0)

by_Event_Function <- by_Event_Function %>% 
  mutate_all(replace_na, 0)

6 Exploratory analysis

6.1 Checking Data Consistency

library(GGally)
ggpairs(trainVariable, columns = 24:30, ggplot2::aes(colour=Damage))

trainVariable_N <- trainVariable %>%
    dplyr::filter(HumanCost > 0)
summary(trainVariable_N$HumanCost)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00    3.00   10.00   25.94   20.00 1850.00
dim(trainVariable_N)
## [1] 590  43

Apesar de ter sido retirado da amostra os registros de custo humano 0 e custo humano maior que o terceiro quartil, o histograma ainda mostra que os dados não estão normalizados, o que pode alterar a correlação de variáveis

histogram(trainVariable_N$HumanCost)

Assim optou-se por selecionar uma amostra com valores mais aproximados da média. O valor mínimo encontrado foi 250 para custo humano, o que gerou uma perda de registros para avaliação, devido a iregularidade dos dados, inviabilizando assim a análise de correlação que recomenda ao menos 40 registros.

trainVariable_N <- trainVariable %>%
    dplyr::filter(HumanCost > 250)

summary(trainVariable_N$HumanCost)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   260.0   270.0   363.0   669.3   796.5  1850.0
dim(trainVariable_N)
## [1]  6 43
histogram(trainVariable_N$HumanCost)

O p-valor foi de 0.. Isto quer dizer que os dados são normais, pois não diferem de uma curva normal. O p-valor for < 0.05 indica que os dados não apresentam normalidade.

shapiro.test(trainVariable_N$HumanCost)
## 
##  Shapiro-Wilk normality test
## 
## data:  trainVariable_N$HumanCost
## W = 0.74827, p-value = 0.01916

Assim, sugere-se realizar uma pesquisa qualitativa a partir desses registros de acidentes. Abaixo a descrição dos registros.

DT::datatable(trainVariable_N)

Contudo, mesmo não havendo normalidade de dados para a variável custo humano, algumas análises serão realizadas, desde que sejam observadas as limitações dos resultados obtidos.

6.2 Normality test variable Y “Frequency of events”

6.2.1 Histogram

summary(trainVariable$HumanCost)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##    0.000    0.000    0.000    3.007    0.000 1850.000       12
histogram(trainVariable$HumanCost)

dim(trainVariable)
## [1] 5101   43

Não há normalidade de dados para análises de correlação.

6.3 Análise de variancia

6.3.1 Análise de variância simples

Há variancia de vento nos acidentes com maiores danos

aggregate(WindSpeed_m_s ~ Damage, trainVariable, var)
##               Damage WindSpeed_m_s
## 1 Insignif/no damage      21.32638
## 2       Minor damage     185.08466
## 3      Severe damage     302.23962
## 4 Significant damage     260.15038
## 5         Total loss     146.02800
aggregate(WindSpeed_m_s ~ Damage, trainVariable,mean)
##               Damage WindSpeed_m_s
## 1 Insignif/no damage     0.8153341
## 2       Minor damage     4.6072664
## 3      Severe damage     6.2211896
## 4 Significant damage     6.5203620
## 5         Total loss     4.1061453
aggregate(WindSpeed_m_s ~ Damage, trainVariable,summary)
##               Damage WindSpeed_m_s.Min. WindSpeed_m_s.1st Qu.
## 1 Insignif/no damage          0.0000000             0.0000000
## 2       Minor damage          0.0000000             0.0000000
## 3      Severe damage          0.0000000             0.0000000
## 4 Significant damage          0.0000000             0.0000000
## 5         Total loss          0.0000000             0.0000000
##   WindSpeed_m_s.Median WindSpeed_m_s.Mean WindSpeed_m_s.3rd Qu.
## 1            0.0000000          0.8153341             0.0000000
## 2            0.0000000          4.6072664             0.0000000
## 3            0.0000000          6.2211896             0.0000000
## 4            0.0000000          6.5203620             0.0000000
## 5            0.0000000          4.1061453             0.0000000
##   WindSpeed_m_s.Max.
## 1         60.0000000
## 2         83.0000000
## 3         62.0000000
## 4         83.0000000
## 5         82.0000000

6.3.2 Análise de variancia ANOVA

As análises de variância permiterm verificar em uma só análise se as diferenças amostrais observadas são reais (causadas por diferenças significativas nas populações observadas) ou casuais (decorrentes da mera variabilidade amostral).

O gráfico (Scale-Location) serve para indicar a distribuição de pontos no intervalo de valores previstos. A variação deve ser razoavelmente igual em todo o intervalo do preditor, no nosso caso existe uma variação considerável nos intervalos.

fit <- aov(HumanCost ~ WindSpeed_m_s + WaveHeight_m + DrillDepth_km + SpillAmount_m3 + as.factor(TypeofUnit) + as.factor(Function) + as.factor(MainEvent), data = trainVariable) 

summary(fit)
##                        Df Sum Sq Mean Sq F value   Pr(>F)    
## WindSpeed_m_s           1   3391    3391   2.294   0.1303    
## WaveHeight_m            1  33101   33101  22.394 2.71e-06 ***
## DrillDepth_km           1      0       0   0.000   0.9877    
## SpillAmount_m3          1      8       8   0.006   0.9397    
## as.factor(TypeofUnit)  16 101826    6364   4.306 4.51e-08 ***
## as.factor(Function)    16   4801     300   0.203   0.9997    
## as.factor(MainEvent)   20  53642    2682   1.815   0.0162 *  
## Residuals             666 984428    1478                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 4378 observations deleted due to missingness
aov(fit)
## Call:
##    aov(formula = fit)
## 
## Terms:
##                 WindSpeed_m_s WaveHeight_m DrillDepth_km SpillAmount_m3
## Sum of Squares         3390.9      33101.0           0.4            8.5
## Deg. of Freedom             1            1             1              1
##                 as.factor(TypeofUnit) as.factor(Function) as.factor(MainEvent)
## Sum of Squares               101825.9              4801.0              53642.3
## Deg. of Freedom                    16                  16                   20
##                 Residuals
## Sum of Squares   984428.0
## Deg. of Freedom       666
## 
## Residual standard error: 38.44633
## 2 out of 59 effects not estimable
## Estimated effects may be unbalanced
## 4378 observations deleted due to missingness
plot(fit)
## Warning: not plotting observations with leverage one:
##   521

## Warning: not plotting observations with leverage one:
##   521

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos

6.4 Análise Univariada

O objetivo da análise univariada é compreender cada unidade de dados da base analisada, de forma a estabelecer um contexto referencial. Isso foi feito a partir de perguntas chaves descritas nas próximas seções.

6.4.1 Qual o perfil e a quantidade dos eventos que ocorreram?

Evento principal

DT::datatable(OverView2$MainEvent, colnames = c('Variavel', 'Freq.','Percentual'), filter = c("top"))
.BarPlot(OverView2$MainEvent)

Categoria de acidente

DT::datatable(OverView2$AccidentCategory, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
.BarPlot(OverView$AccidentCategory)

Tipo de dano

DT::datatable(OverView2$Damage, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
.BarPlot(OverView2$Damage)

6.4.2 Quais as possíveis causas dos eventos?

Causa humana

DT::datatable(OverView2$HumanCause, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
.BarPlot(OverView2$HumanCause)

Causa de equipamento

DT::datatable(OverView2$EquipmentCause, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
.BarPlot(OverView2$EquipmentCause)

6.4.3 Onde os eventos ocorrem na plataforma?

Tipo de unidade

DT::datatable(OverView2$TypeofUnit, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
.BarPlot(OverView2$TypeofUnit)

Funções

DT::datatable(OverView2$Function, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
.BarPlot(OverView2$Function)

Operação principal

DT::datatable(OverView2$MainOperation, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
.BarPlot(OverView2$MainOperation)

6.4.4 Quais modelos de gestão são mais frequentes nessas ocorrências?

Modelo de Gestão

DT::datatable(OverView2$Class_Society, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
.BarPlot(OverView2$Class_Society)

6.4.5 Qual o custo humano desses eventos?

Fatalidades da tripulação

DT::datatable(OverView2$FatalitiesCrew, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
 .BarPlot(OverView2$FatalitiesCrew)

Fatalidades de terceiros

DT::datatable(OverView2$Fatalities3rd_Party, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
 .BarPlot(OverView2$Fatalities3rd_Party)

Ferimentos da tripulação

DT::datatable(OverView2$FatalitiesCrew, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
.BarPlot(OverView2$FatalitiesCrew)

Ferimentos de terceiros

DT::datatable(OverView2$Fatalities3rd_Party, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
.BarPlot(OverView2$Fatalities3rd_Party)

6.4.6 Qual o custo financeiro desses eventos?

Custo dos danos em milhoes

DT::datatable(OverView2$DamageCost_million, colnames = c('Variável', 'Freq.','Percentual'), filter = c("top"))
.BarPlot(OverView2$DamageCost_million)

6.5 Analise bivariada

O Objetivo desta análise é identificar possíveis correlações entre duas unidades de análise e desta forma observar se exite possibilidade de correlação entre as variáveis.

6.5.1 Quais eventos ocorrem com mais frequencia pelo tipo de ocorrência?

trainVariable %>%
  ggplot( aes(x=AccidentCategory, y=MainEvent, fill=AccidentCategory)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="black", size=0.4, alpha=0.9) +
    theme_ipsum() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Principais eventos por tipo de gravidade") +
    xlab("")

trainVariable %>%
  ggplot( aes(x=Damage, y=MainEvent, fill=Damage)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="black", size=0.4, alpha=0.9) +
    theme_ipsum() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Principais eventos por tipo de danos") + theme(axis.text.x = element_text(angle = 35, vjust = 0.5, hjust=1)) +
    xlab("")

trainVariable %>%
  ggplot( aes(x=Damage, y=AccidentCategory, fill=Damage)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="black", size=0.4, alpha=0.9) +
    theme_ipsum() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Categorias de acidentes por tipo de danos") + theme(axis.text.x = element_text(angle = 35, vjust = 0.5, hjust=1)) +
    xlab("")

6.5.2 Em quais funções esses eventos ocorrem?

trainVariable %>%
  ggplot( aes(x=AccidentCategory, y=Function, fill=AccidentCategory)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="black", size=0.4, alpha=0.9) +
    theme_ipsum() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Ocorrências de eventos por Funções e gravidade") +
    xlab("")

trainVariable %>%
  ggplot( aes(x=Damage, y=Function, fill=Damage)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="black", size=0.4, alpha=0.9) +
    theme_ipsum() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Ocorrências de eventos por Funções e danos") + theme(axis.text.x = element_text(angle = 35, vjust = 0.5, hjust=1)) +
    xlab("")

6.5.3 Em quais tipos de unidades esses eventos ocorrem?

trainVariable %>%
  ggplot( aes(x=AccidentCategory, y=TypeofUnit, fill=TypeofUnit)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="black", size=0.4, alpha=0.9) +
    theme_ipsum() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Ocorrências por tipo de unidade e gravidade") + 
    xlab("")

trainVariable %>%
  ggplot( aes(x=Damage, y=TypeofUnit, fill=Damage)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="black", size=0.4, alpha=0.9) +
    theme_ipsum() +
    theme(
      legend.position ="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Ocorrências por tipo de unidade e danos") +  theme(axis.text.x = element_text(angle = 35, vjust = 0.5, hjust=1)) +
    xlab("")

6.5.4 Qual a distribuição desses eventos por ano?

trainVariable %>%
  ggplot( aes(x=Year, y=AccidentCategory, fill=AccidentCategory)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="black", size=0.4, alpha=0.9) +
    theme_ipsum() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Ocorrências por gravidade e ano") + 
    xlab("")

trainVariable %>%
  ggplot( aes(x=Year, y=Damage, fill=Damage)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="black", size=0.4, alpha=0.9) +
    theme_ipsum() +
    theme(
      legend.position ="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Ocorrências por danos e ano") +  theme(axis.text.x = element_text( vjust = 0.5, hjust=1)) +
    xlab("")

6.5.4.1 Há diferenças significativas ao longo do tempo?

qplot(log(How_Many),    Year,   data    =   by_Event_Function,  facets  =   .   ~   Damage) 

qplot(log(How_Many),    Year,   data    =   by_Event_Function,  facets  =   .   ~   AccidentCategory)   

qplot(How_Many, as.factor(Month),   data    =   by_Event_Function,  facets  =   .   ~   Damage)

6.5.5 Há diferenças significativas dos ventos conforme a gravidade das ocorrências?

Aplicando-se uma análise de variância simples, é possível perceber que a variações que devem ser investigadas.

aggregate(WindSpeed_m_s ~ AccidentCategory, trainVariable, var)
##   AccidentCategory WindSpeed_m_s
## 1         Accident     232.37595
## 2  Incidnt/haz.sit      44.55461
## 3        Near miss      30.04729
## 4    Unsignificant      16.46340
bartlett.test(WindSpeed_m_s ~ AccidentCategory, trainVariable)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  WindSpeed_m_s by AccidentCategory
## Bartlett's K-squared = 2118.3, df = 3, p-value < 2.2e-16
summary(trainVariable$WindSpeed_m_s)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.000   0.000   2.429   0.000  83.000      12

6.6 Analise multivariada

Na análise multivariada é identificar quais as possíveis influências dos diversos fatores que permeiam o contexto de análise. Por meio do estudo multivariado é possível obter insumos para realizar diagnósticos, modelar análise preditivas e prescritivas.

6.6.1 Qual a relação do ano e mês, tipo de ocorrência e custo humano?

Por ano

qplot(log(HumanCost),   Year,   data    =   trainVariable,  facets  =   .   ~   AccidentCategory)   
## Warning: Removed 12 rows containing missing values (geom_point).

qplot(as.factor(Month), as.factor(Year), data   =   trainVariable,  facets  =   .   ~   Damage) 

qplot(as.factor(Month), as.factor(Year), data   =   trainVariable,  facets  =   .   ~   AccidentCategory)

Por mês

qplot(log(HumanCost),   as.factor(Month),   data    =   by_Event_Function, color    =   Damage, geom    =   c("point",  "smooth"),  method  =   "lm")   
## Warning: Ignoring unknown parameters: method
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 3693 rows containing non-finite values (stat_smooth).
## Warning in qt((1 - level)/2, df): NaNs produzidos

## Warning in qt((1 - level)/2, df): NaNs produzidos

## Warning in qt((1 - level)/2, df): NaNs produzidos
## Warning in max(ids, na.rm = TRUE): nenhum argumento não faltante para max;
## retornando -Inf

## Warning in max(ids, na.rm = TRUE): nenhum argumento não faltante para max;
## retornando -Inf

## Warning in max(ids, na.rm = TRUE): nenhum argumento não faltante para max;
## retornando -Inf

qplot(DamageCost_million,   as.factor(Month),   data    =   trainVariable, color    =   Damage, geom    =   c("point",  "smooth"),  method  =   "lm")   
## Warning: Ignoring unknown parameters: method
## `geom_smooth()` using formula 'y ~ x'

qplot(DamageCost_million,   as.factor(Month),   data    =   trainVariable, color    =   Damage, geom    =   c("point",  "smooth"),  method  =   "lm")   
## Warning: Ignoring unknown parameters: method
## `geom_smooth()` using formula 'y ~ x'

6.6.2 Correlação multivariável

Conforme observado no gráfico abaixo, não há correlação visiveis

by_Year2 <- trainVariable_N %>%
          group_by(Year) %>%
          summarise(Freq = n(),
           Owner = n_distinct(Owner),
           FatalitiesCrew = sum(FatalitiesCrew),
           Fatalities3rd_Party = sum(Fatalities3rd_Party),
           InjuriesCrew = sum(InjuriesCrew),
           Injuries3rd_Party = sum(Injuries3rd_Party),
           DamageCost_million= sum(DamageCost_million), 
           HumanCost = sum(HumanCost),   
           WindSpeed_m_s= mean(WindSpeed_m_s))

summary(by_Year[, 2:10])
##       Freq           Owner       FatalitiesCrew  Fatalities3rd_Party
##  Min.   :  2.0   Min.   : 1.00   Min.   :  0.0   Min.   : 0.00      
##  1st Qu.:104.2   1st Qu.: 1.00   1st Qu.:  0.0   1st Qu.: 0.00      
##  Median :148.0   Median :49.50   Median : 11.0   Median : 3.00      
##  Mean   :170.0   Mean   :33.57   Mean   : 22.5   Mean   : 4.10      
##  3rd Qu.:197.5   3rd Qu.:60.75   3rd Qu.: 35.5   3rd Qu.: 5.75      
##  Max.   :484.0   Max.   :81.00   Max.   :180.0   Max.   :27.00      
##   InjuriesCrew   Injuries3rd_Party DamageCost_million   HumanCost     
##  Min.   : 0.00   Min.   : 0.000    Min.   :  0.000    Min.   :   0.0  
##  1st Qu.: 0.00   1st Qu.: 0.000    1st Qu.:  0.000    1st Qu.:   0.0  
##  Median : 9.50   Median : 1.500    Median :  1.325    Median : 214.0  
##  Mean   :15.87   Mean   : 7.033    Mean   : 82.452    Mean   : 334.7  
##  3rd Qu.:26.75   3rd Qu.: 6.750    3rd Qu.: 55.250    3rd Qu.: 502.8  
##  Max.   :80.00   Max.   :90.000    Max.   :903.335    Max.   :2103.0  
##  WindSpeed_m_s   
##  Min.   : 0.000  
##  1st Qu.: 0.000  
##  Median : 0.000  
##  Mean   : 1.705  
##  3rd Qu.: 2.069  
##  Max.   :20.584
pairs(by_Year[,3:7],  
      diag.panel = panel.hist,
      upper.panel = panel.cor,
      lower.panel = panel.lm,
      main="Correlação Multivariável")

pairs(by_Year[, 3:10],  
      diag.panel = panel.hist,
      upper.panel = panel.cor,
      lower.panel = panel.lm,
      main="Correlação Multivariável")

6.6.3 Ocorrências por tipo de dano, categoria de acidentes e mês de ocorrência

Neste gráfico abaixo ilustra significante ocorrência de acidentes e incidentets nos meses de agosto e setembro ao longo das 5 décadas de ocorrências. Importante identificar se esses fatos devem-se aos outliers, as especificidades de algum país ou algum outro fator não frequente (mero acaso), ou há algum fator de relevância esses meses(significância).

qplot(log(How_Many),    as.factor(Month),   data    =   by_Event_Function, facets   =   .   ~   AccidentCategory, color =   Damage, geom    =   c("point",  "smooth"),  method  =   "lm")
## Warning: Ignoring unknown parameters: method
## `geom_smooth()` using formula 'y ~ x'
## Warning in qt((1 - level)/2, df): NaNs produzidos
## Warning in qt((1 - level)/2, df): NaNs produzidos
## Warning in max(ids, na.rm = TRUE): nenhum argumento não faltante para max;
## retornando -Inf

## Warning in max(ids, na.rm = TRUE): nenhum argumento não faltante para max;
## retornando -Inf

6.6.4 Dados da Categoria acidentes que mostra custo financeiro por custo humano e tipo de dano

No Gráfico abaixo são filtrados somentes os eventos chamados de Acidentes, os quais são observados a partir dos danos por milhões em relação ao custo humano, observando-se a correlação por tipo de dano. Assim é possível perceber uma relação positiva nas perdas totais, danos severos e danos significativos e negativa nos danos insignificantes.

qplot(HumanCost,    DamageCost_million, data    =   trainVariable[AccidentCategory = "Accident"], color =   Damage, geom    =   c("point",  "smooth"),  method  =   "lm")   
## Warning: Ignoring unknown parameters: method
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 12 rows containing non-finite values (stat_smooth).
## Warning: Removed 12 rows containing missing values (geom_point).

6.6.5 Custo humano por tipo de unidade e grau de danos

No gráfico é possível observar uma correlação positiva nas unidades Jackup, jacket, semi-submersivel, helicopter com custo humano nas ocorrências com perda total.

trainVariable %>% ggvis(~HumanCost, ~TypeofUnit, fill = ~as.factor(Damage)) %>% layer_points()
# Most basic bubble plot
by_Event_Function %>%
   ggplot(aes(x= HumanCost, y= DamageCost_million, size= Mean_HumanCost, color=Damage)) +
    geom_point(alpha=0.5) +
    scale_size(range = c(.1, 20), name="Mean_HumanCost")

par(mfrow=c(1,3))
# Per Damage
ggplot(by_Event_Function, aes(fill=Damage, y=Function, x=How_Many)) + 
    geom_bar(position="fill", stat="identity") + ggtitle("Eventos por Função e danos")

ggplot(by_Event_Function, aes(fill=Damage, y=TypeofUnit, x=How_Many)) + 
    geom_bar(position="fill", stat="identity") + ggtitle("Eventos por Unidade e danos")

ggplot(by_Event_Function, aes(fill=Damage, y=MainEvent, x=How_Many)) + 
    geom_bar(position="fill", stat="identity") + ggtitle("Tipo de Eventos por danos")

# Per Accident Category

ggplot(by_Event_Function, aes(fill=AccidentCategory, y=Function, x=How_Many)) + 
    geom_bar(position="fill", stat="identity") + ggtitle("Eventos por Função e tipo")

ggplot(by_Event_Function, aes(fill=AccidentCategory, y=TypeofUnit, x=How_Many)) + 
    geom_bar(position="fill", stat="identity") + ggtitle("Eventos por Unidade e tipo")

ggplot(trainVariable, aes(fill=AccidentCategory, y=MainEvent, x=HumanCost)) + 
    geom_bar(position="fill", stat="identity") + ggtitle("Eventos por Categoria e tipo")
## Warning: Removed 12 rows containing missing values (position_stack).
## Warning: Removed 122 rows containing missing values (geom_bar).

6.7 Conclusões das análises exploratórias

As análises exploratórias tem o propósito de representar o contexto de um domínio de análise e por isso elas tendem a ser “rápidas e sujas”. “Rápidas”“, pois buscamos o maior número de informações sobre o contexto, utilizando gráficos que nos direcionam em óticas unitárias, binárias e multinéveis sobre os dados, o que acaba trazendo muita informação com ruido (”sujas").

Contudo, trata-se de um estratégia que visa resumir os dados (geralmente graficamente) e destacar qualquer recurso amplo.

Os dados sobre as ocorrências parecem terem sofrido modificações de nomenclatura no decorrer das 5 décadas analisas. Por exemplo, até o ano de 1882 não havia nenhum registro do termo “Near Miss”, indicando que talvez esse termo passou a ser utilizado a partir de 1983, provavelente por alguma razão relacionada a categorização dos eventos, que permitiram facilitar a gestão dessas ocorrências. Valeria nesse ponto uma investigação literária a respeito disso. Assim, abaixo segue a compilação dos dados coletados em quase 5 décadas (46 anos) em 91 países, com a participação de 559

Quanto ao perfil dos eventos:

  • Cerca de 57% de todas as ocorrências relatadas ocorrem por mal funcionamento de equipamentos (32,92%) e problemas no tempo (24,43%)

  • Cerca de 52% de todas as ocorrências tem por evento principal a Queda de carga / objeto descartado (20,41%), vazamento de fluido ou gás (17,88) e fogo (13,84%).

  • Um fato já esperado é que cerca de 85% de todas as ocorrências relatadas ocorrem nas funções de Perfuração (34,96%), Produção (29,39%), e Perfuração e produção (20,72%) 4 Ar de Transferência

  • Há correlação moderada (0,43) de custo financeiro e custo humano.

  • Há correlação moderada (0,49) da velocidade dos ventos em metro por segundo com ocorrência de lesões da tripulação.

(colocar mais achados)

Explorar perguntas e hipóteses básicas (e talvez excluir algumas)

Sugirir estratégias de modelagem para a "próxima etapa

7 Fase 2 - Diagnóstico

7.1 Com outliers

7.1.1 Predição de custo humano por Variáveis de condições climáticas

Somente a velocidade do vento medido em metros por segundo tem significância no modelo. Conforme mostra os gráficos, esse modelo não tem um bom ajuste aos dados.

fit1 <- lm(formula = HumanCost ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km -1, data = trainVariable)

summary(fit1)
## 
## Call:
## lm(formula = HumanCost ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km - 
##     1, data = trainVariable)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -13.14  -1.56  -0.56  -0.28 902.88 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## WindSpeed_m_s 0.148722   0.075056   1.981    0.048 *
## WaterDepth_m  0.003918   0.010399   0.377    0.706  
## DrillDepth_km 0.046100   0.166384   0.277    0.782  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 38.75 on 580 degrees of freedom
##   (4518 observations deleted due to missingness)
## Multiple R-squared:  0.007588,   Adjusted R-squared:  0.002455 
## F-statistic: 1.478 on 3 and 580 DF,  p-value: 0.2194
plot(fit1)

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produzidos

7.1.2 Predição de custo humano por multiplas variáveis

fit2 <- lm(formula = HumanCost ~ WindSpeed_m_s + as.factor(TypeofUnit) + as.factor(Function) + as.factor(MainEvent) + as.factor(MainOperation) - 1, data = trainVariable)

summary(fit2)
## 
## Call:
## lm(formula = HumanCost ~ WindSpeed_m_s + as.factor(TypeofUnit) + 
##     as.factor(Function) + as.factor(MainEvent) + as.factor(MainOperation) - 
##     1, data = trainVariable)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -43.55   -3.31   -0.49    1.09 1841.86 
## 
## Coefficients: (1 not defined because of singularities)
##                                                    Estimate Std. Error t value
## WindSpeed_m_s                                       0.16819    0.05428   3.099
## as.factor(TypeofUnit)Artificial Island              2.43889   28.10744   0.087
## as.factor(TypeofUnit)Barge (not drilling)           6.82684   15.69727   0.435
## as.factor(TypeofUnit)Concrete structure            -3.74184   14.29596  -0.262
## as.factor(TypeofUnit)Drill barge                   -3.77525   15.86315  -0.238
## as.factor(TypeofUnit)Drill ship                    10.65833   14.85264   0.718
## as.factor(TypeofUnit)Drilling tender               -7.64189   24.30405  -0.314
## as.factor(TypeofUnit)FPSO/FSU                      -2.11210   14.40995  -0.147
## as.factor(TypeofUnit)Helicopter-Offshore duty      23.47213   13.97331   1.680
## as.factor(TypeofUnit)Jacket                        -2.35269   14.15019  -0.166
## as.factor(TypeofUnit)Jackup                        -1.46145   14.24890  -0.103
## as.factor(TypeofUnit)Loading buoy                   0.66558   18.88205   0.035
## as.factor(TypeofUnit)Mobile unit(not drill.)        2.02706   23.00098   0.088
## as.factor(TypeofUnit)Other/Unkn. fixed struct      -3.43126   18.21441  -0.188
## as.factor(TypeofUnit)Pipeline                       0.21942   16.44719   0.013
## as.factor(TypeofUnit)Semi-submersible              -3.40123   14.09529  -0.241
## as.factor(TypeofUnit)Submersible                   -5.54808   20.95355  -0.265
## as.factor(TypeofUnit)Subsea install./complet.      -2.53056   17.32781  -0.146
## as.factor(TypeofUnit)Tension leg platform          -3.82177   14.47337  -0.264
## as.factor(TypeofUnit)Well support structure        -8.98273   14.53719  -0.618
## as.factor(Function)Compression                      0.58574   13.63727   0.043
## as.factor(Function)Crane                            8.80605   12.09450   0.728
## as.factor(Function)Drilling                         4.37687    8.67607   0.504
## as.factor(Function)Drilling&Production              4.47716    8.83514   0.507
## as.factor(Function)Other                            0.88384   12.35067   0.072
## as.factor(Function)Pipelaying                       9.96140   12.70990   0.784
## as.factor(Function)Production                       1.62668    8.79773   0.185
## as.factor(Function)Pumping                         -1.37695   12.35999  -0.111
## as.factor(Function)Riser                            1.45585   11.83548   0.123
## as.factor(Function)Service                          3.51326   14.03002   0.250
## as.factor(Function)Storage                         -1.25837   11.54282  -0.109
## as.factor(Function)Transfer Air                          NA         NA      NA
## as.factor(Function)Transfer Gas                    -0.93275   15.47544  -0.060
## as.factor(Function)Transfer Hydrocarbon            -0.24375   12.45294  -0.020
## as.factor(Function)Transfer Oil                     0.15931   13.40076   0.012
## as.factor(Function)Transfer Unknown                -0.53913   21.04270  -0.026
## as.factor(Function)Water/gas injection              1.08467   11.75872   0.092
## as.factor(Function)Work Assistance                  6.33756   11.52225   0.550
## as.factor(MainEvent)Blowout                         5.56084    6.87076   0.809
## as.factor(MainEvent)Breakage or fatigue            -1.54916    4.66187  -0.332
## as.factor(MainEvent)Capsizing,overturn,toppling     8.30822    5.00605   1.660
## as.factor(MainEvent)Collision,not offshore units   -0.25953    5.22769  -0.050
## as.factor(MainEvent)Collision,offshore units       -0.01284    4.81403  -0.003
## as.factor(MainEvent)Crane accident                  0.36334    5.54991   0.065
## as.factor(MainEvent)Explosion                      13.95261    6.93682   2.011
## as.factor(MainEvent)Falling load / Dropped object   1.88661    4.23634   0.445
## as.factor(MainEvent)Fire                            5.36862    4.42509   1.213
## as.factor(MainEvent)Grounding                      -5.90610    7.38416  -0.800
## as.factor(MainEvent)Helicopter accident            23.16198    7.11864   3.254
## as.factor(MainEvent)Leakage into hull              -3.04666    8.45215  -0.360
## as.factor(MainEvent)List, uncontrolled inclination  4.31948    6.30750   0.685
## as.factor(MainEvent)Loss of buoyancy or sinking    21.03130    5.82311   3.612
## as.factor(MainEvent)Machinery/propulsion failure   -6.67331   11.97976  -0.557
## as.factor(MainEvent)Other                          -0.93044    4.75707  -0.196
## as.factor(MainEvent)Out of position, adrift        -0.93533    6.20395  -0.151
## as.factor(MainEvent)Release of fluid or gas         0.81225    4.32582   0.188
## as.factor(MainEvent)Towline failure/rupture        -4.41297    8.26597  -0.534
## as.factor(MainEvent)Well problem, no blowout        0.73468    4.79798   0.153
## as.factor(MainOperation)Accommodation              -0.23340   14.03767  -0.017
## as.factor(MainOperation)Completion/drill.aban       0.03445   11.49155   0.003
## as.factor(MainOperation)Construct. work unit       -9.12865   12.95619  -0.705
## as.factor(MainOperation)Demobilizing                2.38256   14.59081   0.163
## as.factor(MainOperation)Development Drilling       -2.87910   10.57588  -0.272
## as.factor(MainOperation)Drilling,unknwn phase      -1.71643   10.77638  -0.159
## as.factor(MainOperation)Exploration drilling       -0.92158   10.74718  -0.086
## as.factor(MainOperation)Idle                       -3.29113   12.93445  -0.254
## as.factor(MainOperation)Injection                  -0.32583   15.46495  -0.021
## as.factor(MainOperation)Loading of liquids          2.20766   18.34930   0.120
## as.factor(MainOperation)Mobilizing                 -3.21840   11.46181  -0.281
## as.factor(MainOperation)Other                      -0.95114   13.05258  -0.073
## as.factor(MainOperation)Production                 -0.70108   10.45675  -0.067
## as.factor(MainOperation)Repair work/under rep       3.22599   11.86818   0.272
## as.factor(MainOperation)Scrapped                   -5.37448   18.65075  -0.288
## as.factor(MainOperation)Service                    -7.09732   17.15889  -0.414
## as.factor(MainOperation)Stacked                     3.73964   13.33601   0.280
## as.factor(MainOperation)Standby                    -8.27994   20.46541  -0.405
## as.factor(MainOperation)Testing                    -3.69611   14.06333  -0.263
## as.factor(MainOperation)Transfer, dry              -6.21617   15.23668  -0.408
## as.factor(MainOperation)Transfer, wet              -2.05814   11.30293  -0.182
## as.factor(MainOperation)Under construction          5.27337   11.28647   0.467
## as.factor(MainOperation)Well workover              -1.83356   10.77367  -0.170
##                                                    Pr(>|t|)    
## WindSpeed_m_s                                      0.001956 ** 
## as.factor(TypeofUnit)Artificial Island             0.930858    
## as.factor(TypeofUnit)Barge (not drilling)          0.663652    
## as.factor(TypeofUnit)Concrete structure            0.793533    
## as.factor(TypeofUnit)Drill barge                   0.811901    
## as.factor(TypeofUnit)Drill ship                    0.473039    
## as.factor(TypeofUnit)Drilling tender               0.753211    
## as.factor(TypeofUnit)FPSO/FSU                      0.883476    
## as.factor(TypeofUnit)Helicopter-Offshore duty      0.093071 .  
## as.factor(TypeofUnit)Jacket                        0.867956    
## as.factor(TypeofUnit)Jackup                        0.918313    
## as.factor(TypeofUnit)Loading buoy                  0.971882    
## as.factor(TypeofUnit)Mobile unit(not drill.)       0.929778    
## as.factor(TypeofUnit)Other/Unkn. fixed struct      0.850586    
## as.factor(TypeofUnit)Pipeline                      0.989356    
## as.factor(TypeofUnit)Semi-submersible              0.809332    
## as.factor(TypeofUnit)Submersible                   0.791191    
## as.factor(TypeofUnit)Subsea install./complet.      0.883896    
## as.factor(TypeofUnit)Tension leg platform          0.791750    
## as.factor(TypeofUnit)Well support structure        0.536665    
## as.factor(Function)Compression                     0.965742    
## as.factor(Function)Crane                           0.466590    
## as.factor(Function)Drilling                        0.613952    
## as.factor(Function)Drilling&Production             0.612360    
## as.factor(Function)Other                           0.942954    
## as.factor(Function)Pipelaying                      0.433229    
## as.factor(Function)Production                      0.853318    
## as.factor(Function)Pumping                         0.911301    
## as.factor(Function)Riser                           0.902107    
## as.factor(Function)Service                         0.802282    
## as.factor(Function)Storage                         0.913194    
## as.factor(Function)Transfer Air                          NA    
## as.factor(Function)Transfer Gas                    0.951941    
## as.factor(Function)Transfer Hydrocarbon            0.984384    
## as.factor(Function)Transfer Oil                    0.990515    
## as.factor(Function)Transfer Unknown                0.979561    
## as.factor(Function)Water/gas injection             0.926509    
## as.factor(Function)Work Assistance                 0.582328    
## as.factor(MainEvent)Blowout                        0.418359    
## as.factor(MainEvent)Breakage or fatigue            0.739675    
## as.factor(MainEvent)Capsizing,overturn,toppling    0.097060 .  
## as.factor(MainEvent)Collision,not offshore units   0.960408    
## as.factor(MainEvent)Collision,offshore units       0.997872    
## as.factor(MainEvent)Crane accident                 0.947805    
## as.factor(MainEvent)Explosion                      0.044346 *  
## as.factor(MainEvent)Falling load / Dropped object  0.656097    
## as.factor(MainEvent)Fire                           0.225111    
## as.factor(MainEvent)Grounding                      0.423851    
## as.factor(MainEvent)Helicopter accident            0.001148 ** 
## as.factor(MainEvent)Leakage into hull              0.718521    
## as.factor(MainEvent)List, uncontrolled inclination 0.493497    
## as.factor(MainEvent)Loss of buoyancy or sinking    0.000308 ***
## as.factor(MainEvent)Machinery/propulsion failure   0.577523    
## as.factor(MainEvent)Other                          0.844939    
## as.factor(MainEvent)Out of position, adrift        0.880169    
## as.factor(MainEvent)Release of fluid or gas        0.851067    
## as.factor(MainEvent)Towline failure/rupture        0.593457    
## as.factor(MainEvent)Well problem, no blowout       0.878308    
## as.factor(MainOperation)Accommodation              0.986735    
## as.factor(MainOperation)Completion/drill.aban      0.997608    
## as.factor(MainOperation)Construct. work unit       0.481110    
## as.factor(MainOperation)Demobilizing               0.870296    
## as.factor(MainOperation)Development Drilling       0.785456    
## as.factor(MainOperation)Drilling,unknwn phase      0.873458    
## as.factor(MainOperation)Exploration drilling       0.931668    
## as.factor(MainOperation)Idle                       0.799162    
## as.factor(MainOperation)Injection                  0.983192    
## as.factor(MainOperation)Loading of liquids         0.904241    
## as.factor(MainOperation)Mobilizing                 0.778882    
## as.factor(MainOperation)Other                      0.941913    
## as.factor(MainOperation)Production                 0.946548    
## as.factor(MainOperation)Repair work/under rep      0.785775    
## as.factor(MainOperation)Scrapped                   0.773235    
## as.factor(MainOperation)Service                    0.679170    
## as.factor(MainOperation)Stacked                    0.779171    
## as.factor(MainOperation)Standby                    0.685805    
## as.factor(MainOperation)Testing                    0.792703    
## as.factor(MainOperation)Transfer, dry              0.683313    
## as.factor(MainOperation)Transfer, wet              0.855521    
## as.factor(MainOperation)Under construction         0.640359    
## as.factor(MainOperation)Well workover              0.864869    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.38 on 4354 degrees of freedom
##   (667 observations deleted due to missingness)
## Multiple R-squared:  0.05314,    Adjusted R-squared:  0.03575 
## F-statistic: 3.055 on 80 and 4354 DF,  p-value: < 2.2e-16
plot(fit2)

7.2 Sem outliers

7.2.1 Remover outliers

outliers <- boxplot(trainVariable$HumanCost, plot=FALSE)$out
no_outliers <- trainVariable 
no_outliers <-no_outliers[which(no_outliers$HumanCost %in% outliers),]
ver <- lm(formula = HumanCost ~ WindSpeed_m_s + as.factor(TypeofUnit) + as.factor(Function) + as.factor(MainEvent) + as.factor(MainOperation) - 1, data = no_outliers)
dim(no_outliers)
## [1] 590  43
## Confidence interval
sumCoef <- summary(ver)$coefficients
sumCoef[1,1] + c(-1, 1) * qt(.975, df = ver$df) * sumCoef[1,2]
## [1] 3.776777 7.433748
sumCoef[2,1] + c(-1, 1) * qt(.975, df = ver$df) * sumCoef[2,2]
## [1] -407.2136  300.4147

7.2.2 Predição de custo humano por Variáveis de condições climáticas

Somente a velocidade do vento medido em metros por segundo tem significância no modelo. Conforme mostra os gráficos, esse modelo não tem um bom ajuste aos dados.

fit1Nl <- lm(formula = HumanCost ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km -1, data = no_outliers)

summary(fit1Nl)
## 
## Call:
## lm(formula = HumanCost ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km - 
##     1, data = no_outliers)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -235.816    3.374   10.172   22.734  122.006 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## WindSpeed_m_s 18.376526   1.234126  14.890   <2e-16 ***
## WaterDepth_m  -0.002454   0.083713  -0.029    0.977    
## DrillDepth_km  4.193332   5.487789   0.764    0.450    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 55.71 on 35 degrees of freedom
##   (552 observations deleted due to missingness)
## Multiple R-squared:  0.8762, Adjusted R-squared:  0.8656 
## F-statistic: 82.59 on 3 and 35 DF,  p-value: 5.984e-16
plot(fit1Nl)

7.2.3 Predição de custo humano por evento principal

fit2 <- lm(formula = HumanCost ~ WindSpeed_m_s + as.factor(TypeofUnit) + as.factor(Function) + as.factor(MainEvent) + as.factor(MainOperation) - 1, data = trainVariable)

summary(fit2)
## 
## Call:
## lm(formula = HumanCost ~ WindSpeed_m_s + as.factor(TypeofUnit) + 
##     as.factor(Function) + as.factor(MainEvent) + as.factor(MainOperation) - 
##     1, data = trainVariable)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -43.55   -3.31   -0.49    1.09 1841.86 
## 
## Coefficients: (1 not defined because of singularities)
##                                                    Estimate Std. Error t value
## WindSpeed_m_s                                       0.16819    0.05428   3.099
## as.factor(TypeofUnit)Artificial Island              2.43889   28.10744   0.087
## as.factor(TypeofUnit)Barge (not drilling)           6.82684   15.69727   0.435
## as.factor(TypeofUnit)Concrete structure            -3.74184   14.29596  -0.262
## as.factor(TypeofUnit)Drill barge                   -3.77525   15.86315  -0.238
## as.factor(TypeofUnit)Drill ship                    10.65833   14.85264   0.718
## as.factor(TypeofUnit)Drilling tender               -7.64189   24.30405  -0.314
## as.factor(TypeofUnit)FPSO/FSU                      -2.11210   14.40995  -0.147
## as.factor(TypeofUnit)Helicopter-Offshore duty      23.47213   13.97331   1.680
## as.factor(TypeofUnit)Jacket                        -2.35269   14.15019  -0.166
## as.factor(TypeofUnit)Jackup                        -1.46145   14.24890  -0.103
## as.factor(TypeofUnit)Loading buoy                   0.66558   18.88205   0.035
## as.factor(TypeofUnit)Mobile unit(not drill.)        2.02706   23.00098   0.088
## as.factor(TypeofUnit)Other/Unkn. fixed struct      -3.43126   18.21441  -0.188
## as.factor(TypeofUnit)Pipeline                       0.21942   16.44719   0.013
## as.factor(TypeofUnit)Semi-submersible              -3.40123   14.09529  -0.241
## as.factor(TypeofUnit)Submersible                   -5.54808   20.95355  -0.265
## as.factor(TypeofUnit)Subsea install./complet.      -2.53056   17.32781  -0.146
## as.factor(TypeofUnit)Tension leg platform          -3.82177   14.47337  -0.264
## as.factor(TypeofUnit)Well support structure        -8.98273   14.53719  -0.618
## as.factor(Function)Compression                      0.58574   13.63727   0.043
## as.factor(Function)Crane                            8.80605   12.09450   0.728
## as.factor(Function)Drilling                         4.37687    8.67607   0.504
## as.factor(Function)Drilling&Production              4.47716    8.83514   0.507
## as.factor(Function)Other                            0.88384   12.35067   0.072
## as.factor(Function)Pipelaying                       9.96140   12.70990   0.784
## as.factor(Function)Production                       1.62668    8.79773   0.185
## as.factor(Function)Pumping                         -1.37695   12.35999  -0.111
## as.factor(Function)Riser                            1.45585   11.83548   0.123
## as.factor(Function)Service                          3.51326   14.03002   0.250
## as.factor(Function)Storage                         -1.25837   11.54282  -0.109
## as.factor(Function)Transfer Air                          NA         NA      NA
## as.factor(Function)Transfer Gas                    -0.93275   15.47544  -0.060
## as.factor(Function)Transfer Hydrocarbon            -0.24375   12.45294  -0.020
## as.factor(Function)Transfer Oil                     0.15931   13.40076   0.012
## as.factor(Function)Transfer Unknown                -0.53913   21.04270  -0.026
## as.factor(Function)Water/gas injection              1.08467   11.75872   0.092
## as.factor(Function)Work Assistance                  6.33756   11.52225   0.550
## as.factor(MainEvent)Blowout                         5.56084    6.87076   0.809
## as.factor(MainEvent)Breakage or fatigue            -1.54916    4.66187  -0.332
## as.factor(MainEvent)Capsizing,overturn,toppling     8.30822    5.00605   1.660
## as.factor(MainEvent)Collision,not offshore units   -0.25953    5.22769  -0.050
## as.factor(MainEvent)Collision,offshore units       -0.01284    4.81403  -0.003
## as.factor(MainEvent)Crane accident                  0.36334    5.54991   0.065
## as.factor(MainEvent)Explosion                      13.95261    6.93682   2.011
## as.factor(MainEvent)Falling load / Dropped object   1.88661    4.23634   0.445
## as.factor(MainEvent)Fire                            5.36862    4.42509   1.213
## as.factor(MainEvent)Grounding                      -5.90610    7.38416  -0.800
## as.factor(MainEvent)Helicopter accident            23.16198    7.11864   3.254
## as.factor(MainEvent)Leakage into hull              -3.04666    8.45215  -0.360
## as.factor(MainEvent)List, uncontrolled inclination  4.31948    6.30750   0.685
## as.factor(MainEvent)Loss of buoyancy or sinking    21.03130    5.82311   3.612
## as.factor(MainEvent)Machinery/propulsion failure   -6.67331   11.97976  -0.557
## as.factor(MainEvent)Other                          -0.93044    4.75707  -0.196
## as.factor(MainEvent)Out of position, adrift        -0.93533    6.20395  -0.151
## as.factor(MainEvent)Release of fluid or gas         0.81225    4.32582   0.188
## as.factor(MainEvent)Towline failure/rupture        -4.41297    8.26597  -0.534
## as.factor(MainEvent)Well problem, no blowout        0.73468    4.79798   0.153
## as.factor(MainOperation)Accommodation              -0.23340   14.03767  -0.017
## as.factor(MainOperation)Completion/drill.aban       0.03445   11.49155   0.003
## as.factor(MainOperation)Construct. work unit       -9.12865   12.95619  -0.705
## as.factor(MainOperation)Demobilizing                2.38256   14.59081   0.163
## as.factor(MainOperation)Development Drilling       -2.87910   10.57588  -0.272
## as.factor(MainOperation)Drilling,unknwn phase      -1.71643   10.77638  -0.159
## as.factor(MainOperation)Exploration drilling       -0.92158   10.74718  -0.086
## as.factor(MainOperation)Idle                       -3.29113   12.93445  -0.254
## as.factor(MainOperation)Injection                  -0.32583   15.46495  -0.021
## as.factor(MainOperation)Loading of liquids          2.20766   18.34930   0.120
## as.factor(MainOperation)Mobilizing                 -3.21840   11.46181  -0.281
## as.factor(MainOperation)Other                      -0.95114   13.05258  -0.073
## as.factor(MainOperation)Production                 -0.70108   10.45675  -0.067
## as.factor(MainOperation)Repair work/under rep       3.22599   11.86818   0.272
## as.factor(MainOperation)Scrapped                   -5.37448   18.65075  -0.288
## as.factor(MainOperation)Service                    -7.09732   17.15889  -0.414
## as.factor(MainOperation)Stacked                     3.73964   13.33601   0.280
## as.factor(MainOperation)Standby                    -8.27994   20.46541  -0.405
## as.factor(MainOperation)Testing                    -3.69611   14.06333  -0.263
## as.factor(MainOperation)Transfer, dry              -6.21617   15.23668  -0.408
## as.factor(MainOperation)Transfer, wet              -2.05814   11.30293  -0.182
## as.factor(MainOperation)Under construction          5.27337   11.28647   0.467
## as.factor(MainOperation)Well workover              -1.83356   10.77367  -0.170
##                                                    Pr(>|t|)    
## WindSpeed_m_s                                      0.001956 ** 
## as.factor(TypeofUnit)Artificial Island             0.930858    
## as.factor(TypeofUnit)Barge (not drilling)          0.663652    
## as.factor(TypeofUnit)Concrete structure            0.793533    
## as.factor(TypeofUnit)Drill barge                   0.811901    
## as.factor(TypeofUnit)Drill ship                    0.473039    
## as.factor(TypeofUnit)Drilling tender               0.753211    
## as.factor(TypeofUnit)FPSO/FSU                      0.883476    
## as.factor(TypeofUnit)Helicopter-Offshore duty      0.093071 .  
## as.factor(TypeofUnit)Jacket                        0.867956    
## as.factor(TypeofUnit)Jackup                        0.918313    
## as.factor(TypeofUnit)Loading buoy                  0.971882    
## as.factor(TypeofUnit)Mobile unit(not drill.)       0.929778    
## as.factor(TypeofUnit)Other/Unkn. fixed struct      0.850586    
## as.factor(TypeofUnit)Pipeline                      0.989356    
## as.factor(TypeofUnit)Semi-submersible              0.809332    
## as.factor(TypeofUnit)Submersible                   0.791191    
## as.factor(TypeofUnit)Subsea install./complet.      0.883896    
## as.factor(TypeofUnit)Tension leg platform          0.791750    
## as.factor(TypeofUnit)Well support structure        0.536665    
## as.factor(Function)Compression                     0.965742    
## as.factor(Function)Crane                           0.466590    
## as.factor(Function)Drilling                        0.613952    
## as.factor(Function)Drilling&Production             0.612360    
## as.factor(Function)Other                           0.942954    
## as.factor(Function)Pipelaying                      0.433229    
## as.factor(Function)Production                      0.853318    
## as.factor(Function)Pumping                         0.911301    
## as.factor(Function)Riser                           0.902107    
## as.factor(Function)Service                         0.802282    
## as.factor(Function)Storage                         0.913194    
## as.factor(Function)Transfer Air                          NA    
## as.factor(Function)Transfer Gas                    0.951941    
## as.factor(Function)Transfer Hydrocarbon            0.984384    
## as.factor(Function)Transfer Oil                    0.990515    
## as.factor(Function)Transfer Unknown                0.979561    
## as.factor(Function)Water/gas injection             0.926509    
## as.factor(Function)Work Assistance                 0.582328    
## as.factor(MainEvent)Blowout                        0.418359    
## as.factor(MainEvent)Breakage or fatigue            0.739675    
## as.factor(MainEvent)Capsizing,overturn,toppling    0.097060 .  
## as.factor(MainEvent)Collision,not offshore units   0.960408    
## as.factor(MainEvent)Collision,offshore units       0.997872    
## as.factor(MainEvent)Crane accident                 0.947805    
## as.factor(MainEvent)Explosion                      0.044346 *  
## as.factor(MainEvent)Falling load / Dropped object  0.656097    
## as.factor(MainEvent)Fire                           0.225111    
## as.factor(MainEvent)Grounding                      0.423851    
## as.factor(MainEvent)Helicopter accident            0.001148 ** 
## as.factor(MainEvent)Leakage into hull              0.718521    
## as.factor(MainEvent)List, uncontrolled inclination 0.493497    
## as.factor(MainEvent)Loss of buoyancy or sinking    0.000308 ***
## as.factor(MainEvent)Machinery/propulsion failure   0.577523    
## as.factor(MainEvent)Other                          0.844939    
## as.factor(MainEvent)Out of position, adrift        0.880169    
## as.factor(MainEvent)Release of fluid or gas        0.851067    
## as.factor(MainEvent)Towline failure/rupture        0.593457    
## as.factor(MainEvent)Well problem, no blowout       0.878308    
## as.factor(MainOperation)Accommodation              0.986735    
## as.factor(MainOperation)Completion/drill.aban      0.997608    
## as.factor(MainOperation)Construct. work unit       0.481110    
## as.factor(MainOperation)Demobilizing               0.870296    
## as.factor(MainOperation)Development Drilling       0.785456    
## as.factor(MainOperation)Drilling,unknwn phase      0.873458    
## as.factor(MainOperation)Exploration drilling       0.931668    
## as.factor(MainOperation)Idle                       0.799162    
## as.factor(MainOperation)Injection                  0.983192    
## as.factor(MainOperation)Loading of liquids         0.904241    
## as.factor(MainOperation)Mobilizing                 0.778882    
## as.factor(MainOperation)Other                      0.941913    
## as.factor(MainOperation)Production                 0.946548    
## as.factor(MainOperation)Repair work/under rep      0.785775    
## as.factor(MainOperation)Scrapped                   0.773235    
## as.factor(MainOperation)Service                    0.679170    
## as.factor(MainOperation)Stacked                    0.779171    
## as.factor(MainOperation)Standby                    0.685805    
## as.factor(MainOperation)Testing                    0.792703    
## as.factor(MainOperation)Transfer, dry              0.683313    
## as.factor(MainOperation)Transfer, wet              0.855521    
## as.factor(MainOperation)Under construction         0.640359    
## as.factor(MainOperation)Well workover              0.864869    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.38 on 4354 degrees of freedom
##   (667 observations deleted due to missingness)
## Multiple R-squared:  0.05314,    Adjusted R-squared:  0.03575 
## F-statistic: 3.055 on 80 and 4354 DF,  p-value: < 2.2e-16
plot(fit2)

7.2.4 Predição de custo humano por Variáveis de condições climáticas

Somente a velocidade do vento medido em metros por segundo tem significância no modelo. Conforme mostra os gráficos, esse modelo não tem um bom ajuste aos dados.

fit1N <- lm(formula = HumanCost ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km -1, data = no_outliers)

summary(fit1N)
## 
## Call:
## lm(formula = HumanCost ~ WindSpeed_m_s + WaterDepth_m + DrillDepth_km - 
##     1, data = no_outliers)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -235.816    3.374   10.172   22.734  122.006 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## WindSpeed_m_s 18.376526   1.234126  14.890   <2e-16 ***
## WaterDepth_m  -0.002454   0.083713  -0.029    0.977    
## DrillDepth_km  4.193332   5.487789   0.764    0.450    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 55.71 on 35 degrees of freedom
##   (552 observations deleted due to missingness)
## Multiple R-squared:  0.8762, Adjusted R-squared:  0.8656 
## F-statistic: 82.59 on 3 and 35 DF,  p-value: 5.984e-16
plot(fit1N)

7.3 Variáveis significativas em que o fator vento aumenta o custo humano**

as.factor(MainEvent)Blowout

as.factor(MainEvent)Capsizing,overturn,toppling

as.factor(MainEvent)Explosion

as.factor(MainEvent)Falling load / Dropped object

as.factor(MainEvent)Fire

as.factor(MainEvent)Helicopter accident

as.factor(MainEvent)Loss of buoyancy or sinking

https://data.library.virginia.edu/diagnostic-plots/