library(tidyverse, warn.conflicts = F)
package ‘tidyverse’ was built under R version 3.3.2Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
package ‘ggplot2’ was built under R version 3.3.2package ‘tibble’ was built under R version 3.3.2package ‘tidyr’ was built under R version 3.3.2Conflicts with tidy packages ------------------------------------------------------
filter(): dplyr, stats
lag(): dplyr, stats
library(rvest)
Carregando pacotes exigidos: xml2
Attaching package: ‘rvest’
The following object is masked from ‘package:readr’:
guess_encoding
library(plotly)
package ‘plotly’ was built under R version 3.3.2
Attaching package: ‘plotly’
The following object is masked from ‘package:ggplot2’:
last_plot
The following object is masked from ‘package:stats’:
filter
The following object is masked from ‘package:graphics’:
layout
library(cluster)
library(ggdendro)
theme_set(theme_light())
source("plota_solucoes_hclust.R")
Usaremos dados do Rotten Tomatoes sobre os filmes de Scarlett Johansson.
from_page <- read_html("https://www.rottentomatoes.com/celebrity/scarlett_johansson/") %>%
html_node("#filmographyTbl") %>% # A sintaxe da expressão é de um seletor à lá JQuery
html_table(fill=TRUE) %>% # Faz parse
as.tibble()
filmes = from_page %>%
filter(RATING != "No Score Yet",
`BOX OFFICE` != "—",
CREDIT != "Executive Producer") %>%
mutate(RATING = as.numeric(gsub("%", "", RATING)),
`BOX OFFICE` = as.numeric(gsub("[$|M]", "", `BOX OFFICE`))) %>%
filter(`BOX OFFICE` >= 1) # Tem dois filmes que não parecem ter sido lançados no mundo todo
NAs introduzidos por coerção
A intuição
Uma forma de descrever estrutura nos dados é percebendo grupos de observações mais semelhantes entre si que com o restante dos dados. Agrupamentos.
Por exemplo, observando as avaliações dos filmes:
filmes %>%
ggplot(aes(x = "Filmes", y = RATING)) +
geom_jitter(width = .01, height = 0, size = 2, alpha = .6)

filmes %>%
ggplot(aes(x = RATING)) +
geom_histogram(bins = 16) +
geom_rug()

Já considerando o ano de lançamento, não há uma estrutura clara de grupos.
filmes %>%
ggplot(aes(x = "Filmes", y = YEAR)) +
geom_jitter(width = .02, height = 0, size = 2, alpha = .6) +
scale_y_log10()

filmes %>%
ggplot(aes(x = `YEAR`)) +
geom_histogram(bins = 17) +
geom_rug()

tibble(a = c(rnorm(100, mean = 10, sd = 10),
rnorm(40, mean = 80, sd = 15))) %>%
ggplot(aes(x = "teste", y = a)) +
geom_jitter(width = .02, height = 0, size = 2, alpha = .6)

Para a renda do filme, observar uma escala linear ou logarÃtmica levam a conclusões diferentes.
filmes %>%
ggplot(aes(x = "Filmes", y = `BOX OFFICE`)) +
geom_jitter(width = .02, height = 0, size = 2, alpha = .6)

filmes %>%
ggplot(aes(x = "Filmes", y = `BOX OFFICE`)) +
geom_jitter(width = .02, height = 0, size = 2, alpha = .6) +
scale_y_log10()

filmes %>%
ggplot(aes(x = `BOX OFFICE`)) +
geom_histogram(bins = 20) +
geom_rug()

filmes %>%
ggplot(aes(x = `BOX OFFICE`)) +
geom_histogram(bins = 20) +
scale_x_log10() +
geom_rug()

Agrupamento
Precisamos de:
- Definição de proximidade/distância entre pontos
- Definição de proximidade/distância entre grupos ou grupos e pontos
- Processo de agrupamento
- Decidir quantos grupos existem
Para produzir uma solução de agrupamento. Depois vem o principal: avaliar e interpretar a solução. Agrupamento sempre dá um resultado. Nem sempre é útil.
Com uma dimensão
Há duas maneiras principais de agrupar: aglomerativa ou baseada em partição. Vamos explorar primeiro a hierárquica aglomerativa.
distancias.long = filmes %>%
select(RATING) %>%
dist(method = "euclidean") %>%
as.matrix %>%
reshape2::melt(varnames = c("row", "col"))
distancias.long %>%
ggplot(aes(x = row, y = col, fill = value)) +
geom_tile()

# distancias = filmes %>%
# select(RATING) %>%
# dist(method = "euclidean") %>%
# as.matrix %>%
# heatmap()
row.names(filmes) = NULL
agrupamento_h = filmes %>%
column_to_rownames("TITLE") %>% # hclust precisa dos rótulos em nomes de linha (ruim)
select(RATING) %>%
dist(method = "euclidean") %>%
hclust(method = "complete")
Setting row names on a tibble is deprecated.
ggdendrogram(agrupamento_h, rotate = T, size = 2)

ggdendrogram(agrupamento_h, rotate = T, size = 2) +
geom_hline(yintercept = 45, colour = "red")

Cada junção é um passo do algoritmo. A altura na dendrograma em cada passo significa a dissimilaridade entre os pontos ou grupos juntados naquele passo.
Na medida que vamos aglomerando, as dissimilaridades nas junções tendem a ir aumentando caso haja estrutura de grupos.
data.frame(k = 1:NROW(agrupamento_h$height),
height = agrupamento_h$height) %>%
ggplot(aes(x = k, y = height)) +
geom_line(colour = "grey") +
geom_point() +
labs(x = "Junções feitas (34 - clusters)", y = "Dissimilaridade na junção")

Vejamos as soluções com diferentes números de grupos.
solucoes = tibble(k = 1:9)
atribuicoes = solucoes %>%
group_by(k) %>%
do(cbind(filmes,
grupo = as.character(cutree(agrupamento_h, .$k))))
Unequal factor levels: coercing to character
atribuicoes %>%
ggplot(aes(x = "Filmes", y = RATING, colour = grupo)) +
geom_jitter(width = .02, height = 0, size = 2, alpha = .6) +
facet_wrap(~ paste(k, " grupos"))

stats::heatmap() é uma função que visualiza distâncias entre pontos organizando linhas e colunas via hclust:
filmes %>%
select(RATING) %>%
dist(method = "euclidean") %>%
as.matrix %>%
heatmap()

Variando o método de linkage
plota_hclusts_1d(filmes, "RATING",
linkage_method = "centroid", # single, complete, average, centroid, median, ...
ks = 1:6)
Setting row names on a tibble is deprecated.Unequal factor levels: coercing to character

names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
Silhouetas
Dada a distância média de um ponto para os demais do seu cluster \(a(i)\) e a distância média do ponto para todos os demais do cluster mais próximo \(b(i)\), a largura da silhoueta de \(i\) é :
\[
s(i) := ( b(i) - a(i) ) / max( a(i), b(i) )
\]
Repare como 1 significa uma boa atribuição para \(i\), 0 significa indefinição e \(-1\) significa que há outro cluster onde \(i\) estaria melhor alocado.
distancias = filmes %>%
select(RATING) %>%
dist(method = "euclidean")
agrupamento_hs = filmes %>%
column_to_rownames("TITLE") %>%
select(RATING) %>%
dist(method = "euclidean") %>%
hclust(method = "complete")
Setting row names on a tibble is deprecated.
plot(silhouette(cutree(agrupamento_hs, k = 4), distancias))

plot(silhouette(cutree(agrupamento_hs, k = 2), distancias))

Duas dimensões
p = filmes %>%
ggplot(aes(x = RATING, y = `BOX OFFICE`, label = TITLE)) +
geom_point()
p

#ggplotly(p)
agrupamento_h_2d = filmes %>%
column_to_rownames("TITLE") %>%
select(RATING, `BOX OFFICE`) %>%
dist(method = "euclidean") %>%
hclust(method = "centroid")
Setting row names on a tibble is deprecated.
ggdendrogram(agrupamento_h_2d)

data.frame(k = NROW(agrupamento_h_2d$height):1,
height = agrupamento_h_2d$height) %>%
ggplot(aes(x = k, y = height)) +
geom_line(colour = "grey") +
geom_point() +
labs(x = "Número de clusters produzido", y = "Dissimilaridade na junção")

Como sempre, o algoritmo encontra grupos. No caso, parecem até bem separados. Vamos visualizá-los:
plota_hclusts_2d(agrupamento_h_2d,
filmes,
c("RATING", "`BOX OFFICE`"),
linkage_method = "centroid", ks = 1:6)
Unequal factor levels: coercing to character

O agrupamento está acontecendo todo em função de BOX OFFICE, apenas. Como as escalas são diferentes, BOX OFFICE domina qualquer cálculo de distância euclidiana.
Solução: standardize (aka scale).
agrupamento_h_2d = filmes %>%
column_to_rownames("TITLE") %>%
select(RATING, `BOX OFFICE`) %>%
#mutate(`BOX OFFICE` = log10(`BOX OFFICE`)) %>%
mutate_all(funs(scale)) %>%
dist(method = "euclidean") %>%
hclust(method = "ward.D")
Setting row names on a tibble is deprecated.
ggdendrogram(agrupamento_h_2d)

plota_hclusts_2d(agrupamento_h_2d,
filmes,
c("RATING", "`BOX OFFICE`"),
linkage_method = "ward.D", ks = 1:6) # + scale_y_log10()
Unequal factor levels: coercing to character

Mais variáveis
E se tivéssemos mais de duas variáveis?
filmes2 = agrupamento_h_md = filmes %>%
mutate(TITLE_LENGTH = nchar(TITLE))
dists = filmes2 %>%
column_to_rownames("TITLE") %>%
mutate(`BOX OFFICE` = log10(`BOX OFFICE`)) %>%
select(RATING, `BOX OFFICE`, TITLE_LENGTH, YEAR) %>%
mutate_all(funs(scale)) %>%
dist(method = "euclidean")
Setting row names on a tibble is deprecated.
agrupamento_h_md = dists %>%
hclust(method = "ward.D")
ggdendrogram(agrupamento_h_md, rotate = T)

cores = RColorBrewer::brewer.pal(4, "Set3")
plot(cluster::silhouette(cutree(agrupamento_h_md, k = 4), dists), col = cores, border = NA)

atribuicoes = tibble(k = 1:5) %>%
group_by(k) %>%
do(cbind(filmes2,
grupo = as.character(cutree(agrupamento_h_md, .$k))))
Unequal factor levels: coercing to character
atribuicoes_long = atribuicoes %>%
mutate(`BOX OFFICE` = scale(log10(`BOX OFFICE`)),
YEAR = scale(YEAR),
RATING = scale(RATING),
TITLE_LENGTH = scale(TITLE_LENGTH)) %>%
gather(key = "variavel", value = "valor", -TITLE, -k, -grupo, -CREDIT)
attributes are not identical across measure variables; they will be dropped
atribuicoes_long %>%
ggplot(aes(x = variavel, y = valor, group = grupo, colour = grupo)) +
geom_point(alpha = .4, position = position_dodge(width = .5)) +
facet_wrap(~ paste(k, " grupos")) +
labs(x = "", y = "z-score")

atribuicoes_long %>%
filter(k == 3) %>%
ggplot(aes(x = variavel,
y = valor,
colour = grupo)) +
geom_boxplot() +
geom_point(alpha = .4, position = position_jitter(width = .1)) +
facet_wrap(~ grupo) +
labs(x = "", y = "z-score")

atribuicoes_long %>%
filter(k == 4) %>%
ggplot(aes(x = variavel, y = valor, group = TITLE, colour = grupo)) +
geom_point(alpha = .3, size = .5) +
geom_line(alpha = .7) +
facet_wrap(~ paste("Grupo ", grupo)) +
labs(x = "", y = "z-score")

LS0tCnRpdGxlOiAiVGlwb3MgZGUgZmlsbWVzIgphdXRob3I6ICJOYXphcmVubyIKZGF0ZTogIjE2LzA0LzIwMTciCm91dHB1dDogCiAgICAKICAgIGh0bWxfbm90ZWJvb2s6CiAgICAgICAgdGhlbWU6IGNlcnVsZWFuCmVkaXRvcl9vcHRpb25zOiAKICBjaHVua19vdXRwdXRfdHlwZTogaW5saW5lCi0tLQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSkKYGBgCgoKYGBge3IgbG9hZF9jb2RlfQpsaWJyYXJ5KHRpZHl2ZXJzZSwgd2Fybi5jb25mbGljdHMgPSBGKQpsaWJyYXJ5KHJ2ZXN0KQpsaWJyYXJ5KHBsb3RseSkKbGlicmFyeShjbHVzdGVyKQpsaWJyYXJ5KGdnZGVuZHJvKQp0aGVtZV9zZXQodGhlbWVfbGlnaHQoKSkKc291cmNlKCJwbG90YV9zb2x1Y29lc19oY2x1c3QuUiIpCmBgYAoKVXNhcmVtb3MgZGFkb3MgZG8gW1JvdHRlbiBUb21hdG9lc10oaHR0cHM6Ly93d3cucm90dGVudG9tYXRvZXMuY29tKSBzb2JyZSBvcyBmaWxtZXMgZGUgU2NhcmxldHQgSm9oYW5zc29uLiAKCmBgYHtyfQpmcm9tX3BhZ2UgPC0gcmVhZF9odG1sKCJodHRwczovL3d3dy5yb3R0ZW50b21hdG9lcy5jb20vY2VsZWJyaXR5L3NjYXJsZXR0X2pvaGFuc3Nvbi8iKSAlPiUgCiAgICBodG1sX25vZGUoIiNmaWxtb2dyYXBoeVRibCIpICU+JSAjIEEgc2ludGF4ZSBkYSBleHByZXNzw6NvIMOpIGRlIHVtIHNlbGV0b3Igw6AgbMOhIEpRdWVyeQogICAgaHRtbF90YWJsZShmaWxsPVRSVUUpICU+JSAjIEZheiBwYXJzZQogICAgYXMudGliYmxlKCkKCmZpbG1lcyA9IGZyb21fcGFnZSAlPiUgCiAgICBmaWx0ZXIoUkFUSU5HICE9ICJObyBTY29yZSBZZXQiLCAKICAgICAgICAgICBgQk9YIE9GRklDRWAgIT0gIuKAlCIsIAogICAgICAgICAgIENSRURJVCAhPSAiRXhlY3V0aXZlIFByb2R1Y2VyIikgJT4lCiAgICBtdXRhdGUoUkFUSU5HID0gYXMubnVtZXJpYyhnc3ViKCIlIiwgIiIsIFJBVElORykpLCAKICAgICAgICAgICBgQk9YIE9GRklDRWAgPSBhcy5udW1lcmljKGdzdWIoIlskfE1dIiwgIiIsIGBCT1ggT0ZGSUNFYCkpKSAlPiUgCiAgICBmaWx0ZXIoYEJPWCBPRkZJQ0VgID49IDEpICMgVGVtIGRvaXMgZmlsbWVzIHF1ZSBuw6NvIHBhcmVjZW0gdGVyIHNpZG8gbGFuw6dhZG9zIG5vIG11bmRvIHRvZG8KYGBgCgojIyBBIGludHVpw6fDo28KClVtYSBmb3JtYSBkZSBkZXNjcmV2ZXIgZXN0cnV0dXJhIG5vcyBkYWRvcyDDqSBwZXJjZWJlbmRvIGdydXBvcyBkZSBvYnNlcnZhw6fDtWVzIG1haXMgc2VtZWxoYW50ZXMgZW50cmUgc2kgcXVlIGNvbSBvIHJlc3RhbnRlIGRvcyBkYWRvcy4gQWdydXBhbWVudG9zLiAKClBvciBleGVtcGxvLCBvYnNlcnZhbmRvIGFzIGF2YWxpYcOnw7VlcyBkb3MgZmlsbWVzOgoKYGBge3J9CmZpbG1lcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSAiRmlsbWVzIiwgeSA9IFJBVElORykpICsgCiAgICBnZW9tX2ppdHRlcih3aWR0aCA9IC4wMSwgaGVpZ2h0ID0gMCwgc2l6ZSA9IDIsIGFscGhhID0gLjYpCgpmaWxtZXMgJT4lIAogICAgZ2dwbG90KGFlcyh4ID0gUkFUSU5HKSkgKyAKICAgIGdlb21faGlzdG9ncmFtKGJpbnMgPSAxNikgKyAKICAgIGdlb21fcnVnKCkKYGBgCgpKw6EgY29uc2lkZXJhbmRvIG8gYW5vIGRlIGxhbsOnYW1lbnRvLCBuw6NvIGjDoSB1bWEgZXN0cnV0dXJhIGNsYXJhIGRlIGdydXBvcy4gCgpgYGB7cn0KZmlsbWVzICU+JSAKICAgIGdncGxvdChhZXMoeCA9ICJGaWxtZXMiLCB5ID0gWUVBUikpICsgCiAgICBnZW9tX2ppdHRlcih3aWR0aCA9IC4wMiwgaGVpZ2h0ID0gMCwgc2l6ZSA9IDIsIGFscGhhID0gLjYpICsgCiAgICBzY2FsZV95X2xvZzEwKCkKCmZpbG1lcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBgWUVBUmApKSArIAogICAgZ2VvbV9oaXN0b2dyYW0oYmlucyA9IDE3KSArIAogICAgZ2VvbV9ydWcoKQpgYGAKCmBgYHtyfQp0aWJibGUoYSA9IGMocm5vcm0oMTAwLCBtZWFuID0gMTAsIHNkID0gMTApLCAKICAgICAgICAgICAgIHJub3JtKDQwLCBtZWFuID0gODAsIHNkID0gMTUpKSkgJT4lIAogICAgZ2dwbG90KGFlcyh4ID0gInRlc3RlIiwgeSA9IGEpKSArIAogICAgZ2VvbV9qaXR0ZXIod2lkdGggPSAuMDIsIGhlaWdodCA9IDAsIHNpemUgPSAyLCBhbHBoYSA9IC42KSAKCmBgYAoKCgpQYXJhIGEgcmVuZGEgZG8gZmlsbWUsIG9ic2VydmFyIHVtYSBlc2NhbGEgbGluZWFyIG91IGxvZ2Fyw610bWljYSBsZXZhbSBhIGNvbmNsdXPDtWVzIGRpZmVyZW50ZXMuIAoKYGBge3J9CmZpbG1lcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSAiRmlsbWVzIiwgeSA9IGBCT1ggT0ZGSUNFYCkpICsgCiAgICBnZW9tX2ppdHRlcih3aWR0aCA9IC4wMiwgaGVpZ2h0ID0gMCwgc2l6ZSA9IDIsIGFscGhhID0gLjYpICAKICAgIApmaWxtZXMgJT4lIAogICAgZ2dwbG90KGFlcyh4ID0gIkZpbG1lcyIsIHkgPSBgQk9YIE9GRklDRWApKSArIAogICAgZ2VvbV9qaXR0ZXIod2lkdGggPSAuMDIsIGhlaWdodCA9IDAsIHNpemUgPSAyLCBhbHBoYSA9IC42KSArIAogICAgc2NhbGVfeV9sb2cxMCgpCgpmaWxtZXMgJT4lIAogICAgZ2dwbG90KGFlcyh4ID0gYEJPWCBPRkZJQ0VgKSkgKyAKICAgIGdlb21faGlzdG9ncmFtKGJpbnMgPSAyMCkgKyAKICAgIGdlb21fcnVnKCkKCmZpbG1lcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBgQk9YIE9GRklDRWApKSArIAogICAgZ2VvbV9oaXN0b2dyYW0oYmlucyA9IDIwKSArIAogICAgc2NhbGVfeF9sb2cxMCgpICsgCiAgICBnZW9tX3J1ZygpCgpgYGAKCiMjIEFncnVwYW1lbnRvIAoKUHJlY2lzYW1vcyBkZTogCgoqIERlZmluacOnw6NvIGRlIHByb3hpbWlkYWRlL2Rpc3TDom5jaWEgZW50cmUgcG9udG9zCiogRGVmaW5pw6fDo28gZGUgcHJveGltaWRhZGUvZGlzdMOibmNpYSBlbnRyZSBncnVwb3Mgb3UgZ3J1cG9zIGUgcG9udG9zCiogUHJvY2Vzc28gZGUgYWdydXBhbWVudG8gCiogRGVjaWRpciBxdWFudG9zIGdydXBvcyBleGlzdGVtCgpQYXJhIHByb2R1emlyIHVtYSBzb2x1w6fDo28gZGUgYWdydXBhbWVudG8uIERlcG9pcyB2ZW0gbyBwcmluY2lwYWw6ICoqYXZhbGlhciBlIGludGVycHJldGFyKiogYSBzb2x1w6fDo28uIF9BZ3J1cGFtZW50byBzZW1wcmUgZMOhIHVtIHJlc3VsdGFkby4gTmVtIHNlbXByZSDDqSDDunRpbF8uIAoKIyMgQ29tIHVtYSBkaW1lbnPDo28KCkjDoSBkdWFzIG1hbmVpcmFzIHByaW5jaXBhaXMgZGUgYWdydXBhcjogYWdsb21lcmF0aXZhIG91IGJhc2VhZGEgZW0gcGFydGnDp8Ojby4gVmFtb3MgZXhwbG9yYXIgcHJpbWVpcm8gYSAqKmhpZXLDoXJxdWljYSBhZ2xvbWVyYXRpdmEqKi4KCmBgYHtyfQpkaXN0YW5jaWFzLmxvbmcgPSBmaWxtZXMgJT4lIAogICAgc2VsZWN0KFJBVElORykgJT4lCiAgICBkaXN0KG1ldGhvZCA9ICJldWNsaWRlYW4iKSAlPiUgCiAgICBhcy5tYXRyaXggJT4lIAogICAgcmVzaGFwZTI6Om1lbHQodmFybmFtZXMgPSBjKCJyb3ciLCAiY29sIikpCgpkaXN0YW5jaWFzLmxvbmcgJT4lIAogICAgZ2dwbG90KGFlcyh4ID0gcm93LCB5ID0gY29sLCBmaWxsID0gdmFsdWUpKSArIAogICAgZ2VvbV90aWxlKCkKCiMgZGlzdGFuY2lhcyA9IGZpbG1lcyAlPiUgCiMgICAgIHNlbGVjdChSQVRJTkcpICU+JQojICAgICBkaXN0KG1ldGhvZCA9ICJldWNsaWRlYW4iKSAlPiUgCiMgICAgIGFzLm1hdHJpeCAlPiUgCiMgICAgIGhlYXRtYXAoKQpgYGAKCgoKYGBge3J9CnJvdy5uYW1lcyhmaWxtZXMpID0gTlVMTAphZ3J1cGFtZW50b19oID0gZmlsbWVzICU+JSAKICAgIGNvbHVtbl90b19yb3duYW1lcygiVElUTEUiKSAlPiUgIyBoY2x1c3QgcHJlY2lzYSBkb3MgcsOzdHVsb3MgZW0gbm9tZXMgZGUgbGluaGEgKHJ1aW0pCiAgICBzZWxlY3QoUkFUSU5HKSAlPiUKICAgIGRpc3QobWV0aG9kID0gImV1Y2xpZGVhbiIpICU+JSAKICAgIGhjbHVzdChtZXRob2QgPSAiY29tcGxldGUiKQoKZ2dkZW5kcm9ncmFtKGFncnVwYW1lbnRvX2gsIHJvdGF0ZSA9IFQsIHNpemUgPSAyKSAKCmdnZGVuZHJvZ3JhbShhZ3J1cGFtZW50b19oLCByb3RhdGUgPSBULCBzaXplID0gMikgKyAKICAgIGdlb21faGxpbmUoeWludGVyY2VwdCA9IDQ1LCBjb2xvdXIgPSAicmVkIikKYGBgCgpDYWRhIGp1bsOnw6NvIMOpIHVtIHBhc3NvIGRvIGFsZ29yaXRtby4gQSBhbHR1cmEgbmEgZGVuZHJvZ3JhbWEgZW0gY2FkYSBwYXNzbyBzaWduaWZpY2EgYSBkaXNzaW1pbGFyaWRhZGUgZW50cmUgb3MgcG9udG9zIG91IGdydXBvcyBqdW50YWRvcyBuYXF1ZWxlIHBhc3NvLiAKCk5hIG1lZGlkYSBxdWUgdmFtb3MgYWdsb21lcmFuZG8sIGFzIGRpc3NpbWlsYXJpZGFkZXMgbmFzIGp1bsOnw7VlcyB0ZW5kZW0gYSBpciBhdW1lbnRhbmRvIGNhc28gaGFqYSBlc3RydXR1cmEgZGUgZ3J1cG9zLgoKYGBge3J9CmRhdGEuZnJhbWUoayA9IDE6TlJPVyhhZ3J1cGFtZW50b19oJGhlaWdodCksIAogICAgICAgICAgIGhlaWdodCA9IGFncnVwYW1lbnRvX2gkaGVpZ2h0KSAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBrLCB5ID0gaGVpZ2h0KSkgKyAKICAgIGdlb21fbGluZShjb2xvdXIgPSAiZ3JleSIpICsgCiAgICBnZW9tX3BvaW50KCkgKyAKICAgIGxhYnMoeCA9ICJKdW7Dp8O1ZXMgZmVpdGFzICgzNCAtIGNsdXN0ZXJzKSIsIHkgPSAiRGlzc2ltaWxhcmlkYWRlIG5hIGp1bsOnw6NvIikKCmBgYAoKVmVqYW1vcyBhcyBzb2x1w6fDtWVzIGNvbSBkaWZlcmVudGVzIG7Dum1lcm9zIGRlIGdydXBvcy4KCmBgYHtyfQpzb2x1Y29lcyA9IHRpYmJsZShrID0gMTo5KQoKYXRyaWJ1aWNvZXMgPSBzb2x1Y29lcyAlPiUgCiAgICBncm91cF9ieShrKSAlPiUgCiAgICBkbyhjYmluZChmaWxtZXMsIAogICAgICAgICAgICAgZ3J1cG8gPSBhcy5jaGFyYWN0ZXIoY3V0cmVlKGFncnVwYW1lbnRvX2gsIC4kaykpKSkgCgphdHJpYnVpY29lcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSAiRmlsbWVzIiwgeSA9IFJBVElORywgY29sb3VyID0gZ3J1cG8pKSArIAogICAgZ2VvbV9qaXR0ZXIod2lkdGggPSAuMDIsIGhlaWdodCA9IDAsIHNpemUgPSAyLCBhbHBoYSA9IC42KSArIAogICAgZmFjZXRfd3JhcCh+IHBhc3RlKGssICIgZ3J1cG9zIikpCgpgYGAKCmBzdGF0czo6aGVhdG1hcCgpYCDDqSB1bWEgZnVuw6fDo28gcXVlIHZpc3VhbGl6YSBkaXN0w6JuY2lhcyBlbnRyZSBwb250b3Mgb3JnYW5pemFuZG8gbGluaGFzIGUgY29sdW5hcyB2aWEgYGhjbHVzdGA6CgpgYGB7cn0KZmlsbWVzICU+JSAKICAgIHNlbGVjdChSQVRJTkcpICU+JQogICAgZGlzdChtZXRob2QgPSAiZXVjbGlkZWFuIikgJT4lCiAgICBhcy5tYXRyaXggJT4lCiAgICBoZWF0bWFwKCkKYGBgCgojIyMgVmFyaWFuZG8gbyBtw6l0b2RvIGRlIGxpbmthZ2UKCmBgYHtyfQpwbG90YV9oY2x1c3RzXzFkKGZpbG1lcywgIlJBVElORyIsIAogICAgICAgICAgICAgICAgIGxpbmthZ2VfbWV0aG9kID0gImNlbnRyb2lkIiwgIyBzaW5nbGUsIGNvbXBsZXRlLCBhdmVyYWdlLCBjZW50cm9pZCwgbWVkaWFuLCAuLi4KICAgICAgICAgICAgICAgICBrcyA9IDE6NikKbmFtZXMoaXJpcykKYGBgCgojIyMgQ29tIG91dHJhcyB2YXJpw6F2ZWlzCgpBZ3J1cGFtZW50byBzZW1wcmUgZMOhIHVtIHJlc3VsdGFkby4gTWVzbW8gcXVhbmRvIGVsZSBuw6NvIMOpIMO6dGlsOgoKYGBge3J9CnBsb3RhX2hjbHVzdHNfMWQoZmlsbWVzLCAiWUVBUiIsIGxpbmthZ2VfbWV0aG9kID0gImNlbnRyb2lkIiwga3MgPSAxOjYpCmBgYAoKQ29tcGFyZSBhcyBzb2x1w6fDtWVzIHVzYW5kbyBhIGVzY2FsYSBsaW5lYXIgZGEgdmFyacOhdmVsIGUgYSB0cmFuc2Zvcm1hZGEgZW0gbG9nOgoKYGBge3J9CnBsb3RhX2hjbHVzdHNfMWQoZmlsbWVzLCAiYEJPWCBPRkZJQ0VgIiwgbGlua2FnZV9tZXRob2QgPSAiY2VudHJvaWQiLCBrcyA9IDE6NikKCmZpbG1lcyAlPiUgbXV0YXRlKGBCT1ggT0ZGSUNFYCA9IGxvZyhgQk9YIE9GRklDRWApKSAlPiUgCiAgICBwbG90YV9oY2x1c3RzXzFkKCJgQk9YIE9GRklDRWAiLCBsaW5rYWdlX21ldGhvZCA9ICJjZW50cm9pZCIsIGtzID0gMTo2KSArIAogICAgc2NhbGVfeV9sb2cxMCgpCmBgYAoKIyMgU2lsaG91ZXRhcwoKRGFkYSBhIGRpc3TDom5jaWEgbcOpZGlhIGRlIHVtIHBvbnRvIHBhcmEgb3MgZGVtYWlzIGRvIHNldSBjbHVzdGVyICRhKGkpJCBlIGEgZGlzdMOibmNpYSBtw6lkaWEgZG8gcG9udG8gcGFyYSB0b2RvcyBvcyBkZW1haXMgZG8gY2x1c3RlciBtYWlzIHByw7N4aW1vICRiKGkpJCwgYSBsYXJndXJhIGRhIHNpbGhvdWV0YSBkZSAkaSQgw6kgOiAKClxbCnMoaSkgOj0gKCBiKGkpIC0gYShpKSApIC8gbWF4KCBhKGkpLCBiKGkpICkKXF0KClJlcGFyZSBjb21vIDEgc2lnbmlmaWNhIHVtYSBib2EgYXRyaWJ1acOnw6NvIHBhcmEgJGkkLCAwIHNpZ25pZmljYSBpbmRlZmluacOnw6NvIGUgJC0xJCBzaWduaWZpY2EgcXVlIGjDoSBvdXRybyBjbHVzdGVyIG9uZGUgJGkkIGVzdGFyaWEgbWVsaG9yIGFsb2NhZG8uCgpgYGB7cn0KZGlzdGFuY2lhcyA9IGZpbG1lcyAlPiUgCiAgICBzZWxlY3QoUkFUSU5HKSAlPiUKICAgIGRpc3QobWV0aG9kID0gImV1Y2xpZGVhbiIpCgphZ3J1cGFtZW50b19ocyA9IGZpbG1lcyAlPiUgCiAgICBjb2x1bW5fdG9fcm93bmFtZXMoIlRJVExFIikgJT4lCiAgICBzZWxlY3QoUkFUSU5HKSAlPiUKICAgIGRpc3QobWV0aG9kID0gImV1Y2xpZGVhbiIpICU+JSAKICAgIGhjbHVzdChtZXRob2QgPSAiY29tcGxldGUiKQoKCnBsb3Qoc2lsaG91ZXR0ZShjdXRyZWUoYWdydXBhbWVudG9faHMsIGsgPSA0KSwgZGlzdGFuY2lhcykpCnBsb3Qoc2lsaG91ZXR0ZShjdXRyZWUoYWdydXBhbWVudG9faHMsIGsgPSAyKSwgZGlzdGFuY2lhcykpCmBgYAoKIyMgRHVhcyBkaW1lbnPDtWVzCgpgYGB7cn0KcCA9IGZpbG1lcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBSQVRJTkcsIHkgPSBgQk9YIE9GRklDRWAsIGxhYmVsID0gVElUTEUpKSArIAogICAgZ2VvbV9wb2ludCgpIApwCiNnZ3Bsb3RseShwKQpgYGAKCgpgYGB7cn0KYWdydXBhbWVudG9faF8yZCA9IGZpbG1lcyAlPiUgCiAgICBjb2x1bW5fdG9fcm93bmFtZXMoIlRJVExFIikgJT4lCiAgICBzZWxlY3QoUkFUSU5HLCBgQk9YIE9GRklDRWApICU+JQogICAgZGlzdChtZXRob2QgPSAiZXVjbGlkZWFuIikgJT4lIAogICAgaGNsdXN0KG1ldGhvZCA9ICJjZW50cm9pZCIpCgpnZ2RlbmRyb2dyYW0oYWdydXBhbWVudG9faF8yZCkKCmRhdGEuZnJhbWUoayA9IE5ST1coYWdydXBhbWVudG9faF8yZCRoZWlnaHQpOjEsIAogICAgICAgICAgIGhlaWdodCA9IGFncnVwYW1lbnRvX2hfMmQkaGVpZ2h0KSAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBrLCB5ID0gaGVpZ2h0KSkgKyAKICAgIGdlb21fbGluZShjb2xvdXIgPSAiZ3JleSIpICsgCiAgICBnZW9tX3BvaW50KCkgKyAKICAgIGxhYnMoeCA9ICJOw7ptZXJvIGRlIGNsdXN0ZXJzIHByb2R1emlkbyIsIHkgPSAiRGlzc2ltaWxhcmlkYWRlIG5hIGp1bsOnw6NvIikKCmBgYAoKQ29tbyBzZW1wcmUsIG8gYWxnb3JpdG1vIGVuY29udHJhIGdydXBvcy4gTm8gY2FzbywgcGFyZWNlbSBhdMOpIGJlbSBzZXBhcmFkb3MuIFZhbW9zIHZpc3VhbGl6w6EtbG9zOgoKYGBge3J9CnBsb3RhX2hjbHVzdHNfMmQoYWdydXBhbWVudG9faF8yZCwgCiAgICAgICAgICAgICAgICAgZmlsbWVzLCAKICAgICAgICAgICAgICAgICBjKCJSQVRJTkciLCAiYEJPWCBPRkZJQ0VgIiksIAogICAgICAgICAgICAgICAgIGxpbmthZ2VfbWV0aG9kID0gImNlbnRyb2lkIiwga3MgPSAxOjYpCmBgYAoKTyBhZ3J1cGFtZW50byBlc3TDoSBhY29udGVjZW5kbyB0b2RvIGVtIGZ1bsOnw6NvIGRlIEJPWCBPRkZJQ0UsIGFwZW5hcy4gQ29tbyBhcyBlc2NhbGFzIHPDo28gZGlmZXJlbnRlcywgQk9YIE9GRklDRSBkb21pbmEgcXVhbHF1ZXIgY8OhbGN1bG8gZGUgZGlzdMOibmNpYSBldWNsaWRpYW5hLiAKClNvbHXDp8Ojbzogc3RhbmRhcmRpemUgKGFrYSBzY2FsZSkuCgpgYGB7cn0KYWdydXBhbWVudG9faF8yZCA9IGZpbG1lcyAlPiUgCiAgICBjb2x1bW5fdG9fcm93bmFtZXMoIlRJVExFIikgJT4lCiAgICBzZWxlY3QoUkFUSU5HLCBgQk9YIE9GRklDRWApICU+JQogICAgI211dGF0ZShgQk9YIE9GRklDRWAgPSBsb2cxMChgQk9YIE9GRklDRWApKSAlPiUgCiAgICBtdXRhdGVfYWxsKGZ1bnMoc2NhbGUpKSAlPiUgCiAgICBkaXN0KG1ldGhvZCA9ICJldWNsaWRlYW4iKSAlPiUgCiAgICBoY2x1c3QobWV0aG9kID0gIndhcmQuRCIpCgpnZ2RlbmRyb2dyYW0oYWdydXBhbWVudG9faF8yZCkKCnBsb3RhX2hjbHVzdHNfMmQoYWdydXBhbWVudG9faF8yZCwgCiAgICAgICAgICAgICAgICAgZmlsbWVzLCAKICAgICAgICAgICAgICAgICBjKCJSQVRJTkciLCAiYEJPWCBPRkZJQ0VgIiksIAogICAgICAgICAgICAgICAgIGxpbmthZ2VfbWV0aG9kID0gIndhcmQuRCIsIGtzID0gMTo2KSAjICsgc2NhbGVfeV9sb2cxMCgpCmBgYAoKCiMjIE1haXMgdmFyacOhdmVpcwoKRSBzZSB0aXbDqXNzZW1vcyBtYWlzIGRlIGR1YXMgdmFyacOhdmVpcz8KCgpgYGB7cn0KZmlsbWVzMiA9IGFncnVwYW1lbnRvX2hfbWQgPSBmaWxtZXMgJT4lIAogICAgbXV0YXRlKFRJVExFX0xFTkdUSCA9IG5jaGFyKFRJVExFKSkgCgpkaXN0cyA9IGZpbG1lczIgJT4lIAogICAgY29sdW1uX3RvX3Jvd25hbWVzKCJUSVRMRSIpICU+JQogICAgbXV0YXRlKGBCT1ggT0ZGSUNFYCA9IGxvZzEwKGBCT1ggT0ZGSUNFYCkpICU+JSAKICAgIHNlbGVjdChSQVRJTkcsIGBCT1ggT0ZGSUNFYCwgVElUTEVfTEVOR1RILCBZRUFSKSAlPiUKICAgIG11dGF0ZV9hbGwoZnVucyhzY2FsZSkpICU+JSAKICAgIGRpc3QobWV0aG9kID0gImV1Y2xpZGVhbiIpCgphZ3J1cGFtZW50b19oX21kID0gZGlzdHMgJT4lIAogICAgaGNsdXN0KG1ldGhvZCA9ICJ3YXJkLkQiKQoKZ2dkZW5kcm9ncmFtKGFncnVwYW1lbnRvX2hfbWQsIHJvdGF0ZSA9IFQpCmNvcmVzID0gUkNvbG9yQnJld2VyOjpicmV3ZXIucGFsKDQsICJTZXQzIikKcGxvdChjbHVzdGVyOjpzaWxob3VldHRlKGN1dHJlZShhZ3J1cGFtZW50b19oX21kLCBrID0gNCksIGRpc3RzKSwgY29sID0gY29yZXMsIGJvcmRlciA9IE5BKQpgYGAKCmBgYHtyfQphdHJpYnVpY29lcyA9IHRpYmJsZShrID0gMTo1KSAlPiUgCiAgICBncm91cF9ieShrKSAlPiUgCiAgICBkbyhjYmluZChmaWxtZXMyLCAKICAgICAgICAgICAgIGdydXBvID0gYXMuY2hhcmFjdGVyKGN1dHJlZShhZ3J1cGFtZW50b19oX21kLCAuJGspKSkpIAoKYXRyaWJ1aWNvZXNfbG9uZyA9IGF0cmlidWljb2VzICU+JSAKICAgIG11dGF0ZShgQk9YIE9GRklDRWAgPSBzY2FsZShsb2cxMChgQk9YIE9GRklDRWApKSwgCiAgICAgICAgICAgWUVBUiA9IHNjYWxlKFlFQVIpLCAKICAgICAgICAgICBSQVRJTkcgPSBzY2FsZShSQVRJTkcpLCAKICAgICAgICAgICBUSVRMRV9MRU5HVEggPSBzY2FsZShUSVRMRV9MRU5HVEgpKSAlPiUgCiAgICBnYXRoZXIoa2V5ID0gInZhcmlhdmVsIiwgdmFsdWUgPSAidmFsb3IiLCAtVElUTEUsIC1rLCAtZ3J1cG8sIC1DUkVESVQpIAoKYXRyaWJ1aWNvZXNfbG9uZyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSB2YXJpYXZlbCwgeSA9IHZhbG9yLCBncm91cCA9IGdydXBvLCBjb2xvdXIgPSBncnVwbykpICsgCiAgICBnZW9tX3BvaW50KGFscGhhID0gLjQsIHBvc2l0aW9uID0gcG9zaXRpb25fZG9kZ2Uod2lkdGggPSAuNSkpICsgCiAgICBmYWNldF93cmFwKH4gcGFzdGUoaywgIiBncnVwb3MiKSkgKyAKICAgIGxhYnMoeCA9ICIiLCB5ID0gInotc2NvcmUiKQoKYXRyaWJ1aWNvZXNfbG9uZyAlPiUgCiAgICBmaWx0ZXIoayA9PSAzKSAlPiUKICAgIGdncGxvdChhZXMoeCA9IHZhcmlhdmVsLCAKICAgICAgICAgICAgICAgeSA9IHZhbG9yLCAKICAgICAgICAgICAgICAgY29sb3VyID0gZ3J1cG8pKSArIAogICAgZ2VvbV9ib3hwbG90KCkgKyAKICAgIGdlb21fcG9pbnQoYWxwaGEgPSAuNCwgcG9zaXRpb24gPSBwb3NpdGlvbl9qaXR0ZXIod2lkdGggPSAuMSkpICsgCiAgICBmYWNldF93cmFwKH4gZ3J1cG8pICsgCiAgICBsYWJzKHggPSAiIiwgeSA9ICJ6LXNjb3JlIikKCmF0cmlidWljb2VzX2xvbmcgJT4lIAogICAgZmlsdGVyKGsgPT0gNCkgJT4lCiAgICBnZ3Bsb3QoYWVzKHggPSB2YXJpYXZlbCwgeSA9IHZhbG9yLCBncm91cCA9IFRJVExFLCBjb2xvdXIgPSBncnVwbykpICsgCiAgICBnZW9tX3BvaW50KGFscGhhID0gLjMsIHNpemUgPSAuNSkgKyAKICAgIGdlb21fbGluZShhbHBoYSA9IC43KSArIAogICAgZmFjZXRfd3JhcCh+IHBhc3RlKCJHcnVwbyAiLCBncnVwbykpICsgCiAgICBsYWJzKHggPSAiIiwgeSA9ICJ6LXNjb3JlIikKYGBgCgo=