library(tidyverse, warn.conflicts = F)
package ‘tidyverse’ was built under R version 3.3.2Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
package ‘ggplot2’ was built under R version 3.3.2package ‘tibble’ was built under R version 3.3.2package ‘tidyr’ was built under R version 3.3.2Conflicts with tidy packages ------------------------------------------------------
filter(): dplyr, stats
lag():    dplyr, stats
library(rvest)
Carregando pacotes exigidos: xml2

Attaching package: ‘rvest’

The following object is masked from ‘package:readr’:

    guess_encoding
library(plotly)
package ‘plotly’ was built under R version 3.3.2
Attaching package: ‘plotly’

The following object is masked from ‘package:ggplot2’:

    last_plot

The following object is masked from ‘package:stats’:

    filter

The following object is masked from ‘package:graphics’:

    layout
library(cluster)
library(ggdendro)
theme_set(theme_light())
source("plota_solucoes_hclust.R")

Usaremos dados do Rotten Tomatoes sobre os filmes de Scarlett Johansson.

from_page <- read_html("https://www.rottentomatoes.com/celebrity/scarlett_johansson/") %>% 
    html_node("#filmographyTbl") %>% # A sintaxe da expressão é de um seletor à lá JQuery
    html_table(fill=TRUE) %>% # Faz parse
    as.tibble()
filmes = from_page %>% 
    filter(RATING != "No Score Yet", 
           `BOX OFFICE` != "—", 
           CREDIT != "Executive Producer") %>%
    mutate(RATING = as.numeric(gsub("%", "", RATING)), 
           `BOX OFFICE` = as.numeric(gsub("[$|M]", "", `BOX OFFICE`))) %>% 
    filter(`BOX OFFICE` >= 1) # Tem dois filmes que não parecem ter sido lançados no mundo todo
NAs introduzidos por coerção

A intuição

Uma forma de descrever estrutura nos dados é percebendo grupos de observações mais semelhantes entre si que com o restante dos dados. Agrupamentos.

Por exemplo, observando as avaliações dos filmes:

filmes %>% 
    ggplot(aes(x = "Filmes", y = RATING)) + 
    geom_jitter(width = .01, height = 0, size = 2, alpha = .6)

filmes %>% 
    ggplot(aes(x = RATING)) + 
    geom_histogram(bins = 16) + 
    geom_rug()

Já considerando o ano de lançamento, não há uma estrutura clara de grupos.

filmes %>% 
    ggplot(aes(x = "Filmes", y = YEAR)) + 
    geom_jitter(width = .02, height = 0, size = 2, alpha = .6) + 
    scale_y_log10()

filmes %>% 
    ggplot(aes(x = `YEAR`)) + 
    geom_histogram(bins = 17) + 
    geom_rug()

tibble(a = c(rnorm(100, mean = 10, sd = 10), 
             rnorm(40, mean = 80, sd = 15))) %>% 
    ggplot(aes(x = "teste", y = a)) + 
    geom_jitter(width = .02, height = 0, size = 2, alpha = .6) 

Para a renda do filme, observar uma escala linear ou logarítmica levam a conclusões diferentes.

filmes %>% 
    ggplot(aes(x = "Filmes", y = `BOX OFFICE`)) + 
    geom_jitter(width = .02, height = 0, size = 2, alpha = .6)  

    
filmes %>% 
    ggplot(aes(x = "Filmes", y = `BOX OFFICE`)) + 
    geom_jitter(width = .02, height = 0, size = 2, alpha = .6) + 
    scale_y_log10()

filmes %>% 
    ggplot(aes(x = `BOX OFFICE`)) + 
    geom_histogram(bins = 20) + 
    geom_rug()

filmes %>% 
    ggplot(aes(x = `BOX OFFICE`)) + 
    geom_histogram(bins = 20) + 
    scale_x_log10() + 
    geom_rug()

Agrupamento

Precisamos de:

Para produzir uma solução de agrupamento. Depois vem o principal: avaliar e interpretar a solução. Agrupamento sempre dá um resultado. Nem sempre é útil.

Com uma dimensão

Há duas maneiras principais de agrupar: aglomerativa ou baseada em partição. Vamos explorar primeiro a hierárquica aglomerativa.

distancias.long = filmes %>% 
    select(RATING) %>%
    dist(method = "euclidean") %>% 
    as.matrix %>% 
    reshape2::melt(varnames = c("row", "col"))
distancias.long %>% 
    ggplot(aes(x = row, y = col, fill = value)) + 
    geom_tile()

# distancias = filmes %>% 
#     select(RATING) %>%
#     dist(method = "euclidean") %>% 
#     as.matrix %>% 
#     heatmap()
row.names(filmes) = NULL
agrupamento_h = filmes %>% 
    column_to_rownames("TITLE") %>% # hclust precisa dos rótulos em nomes de linha (ruim)
    select(RATING) %>%
    dist(method = "euclidean") %>% 
    hclust(method = "complete")
Setting row names on a tibble is deprecated.
ggdendrogram(agrupamento_h, rotate = T, size = 2) 

ggdendrogram(agrupamento_h, rotate = T, size = 2) + 
    geom_hline(yintercept = 45, colour = "red")

Cada junção é um passo do algoritmo. A altura na dendrograma em cada passo significa a dissimilaridade entre os pontos ou grupos juntados naquele passo.

Na medida que vamos aglomerando, as dissimilaridades nas junções tendem a ir aumentando caso haja estrutura de grupos.

data.frame(k = 1:NROW(agrupamento_h$height), 
           height = agrupamento_h$height) %>% 
    ggplot(aes(x = k, y = height)) + 
    geom_line(colour = "grey") + 
    geom_point() + 
    labs(x = "Junções feitas (34 - clusters)", y = "Dissimilaridade na junção")

Vejamos as soluções com diferentes números de grupos.

solucoes = tibble(k = 1:9)
atribuicoes = solucoes %>% 
    group_by(k) %>% 
    do(cbind(filmes, 
             grupo = as.character(cutree(agrupamento_h, .$k)))) 
Unequal factor levels: coercing to character
atribuicoes %>% 
    ggplot(aes(x = "Filmes", y = RATING, colour = grupo)) + 
    geom_jitter(width = .02, height = 0, size = 2, alpha = .6) + 
    facet_wrap(~ paste(k, " grupos"))

stats::heatmap() é uma função que visualiza distâncias entre pontos organizando linhas e colunas via hclust:

filmes %>% 
    select(RATING) %>%
    dist(method = "euclidean") %>%
    as.matrix %>%
    heatmap()

Variando o método de linkage

plota_hclusts_1d(filmes, "RATING", 
                 linkage_method = "centroid", # single, complete, average, centroid, median, ...
                 ks = 1:6)
Setting row names on a tibble is deprecated.Unequal factor levels: coercing to character

names(iris)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     

Com outras variáveis

Agrupamento sempre dá um resultado. Mesmo quando ele não é útil:

plota_hclusts_1d(filmes, "YEAR", linkage_method = "centroid", ks = 1:6)
Setting row names on a tibble is deprecated.Unequal factor levels: coercing to character

Compare as soluções usando a escala linear da variável e a transformada em log:

plota_hclusts_1d(filmes, "`BOX OFFICE`", linkage_method = "centroid", ks = 1:6)
Setting row names on a tibble is deprecated.Unequal factor levels: coercing to character

filmes %>% mutate(`BOX OFFICE` = log(`BOX OFFICE`)) %>% 
    plota_hclusts_1d("`BOX OFFICE`", linkage_method = "centroid", ks = 1:6) + 
    scale_y_log10()
Setting row names on a tibble is deprecated.Unequal factor levels: coercing to character

Silhouetas

Dada a distância média de um ponto para os demais do seu cluster \(a(i)\) e a distância média do ponto para todos os demais do cluster mais próximo \(b(i)\), a largura da silhoueta de \(i\) é :

\[ s(i) := ( b(i) - a(i) ) / max( a(i), b(i) ) \]

Repare como 1 significa uma boa atribuição para \(i\), 0 significa indefinição e \(-1\) significa que há outro cluster onde \(i\) estaria melhor alocado.

distancias = filmes %>% 
    select(RATING) %>%
    dist(method = "euclidean")
agrupamento_hs = filmes %>% 
    column_to_rownames("TITLE") %>%
    select(RATING) %>%
    dist(method = "euclidean") %>% 
    hclust(method = "complete")
Setting row names on a tibble is deprecated.
plot(silhouette(cutree(agrupamento_hs, k = 4), distancias))

plot(silhouette(cutree(agrupamento_hs, k = 2), distancias))

Duas dimensões

p = filmes %>% 
    ggplot(aes(x = RATING, y = `BOX OFFICE`, label = TITLE)) + 
    geom_point() 
p

#ggplotly(p)
agrupamento_h_2d = filmes %>% 
    column_to_rownames("TITLE") %>%
    select(RATING, `BOX OFFICE`) %>%
    dist(method = "euclidean") %>% 
    hclust(method = "centroid")
Setting row names on a tibble is deprecated.
ggdendrogram(agrupamento_h_2d)

data.frame(k = NROW(agrupamento_h_2d$height):1, 
           height = agrupamento_h_2d$height) %>% 
    ggplot(aes(x = k, y = height)) + 
    geom_line(colour = "grey") + 
    geom_point() + 
    labs(x = "Número de clusters produzido", y = "Dissimilaridade na junção")

Como sempre, o algoritmo encontra grupos. No caso, parecem até bem separados. Vamos visualizá-los:

plota_hclusts_2d(agrupamento_h_2d, 
                 filmes, 
                 c("RATING", "`BOX OFFICE`"), 
                 linkage_method = "centroid", ks = 1:6)
Unequal factor levels: coercing to character

O agrupamento está acontecendo todo em função de BOX OFFICE, apenas. Como as escalas são diferentes, BOX OFFICE domina qualquer cálculo de distância euclidiana.

Solução: standardize (aka scale).

agrupamento_h_2d = filmes %>% 
    column_to_rownames("TITLE") %>%
    select(RATING, `BOX OFFICE`) %>%
    #mutate(`BOX OFFICE` = log10(`BOX OFFICE`)) %>% 
    mutate_all(funs(scale)) %>% 
    dist(method = "euclidean") %>% 
    hclust(method = "ward.D")
Setting row names on a tibble is deprecated.
ggdendrogram(agrupamento_h_2d)

plota_hclusts_2d(agrupamento_h_2d, 
                 filmes, 
                 c("RATING", "`BOX OFFICE`"), 
                 linkage_method = "ward.D", ks = 1:6) # + scale_y_log10()
Unequal factor levels: coercing to character

Mais variáveis

E se tivéssemos mais de duas variáveis?

filmes2 = agrupamento_h_md = filmes %>% 
    mutate(TITLE_LENGTH = nchar(TITLE)) 
dists = filmes2 %>% 
    column_to_rownames("TITLE") %>%
    mutate(`BOX OFFICE` = log10(`BOX OFFICE`)) %>% 
    select(RATING, `BOX OFFICE`, TITLE_LENGTH, YEAR) %>%
    mutate_all(funs(scale)) %>% 
    dist(method = "euclidean")
Setting row names on a tibble is deprecated.
agrupamento_h_md = dists %>% 
    hclust(method = "ward.D")
ggdendrogram(agrupamento_h_md, rotate = T)

cores = RColorBrewer::brewer.pal(4, "Set3")
plot(cluster::silhouette(cutree(agrupamento_h_md, k = 4), dists), col = cores, border = NA)

atribuicoes = tibble(k = 1:5) %>% 
    group_by(k) %>% 
    do(cbind(filmes2, 
             grupo = as.character(cutree(agrupamento_h_md, .$k)))) 
Unequal factor levels: coercing to character
atribuicoes_long = atribuicoes %>% 
    mutate(`BOX OFFICE` = scale(log10(`BOX OFFICE`)), 
           YEAR = scale(YEAR), 
           RATING = scale(RATING), 
           TITLE_LENGTH = scale(TITLE_LENGTH)) %>% 
    gather(key = "variavel", value = "valor", -TITLE, -k, -grupo, -CREDIT) 
attributes are not identical across measure variables; they will be dropped
atribuicoes_long %>% 
    ggplot(aes(x = variavel, y = valor, group = grupo, colour = grupo)) + 
    geom_point(alpha = .4, position = position_dodge(width = .5)) + 
    facet_wrap(~ paste(k, " grupos")) + 
    labs(x = "", y = "z-score")

atribuicoes_long %>% 
    filter(k == 3) %>%
    ggplot(aes(x = variavel, 
               y = valor, 
               colour = grupo)) + 
    geom_boxplot() + 
    geom_point(alpha = .4, position = position_jitter(width = .1)) + 
    facet_wrap(~ grupo) + 
    labs(x = "", y = "z-score")

atribuicoes_long %>% 
    filter(k == 4) %>%
    ggplot(aes(x = variavel, y = valor, group = TITLE, colour = grupo)) + 
    geom_point(alpha = .3, size = .5) + 
    geom_line(alpha = .7) + 
    facet_wrap(~ paste("Grupo ", grupo)) + 
    labs(x = "", y = "z-score")

LS0tCnRpdGxlOiAiVGlwb3MgZGUgZmlsbWVzIgphdXRob3I6ICJOYXphcmVubyIKZGF0ZTogIjE2LzA0LzIwMTciCm91dHB1dDogCiAgICAKICAgIGh0bWxfbm90ZWJvb2s6CiAgICAgICAgdGhlbWU6IGNlcnVsZWFuCmVkaXRvcl9vcHRpb25zOiAKICBjaHVua19vdXRwdXRfdHlwZTogaW5saW5lCi0tLQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSkKYGBgCgoKYGBge3IgbG9hZF9jb2RlfQpsaWJyYXJ5KHRpZHl2ZXJzZSwgd2Fybi5jb25mbGljdHMgPSBGKQpsaWJyYXJ5KHJ2ZXN0KQpsaWJyYXJ5KHBsb3RseSkKbGlicmFyeShjbHVzdGVyKQpsaWJyYXJ5KGdnZGVuZHJvKQp0aGVtZV9zZXQodGhlbWVfbGlnaHQoKSkKc291cmNlKCJwbG90YV9zb2x1Y29lc19oY2x1c3QuUiIpCmBgYAoKVXNhcmVtb3MgZGFkb3MgZG8gW1JvdHRlbiBUb21hdG9lc10oaHR0cHM6Ly93d3cucm90dGVudG9tYXRvZXMuY29tKSBzb2JyZSBvcyBmaWxtZXMgZGUgU2NhcmxldHQgSm9oYW5zc29uLiAKCmBgYHtyfQpmcm9tX3BhZ2UgPC0gcmVhZF9odG1sKCJodHRwczovL3d3dy5yb3R0ZW50b21hdG9lcy5jb20vY2VsZWJyaXR5L3NjYXJsZXR0X2pvaGFuc3Nvbi8iKSAlPiUgCiAgICBodG1sX25vZGUoIiNmaWxtb2dyYXBoeVRibCIpICU+JSAjIEEgc2ludGF4ZSBkYSBleHByZXNzw6NvIMOpIGRlIHVtIHNlbGV0b3Igw6AgbMOhIEpRdWVyeQogICAgaHRtbF90YWJsZShmaWxsPVRSVUUpICU+JSAjIEZheiBwYXJzZQogICAgYXMudGliYmxlKCkKCmZpbG1lcyA9IGZyb21fcGFnZSAlPiUgCiAgICBmaWx0ZXIoUkFUSU5HICE9ICJObyBTY29yZSBZZXQiLCAKICAgICAgICAgICBgQk9YIE9GRklDRWAgIT0gIuKAlCIsIAogICAgICAgICAgIENSRURJVCAhPSAiRXhlY3V0aXZlIFByb2R1Y2VyIikgJT4lCiAgICBtdXRhdGUoUkFUSU5HID0gYXMubnVtZXJpYyhnc3ViKCIlIiwgIiIsIFJBVElORykpLCAKICAgICAgICAgICBgQk9YIE9GRklDRWAgPSBhcy5udW1lcmljKGdzdWIoIlskfE1dIiwgIiIsIGBCT1ggT0ZGSUNFYCkpKSAlPiUgCiAgICBmaWx0ZXIoYEJPWCBPRkZJQ0VgID49IDEpICMgVGVtIGRvaXMgZmlsbWVzIHF1ZSBuw6NvIHBhcmVjZW0gdGVyIHNpZG8gbGFuw6dhZG9zIG5vIG11bmRvIHRvZG8KYGBgCgojIyBBIGludHVpw6fDo28KClVtYSBmb3JtYSBkZSBkZXNjcmV2ZXIgZXN0cnV0dXJhIG5vcyBkYWRvcyDDqSBwZXJjZWJlbmRvIGdydXBvcyBkZSBvYnNlcnZhw6fDtWVzIG1haXMgc2VtZWxoYW50ZXMgZW50cmUgc2kgcXVlIGNvbSBvIHJlc3RhbnRlIGRvcyBkYWRvcy4gQWdydXBhbWVudG9zLiAKClBvciBleGVtcGxvLCBvYnNlcnZhbmRvIGFzIGF2YWxpYcOnw7VlcyBkb3MgZmlsbWVzOgoKYGBge3J9CmZpbG1lcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSAiRmlsbWVzIiwgeSA9IFJBVElORykpICsgCiAgICBnZW9tX2ppdHRlcih3aWR0aCA9IC4wMSwgaGVpZ2h0ID0gMCwgc2l6ZSA9IDIsIGFscGhhID0gLjYpCgpmaWxtZXMgJT4lIAogICAgZ2dwbG90KGFlcyh4ID0gUkFUSU5HKSkgKyAKICAgIGdlb21faGlzdG9ncmFtKGJpbnMgPSAxNikgKyAKICAgIGdlb21fcnVnKCkKYGBgCgpKw6EgY29uc2lkZXJhbmRvIG8gYW5vIGRlIGxhbsOnYW1lbnRvLCBuw6NvIGjDoSB1bWEgZXN0cnV0dXJhIGNsYXJhIGRlIGdydXBvcy4gCgpgYGB7cn0KZmlsbWVzICU+JSAKICAgIGdncGxvdChhZXMoeCA9ICJGaWxtZXMiLCB5ID0gWUVBUikpICsgCiAgICBnZW9tX2ppdHRlcih3aWR0aCA9IC4wMiwgaGVpZ2h0ID0gMCwgc2l6ZSA9IDIsIGFscGhhID0gLjYpICsgCiAgICBzY2FsZV95X2xvZzEwKCkKCmZpbG1lcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBgWUVBUmApKSArIAogICAgZ2VvbV9oaXN0b2dyYW0oYmlucyA9IDE3KSArIAogICAgZ2VvbV9ydWcoKQpgYGAKCmBgYHtyfQp0aWJibGUoYSA9IGMocm5vcm0oMTAwLCBtZWFuID0gMTAsIHNkID0gMTApLCAKICAgICAgICAgICAgIHJub3JtKDQwLCBtZWFuID0gODAsIHNkID0gMTUpKSkgJT4lIAogICAgZ2dwbG90KGFlcyh4ID0gInRlc3RlIiwgeSA9IGEpKSArIAogICAgZ2VvbV9qaXR0ZXIod2lkdGggPSAuMDIsIGhlaWdodCA9IDAsIHNpemUgPSAyLCBhbHBoYSA9IC42KSAKCmBgYAoKCgpQYXJhIGEgcmVuZGEgZG8gZmlsbWUsIG9ic2VydmFyIHVtYSBlc2NhbGEgbGluZWFyIG91IGxvZ2Fyw610bWljYSBsZXZhbSBhIGNvbmNsdXPDtWVzIGRpZmVyZW50ZXMuIAoKYGBge3J9CmZpbG1lcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSAiRmlsbWVzIiwgeSA9IGBCT1ggT0ZGSUNFYCkpICsgCiAgICBnZW9tX2ppdHRlcih3aWR0aCA9IC4wMiwgaGVpZ2h0ID0gMCwgc2l6ZSA9IDIsIGFscGhhID0gLjYpICAKICAgIApmaWxtZXMgJT4lIAogICAgZ2dwbG90KGFlcyh4ID0gIkZpbG1lcyIsIHkgPSBgQk9YIE9GRklDRWApKSArIAogICAgZ2VvbV9qaXR0ZXIod2lkdGggPSAuMDIsIGhlaWdodCA9IDAsIHNpemUgPSAyLCBhbHBoYSA9IC42KSArIAogICAgc2NhbGVfeV9sb2cxMCgpCgpmaWxtZXMgJT4lIAogICAgZ2dwbG90KGFlcyh4ID0gYEJPWCBPRkZJQ0VgKSkgKyAKICAgIGdlb21faGlzdG9ncmFtKGJpbnMgPSAyMCkgKyAKICAgIGdlb21fcnVnKCkKCmZpbG1lcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBgQk9YIE9GRklDRWApKSArIAogICAgZ2VvbV9oaXN0b2dyYW0oYmlucyA9IDIwKSArIAogICAgc2NhbGVfeF9sb2cxMCgpICsgCiAgICBnZW9tX3J1ZygpCgpgYGAKCiMjIEFncnVwYW1lbnRvIAoKUHJlY2lzYW1vcyBkZTogCgoqIERlZmluacOnw6NvIGRlIHByb3hpbWlkYWRlL2Rpc3TDom5jaWEgZW50cmUgcG9udG9zCiogRGVmaW5pw6fDo28gZGUgcHJveGltaWRhZGUvZGlzdMOibmNpYSBlbnRyZSBncnVwb3Mgb3UgZ3J1cG9zIGUgcG9udG9zCiogUHJvY2Vzc28gZGUgYWdydXBhbWVudG8gCiogRGVjaWRpciBxdWFudG9zIGdydXBvcyBleGlzdGVtCgpQYXJhIHByb2R1emlyIHVtYSBzb2x1w6fDo28gZGUgYWdydXBhbWVudG8uIERlcG9pcyB2ZW0gbyBwcmluY2lwYWw6ICoqYXZhbGlhciBlIGludGVycHJldGFyKiogYSBzb2x1w6fDo28uIF9BZ3J1cGFtZW50byBzZW1wcmUgZMOhIHVtIHJlc3VsdGFkby4gTmVtIHNlbXByZSDDqSDDunRpbF8uIAoKIyMgQ29tIHVtYSBkaW1lbnPDo28KCkjDoSBkdWFzIG1hbmVpcmFzIHByaW5jaXBhaXMgZGUgYWdydXBhcjogYWdsb21lcmF0aXZhIG91IGJhc2VhZGEgZW0gcGFydGnDp8Ojby4gVmFtb3MgZXhwbG9yYXIgcHJpbWVpcm8gYSAqKmhpZXLDoXJxdWljYSBhZ2xvbWVyYXRpdmEqKi4KCmBgYHtyfQpkaXN0YW5jaWFzLmxvbmcgPSBmaWxtZXMgJT4lIAogICAgc2VsZWN0KFJBVElORykgJT4lCiAgICBkaXN0KG1ldGhvZCA9ICJldWNsaWRlYW4iKSAlPiUgCiAgICBhcy5tYXRyaXggJT4lIAogICAgcmVzaGFwZTI6Om1lbHQodmFybmFtZXMgPSBjKCJyb3ciLCAiY29sIikpCgpkaXN0YW5jaWFzLmxvbmcgJT4lIAogICAgZ2dwbG90KGFlcyh4ID0gcm93LCB5ID0gY29sLCBmaWxsID0gdmFsdWUpKSArIAogICAgZ2VvbV90aWxlKCkKCiMgZGlzdGFuY2lhcyA9IGZpbG1lcyAlPiUgCiMgICAgIHNlbGVjdChSQVRJTkcpICU+JQojICAgICBkaXN0KG1ldGhvZCA9ICJldWNsaWRlYW4iKSAlPiUgCiMgICAgIGFzLm1hdHJpeCAlPiUgCiMgICAgIGhlYXRtYXAoKQpgYGAKCgoKYGBge3J9CnJvdy5uYW1lcyhmaWxtZXMpID0gTlVMTAphZ3J1cGFtZW50b19oID0gZmlsbWVzICU+JSAKICAgIGNvbHVtbl90b19yb3duYW1lcygiVElUTEUiKSAlPiUgIyBoY2x1c3QgcHJlY2lzYSBkb3MgcsOzdHVsb3MgZW0gbm9tZXMgZGUgbGluaGEgKHJ1aW0pCiAgICBzZWxlY3QoUkFUSU5HKSAlPiUKICAgIGRpc3QobWV0aG9kID0gImV1Y2xpZGVhbiIpICU+JSAKICAgIGhjbHVzdChtZXRob2QgPSAiY29tcGxldGUiKQoKZ2dkZW5kcm9ncmFtKGFncnVwYW1lbnRvX2gsIHJvdGF0ZSA9IFQsIHNpemUgPSAyKSAKCmdnZGVuZHJvZ3JhbShhZ3J1cGFtZW50b19oLCByb3RhdGUgPSBULCBzaXplID0gMikgKyAKICAgIGdlb21faGxpbmUoeWludGVyY2VwdCA9IDQ1LCBjb2xvdXIgPSAicmVkIikKYGBgCgpDYWRhIGp1bsOnw6NvIMOpIHVtIHBhc3NvIGRvIGFsZ29yaXRtby4gQSBhbHR1cmEgbmEgZGVuZHJvZ3JhbWEgZW0gY2FkYSBwYXNzbyBzaWduaWZpY2EgYSBkaXNzaW1pbGFyaWRhZGUgZW50cmUgb3MgcG9udG9zIG91IGdydXBvcyBqdW50YWRvcyBuYXF1ZWxlIHBhc3NvLiAKCk5hIG1lZGlkYSBxdWUgdmFtb3MgYWdsb21lcmFuZG8sIGFzIGRpc3NpbWlsYXJpZGFkZXMgbmFzIGp1bsOnw7VlcyB0ZW5kZW0gYSBpciBhdW1lbnRhbmRvIGNhc28gaGFqYSBlc3RydXR1cmEgZGUgZ3J1cG9zLgoKYGBge3J9CmRhdGEuZnJhbWUoayA9IDE6TlJPVyhhZ3J1cGFtZW50b19oJGhlaWdodCksIAogICAgICAgICAgIGhlaWdodCA9IGFncnVwYW1lbnRvX2gkaGVpZ2h0KSAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBrLCB5ID0gaGVpZ2h0KSkgKyAKICAgIGdlb21fbGluZShjb2xvdXIgPSAiZ3JleSIpICsgCiAgICBnZW9tX3BvaW50KCkgKyAKICAgIGxhYnMoeCA9ICJKdW7Dp8O1ZXMgZmVpdGFzICgzNCAtIGNsdXN0ZXJzKSIsIHkgPSAiRGlzc2ltaWxhcmlkYWRlIG5hIGp1bsOnw6NvIikKCmBgYAoKVmVqYW1vcyBhcyBzb2x1w6fDtWVzIGNvbSBkaWZlcmVudGVzIG7Dum1lcm9zIGRlIGdydXBvcy4KCmBgYHtyfQpzb2x1Y29lcyA9IHRpYmJsZShrID0gMTo5KQoKYXRyaWJ1aWNvZXMgPSBzb2x1Y29lcyAlPiUgCiAgICBncm91cF9ieShrKSAlPiUgCiAgICBkbyhjYmluZChmaWxtZXMsIAogICAgICAgICAgICAgZ3J1cG8gPSBhcy5jaGFyYWN0ZXIoY3V0cmVlKGFncnVwYW1lbnRvX2gsIC4kaykpKSkgCgphdHJpYnVpY29lcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSAiRmlsbWVzIiwgeSA9IFJBVElORywgY29sb3VyID0gZ3J1cG8pKSArIAogICAgZ2VvbV9qaXR0ZXIod2lkdGggPSAuMDIsIGhlaWdodCA9IDAsIHNpemUgPSAyLCBhbHBoYSA9IC42KSArIAogICAgZmFjZXRfd3JhcCh+IHBhc3RlKGssICIgZ3J1cG9zIikpCgpgYGAKCmBzdGF0czo6aGVhdG1hcCgpYCDDqSB1bWEgZnVuw6fDo28gcXVlIHZpc3VhbGl6YSBkaXN0w6JuY2lhcyBlbnRyZSBwb250b3Mgb3JnYW5pemFuZG8gbGluaGFzIGUgY29sdW5hcyB2aWEgYGhjbHVzdGA6CgpgYGB7cn0KZmlsbWVzICU+JSAKICAgIHNlbGVjdChSQVRJTkcpICU+JQogICAgZGlzdChtZXRob2QgPSAiZXVjbGlkZWFuIikgJT4lCiAgICBhcy5tYXRyaXggJT4lCiAgICBoZWF0bWFwKCkKYGBgCgojIyMgVmFyaWFuZG8gbyBtw6l0b2RvIGRlIGxpbmthZ2UKCmBgYHtyfQpwbG90YV9oY2x1c3RzXzFkKGZpbG1lcywgIlJBVElORyIsIAogICAgICAgICAgICAgICAgIGxpbmthZ2VfbWV0aG9kID0gImNlbnRyb2lkIiwgIyBzaW5nbGUsIGNvbXBsZXRlLCBhdmVyYWdlLCBjZW50cm9pZCwgbWVkaWFuLCAuLi4KICAgICAgICAgICAgICAgICBrcyA9IDE6NikKbmFtZXMoaXJpcykKYGBgCgojIyMgQ29tIG91dHJhcyB2YXJpw6F2ZWlzCgpBZ3J1cGFtZW50byBzZW1wcmUgZMOhIHVtIHJlc3VsdGFkby4gTWVzbW8gcXVhbmRvIGVsZSBuw6NvIMOpIMO6dGlsOgoKYGBge3J9CnBsb3RhX2hjbHVzdHNfMWQoZmlsbWVzLCAiWUVBUiIsIGxpbmthZ2VfbWV0aG9kID0gImNlbnRyb2lkIiwga3MgPSAxOjYpCmBgYAoKQ29tcGFyZSBhcyBzb2x1w6fDtWVzIHVzYW5kbyBhIGVzY2FsYSBsaW5lYXIgZGEgdmFyacOhdmVsIGUgYSB0cmFuc2Zvcm1hZGEgZW0gbG9nOgoKYGBge3J9CnBsb3RhX2hjbHVzdHNfMWQoZmlsbWVzLCAiYEJPWCBPRkZJQ0VgIiwgbGlua2FnZV9tZXRob2QgPSAiY2VudHJvaWQiLCBrcyA9IDE6NikKCmZpbG1lcyAlPiUgbXV0YXRlKGBCT1ggT0ZGSUNFYCA9IGxvZyhgQk9YIE9GRklDRWApKSAlPiUgCiAgICBwbG90YV9oY2x1c3RzXzFkKCJgQk9YIE9GRklDRWAiLCBsaW5rYWdlX21ldGhvZCA9ICJjZW50cm9pZCIsIGtzID0gMTo2KSArIAogICAgc2NhbGVfeV9sb2cxMCgpCmBgYAoKIyMgU2lsaG91ZXRhcwoKRGFkYSBhIGRpc3TDom5jaWEgbcOpZGlhIGRlIHVtIHBvbnRvIHBhcmEgb3MgZGVtYWlzIGRvIHNldSBjbHVzdGVyICRhKGkpJCBlIGEgZGlzdMOibmNpYSBtw6lkaWEgZG8gcG9udG8gcGFyYSB0b2RvcyBvcyBkZW1haXMgZG8gY2x1c3RlciBtYWlzIHByw7N4aW1vICRiKGkpJCwgYSBsYXJndXJhIGRhIHNpbGhvdWV0YSBkZSAkaSQgw6kgOiAKClxbCnMoaSkgOj0gKCBiKGkpIC0gYShpKSApIC8gbWF4KCBhKGkpLCBiKGkpICkKXF0KClJlcGFyZSBjb21vIDEgc2lnbmlmaWNhIHVtYSBib2EgYXRyaWJ1acOnw6NvIHBhcmEgJGkkLCAwIHNpZ25pZmljYSBpbmRlZmluacOnw6NvIGUgJC0xJCBzaWduaWZpY2EgcXVlIGjDoSBvdXRybyBjbHVzdGVyIG9uZGUgJGkkIGVzdGFyaWEgbWVsaG9yIGFsb2NhZG8uCgpgYGB7cn0KZGlzdGFuY2lhcyA9IGZpbG1lcyAlPiUgCiAgICBzZWxlY3QoUkFUSU5HKSAlPiUKICAgIGRpc3QobWV0aG9kID0gImV1Y2xpZGVhbiIpCgphZ3J1cGFtZW50b19ocyA9IGZpbG1lcyAlPiUgCiAgICBjb2x1bW5fdG9fcm93bmFtZXMoIlRJVExFIikgJT4lCiAgICBzZWxlY3QoUkFUSU5HKSAlPiUKICAgIGRpc3QobWV0aG9kID0gImV1Y2xpZGVhbiIpICU+JSAKICAgIGhjbHVzdChtZXRob2QgPSAiY29tcGxldGUiKQoKCnBsb3Qoc2lsaG91ZXR0ZShjdXRyZWUoYWdydXBhbWVudG9faHMsIGsgPSA0KSwgZGlzdGFuY2lhcykpCnBsb3Qoc2lsaG91ZXR0ZShjdXRyZWUoYWdydXBhbWVudG9faHMsIGsgPSAyKSwgZGlzdGFuY2lhcykpCmBgYAoKIyMgRHVhcyBkaW1lbnPDtWVzCgpgYGB7cn0KcCA9IGZpbG1lcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBSQVRJTkcsIHkgPSBgQk9YIE9GRklDRWAsIGxhYmVsID0gVElUTEUpKSArIAogICAgZ2VvbV9wb2ludCgpIApwCiNnZ3Bsb3RseShwKQpgYGAKCgpgYGB7cn0KYWdydXBhbWVudG9faF8yZCA9IGZpbG1lcyAlPiUgCiAgICBjb2x1bW5fdG9fcm93bmFtZXMoIlRJVExFIikgJT4lCiAgICBzZWxlY3QoUkFUSU5HLCBgQk9YIE9GRklDRWApICU+JQogICAgZGlzdChtZXRob2QgPSAiZXVjbGlkZWFuIikgJT4lIAogICAgaGNsdXN0KG1ldGhvZCA9ICJjZW50cm9pZCIpCgpnZ2RlbmRyb2dyYW0oYWdydXBhbWVudG9faF8yZCkKCmRhdGEuZnJhbWUoayA9IE5ST1coYWdydXBhbWVudG9faF8yZCRoZWlnaHQpOjEsIAogICAgICAgICAgIGhlaWdodCA9IGFncnVwYW1lbnRvX2hfMmQkaGVpZ2h0KSAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBrLCB5ID0gaGVpZ2h0KSkgKyAKICAgIGdlb21fbGluZShjb2xvdXIgPSAiZ3JleSIpICsgCiAgICBnZW9tX3BvaW50KCkgKyAKICAgIGxhYnMoeCA9ICJOw7ptZXJvIGRlIGNsdXN0ZXJzIHByb2R1emlkbyIsIHkgPSAiRGlzc2ltaWxhcmlkYWRlIG5hIGp1bsOnw6NvIikKCmBgYAoKQ29tbyBzZW1wcmUsIG8gYWxnb3JpdG1vIGVuY29udHJhIGdydXBvcy4gTm8gY2FzbywgcGFyZWNlbSBhdMOpIGJlbSBzZXBhcmFkb3MuIFZhbW9zIHZpc3VhbGl6w6EtbG9zOgoKYGBge3J9CnBsb3RhX2hjbHVzdHNfMmQoYWdydXBhbWVudG9faF8yZCwgCiAgICAgICAgICAgICAgICAgZmlsbWVzLCAKICAgICAgICAgICAgICAgICBjKCJSQVRJTkciLCAiYEJPWCBPRkZJQ0VgIiksIAogICAgICAgICAgICAgICAgIGxpbmthZ2VfbWV0aG9kID0gImNlbnRyb2lkIiwga3MgPSAxOjYpCmBgYAoKTyBhZ3J1cGFtZW50byBlc3TDoSBhY29udGVjZW5kbyB0b2RvIGVtIGZ1bsOnw6NvIGRlIEJPWCBPRkZJQ0UsIGFwZW5hcy4gQ29tbyBhcyBlc2NhbGFzIHPDo28gZGlmZXJlbnRlcywgQk9YIE9GRklDRSBkb21pbmEgcXVhbHF1ZXIgY8OhbGN1bG8gZGUgZGlzdMOibmNpYSBldWNsaWRpYW5hLiAKClNvbHXDp8Ojbzogc3RhbmRhcmRpemUgKGFrYSBzY2FsZSkuCgpgYGB7cn0KYWdydXBhbWVudG9faF8yZCA9IGZpbG1lcyAlPiUgCiAgICBjb2x1bW5fdG9fcm93bmFtZXMoIlRJVExFIikgJT4lCiAgICBzZWxlY3QoUkFUSU5HLCBgQk9YIE9GRklDRWApICU+JQogICAgI211dGF0ZShgQk9YIE9GRklDRWAgPSBsb2cxMChgQk9YIE9GRklDRWApKSAlPiUgCiAgICBtdXRhdGVfYWxsKGZ1bnMoc2NhbGUpKSAlPiUgCiAgICBkaXN0KG1ldGhvZCA9ICJldWNsaWRlYW4iKSAlPiUgCiAgICBoY2x1c3QobWV0aG9kID0gIndhcmQuRCIpCgpnZ2RlbmRyb2dyYW0oYWdydXBhbWVudG9faF8yZCkKCnBsb3RhX2hjbHVzdHNfMmQoYWdydXBhbWVudG9faF8yZCwgCiAgICAgICAgICAgICAgICAgZmlsbWVzLCAKICAgICAgICAgICAgICAgICBjKCJSQVRJTkciLCAiYEJPWCBPRkZJQ0VgIiksIAogICAgICAgICAgICAgICAgIGxpbmthZ2VfbWV0aG9kID0gIndhcmQuRCIsIGtzID0gMTo2KSAjICsgc2NhbGVfeV9sb2cxMCgpCmBgYAoKCiMjIE1haXMgdmFyacOhdmVpcwoKRSBzZSB0aXbDqXNzZW1vcyBtYWlzIGRlIGR1YXMgdmFyacOhdmVpcz8KCgpgYGB7cn0KZmlsbWVzMiA9IGFncnVwYW1lbnRvX2hfbWQgPSBmaWxtZXMgJT4lIAogICAgbXV0YXRlKFRJVExFX0xFTkdUSCA9IG5jaGFyKFRJVExFKSkgCgpkaXN0cyA9IGZpbG1lczIgJT4lIAogICAgY29sdW1uX3RvX3Jvd25hbWVzKCJUSVRMRSIpICU+JQogICAgbXV0YXRlKGBCT1ggT0ZGSUNFYCA9IGxvZzEwKGBCT1ggT0ZGSUNFYCkpICU+JSAKICAgIHNlbGVjdChSQVRJTkcsIGBCT1ggT0ZGSUNFYCwgVElUTEVfTEVOR1RILCBZRUFSKSAlPiUKICAgIG11dGF0ZV9hbGwoZnVucyhzY2FsZSkpICU+JSAKICAgIGRpc3QobWV0aG9kID0gImV1Y2xpZGVhbiIpCgphZ3J1cGFtZW50b19oX21kID0gZGlzdHMgJT4lIAogICAgaGNsdXN0KG1ldGhvZCA9ICJ3YXJkLkQiKQoKZ2dkZW5kcm9ncmFtKGFncnVwYW1lbnRvX2hfbWQsIHJvdGF0ZSA9IFQpCmNvcmVzID0gUkNvbG9yQnJld2VyOjpicmV3ZXIucGFsKDQsICJTZXQzIikKcGxvdChjbHVzdGVyOjpzaWxob3VldHRlKGN1dHJlZShhZ3J1cGFtZW50b19oX21kLCBrID0gNCksIGRpc3RzKSwgY29sID0gY29yZXMsIGJvcmRlciA9IE5BKQpgYGAKCmBgYHtyfQphdHJpYnVpY29lcyA9IHRpYmJsZShrID0gMTo1KSAlPiUgCiAgICBncm91cF9ieShrKSAlPiUgCiAgICBkbyhjYmluZChmaWxtZXMyLCAKICAgICAgICAgICAgIGdydXBvID0gYXMuY2hhcmFjdGVyKGN1dHJlZShhZ3J1cGFtZW50b19oX21kLCAuJGspKSkpIAoKYXRyaWJ1aWNvZXNfbG9uZyA9IGF0cmlidWljb2VzICU+JSAKICAgIG11dGF0ZShgQk9YIE9GRklDRWAgPSBzY2FsZShsb2cxMChgQk9YIE9GRklDRWApKSwgCiAgICAgICAgICAgWUVBUiA9IHNjYWxlKFlFQVIpLCAKICAgICAgICAgICBSQVRJTkcgPSBzY2FsZShSQVRJTkcpLCAKICAgICAgICAgICBUSVRMRV9MRU5HVEggPSBzY2FsZShUSVRMRV9MRU5HVEgpKSAlPiUgCiAgICBnYXRoZXIoa2V5ID0gInZhcmlhdmVsIiwgdmFsdWUgPSAidmFsb3IiLCAtVElUTEUsIC1rLCAtZ3J1cG8sIC1DUkVESVQpIAoKYXRyaWJ1aWNvZXNfbG9uZyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSB2YXJpYXZlbCwgeSA9IHZhbG9yLCBncm91cCA9IGdydXBvLCBjb2xvdXIgPSBncnVwbykpICsgCiAgICBnZW9tX3BvaW50KGFscGhhID0gLjQsIHBvc2l0aW9uID0gcG9zaXRpb25fZG9kZ2Uod2lkdGggPSAuNSkpICsgCiAgICBmYWNldF93cmFwKH4gcGFzdGUoaywgIiBncnVwb3MiKSkgKyAKICAgIGxhYnMoeCA9ICIiLCB5ID0gInotc2NvcmUiKQoKYXRyaWJ1aWNvZXNfbG9uZyAlPiUgCiAgICBmaWx0ZXIoayA9PSAzKSAlPiUKICAgIGdncGxvdChhZXMoeCA9IHZhcmlhdmVsLCAKICAgICAgICAgICAgICAgeSA9IHZhbG9yLCAKICAgICAgICAgICAgICAgY29sb3VyID0gZ3J1cG8pKSArIAogICAgZ2VvbV9ib3hwbG90KCkgKyAKICAgIGdlb21fcG9pbnQoYWxwaGEgPSAuNCwgcG9zaXRpb24gPSBwb3NpdGlvbl9qaXR0ZXIod2lkdGggPSAuMSkpICsgCiAgICBmYWNldF93cmFwKH4gZ3J1cG8pICsgCiAgICBsYWJzKHggPSAiIiwgeSA9ICJ6LXNjb3JlIikKCmF0cmlidWljb2VzX2xvbmcgJT4lIAogICAgZmlsdGVyKGsgPT0gNCkgJT4lCiAgICBnZ3Bsb3QoYWVzKHggPSB2YXJpYXZlbCwgeSA9IHZhbG9yLCBncm91cCA9IFRJVExFLCBjb2xvdXIgPSBncnVwbykpICsgCiAgICBnZW9tX3BvaW50KGFscGhhID0gLjMsIHNpemUgPSAuNSkgKyAKICAgIGdlb21fbGluZShhbHBoYSA9IC43KSArIAogICAgZmFjZXRfd3JhcCh+IHBhc3RlKCJHcnVwbyAiLCBncnVwbykpICsgCiAgICBsYWJzKHggPSAiIiwgeSA9ICJ6LXNjb3JlIikKYGBgCgo=