library(tidyverse)
library(openintro)
library(dplyr)
library(data.table)
library(arsenal)
## Warning: package 'arsenal' was built under R version 4.4.3

Overview

The purpose of this lab is to tidy and demonstrate an analysis of three different data sets. The data sets chosen were the Pokemon, cheese, and cancer data sets from the Week 5 Discussion.

Data set 1 - Pokemon Competitive Usage

The goal for the Pokemon data set is to use the usage values as target variables. In theory, almost all of a Pokemon’s properties can work as a predictor variable; even a Pokemon’s name and generation can affect its usage rate. For our analysis, we we focus on “total_stats” and “usage”.

Load and examine data

Load the Pokemon competitive usage data set. There are multiple columns including a Pokemon’s name, general properties, and usage information.

pokemon_usage_url <- 'https://raw.githubusercontent.com/Megabuster/Data607/refs/heads/main/data/project2/pokemon_competitive_analysis.csv'
pokemon_usage_raw <- fread(pokemon_usage_url)
head(pokemon_usage_raw)
##    index          name  type1   type2  ability1   ability2 hidden_ability    hp
##    <int>        <char> <char>  <char>    <char>     <char>         <char> <int>
## 1:     1     bulbasaur  grass  poison  overgrow No_ability    chlorophyll    45
## 2:     2       ivysaur  grass  poison  overgrow No_ability    chlorophyll    60
## 3:     3      venusaur  grass  poison  overgrow No_ability    chlorophyll    80
## 4:     3 venusaur-mega  grass  poison thick-fat No_ability           None    80
## 5:     3 venusaur-gmax  grass  poison  overgrow No_ability    chlorophyll    80
## 6:     4    charmander   fire No_type     blaze No_ability    solar-power    39
##    attack defense sp_atk sp_def speed total_stats legendary mythical
##     <int>   <int>  <int>  <int> <int>       <int>    <lgcl>   <lgcl>
## 1:     49      49     65     65    45         318     FALSE    FALSE
## 2:     62      63     80     80    60         405     FALSE    FALSE
## 3:     82      83    100    100    80         525     FALSE    FALSE
## 4:    100     123    122    120    80         625     FALSE    FALSE
## 5:     82      83    100    100    80         525     FALSE    FALSE
## 6:     52      43     60     50    65         309     FALSE    FALSE
##      generation Smogon_VGC_Usage_2022 Smogon_VGC_Usage_2023
##          <char>                <char>                <char>
## 1: generation-i                   0.0               NoUsage
## 2: generation-i                   0.0               NoUsage
## 3: generation-i              20.83915               NoUsage
## 4: generation-i               NoUsage               NoUsage
## 5: generation-i               NoUsage               NoUsage
## 6: generation-i                   0.0                   0.0
##    Smogon_VGC_Usage_2024 Worlds_VGC_Usage_2022 Worlds_VGC_Usage_2023
##                   <char>                <char>                <char>
## 1:                   0.0               NoUsage               NoUsage
## 2:                   0.0               NoUsage               NoUsage
## 3:                0.4441                 19.62               NoUsage
## 4:               NoUsage               NoUsage               NoUsage
## 5:               NoUsage               NoUsage               NoUsage
## 6:                   0.0               NoUsage               NoUsage
##    Worlds_VGC_Usage_2024
##                   <char>
## 1:               NoUsage
## 2:               NoUsage
## 3:                  0.09
## 4:               NoUsage
## 5:               NoUsage
## 6:               NoUsage

Wide to long data

Most of the columns are base features of the listed Pokemon, but the last 6 are VGC data which denotes the competitive format usage. Since these are observations, they should not be mixed with the other variables. This dataset is a strong candidate for converting from wide to long format. Use “melt” to combine all the VGC columns together.

pokemon_usage_long <- pokemon_usage_raw %>% melt(id.vars = c(colnames(pokemon_usage_raw)[1:17]), variable.name = 'competitive_format', value.name = 'usage')
head(pokemon_usage_long)
##    index          name  type1   type2  ability1   ability2 hidden_ability    hp
##    <int>        <char> <char>  <char>    <char>     <char>         <char> <int>
## 1:     1     bulbasaur  grass  poison  overgrow No_ability    chlorophyll    45
## 2:     2       ivysaur  grass  poison  overgrow No_ability    chlorophyll    60
## 3:     3      venusaur  grass  poison  overgrow No_ability    chlorophyll    80
## 4:     3 venusaur-mega  grass  poison thick-fat No_ability           None    80
## 5:     3 venusaur-gmax  grass  poison  overgrow No_ability    chlorophyll    80
## 6:     4    charmander   fire No_type     blaze No_ability    solar-power    39
##    attack defense sp_atk sp_def speed total_stats legendary mythical
##     <int>   <int>  <int>  <int> <int>       <int>    <lgcl>   <lgcl>
## 1:     49      49     65     65    45         318     FALSE    FALSE
## 2:     62      63     80     80    60         405     FALSE    FALSE
## 3:     82      83    100    100    80         525     FALSE    FALSE
## 4:    100     123    122    120    80         625     FALSE    FALSE
## 5:     82      83    100    100    80         525     FALSE    FALSE
## 6:     52      43     60     50    65         309     FALSE    FALSE
##      generation    competitive_format    usage
##          <char>                <fctr>   <char>
## 1: generation-i Smogon_VGC_Usage_2022      0.0
## 2: generation-i Smogon_VGC_Usage_2022      0.0
## 3: generation-i Smogon_VGC_Usage_2022 20.83915
## 4: generation-i Smogon_VGC_Usage_2022  NoUsage
## 5: generation-i Smogon_VGC_Usage_2022  NoUsage
## 6: generation-i Smogon_VGC_Usage_2022      0.0

Addressing oddities in the data

Every row is completely filled, but certain cells are effectively “NA” data: “No_type” in “type2”, “No_ability” in “ability2”, “None” in “hidden_ability”, and “NoUsage” in “usage”.

A Pokemon always has a “type1”, but might not have a “type2”. Two Pokemon with the same two types in opposite orders such as “grass” and “poison” or “poison” and “grass” are generally the same for our purposes.

For “No_ability” and “None”, “ability1” is always set and is the default if no other abilities exist. Unfortunately, there is no information within the dataset that determines which ability is being used and how often. For the scope of this analysis, these columns will not be changed.

In “usage”, “NoUsage” stands out because that should be the same as 0% usage. As 0% does exist within the “usage” column as “0.0”, what does “NoUsage” actually mean? We may be able to answer this, and thus decide how to proceed with tidying, by analyzing when we see each value. We will examine data for all Pokemon, used (any number above 0%), no usage (“NoUsage” value), and 0% usage.

used_df <- pokemon_usage_long %>% filter(usage != '0.0' & usage != 'NoUsage')
no_usage_df <- pokemon_usage_long %>% filter(usage == 'NoUsage')
zero_usage_df <- pokemon_usage_long %>% filter(usage == '0.0')
all_summar <- pokemon_usage_long %>% summarise(mean = mean(total_stats), median = median(total_stats))
used_summar <- used_df %>% summarise(mean = mean(total_stats), median = median(total_stats))
no_usage_summar <- no_usage_df %>% summarise(mean = mean(total_stats), median = median(total_stats))
zero_usage_summar <- zero_usage_df %>% summarise(mean = mean(total_stats), median = median(total_stats))
usage_means <- c(all_summar$mean, used_summar$mean, no_usage_summar$mean, zero_usage_summar$mean)
usage_medians <- c(all_summar$median, used_summar$median, no_usage_summar$median, zero_usage_summar$median)
usage_names <- c('all', 'used', 'no_usage', 'zero_usage')
barplot(usage_means, names.arg = usage_names, ylab = 'avg_total_stats')

data.frame(
  usage_type = usage_names,
  usage_mean = usage_means,
  usage_median = usage_medians
)
##   usage_type usage_mean usage_median
## 1        all   445.9593          470
## 2       used   510.0786          505
## 3   no_usage   445.1212          460
## 4 zero_usage   337.4237          320

The “total_stats” property is a collection of all the base stats of a Pokemon ranging from “hp” to “speed”. Logically, a Pokemon that is stronger, one that has higher stats, is more likely to be used. 0% usage Pokemon having the lowest total stats in the 300’s makes sense, but “no_usage” is close to the average of all Pokemon in the original dataset. The medians were also calculated as despite being numbers, “total_stats” are actually discrete and not continuous variables. Pokemon are distinctly given totals such as 470 or 505. The used set of Pokemon have the highest median at 505.

There are many potential directions at this juncture. While the original question was what effect does “total_stats” have on “usage”, a subgoal is to tidy the “NoUsage” values. One path forward is to see the distribution of each group’s “total_stats” to see if they have any odd distributions.

pokemon_usage_long %>%
  select(c('name', 'total_stats')) %>% 
  unique() %>%
  ggplot(aes(x = total_stats)) +
  geom_histogram(bins = 15, binwidth = 20) +
  labs(title = 'All Pokemon')

used_df %>%
  select(c('name', 'total_stats')) %>% 
  unique() %>%
  ggplot(aes(x = total_stats)) +
  geom_histogram(bins = 15, binwidth = 20) +
  labs(title = 'Used Pokemon')

no_usage_df %>%
  select(c('name', 'total_stats')) %>% 
  unique() %>%
  ggplot(aes(x = total_stats)) +
  geom_histogram(bins = 15, binwidth = 20) +
  labs(title = 'No Usage Pokemon')

zero_usage_df %>%
  select(c('name', 'total_stats')) %>% 
  unique() %>%
  ggplot(aes(x = total_stats)) +
  geom_histogram(bins = 15, binwidth = 20) +
  labs(title = 'Zero Usage Pokemon')

These histograms agree with what the averages implied. Used Pokemon usually had at least over 500 total stats. Zero usage Pokemon peaked at around 300 total stats. The plot of all Pokemon and no usage Pokemon are somehow still very similar. One more test can be done to verify this finding is the “comparef” function that will show how many observations, individual Pokemon, are not in the no usage group.

comparedf(pokemon_usage_long %>% select(c('name', 'total_stats')) %>% unique(), 
          no_usage_df %>% select(c('name', 'total_stats')) %>% unique())
## Compare Object
## 
## Function Call: 
## comparedf(x = pokemon_usage_long %>% select(c("name", "total_stats")) %>% 
##     unique(), y = no_usage_df %>% select(c("name", "total_stats")) %>% 
##     unique())
## 
## Shared: 2 non-by variables and 1272 observations.
## Not shared: 0 variables and 31 observations.
## 
## Differences found in 2/2 variables compared.
## 0 variables compared have non-identical attributes.

With this result that 1272/1303 Pokemon matched, it appears that “total_stats” has almost no correlation with the “NoUsage” value. Almost every Pokemon had a competitive format where it had “NoUsage” regardless of its stats. Given the original question of whether or not “total_stats” impacted usage, we will remove instances of “NoUsage” as they are merely noise for our purpose. This feels rudimentary, but “NoUsage” clearly does not mean 0% usage and cannot be trusted to provide meaningful information.

This leads to the next issue with the original data. The “usage” column treats its values as strings instead of numbers because of “NoUsage” and the small values that use “e” in them. We can start by making a data frame without “NoUsage” in it. The resulting “usage” column is still a character type. Turn the usage values into numerics so that they can be operated on.

pokemon_usage_df <- pokemon_usage_long %>% filter(usage != 'NoUsage')
# typeof(pokemon_usage_df$usage)
pokemon_usage_df_fixed <- transform(pokemon_usage_df, usage = as.numeric(usage))
head(pokemon_usage_df_fixed)
##    index       name  type1   type2 ability1   ability2 hidden_ability    hp
##    <int>     <char> <char>  <char>   <char>     <char>         <char> <int>
## 1:     1  bulbasaur  grass  poison overgrow No_ability    chlorophyll    45
## 2:     2    ivysaur  grass  poison overgrow No_ability    chlorophyll    60
## 3:     3   venusaur  grass  poison overgrow No_ability    chlorophyll    80
## 4:     4 charmander   fire No_type    blaze No_ability    solar-power    39
## 5:     5 charmeleon   fire No_type    blaze No_ability    solar-power    58
## 6:     6  charizard   fire  flying    blaze No_ability    solar-power    78
##    attack defense sp_atk sp_def speed total_stats legendary mythical
##     <int>   <int>  <int>  <int> <int>       <int>    <lgcl>   <lgcl>
## 1:     49      49     65     65    45         318     FALSE    FALSE
## 2:     62      63     80     80    60         405     FALSE    FALSE
## 3:     82      83    100    100    80         525     FALSE    FALSE
## 4:     52      43     60     50    65         309     FALSE    FALSE
## 5:     64      58     80     65    80         405     FALSE    FALSE
## 6:     84      78    109     85   100         534     FALSE    FALSE
##      generation    competitive_format    usage
##          <char>                <fctr>    <num>
## 1: generation-i Smogon_VGC_Usage_2022  0.00000
## 2: generation-i Smogon_VGC_Usage_2022  0.00000
## 3: generation-i Smogon_VGC_Usage_2022 20.83915
## 4: generation-i Smogon_VGC_Usage_2022  0.00000
## 5: generation-i Smogon_VGC_Usage_2022  0.00000
## 6: generation-i Smogon_VGC_Usage_2022  7.17374

The main question was the impact of “total_stats” on “usage”. Let’s plot the correlation between these values.

pokemon_usage_df_fixed %>% 
  ggplot(aes(x = total_stats, y = usage)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

pokemon_usage_df_fixed %>% 
  filter(usage > 1) %>%
  ggplot(aes(x = total_stats, y = usage)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

There does appear to be an effect on usage by total_stats. However, most usage rates are still clustered near 0% even among the Pokemon with higher total_stats. The highest usage rate Pokemon at around 60% usage and higher are generally in the 500s. Setting “usage” above 1% to remove the impact of barely used Pokemon shows a peak usage of Pokemon around 550 total stats. This implies that other predictors are also affecting usage.

The average usage percentage of a Pokemon is 1.6%. Keep this in mind for the final plot.

pokemon_usage_df_fixed %>% summarise(mean = mean(usage))
##       mean
## 1 1.608681
aggr_df <- aggregate(usage ~ total_stats, data = pokemon_usage_df_fixed, mean)
aggr_df %>%
  filter() %>%
  ggplot(aes(x = total_stats, y = usage)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

The individual Pokemon that were used a lot skew the results heavily. However, the trend here seems to fit what we had assumed. The average usage of lower “total_stat” Pokemon is extremely low compared to the higher ones. We can try to regroup the data into stat groups to reduce the impact of individual Pokemon outliers. This categorization is because “total_stat” is discrete and not continuous. A single Pokemon with a 525 stat total and 50% usage does not automatically mean we should see a higher usage rate for a 530 total Pokemon.

Pokemon final analysis

test_combine <- pokemon_usage_df_fixed  %>% reframe(stat_group = round(total_stats, digits = -2), name = name, usage = usage) %>% aggregate(usage ~ stat_group, mean)
test_combine
##   stat_group      usage
## 1        200 0.04548513
## 2        300 0.02637072
## 3        400 0.16347696
## 4        500 1.41415392
## 5        600 6.19699620
## 6        700 5.42428733
test_combine %>% 
  ggplot() +
  geom_bar(aes(x = stat_group, y = usage), stat = 'identity') +
  labs(title = "Pokemon usage by total stat group (rounded)")

Combine each Pokemon by total_stat groups, rounded to the nearest hundred. This means 550-649 will round to 600. Notably, this is now the highest average usage group. The correlation between “usage” and “total_stats” is much more apparent. The 600 and 700 total stat groups are by far the most represented in battles with 6% and 5% respectively. The 500 total stat group is down at 1.4% now, while the smaller groups average usage rates of well below 1%. 1.4% is still quite close to the 1.6% mean. It is possible there are some extraneous factors that prevent the highest “stat_group” from being the most used of all such as a ban list that would disallow usage of certain Pokemon during specific tournaments, but those variables were not collected in this data set.

Dataset 2 - Cheese

The data set chosen is a list of different cheeses and properties about them such as taste, origin, and nutritional facts. The goal is to compare the cheeses by region and determine if there’s a trend in flavor.

Start by loading the data.

cheese_url <- 'https://raw.githubusercontent.com/Megabuster/Data607/refs/heads/main/data/project2/cheese.csv'
cheese_raw <- fread(cheese_url)
head(cheese_raw)
##                cheese                                        url   milk
##                <char>                                     <char> <char>
## 1:         Aarewasser         https://www.cheese.com/aarewasser/    cow
## 2:   Abbaye de Belloc   https://www.cheese.com/abbaye-de-belloc/  sheep
## 3:   Abbaye de Belval   https://www.cheese.com/abbaye-de-belval/    cow
## 4:  Abbaye de Citeaux  https://www.cheese.com/abbaye-de-citeaux/    cow
## 5:    Abbaye de Tamié              https://www.cheese.com/tamie/    cow
## 6: Abbaye de Timadeuc https://www.cheese.com/abbaye-de-timadeuc/    cow
##        country               region family                       type
##         <char>               <char> <char>                     <char>
## 1: Switzerland                 <NA>   <NA>                  semi-soft
## 2:      France          Pays Basque   <NA>         semi-hard, artisan
## 3:      France                 <NA>   <NA>                  semi-hard
## 4:      France             Burgundy   <NA> semi-soft, artisan, brined
## 5:      France               Savoie   <NA>              soft, artisan
## 6:      France province of Brittany   <NA>                  semi-hard
##    fat_content calcium_content               texture    rind       color
##         <char>          <char>                <char>  <char>      <char>
## 1:        <NA>            <NA>               buttery  washed      yellow
## 2:        <NA>            <NA>   creamy, dense, firm natural      yellow
## 3:      40-46%            <NA>               elastic  washed       ivory
## 4:        <NA>            <NA> creamy, dense, smooth  washed       white
## 5:        <NA>            <NA>  creamy, open, smooth  washed       white
## 6:        <NA>            <NA>                  soft  washed pale yellow
##                   flavor             aroma vegetarian  vegan
##                   <char>            <char>     <lgcl> <lgcl>
## 1:                 sweet           buttery      FALSE  FALSE
## 2:         burnt caramel          lanoline       TRUE  FALSE
## 3:                  <NA>          aromatic      FALSE  FALSE
## 4: acidic, milky, smooth barnyardy, earthy      FALSE  FALSE
## 5:         fruity, nutty perfumed, pungent      FALSE  FALSE
## 6:         salty, smooth             nutty      FALSE  FALSE
##                       synonyms                             alt_spellings
##                         <char>                                    <char>
## 1:                        <NA>                                      <NA>
## 2: Abbaye Notre-Dame de Belloc                                      <NA>
## 3:                        <NA>                                      <NA>
## 4:                        <NA>                                      <NA>
## 5:                        <NA> Tamié, Trappiste de Tamie, Abbey of Tamie
## 6:                        <NA>                                      <NA>
##                                     producers
##                                        <char>
## 1:                                       Jumi
## 2:                                       <NA>
## 3:                                       <NA>
## 4:                                       <NA>
## 5:                                       <NA>
## 6: Abbaye Cistercienne NOTRE-DAME DE TIMADEUC

Tidying region and country columns

As one of the main variables we are analyzing today, we want to see how regions look. Nearly 28% of cheeses do not have a region. There is also very little consistency regarding the way regions are listed. There are cities, states, multiple municipalities, actual regions, regions within states, contracted names, names with prepositions, etc.

sum(is.na(cheese_raw$region)) / nrow(cheese_raw) * 100
## [1] 27.96967
# Sample country for missing data twice because apparently they are not listed as "NA"
sum(is.na(cheese_raw$country)) / nrow(cheese_raw) * 100
## [1] 0
sum(cheese_raw$country == '') / nrow(cheese_raw) * 100
## [1] 0.926706

A key to this analysis is that each “region” is going to be within a country and fortunately all of those have values… or do they? Actually, there are countries that are missing, but they are listed as empty strings instead of “NA”. Over 99% of countries are available making them a reliable starting point. Let’s group the countries together and see what regions are associated with each one.

aggr_country <- aggregate(region ~ country, cheese_raw, \(cheese_raw) paste(unique(cheese_raw), collapse = ", "))
head(aggr_country)
##                                                                  country
## 1                                                                       
## 2 Albania, Bulgaria, Croatia, Greece, Israel, Macedonia, Romania, Serbia
## 3                                                              Australia
## 4                                                                Austria
## 5                                                                Belgium
## 6                                          Belgium, Germany, Netherlands
##                                                                                                                                                                                                  region
## 1                                                                                                                                 Carrigtwohill, Cornwall, North East Victoria, Veneto, Inagh, Co Clare
## 2                                                                                                                                                                                                Trakia
## 3 South Australia, Brisbane, Mornington Peninsula, Melbourne, Pokolbin, Hunter Valley, Gippsland, Victoria, Victoria, Ballarat, Victoria, Tasmania, North East Victoria, New South Wales, Hunter Valley
## 4                                                                                           Vorarlberg, Bregenzerwald, Kleinwalsertal, Großwalsertal, Laiblachtal (Pfänderstock) and Rheintal, Sulzberg
## 5                                                                                                                                                                Flanders, Wallonia, Passendale, Postel
## 6                                                                                                                                                                                      Duchy of Limburg

Actually, the countries are not tidy either. There are multiple countries grouped together and sometimes a country has a sovereign state (England as a part of the United Kingdom). Without additional contextual information, there are multiple possible paths forward. One logical assumption is that countries or regions that are co-credited for a cheese are probably either similar culturally or are neighboring entities. The flaw in that assumption is that you could argue that there are similarities between the United States and Italy and between the United States and Mexico, but that does not transitively mean that Mexico and Italy are related.

From a quick check, we find that there are 6 instances where no region or country is provided. Since we’re trying to see the impact of location on the cheese’s properties, we will remove these cheeses as we are unable to analyze them without more context.

cheese_reduced <- cheese_raw %>% filter(!(cheese_raw$country == '' & is.na(cheese_raw$region)))

cheese_longest <- cheese_reduced %>% select(cheese, country, region, flavor) %>% separate_longer_delim(country, ', ') %>% separate_longer_delim(region, ', ') %>% separate_longer_delim(flavor, ', ')
head(cheese_longest)
##              cheese     country      region        flavor
## 1        Aarewasser Switzerland        <NA>         sweet
## 2  Abbaye de Belloc      France Pays Basque burnt caramel
## 3  Abbaye de Belval      France        <NA>          <NA>
## 4 Abbaye de Citeaux      France    Burgundy        acidic
## 5 Abbaye de Citeaux      France    Burgundy         milky
## 6 Abbaye de Citeaux      France    Burgundy        smooth

Cheese final analysis

Going over the resulting “cheese_longest” dataframe, we get a spoiler regarding the fact that this is the extent that we will be lengthening this data set today. Let’s see what impacts country or region have on flavor.

distinct_country_cheese <- cheese_longest %>% filter(country != '' & !is.na(flavor)) %>% select(cheese, country, flavor) %>% distinct() 

sum_country_cheese <- distinct_country_cheese %>% 
  select(country, flavor) %>%
  group_by(country, flavor) %>%
  summarise(total = n(), across(everything(), first), .groups = 'drop')

head(sum_country_cheese %>% arrange(desc(total)))
## # A tibble: 6 × 3
##   country       flavor  total
##   <chr>         <chr>   <int>
## 1 United States sweet     100
## 2 United States creamy     89
## 3 United States nutty      77
## 4 United States tangy      76
## 5 United States buttery    65
## 6 United States mild       52
aggregate(total ~ country + flavor, data = sum_country_cheese, sum) %>% ggplot(aes(x = country, y = total, fill = flavor)) +
  geom_bar(stat = 'identity', position = 'dodge') +
  labs(title = 'Cheese flavor trends per country') +
  coord_flip()

distinct_region_cheese <- cheese_longest %>% filter(!is.na(region) & !is.na(flavor)) %>% select(cheese, region, flavor) %>% distinct() 

sum_region_cheese <- distinct_region_cheese %>% 
  select(region, flavor) %>%
  group_by(region, flavor) %>%
  summarise(total = n(), across(everything(), first), .groups = 'drop')

head(sum_region_cheese %>% arrange(desc(total)))
## # A tibble: 6 × 3
##   region    flavor  total
##   <chr>     <chr>   <int>
## 1 Wisconsin sweet      31
## 2 Wisconsin nutty      21
## 3 Quebec    creamy     20
## 4 Wisconsin creamy     19
## 5 Quebec    buttery    18
## 6 Wisconsin mild       18

This data set pushes the limits of what can be reasonably viewed within a bar plot. It appears that Canada, Italy and the United States were especially well represented in terms of flavors of cheese. Notable trends from examining the top of the country specific data is the prevalence of sweet flavors which matches a common stereotype.

There were 380 unique region names still remaining in this modified data frame, so I saved us the trouble of trying to read that plot. The region table shows that Wisconsin produces a lot of sweet cheese as well. As an American state, this corroborates the implications from the previous table.

While the final dataframes were still not perfectly tidy, we were able to see notable consistent trends when analyzing the most recurring data within the cheese data set.

Dataset 3 - Cancer Data

This data set is about breast cancer cases from 2020-2024 at Houston’s MD Anderson Cancer Center. Tidying the data will focus on making the data longer. The analysis goal will be to see how race demographics impact case totals.

Tidying the data

Start by loading the data.

cancer_url <- 'https://raw.githubusercontent.com/Megabuster/Data607/refs/heads/main/data/project2/Untidydataset%20_624.csv'
cancer_raw <- fread(cancer_url)
head(cancer_raw)
##                                          V1       V2        V3       V4
##                                      <char>   <char>    <char>   <char>
## 1: Untidy Breast Cancer Dataset (2020-2024)                            
## 2:                               Patient_ID    Month Age_Group     Race
## 3:                                     P001  January     30-39    White
## 4:                                     P002  January     40-49    Black
## 5:                                     P003  January     50-59 Hispanic
## 6:                                     P004 February     30-39    Asian
##            V5         V6         V7         V8         V9            V10
##        <char>     <char>     <char>     <char>     <char>         <char>
## 1:                                                                      
## 2: Cases_2020 Cases_2021 Cases_2022 Cases_2023 Cases_2024 Total_Patients
## 3:         50         55         60         65         70            300
## 4:         90         95        100        105        120            510
## 5:         80         85         88         90        110            453
## 6:         70         75         80         85         90            400
##                V11
##             <char>
## 1:                
## 2:            Type
## 3: Invasive Ductal
## 4:         Lobular
## 5: Triple-Negative
## 6:           HER2+

This data set was converted from xlsx to csv type to streamline it with the other data sets. Inadvertently, this has led to the form of the data getting untidier than before. We will need to remove the first two rows and make row 3 the new column names.

names(cancer_raw) <- unlist(cancer_raw[2])
cancer_fixed_col <- cancer_raw[-c(1:2),]
head(cancer_fixed_col)
##    Patient_ID    Month Age_Group     Race Cases_2020 Cases_2021 Cases_2022
##        <char>   <char>    <char>   <char>     <char>     <char>     <char>
## 1:       P001  January     30-39    White         50         55         60
## 2:       P002  January     40-49    Black         90         95        100
## 3:       P003  January     50-59 Hispanic         80         85         88
## 4:       P004 February     30-39    Asian         70         75         80
## 5:       P005 February     40-49    White        110        115        120
## 6:       P006    March     50-59    Black        130        135        140
##    Cases_2023 Cases_2024 Total_Patients            Type
##        <char>     <char>         <char>          <char>
## 1:         65         70            300 Invasive Ductal
## 2:        105        120            510         Lobular
## 3:         90        110            453 Triple-Negative
## 4:         85         90            400           HER2+
## 5:        125        140            610    Inflammatory
## 6:        145        160            710 Invasive Ductal

With the column names now correctly assigned, let’s remove “total_patients” as it will be useless for us after we lengthen the data. Then we will use the “melt” function to extract the years into a column with the values being another column. Clean up the new “Year” column by removing “Cases_” and making the resulting years into numeric values.

cancer_fixed_col$Total_Patients <- NULL
head(cancer_fixed_col)
##    Patient_ID    Month Age_Group     Race Cases_2020 Cases_2021 Cases_2022
##        <char>   <char>    <char>   <char>     <char>     <char>     <char>
## 1:       P001  January     30-39    White         50         55         60
## 2:       P002  January     40-49    Black         90         95        100
## 3:       P003  January     50-59 Hispanic         80         85         88
## 4:       P004 February     30-39    Asian         70         75         80
## 5:       P005 February     40-49    White        110        115        120
## 6:       P006    March     50-59    Black        130        135        140
##    Cases_2023 Cases_2024            Type
##        <char>     <char>          <char>
## 1:         65         70 Invasive Ductal
## 2:        105        120         Lobular
## 3:         90        110 Triple-Negative
## 4:         85         90           HER2+
## 5:        125        140    Inflammatory
## 6:        145        160 Invasive Ductal
cancer_melt <- cancer_fixed_col %>% melt(id.vars = c(colnames(cancer_fixed_col)[1:4],colnames(cancer_fixed_col)[10]), variable.name = 'Year', value.name = 'Cases')
head(cancer_melt)
##    Patient_ID    Month Age_Group     Race            Type       Year  Cases
##        <char>   <char>    <char>   <char>          <char>     <fctr> <char>
## 1:       P001  January     30-39    White Invasive Ductal Cases_2020     50
## 2:       P002  January     40-49    Black         Lobular Cases_2020     90
## 3:       P003  January     50-59 Hispanic Triple-Negative Cases_2020     80
## 4:       P004 February     30-39    Asian           HER2+ Cases_2020     70
## 5:       P005 February     40-49    White    Inflammatory Cases_2020    110
## 6:       P006    March     50-59    Black Invasive Ductal Cases_2020    130
cancer_melt <- cancer_melt %>% mutate(Year = as.numeric(str_remove(Year, 'Cases_')))
cancer_melt <- cancer_melt %>% mutate(Cases = as.numeric(Cases))
head(cancer_melt)
##    Patient_ID    Month Age_Group     Race            Type  Year Cases
##        <char>   <char>    <char>   <char>          <char> <num> <num>
## 1:       P001  January     30-39    White Invasive Ductal  2020    50
## 2:       P002  January     40-49    Black         Lobular  2020    90
## 3:       P003  January     50-59 Hispanic Triple-Negative  2020    80
## 4:       P004 February     30-39    Asian           HER2+  2020    70
## 5:       P005 February     40-49    White    Inflammatory  2020   110
## 6:       P006    March     50-59    Black Invasive Ductal  2020   130

Cancer final analysis

With the data better formatted, let’s take advantage of having the cases and year properly split. Let’s see how many cancer cases each race had from 2020-2024.

aggregate(Cases ~ Race + Year, data = cancer_melt, sum)
##        Race Year Cases
## 1     Asian 2020   140
## 2     Black 2020   310
## 3  Hispanic 2020   300
## 4     White 2020   220
## 5     Asian 2021   150
## 6     Black 2021   325
## 7  Hispanic 2021   315
## 8     White 2021   235
## 9     Asian 2022   158
## 10    Black 2022   340
## 11 Hispanic 2022   328
## 12    White 2022   250
## 13    Asian 2023   165
## 14    Black 2023   355
## 15 Hispanic 2023   340
## 16    White 2023   265
## 17    Asian 2024   190
## 18    Black 2024   400
## 19 Hispanic 2024   390
## 20    White 2024   295
aggregate(Cases ~ Race + Year, data = cancer_melt, sum) %>% ggplot(aes(x = Year, y = Cases, fill = Race)) +
  geom_bar(stat = 'identity', position = 'dodge') +
  labs(title = 'Cancer rate from 2020-2024 by race')

Nothing looks out of place in this chart. The total cancer cases for each race increased gradually each year. This trend did not affect the ratio of cancer rates for each race against each other. Asians had the lowest rate while Blacks had the highest.

Findings and Recommendations

Not all untidy data look the same. Many similar techniques were used to tidy each data set, but different nuances were still required.

The Pokemon data set was a good candidate for a wide to long transformation. Since the goal was to compare “total_stats” and “usage” to determine if there was a correlation, the focus was on tidying the “usage” column. A choice was made to remove any data listed as “NoUsage” because through further analysis, there was not enough information to decide if a technique like imputation was better served. The need to analyze the data before determining a tidying strategy was a unique feature of this data set. Further work on the data could include breaking down the formats into specific years and tournaments to see time series trends.

The hardest data set to work with was the cheese data as without more context beyond what the data set provided, it was nearly impossible to account for every exception. Concessions had to be made when tidying because too many manual fixes and assumptions needed to be made. A good lesson from this work was that despite the issues, trend data still stood out. As more nuanced trends would be hard to detect, further work would be introducing geographic data sets that can help with cleaning and categorizing the countries and regions.

The main takeaway from working with the cancer data set was the importance of practice. It was the last data set to be tidied and that felt evident as many patterns were quicker to find. Further work could be to incorporate the other predictors.

LS0tDQp0aXRsZTogIkRhdGEgNjA3IC0gUHJvamVjdCAyIg0KYXV0aG9yOiAiTGF3cmVuY2UgWXUiDQpkYXRlOiAiYHIgU3lzLkRhdGUoKWAiDQpvdXRwdXQ6IG9wZW5pbnRybzo6bGFiX3JlcG9ydA0KLS0tDQoNCmBgYHtyIGxvYWQtcGFja2FnZXMsIG1lc3NhZ2U9RkFMU0V9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkob3BlbmludHJvKQ0KbGlicmFyeShkcGx5cikNCmxpYnJhcnkoZGF0YS50YWJsZSkNCmxpYnJhcnkoYXJzZW5hbCkNCmBgYA0KDQojIyBPdmVydmlldw0KVGhlIHB1cnBvc2Ugb2YgdGhpcyBsYWIgaXMgdG8gdGlkeSBhbmQgZGVtb25zdHJhdGUgYW4gYW5hbHlzaXMgb2YgdGhyZWUgZGlmZmVyZW50IGRhdGEgc2V0cy4gVGhlIGRhdGEgc2V0cyBjaG9zZW4gd2VyZSB0aGUgUG9rZW1vbiwgY2hlZXNlLCBhbmQgY2FuY2VyIGRhdGEgc2V0cyBmcm9tIHRoZSBXZWVrIDUgRGlzY3Vzc2lvbi4gDQoNCiMjIERhdGEgc2V0IDEgLSBQb2tlbW9uIENvbXBldGl0aXZlIFVzYWdlDQoNClRoZSBnb2FsIGZvciB0aGUgUG9rZW1vbiBkYXRhIHNldCBpcyB0byB1c2UgdGhlIHVzYWdlIHZhbHVlcyBhcyB0YXJnZXQgdmFyaWFibGVzLiBJbiB0aGVvcnksIGFsbW9zdCBhbGwgb2YgYSBQb2tlbW9uJ3MgcHJvcGVydGllcyBjYW4gd29yayBhcyBhIHByZWRpY3RvciB2YXJpYWJsZTsgZXZlbiBhIFBva2Vtb24ncyBuYW1lIGFuZCBnZW5lcmF0aW9uIGNhbiBhZmZlY3QgaXRzIHVzYWdlIHJhdGUuIEZvciBvdXIgYW5hbHlzaXMsIHdlIHdlIGZvY3VzIG9uICJ0b3RhbF9zdGF0cyIgYW5kICJ1c2FnZSIuDQoNCiMjIyBMb2FkIGFuZCBleGFtaW5lIGRhdGENCg0KTG9hZCB0aGUgUG9rZW1vbiBjb21wZXRpdGl2ZSB1c2FnZSBkYXRhIHNldC4gVGhlcmUgYXJlIG11bHRpcGxlIGNvbHVtbnMgaW5jbHVkaW5nIGEgUG9rZW1vbidzIG5hbWUsIGdlbmVyYWwgcHJvcGVydGllcywgYW5kIHVzYWdlIGluZm9ybWF0aW9uLg0KDQpgYGB7ciBnZXQtcG9rZS1kYXRhfQ0KcG9rZW1vbl91c2FnZV91cmwgPC0gJ2h0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9NZWdhYnVzdGVyL0RhdGE2MDcvcmVmcy9oZWFkcy9tYWluL2RhdGEvcHJvamVjdDIvcG9rZW1vbl9jb21wZXRpdGl2ZV9hbmFseXNpcy5jc3YnDQpwb2tlbW9uX3VzYWdlX3JhdyA8LSBmcmVhZChwb2tlbW9uX3VzYWdlX3VybCkNCmhlYWQocG9rZW1vbl91c2FnZV9yYXcpDQpgYGANCg0KIyMjIFdpZGUgdG8gbG9uZyBkYXRhIA0KDQpNb3N0IG9mIHRoZSBjb2x1bW5zIGFyZSBiYXNlIGZlYXR1cmVzIG9mIHRoZSBsaXN0ZWQgUG9rZW1vbiwgYnV0IHRoZSBsYXN0IDYgYXJlIFZHQyBkYXRhIHdoaWNoIGRlbm90ZXMgdGhlIGNvbXBldGl0aXZlIGZvcm1hdCB1c2FnZS4gU2luY2UgdGhlc2UgYXJlIG9ic2VydmF0aW9ucywgdGhleSBzaG91bGQgbm90IGJlIG1peGVkIHdpdGggdGhlIG90aGVyIHZhcmlhYmxlcy4gVGhpcyBkYXRhc2V0IGlzIGEgc3Ryb25nIGNhbmRpZGF0ZSBmb3IgY29udmVydGluZyBmcm9tIHdpZGUgdG8gbG9uZyBmb3JtYXQuIFVzZSAibWVsdCIgdG8gY29tYmluZSBhbGwgdGhlIFZHQyBjb2x1bW5zIHRvZ2V0aGVyLiANCmBgYHtyIHdpZGUtdG8tbG9uZ30NCnBva2Vtb25fdXNhZ2VfbG9uZyA8LSBwb2tlbW9uX3VzYWdlX3JhdyAlPiUgbWVsdChpZC52YXJzID0gYyhjb2xuYW1lcyhwb2tlbW9uX3VzYWdlX3JhdylbMToxN10pLCB2YXJpYWJsZS5uYW1lID0gJ2NvbXBldGl0aXZlX2Zvcm1hdCcsIHZhbHVlLm5hbWUgPSAndXNhZ2UnKQ0KaGVhZChwb2tlbW9uX3VzYWdlX2xvbmcpDQpgYGANCg0KIyMjIEFkZHJlc3Npbmcgb2RkaXRpZXMgaW4gdGhlIGRhdGENCg0KRXZlcnkgcm93IGlzIGNvbXBsZXRlbHkgZmlsbGVkLCBidXQgY2VydGFpbiBjZWxscyBhcmUgZWZmZWN0aXZlbHkgIk5BIiBkYXRhOiAiTm9fdHlwZSIgaW4gInR5cGUyIiwgIk5vX2FiaWxpdHkiIGluICJhYmlsaXR5MiIsICJOb25lIiBpbiAiaGlkZGVuX2FiaWxpdHkiLCBhbmQgIk5vVXNhZ2UiIGluICJ1c2FnZSIuIA0KDQpBIFBva2Vtb24gYWx3YXlzIGhhcyBhICJ0eXBlMSIsIGJ1dCBtaWdodCBub3QgaGF2ZSBhICJ0eXBlMiIuIFR3byBQb2tlbW9uIHdpdGggdGhlIHNhbWUgdHdvIHR5cGVzIGluIG9wcG9zaXRlIG9yZGVycyBzdWNoIGFzICJncmFzcyIgYW5kICJwb2lzb24iIG9yICJwb2lzb24iIGFuZCAiZ3Jhc3MiIGFyZSBnZW5lcmFsbHkgdGhlIHNhbWUgZm9yIG91ciBwdXJwb3Nlcy4gDQoNCkZvciAiTm9fYWJpbGl0eSIgYW5kICJOb25lIiwgImFiaWxpdHkxIiBpcyBhbHdheXMgc2V0IGFuZCBpcyB0aGUgZGVmYXVsdCBpZiBubyBvdGhlciBhYmlsaXRpZXMgZXhpc3QuIFVuZm9ydHVuYXRlbHksIHRoZXJlIGlzIG5vIGluZm9ybWF0aW9uIHdpdGhpbiB0aGUgZGF0YXNldCB0aGF0IGRldGVybWluZXMgd2hpY2ggYWJpbGl0eSBpcyBiZWluZyB1c2VkIGFuZCBob3cgb2Z0ZW4uIEZvciB0aGUgc2NvcGUgb2YgdGhpcyBhbmFseXNpcywgdGhlc2UgY29sdW1ucyB3aWxsIG5vdCBiZSBjaGFuZ2VkLiANCg0KSW4gInVzYWdlIiwgIk5vVXNhZ2UiIHN0YW5kcyBvdXQgYmVjYXVzZSB0aGF0IHNob3VsZCBiZSB0aGUgc2FtZSBhcyAwJSB1c2FnZS4gQXMgMCUgZG9lcyBleGlzdCB3aXRoaW4gdGhlICJ1c2FnZSIgY29sdW1uIGFzICIwLjAiLCB3aGF0IGRvZXMgIk5vVXNhZ2UiIGFjdHVhbGx5IG1lYW4/IFdlIG1heSBiZSBhYmxlIHRvIGFuc3dlciB0aGlzLCBhbmQgdGh1cyBkZWNpZGUgaG93IHRvIHByb2NlZWQgd2l0aCB0aWR5aW5nLCBieSBhbmFseXppbmcgd2hlbiB3ZSBzZWUgZWFjaCB2YWx1ZS4gV2Ugd2lsbCBleGFtaW5lIGRhdGEgZm9yIGFsbCBQb2tlbW9uLCB1c2VkIChhbnkgbnVtYmVyIGFib3ZlIDAlKSwgbm8gdXNhZ2UgKCJOb1VzYWdlIiB2YWx1ZSksIGFuZCAwJSB1c2FnZS4NCg0KYGBge3Igbm91c2FnZS12cy16ZXJvfQ0KdXNlZF9kZiA8LSBwb2tlbW9uX3VzYWdlX2xvbmcgJT4lIGZpbHRlcih1c2FnZSAhPSAnMC4wJyAmIHVzYWdlICE9ICdOb1VzYWdlJykNCm5vX3VzYWdlX2RmIDwtIHBva2Vtb25fdXNhZ2VfbG9uZyAlPiUgZmlsdGVyKHVzYWdlID09ICdOb1VzYWdlJykNCnplcm9fdXNhZ2VfZGYgPC0gcG9rZW1vbl91c2FnZV9sb25nICU+JSBmaWx0ZXIodXNhZ2UgPT0gJzAuMCcpDQphbGxfc3VtbWFyIDwtIHBva2Vtb25fdXNhZ2VfbG9uZyAlPiUgc3VtbWFyaXNlKG1lYW4gPSBtZWFuKHRvdGFsX3N0YXRzKSwgbWVkaWFuID0gbWVkaWFuKHRvdGFsX3N0YXRzKSkNCnVzZWRfc3VtbWFyIDwtIHVzZWRfZGYgJT4lIHN1bW1hcmlzZShtZWFuID0gbWVhbih0b3RhbF9zdGF0cyksIG1lZGlhbiA9IG1lZGlhbih0b3RhbF9zdGF0cykpDQpub191c2FnZV9zdW1tYXIgPC0gbm9fdXNhZ2VfZGYgJT4lIHN1bW1hcmlzZShtZWFuID0gbWVhbih0b3RhbF9zdGF0cyksIG1lZGlhbiA9IG1lZGlhbih0b3RhbF9zdGF0cykpDQp6ZXJvX3VzYWdlX3N1bW1hciA8LSB6ZXJvX3VzYWdlX2RmICU+JSBzdW1tYXJpc2UobWVhbiA9IG1lYW4odG90YWxfc3RhdHMpLCBtZWRpYW4gPSBtZWRpYW4odG90YWxfc3RhdHMpKQ0KdXNhZ2VfbWVhbnMgPC0gYyhhbGxfc3VtbWFyJG1lYW4sIHVzZWRfc3VtbWFyJG1lYW4sIG5vX3VzYWdlX3N1bW1hciRtZWFuLCB6ZXJvX3VzYWdlX3N1bW1hciRtZWFuKQ0KdXNhZ2VfbWVkaWFucyA8LSBjKGFsbF9zdW1tYXIkbWVkaWFuLCB1c2VkX3N1bW1hciRtZWRpYW4sIG5vX3VzYWdlX3N1bW1hciRtZWRpYW4sIHplcm9fdXNhZ2Vfc3VtbWFyJG1lZGlhbikNCnVzYWdlX25hbWVzIDwtIGMoJ2FsbCcsICd1c2VkJywgJ25vX3VzYWdlJywgJ3plcm9fdXNhZ2UnKQ0KYmFycGxvdCh1c2FnZV9tZWFucywgbmFtZXMuYXJnID0gdXNhZ2VfbmFtZXMsIHlsYWIgPSAnYXZnX3RvdGFsX3N0YXRzJykNCg0KZGF0YS5mcmFtZSgNCiAgdXNhZ2VfdHlwZSA9IHVzYWdlX25hbWVzLA0KICB1c2FnZV9tZWFuID0gdXNhZ2VfbWVhbnMsDQogIHVzYWdlX21lZGlhbiA9IHVzYWdlX21lZGlhbnMNCikNCmBgYA0KVGhlICJ0b3RhbF9zdGF0cyIgcHJvcGVydHkgaXMgYSBjb2xsZWN0aW9uIG9mIGFsbCB0aGUgYmFzZSBzdGF0cyBvZiBhIFBva2Vtb24gcmFuZ2luZyBmcm9tICJocCIgdG8gInNwZWVkIi4gTG9naWNhbGx5LCBhIFBva2Vtb24gdGhhdCBpcyBzdHJvbmdlciwgb25lIHRoYXQgaGFzIGhpZ2hlciBzdGF0cywgaXMgbW9yZSBsaWtlbHkgdG8gYmUgdXNlZC4gMCUgdXNhZ2UgUG9rZW1vbiBoYXZpbmcgdGhlIGxvd2VzdCB0b3RhbCBzdGF0cyBpbiB0aGUgMzAwJ3MgbWFrZXMgc2Vuc2UsIGJ1dCAibm9fdXNhZ2UiIGlzIGNsb3NlIHRvIHRoZSBhdmVyYWdlIG9mIGFsbCBQb2tlbW9uIGluIHRoZSBvcmlnaW5hbCBkYXRhc2V0LiBUaGUgbWVkaWFucyB3ZXJlIGFsc28gY2FsY3VsYXRlZCBhcyBkZXNwaXRlIGJlaW5nIG51bWJlcnMsICJ0b3RhbF9zdGF0cyIgYXJlIGFjdHVhbGx5IGRpc2NyZXRlIGFuZCBub3QgY29udGludW91cyB2YXJpYWJsZXMuIFBva2Vtb24gYXJlIGRpc3RpbmN0bHkgZ2l2ZW4gdG90YWxzIHN1Y2ggYXMgNDcwIG9yIDUwNS4gVGhlIHVzZWQgc2V0IG9mIFBva2Vtb24gaGF2ZSB0aGUgaGlnaGVzdCBtZWRpYW4gYXQgNTA1Lg0KDQpUaGVyZSBhcmUgbWFueSBwb3RlbnRpYWwgZGlyZWN0aW9ucyBhdCB0aGlzIGp1bmN0dXJlLiBXaGlsZSB0aGUgb3JpZ2luYWwgcXVlc3Rpb24gd2FzIHdoYXQgZWZmZWN0IGRvZXMgInRvdGFsX3N0YXRzIiBoYXZlIG9uICJ1c2FnZSIsIGEgc3ViZ29hbCBpcyB0byB0aWR5IHRoZSAiTm9Vc2FnZSIgdmFsdWVzLiBPbmUgcGF0aCBmb3J3YXJkIGlzIHRvIHNlZSB0aGUgZGlzdHJpYnV0aW9uIG9mIGVhY2ggZ3JvdXAncyAidG90YWxfc3RhdHMiIHRvIHNlZSBpZiB0aGV5IGhhdmUgYW55IG9kZCBkaXN0cmlidXRpb25zLg0KYGBge3IgdXNhZ2UtZnJlcXVlbmN5LXBsb3RzfQ0KcG9rZW1vbl91c2FnZV9sb25nICU+JQ0KICBzZWxlY3QoYygnbmFtZScsICd0b3RhbF9zdGF0cycpKSAlPiUgDQogIHVuaXF1ZSgpICU+JQ0KICBnZ3Bsb3QoYWVzKHggPSB0b3RhbF9zdGF0cykpICsNCiAgZ2VvbV9oaXN0b2dyYW0oYmlucyA9IDE1LCBiaW53aWR0aCA9IDIwKSArDQogIGxhYnModGl0bGUgPSAnQWxsIFBva2Vtb24nKQ0KdXNlZF9kZiAlPiUNCiAgc2VsZWN0KGMoJ25hbWUnLCAndG90YWxfc3RhdHMnKSkgJT4lIA0KICB1bmlxdWUoKSAlPiUNCiAgZ2dwbG90KGFlcyh4ID0gdG90YWxfc3RhdHMpKSArDQogIGdlb21faGlzdG9ncmFtKGJpbnMgPSAxNSwgYmlud2lkdGggPSAyMCkgKw0KICBsYWJzKHRpdGxlID0gJ1VzZWQgUG9rZW1vbicpDQpub191c2FnZV9kZiAlPiUNCiAgc2VsZWN0KGMoJ25hbWUnLCAndG90YWxfc3RhdHMnKSkgJT4lIA0KICB1bmlxdWUoKSAlPiUNCiAgZ2dwbG90KGFlcyh4ID0gdG90YWxfc3RhdHMpKSArDQogIGdlb21faGlzdG9ncmFtKGJpbnMgPSAxNSwgYmlud2lkdGggPSAyMCkgKw0KICBsYWJzKHRpdGxlID0gJ05vIFVzYWdlIFBva2Vtb24nKQ0KemVyb191c2FnZV9kZiAlPiUNCiAgc2VsZWN0KGMoJ25hbWUnLCAndG90YWxfc3RhdHMnKSkgJT4lIA0KICB1bmlxdWUoKSAlPiUNCiAgZ2dwbG90KGFlcyh4ID0gdG90YWxfc3RhdHMpKSArDQogIGdlb21faGlzdG9ncmFtKGJpbnMgPSAxNSwgYmlud2lkdGggPSAyMCkgKw0KICBsYWJzKHRpdGxlID0gJ1plcm8gVXNhZ2UgUG9rZW1vbicpDQpgYGANCg0KVGhlc2UgaGlzdG9ncmFtcyBhZ3JlZSB3aXRoIHdoYXQgdGhlIGF2ZXJhZ2VzIGltcGxpZWQuIFVzZWQgUG9rZW1vbiB1c3VhbGx5IGhhZCBhdCBsZWFzdCBvdmVyIDUwMCB0b3RhbCBzdGF0cy4gWmVybyB1c2FnZSBQb2tlbW9uIHBlYWtlZCBhdCBhcm91bmQgMzAwIHRvdGFsIHN0YXRzLiBUaGUgcGxvdCBvZiBhbGwgUG9rZW1vbiBhbmQgbm8gdXNhZ2UgUG9rZW1vbiBhcmUgc29tZWhvdyBzdGlsbCB2ZXJ5IHNpbWlsYXIuIE9uZSBtb3JlIHRlc3QgY2FuIGJlIGRvbmUgdG8gdmVyaWZ5IHRoaXMgZmluZGluZyBpcyB0aGUgImNvbXBhcmVmIiBmdW5jdGlvbiB0aGF0IHdpbGwgc2hvdyBob3cgbWFueSBvYnNlcnZhdGlvbnMsIGluZGl2aWR1YWwgUG9rZW1vbiwgYXJlIG5vdCBpbiB0aGUgbm8gdXNhZ2UgZ3JvdXAuDQoNCmBgYHtyIGFsbC12cy1uby11c2FnZX0NCmNvbXBhcmVkZihwb2tlbW9uX3VzYWdlX2xvbmcgJT4lIHNlbGVjdChjKCduYW1lJywgJ3RvdGFsX3N0YXRzJykpICU+JSB1bmlxdWUoKSwgDQogICAgICAgICAgbm9fdXNhZ2VfZGYgJT4lIHNlbGVjdChjKCduYW1lJywgJ3RvdGFsX3N0YXRzJykpICU+JSB1bmlxdWUoKSkNCmBgYA0KV2l0aCB0aGlzIHJlc3VsdCB0aGF0IDEyNzIvMTMwMyBQb2tlbW9uIG1hdGNoZWQsIGl0IGFwcGVhcnMgdGhhdCAidG90YWxfc3RhdHMiIGhhcyBhbG1vc3Qgbm8gY29ycmVsYXRpb24gd2l0aCB0aGUgIk5vVXNhZ2UiIHZhbHVlLiBBbG1vc3QgZXZlcnkgUG9rZW1vbiBoYWQgYSBjb21wZXRpdGl2ZSBmb3JtYXQgd2hlcmUgaXQgaGFkICJOb1VzYWdlIiByZWdhcmRsZXNzIG9mIGl0cyBzdGF0cy4gR2l2ZW4gdGhlIG9yaWdpbmFsIHF1ZXN0aW9uIG9mIHdoZXRoZXIgb3Igbm90ICJ0b3RhbF9zdGF0cyIgaW1wYWN0ZWQgdXNhZ2UsIHdlIHdpbGwgcmVtb3ZlIGluc3RhbmNlcyBvZiAiTm9Vc2FnZSIgYXMgdGhleSBhcmUgbWVyZWx5IG5vaXNlIGZvciBvdXIgcHVycG9zZS4gVGhpcyBmZWVscyBydWRpbWVudGFyeSwgYnV0ICJOb1VzYWdlIiBjbGVhcmx5IGRvZXMgbm90IG1lYW4gMCUgdXNhZ2UgYW5kIGNhbm5vdCBiZSB0cnVzdGVkIHRvIHByb3ZpZGUgbWVhbmluZ2Z1bCBpbmZvcm1hdGlvbi4NCg0KVGhpcyBsZWFkcyB0byB0aGUgbmV4dCBpc3N1ZSB3aXRoIHRoZSBvcmlnaW5hbCBkYXRhLiBUaGUgInVzYWdlIiBjb2x1bW4gdHJlYXRzIGl0cyB2YWx1ZXMgYXMgc3RyaW5ncyBpbnN0ZWFkIG9mIG51bWJlcnMgYmVjYXVzZSBvZiAiTm9Vc2FnZSIgYW5kIHRoZSBzbWFsbCB2YWx1ZXMgdGhhdCB1c2UgImUiIGluIHRoZW0uIFdlIGNhbiBzdGFydCBieSBtYWtpbmcgYSBkYXRhIGZyYW1lIHdpdGhvdXQgIk5vVXNhZ2UiIGluIGl0LiBUaGUgcmVzdWx0aW5nICJ1c2FnZSIgY29sdW1uIGlzIHN0aWxsIGEgY2hhcmFjdGVyIHR5cGUuIFR1cm4gdGhlIHVzYWdlIHZhbHVlcyBpbnRvIG51bWVyaWNzIHNvIHRoYXQgdGhleSBjYW4gYmUgb3BlcmF0ZWQgb24uICAgDQoNCmBgYHtyIHRpZHktdXNhZ2UtY29sdW1ufQ0KcG9rZW1vbl91c2FnZV9kZiA8LSBwb2tlbW9uX3VzYWdlX2xvbmcgJT4lIGZpbHRlcih1c2FnZSAhPSAnTm9Vc2FnZScpDQojIHR5cGVvZihwb2tlbW9uX3VzYWdlX2RmJHVzYWdlKQ0KcG9rZW1vbl91c2FnZV9kZl9maXhlZCA8LSB0cmFuc2Zvcm0ocG9rZW1vbl91c2FnZV9kZiwgdXNhZ2UgPSBhcy5udW1lcmljKHVzYWdlKSkNCmhlYWQocG9rZW1vbl91c2FnZV9kZl9maXhlZCkNCmBgYA0KDQpUaGUgbWFpbiBxdWVzdGlvbiB3YXMgdGhlIGltcGFjdCBvZiAidG90YWxfc3RhdHMiIG9uICJ1c2FnZSIuIExldCdzIHBsb3QgdGhlIGNvcnJlbGF0aW9uIGJldHdlZW4gdGhlc2UgdmFsdWVzLg0KDQpgYGB7ciB1c2FnZS10b3RhbC1zdGF0cy1wbG90fQ0KcG9rZW1vbl91c2FnZV9kZl9maXhlZCAlPiUgDQogIGdncGxvdChhZXMoeCA9IHRvdGFsX3N0YXRzLCB5ID0gdXNhZ2UpKSArDQogIGdlb21fcG9pbnQoKSArDQogIGdlb21fc21vb3RoKCkNCnBva2Vtb25fdXNhZ2VfZGZfZml4ZWQgJT4lIA0KICBmaWx0ZXIodXNhZ2UgPiAxKSAlPiUNCiAgZ2dwbG90KGFlcyh4ID0gdG90YWxfc3RhdHMsIHkgPSB1c2FnZSkpICsNCiAgZ2VvbV9wb2ludCgpICsNCiAgZ2VvbV9zbW9vdGgoKQ0KYGBgDQoNClRoZXJlIGRvZXMgYXBwZWFyIHRvIGJlIGFuIGVmZmVjdCBvbiB1c2FnZSBieSB0b3RhbF9zdGF0cy4gSG93ZXZlciwgbW9zdCB1c2FnZSByYXRlcyBhcmUgc3RpbGwgY2x1c3RlcmVkIG5lYXIgMCUgZXZlbiBhbW9uZyB0aGUgUG9rZW1vbiB3aXRoIGhpZ2hlciB0b3RhbF9zdGF0cy4gVGhlIGhpZ2hlc3QgdXNhZ2UgcmF0ZSBQb2tlbW9uIGF0IGFyb3VuZCA2MCUgdXNhZ2UgYW5kIGhpZ2hlciBhcmUgZ2VuZXJhbGx5IGluIHRoZSA1MDBzLiBTZXR0aW5nICJ1c2FnZSIgYWJvdmUgMSUgdG8gcmVtb3ZlIHRoZSBpbXBhY3Qgb2YgYmFyZWx5IHVzZWQgUG9rZW1vbiBzaG93cyBhIHBlYWsgdXNhZ2Ugb2YgUG9rZW1vbiBhcm91bmQgNTUwIHRvdGFsIHN0YXRzLiBUaGlzIGltcGxpZXMgdGhhdCBvdGhlciBwcmVkaWN0b3JzIGFyZSBhbHNvIGFmZmVjdGluZyB1c2FnZS4gDQoNClRoZSBhdmVyYWdlIHVzYWdlIHBlcmNlbnRhZ2Ugb2YgYSBQb2tlbW9uIGlzIDEuNiUuIEtlZXAgdGhpcyBpbiBtaW5kIGZvciB0aGUgZmluYWwgcGxvdC4NCg0KYGBge3IgdXNhZ2UtYWdncmVnYXRlZC1tZWFufQ0KcG9rZW1vbl91c2FnZV9kZl9maXhlZCAlPiUgc3VtbWFyaXNlKG1lYW4gPSBtZWFuKHVzYWdlKSkNCmFnZ3JfZGYgPC0gYWdncmVnYXRlKHVzYWdlIH4gdG90YWxfc3RhdHMsIGRhdGEgPSBwb2tlbW9uX3VzYWdlX2RmX2ZpeGVkLCBtZWFuKQ0KYWdncl9kZiAlPiUNCiAgZmlsdGVyKCkgJT4lDQogIGdncGxvdChhZXMoeCA9IHRvdGFsX3N0YXRzLCB5ID0gdXNhZ2UpKSArDQogIGdlb21fcG9pbnQoKSArDQogIGdlb21fc21vb3RoKCkNCmBgYA0KDQpUaGUgaW5kaXZpZHVhbCBQb2tlbW9uIHRoYXQgd2VyZSB1c2VkIGEgbG90IHNrZXcgdGhlIHJlc3VsdHMgaGVhdmlseS4gSG93ZXZlciwgdGhlIHRyZW5kIGhlcmUgc2VlbXMgdG8gZml0IHdoYXQgd2UgaGFkIGFzc3VtZWQuIFRoZSBhdmVyYWdlIHVzYWdlIG9mIGxvd2VyICJ0b3RhbF9zdGF0IiBQb2tlbW9uIGlzIGV4dHJlbWVseSBsb3cgY29tcGFyZWQgdG8gdGhlIGhpZ2hlciBvbmVzLiBXZSBjYW4gdHJ5IHRvIHJlZ3JvdXAgdGhlIGRhdGEgaW50byBzdGF0IGdyb3VwcyB0byByZWR1Y2UgdGhlIGltcGFjdCBvZiBpbmRpdmlkdWFsIFBva2Vtb24gb3V0bGllcnMuIFRoaXMgY2F0ZWdvcml6YXRpb24gaXMgYmVjYXVzZSAidG90YWxfc3RhdCIgaXMgZGlzY3JldGUgYW5kIG5vdCBjb250aW51b3VzLiBBIHNpbmdsZSBQb2tlbW9uIHdpdGggYSA1MjUgc3RhdCB0b3RhbCBhbmQgNTAlIHVzYWdlIGRvZXMgbm90IGF1dG9tYXRpY2FsbHkgbWVhbiB3ZSBzaG91bGQgc2VlIGEgaGlnaGVyIHVzYWdlIHJhdGUgZm9yIGEgNTMwIHRvdGFsIFBva2Vtb24uDQoNCiMjIyBQb2tlbW9uIGZpbmFsIGFuYWx5c2lzIA0KDQpgYGB7ciB1c2FnZS1ieS1zdGF0LWdyb3VwfQ0KdGVzdF9jb21iaW5lIDwtIHBva2Vtb25fdXNhZ2VfZGZfZml4ZWQgICU+JSByZWZyYW1lKHN0YXRfZ3JvdXAgPSByb3VuZCh0b3RhbF9zdGF0cywgZGlnaXRzID0gLTIpLCBuYW1lID0gbmFtZSwgdXNhZ2UgPSB1c2FnZSkgJT4lIGFnZ3JlZ2F0ZSh1c2FnZSB+IHN0YXRfZ3JvdXAsIG1lYW4pDQp0ZXN0X2NvbWJpbmUNCnRlc3RfY29tYmluZSAlPiUgDQogIGdncGxvdCgpICsNCiAgZ2VvbV9iYXIoYWVzKHggPSBzdGF0X2dyb3VwLCB5ID0gdXNhZ2UpLCBzdGF0ID0gJ2lkZW50aXR5JykgKw0KICBsYWJzKHRpdGxlID0gIlBva2Vtb24gdXNhZ2UgYnkgdG90YWwgc3RhdCBncm91cCAocm91bmRlZCkiKQ0KYGBgDQoNCkNvbWJpbmUgZWFjaCBQb2tlbW9uIGJ5IHRvdGFsX3N0YXQgZ3JvdXBzLCByb3VuZGVkIHRvIHRoZSBuZWFyZXN0IGh1bmRyZWQuIFRoaXMgbWVhbnMgNTUwLTY0OSB3aWxsIHJvdW5kIHRvIDYwMC4gTm90YWJseSwgdGhpcyBpcyBub3cgdGhlIGhpZ2hlc3QgYXZlcmFnZSB1c2FnZSBncm91cC4gVGhlIGNvcnJlbGF0aW9uIGJldHdlZW4gInVzYWdlIiBhbmQgInRvdGFsX3N0YXRzIiBpcyBtdWNoIG1vcmUgYXBwYXJlbnQuIFRoZSA2MDAgYW5kIDcwMCB0b3RhbCBzdGF0IGdyb3VwcyBhcmUgYnkgZmFyIHRoZSBtb3N0IHJlcHJlc2VudGVkIGluIGJhdHRsZXMgd2l0aCA2JSBhbmQgNSUgcmVzcGVjdGl2ZWx5LiBUaGUgNTAwIHRvdGFsIHN0YXQgZ3JvdXAgaXMgZG93biBhdCAxLjQlIG5vdywgd2hpbGUgdGhlIHNtYWxsZXIgZ3JvdXBzIGF2ZXJhZ2UgdXNhZ2UgcmF0ZXMgb2Ygd2VsbCBiZWxvdyAxJS4gMS40JSBpcyBzdGlsbCBxdWl0ZSBjbG9zZSB0byB0aGUgMS42JSBtZWFuLiBJdCBpcyBwb3NzaWJsZSB0aGVyZSBhcmUgc29tZSBleHRyYW5lb3VzIGZhY3RvcnMgdGhhdCBwcmV2ZW50IHRoZSBoaWdoZXN0ICJzdGF0X2dyb3VwIiBmcm9tIGJlaW5nIHRoZSBtb3N0IHVzZWQgb2YgYWxsIHN1Y2ggYXMgYSBiYW4gbGlzdCB0aGF0IHdvdWxkIGRpc2FsbG93IHVzYWdlIG9mIGNlcnRhaW4gUG9rZW1vbiBkdXJpbmcgc3BlY2lmaWMgdG91cm5hbWVudHMsIGJ1dCB0aG9zZSB2YXJpYWJsZXMgd2VyZSBub3QgY29sbGVjdGVkIGluIHRoaXMgZGF0YSBzZXQuIA0KDQoNCiMjIERhdGFzZXQgMiAtIENoZWVzZQ0KDQpUaGUgZGF0YSBzZXQgY2hvc2VuIGlzIGEgbGlzdCBvZiBkaWZmZXJlbnQgY2hlZXNlcyBhbmQgcHJvcGVydGllcyBhYm91dCB0aGVtIHN1Y2ggYXMgdGFzdGUsIG9yaWdpbiwgYW5kIG51dHJpdGlvbmFsIGZhY3RzLiBUaGUgZ29hbCBpcyB0byBjb21wYXJlIHRoZSBjaGVlc2VzIGJ5IHJlZ2lvbiBhbmQgZGV0ZXJtaW5lIGlmIHRoZXJlJ3MgYSB0cmVuZCBpbiBmbGF2b3IuIA0KDQpTdGFydCBieSBsb2FkaW5nIHRoZSBkYXRhLiANCmBgYHtyIGdldC1jaGVlc2UtZGF0YX0NCmNoZWVzZV91cmwgPC0gJ2h0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9NZWdhYnVzdGVyL0RhdGE2MDcvcmVmcy9oZWFkcy9tYWluL2RhdGEvcHJvamVjdDIvY2hlZXNlLmNzdicNCmNoZWVzZV9yYXcgPC0gZnJlYWQoY2hlZXNlX3VybCkNCmhlYWQoY2hlZXNlX3JhdykNCmBgYA0KDQojIyMgVGlkeWluZyByZWdpb24gYW5kIGNvdW50cnkgY29sdW1ucw0KDQpBcyBvbmUgb2YgdGhlIG1haW4gdmFyaWFibGVzIHdlIGFyZSBhbmFseXppbmcgdG9kYXksIHdlIHdhbnQgdG8gc2VlIGhvdyByZWdpb25zIGxvb2suIE5lYXJseSAyOCUgb2YgY2hlZXNlcyBkbyBub3QgaGF2ZSBhIHJlZ2lvbi4gVGhlcmUgaXMgYWxzbyB2ZXJ5IGxpdHRsZSBjb25zaXN0ZW5jeSByZWdhcmRpbmcgdGhlIHdheSByZWdpb25zIGFyZSBsaXN0ZWQuIFRoZXJlIGFyZSBjaXRpZXMsIHN0YXRlcywgbXVsdGlwbGUgbXVuaWNpcGFsaXRpZXMsIGFjdHVhbCByZWdpb25zLCByZWdpb25zIHdpdGhpbiBzdGF0ZXMsIGNvbnRyYWN0ZWQgbmFtZXMsIG5hbWVzIHdpdGggcHJlcG9zaXRpb25zLCBldGMuIA0KDQpgYGB7ciBjaGVlc2UtdGlkeS1yZWdpb259DQpzdW0oaXMubmEoY2hlZXNlX3JhdyRyZWdpb24pKSAvIG5yb3coY2hlZXNlX3JhdykgKiAxMDANCiMgU2FtcGxlIGNvdW50cnkgZm9yIG1pc3NpbmcgZGF0YSB0d2ljZSBiZWNhdXNlIGFwcGFyZW50bHkgdGhleSBhcmUgbm90IGxpc3RlZCBhcyAiTkEiDQpzdW0oaXMubmEoY2hlZXNlX3JhdyRjb3VudHJ5KSkgLyBucm93KGNoZWVzZV9yYXcpICogMTAwDQpzdW0oY2hlZXNlX3JhdyRjb3VudHJ5ID09ICcnKSAvIG5yb3coY2hlZXNlX3JhdykgKiAxMDANCmBgYA0KQSBrZXkgdG8gdGhpcyBhbmFseXNpcyBpcyB0aGF0IGVhY2ggInJlZ2lvbiIgaXMgZ29pbmcgdG8gYmUgd2l0aGluIGEgY291bnRyeSBhbmQgZm9ydHVuYXRlbHkgYWxsIG9mIHRob3NlIGhhdmUgdmFsdWVzLi4uIG9yIGRvIHRoZXk/IEFjdHVhbGx5LCB0aGVyZSBhcmUgY291bnRyaWVzIHRoYXQgYXJlIG1pc3NpbmcsIGJ1dCB0aGV5IGFyZSBsaXN0ZWQgYXMgZW1wdHkgc3RyaW5ncyBpbnN0ZWFkIG9mICJOQSIuIE92ZXIgOTklIG9mIGNvdW50cmllcyBhcmUgYXZhaWxhYmxlIG1ha2luZyB0aGVtIGEgcmVsaWFibGUgc3RhcnRpbmcgcG9pbnQuIExldCdzIGdyb3VwIHRoZSBjb3VudHJpZXMgdG9nZXRoZXIgYW5kIHNlZSB3aGF0IHJlZ2lvbnMgYXJlIGFzc29jaWF0ZWQgd2l0aCBlYWNoIG9uZS4NCg0KYGBge3IgY2hlZXNlLWNvdW50cmllc30NCmFnZ3JfY291bnRyeSA8LSBhZ2dyZWdhdGUocmVnaW9uIH4gY291bnRyeSwgY2hlZXNlX3JhdywgXChjaGVlc2VfcmF3KSBwYXN0ZSh1bmlxdWUoY2hlZXNlX3JhdyksIGNvbGxhcHNlID0gIiwgIikpDQpoZWFkKGFnZ3JfY291bnRyeSkNCmBgYA0KDQpBY3R1YWxseSwgdGhlIGNvdW50cmllcyBhcmUgbm90IHRpZHkgZWl0aGVyLiBUaGVyZSBhcmUgbXVsdGlwbGUgY291bnRyaWVzIGdyb3VwZWQgdG9nZXRoZXIgYW5kIHNvbWV0aW1lcyBhIGNvdW50cnkgaGFzIGEgc292ZXJlaWduIHN0YXRlIChFbmdsYW5kIGFzIGEgcGFydCBvZiB0aGUgVW5pdGVkIEtpbmdkb20pLiBXaXRob3V0IGFkZGl0aW9uYWwgY29udGV4dHVhbCBpbmZvcm1hdGlvbiwgdGhlcmUgYXJlIG11bHRpcGxlIHBvc3NpYmxlIHBhdGhzIGZvcndhcmQuIE9uZSBsb2dpY2FsIGFzc3VtcHRpb24gaXMgdGhhdCBjb3VudHJpZXMgb3IgcmVnaW9ucyB0aGF0IGFyZSBjby1jcmVkaXRlZCBmb3IgYSBjaGVlc2UgYXJlIHByb2JhYmx5IGVpdGhlciBzaW1pbGFyIGN1bHR1cmFsbHkgb3IgYXJlIG5laWdoYm9yaW5nIGVudGl0aWVzLiBUaGUgZmxhdyBpbiB0aGF0IGFzc3VtcHRpb24gaXMgdGhhdCB5b3UgY291bGQgYXJndWUgdGhhdCB0aGVyZSBhcmUgc2ltaWxhcml0aWVzIGJldHdlZW4gdGhlIFVuaXRlZCBTdGF0ZXMgYW5kIEl0YWx5IGFuZCBiZXR3ZWVuIHRoZSBVbml0ZWQgU3RhdGVzIGFuZCBNZXhpY28sIGJ1dCB0aGF0IGRvZXMgbm90IHRyYW5zaXRpdmVseSBtZWFuIHRoYXQgTWV4aWNvIGFuZCBJdGFseSBhcmUgcmVsYXRlZC4gDQoNCkZyb20gYSBxdWljayBjaGVjaywgd2UgZmluZCB0aGF0IHRoZXJlIGFyZSA2IGluc3RhbmNlcyB3aGVyZSBubyByZWdpb24gb3IgY291bnRyeSBpcyBwcm92aWRlZC4gU2luY2Ugd2UncmUgdHJ5aW5nIHRvIHNlZSB0aGUgaW1wYWN0IG9mIGxvY2F0aW9uIG9uIHRoZSBjaGVlc2UncyBwcm9wZXJ0aWVzLCB3ZSB3aWxsIHJlbW92ZSB0aGVzZSBjaGVlc2VzIGFzIHdlIGFyZSB1bmFibGUgdG8gYW5hbHl6ZSB0aGVtIHdpdGhvdXQgbW9yZSBjb250ZXh0LiAgDQoNCmBgYHtyIGNoZWVzZS1jaGVjay1uYX0NCmNoZWVzZV9yZWR1Y2VkIDwtIGNoZWVzZV9yYXcgJT4lIGZpbHRlcighKGNoZWVzZV9yYXckY291bnRyeSA9PSAnJyAmIGlzLm5hKGNoZWVzZV9yYXckcmVnaW9uKSkpDQoNCmNoZWVzZV9sb25nZXN0IDwtIGNoZWVzZV9yZWR1Y2VkICU+JSBzZWxlY3QoY2hlZXNlLCBjb3VudHJ5LCByZWdpb24sIGZsYXZvcikgJT4lIHNlcGFyYXRlX2xvbmdlcl9kZWxpbShjb3VudHJ5LCAnLCAnKSAlPiUgc2VwYXJhdGVfbG9uZ2VyX2RlbGltKHJlZ2lvbiwgJywgJykgJT4lIHNlcGFyYXRlX2xvbmdlcl9kZWxpbShmbGF2b3IsICcsICcpDQpoZWFkKGNoZWVzZV9sb25nZXN0KQ0KYGBgDQoNCiMjIyBDaGVlc2UgZmluYWwgYW5hbHlzaXMNCkdvaW5nIG92ZXIgdGhlIHJlc3VsdGluZyAiY2hlZXNlX2xvbmdlc3QiIGRhdGFmcmFtZSwgd2UgZ2V0IGEgc3BvaWxlciByZWdhcmRpbmcgdGhlIGZhY3QgdGhhdCB0aGlzIGlzIHRoZSBleHRlbnQgdGhhdCB3ZSB3aWxsIGJlIGxlbmd0aGVuaW5nIHRoaXMgZGF0YSBzZXQgdG9kYXkuIExldCdzIHNlZSB3aGF0IGltcGFjdHMgY291bnRyeSBvciByZWdpb24gaGF2ZSBvbiBmbGF2b3IuDQoNCmBgYHtyIGZpZy5oZWlnaHQ9MTAsIGNoZWVzZS1wbG90LWZpbmFsfQ0KZGlzdGluY3RfY291bnRyeV9jaGVlc2UgPC0gY2hlZXNlX2xvbmdlc3QgJT4lIGZpbHRlcihjb3VudHJ5ICE9ICcnICYgIWlzLm5hKGZsYXZvcikpICU+JSBzZWxlY3QoY2hlZXNlLCBjb3VudHJ5LCBmbGF2b3IpICU+JSBkaXN0aW5jdCgpIA0KDQpzdW1fY291bnRyeV9jaGVlc2UgPC0gZGlzdGluY3RfY291bnRyeV9jaGVlc2UgJT4lIA0KICBzZWxlY3QoY291bnRyeSwgZmxhdm9yKSAlPiUNCiAgZ3JvdXBfYnkoY291bnRyeSwgZmxhdm9yKSAlPiUNCiAgc3VtbWFyaXNlKHRvdGFsID0gbigpLCBhY3Jvc3MoZXZlcnl0aGluZygpLCBmaXJzdCksIC5ncm91cHMgPSAnZHJvcCcpDQoNCmhlYWQoc3VtX2NvdW50cnlfY2hlZXNlICU+JSBhcnJhbmdlKGRlc2ModG90YWwpKSkNCg0KYWdncmVnYXRlKHRvdGFsIH4gY291bnRyeSArIGZsYXZvciwgZGF0YSA9IHN1bV9jb3VudHJ5X2NoZWVzZSwgc3VtKSAlPiUgZ2dwbG90KGFlcyh4ID0gY291bnRyeSwgeSA9IHRvdGFsLCBmaWxsID0gZmxhdm9yKSkgKw0KICBnZW9tX2JhcihzdGF0ID0gJ2lkZW50aXR5JywgcG9zaXRpb24gPSAnZG9kZ2UnKSArDQogIGxhYnModGl0bGUgPSAnQ2hlZXNlIGZsYXZvciB0cmVuZHMgcGVyIGNvdW50cnknKSArDQogIGNvb3JkX2ZsaXAoKQ0KDQpkaXN0aW5jdF9yZWdpb25fY2hlZXNlIDwtIGNoZWVzZV9sb25nZXN0ICU+JSBmaWx0ZXIoIWlzLm5hKHJlZ2lvbikgJiAhaXMubmEoZmxhdm9yKSkgJT4lIHNlbGVjdChjaGVlc2UsIHJlZ2lvbiwgZmxhdm9yKSAlPiUgZGlzdGluY3QoKSANCg0Kc3VtX3JlZ2lvbl9jaGVlc2UgPC0gZGlzdGluY3RfcmVnaW9uX2NoZWVzZSAlPiUgDQogIHNlbGVjdChyZWdpb24sIGZsYXZvcikgJT4lDQogIGdyb3VwX2J5KHJlZ2lvbiwgZmxhdm9yKSAlPiUNCiAgc3VtbWFyaXNlKHRvdGFsID0gbigpLCBhY3Jvc3MoZXZlcnl0aGluZygpLCBmaXJzdCksIC5ncm91cHMgPSAnZHJvcCcpDQoNCmhlYWQoc3VtX3JlZ2lvbl9jaGVlc2UgJT4lIGFycmFuZ2UoZGVzYyh0b3RhbCkpKQ0KYGBgDQoNClRoaXMgZGF0YSBzZXQgcHVzaGVzIHRoZSBsaW1pdHMgb2Ygd2hhdCBjYW4gYmUgcmVhc29uYWJseSB2aWV3ZWQgd2l0aGluIGEgYmFyIHBsb3QuIEl0IGFwcGVhcnMgdGhhdCBDYW5hZGEsIEl0YWx5IGFuZCB0aGUgVW5pdGVkIFN0YXRlcyB3ZXJlIGVzcGVjaWFsbHkgd2VsbCByZXByZXNlbnRlZCBpbiB0ZXJtcyBvZiBmbGF2b3JzIG9mIGNoZWVzZS4gTm90YWJsZSB0cmVuZHMgZnJvbSBleGFtaW5pbmcgdGhlIHRvcCBvZiB0aGUgY291bnRyeSBzcGVjaWZpYyBkYXRhIGlzIHRoZSBwcmV2YWxlbmNlIG9mIHN3ZWV0IGZsYXZvcnMgd2hpY2ggbWF0Y2hlcyBhIGNvbW1vbiBzdGVyZW90eXBlLiANCg0KVGhlcmUgd2VyZSAzODAgdW5pcXVlIHJlZ2lvbiBuYW1lcyBzdGlsbCByZW1haW5pbmcgaW4gdGhpcyBtb2RpZmllZCBkYXRhIGZyYW1lLCBzbyBJIHNhdmVkIHVzIHRoZSB0cm91YmxlIG9mIHRyeWluZyB0byByZWFkIHRoYXQgcGxvdC4gVGhlIHJlZ2lvbiB0YWJsZSBzaG93cyB0aGF0IFdpc2NvbnNpbiBwcm9kdWNlcyBhIGxvdCBvZiBzd2VldCBjaGVlc2UgYXMgd2VsbC4gQXMgYW4gQW1lcmljYW4gc3RhdGUsIHRoaXMgY29ycm9ib3JhdGVzIHRoZSBpbXBsaWNhdGlvbnMgZnJvbSB0aGUgcHJldmlvdXMgdGFibGUuIA0KDQpXaGlsZSB0aGUgZmluYWwgZGF0YWZyYW1lcyB3ZXJlIHN0aWxsIG5vdCBwZXJmZWN0bHkgdGlkeSwgd2Ugd2VyZSBhYmxlIHRvIHNlZSBub3RhYmxlIGNvbnNpc3RlbnQgdHJlbmRzIHdoZW4gYW5hbHl6aW5nIHRoZSBtb3N0IHJlY3VycmluZyBkYXRhIHdpdGhpbiB0aGUgY2hlZXNlIGRhdGEgc2V0Lg0KDQojIyBEYXRhc2V0IDMgLSBDYW5jZXIgRGF0YQ0KDQpUaGlzIGRhdGEgc2V0IGlzIGFib3V0IGJyZWFzdCBjYW5jZXIgY2FzZXMgZnJvbSAyMDIwLTIwMjQgYXQgSG91c3RvbidzIE1EIEFuZGVyc29uIENhbmNlciBDZW50ZXIuIFRpZHlpbmcgdGhlIGRhdGEgd2lsbCBmb2N1cyBvbiBtYWtpbmcgdGhlIGRhdGEgbG9uZ2VyLiBUaGUgYW5hbHlzaXMgZ29hbCB3aWxsIGJlIHRvIHNlZSBob3cgcmFjZSBkZW1vZ3JhcGhpY3MgaW1wYWN0IGNhc2UgdG90YWxzLg0KDQojIyMgVGlkeWluZyB0aGUgZGF0YQ0KDQpTdGFydCBieSBsb2FkaW5nIHRoZSBkYXRhLiANCg0KYGBge3IgZ2V0LWRuZC1kYXRhfQ0KY2FuY2VyX3VybCA8LSAnaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL01lZ2FidXN0ZXIvRGF0YTYwNy9yZWZzL2hlYWRzL21haW4vZGF0YS9wcm9qZWN0Mi9VbnRpZHlkYXRhc2V0JTIwXzYyNC5jc3YnDQpjYW5jZXJfcmF3IDwtIGZyZWFkKGNhbmNlcl91cmwpDQpoZWFkKGNhbmNlcl9yYXcpDQpgYGANCg0KVGhpcyBkYXRhIHNldCB3YXMgY29udmVydGVkIGZyb20geGxzeCB0byBjc3YgdHlwZSB0byBzdHJlYW1saW5lIGl0IHdpdGggdGhlIG90aGVyIGRhdGEgc2V0cy4gSW5hZHZlcnRlbnRseSwgdGhpcyBoYXMgbGVkIHRvIHRoZSBmb3JtIG9mIHRoZSBkYXRhIGdldHRpbmcgdW50aWRpZXIgdGhhbiBiZWZvcmUuIFdlIHdpbGwgbmVlZCB0byByZW1vdmUgdGhlIGZpcnN0IHR3byByb3dzIGFuZCBtYWtlIHJvdyAzIHRoZSBuZXcgY29sdW1uIG5hbWVzLg0KDQpgYGB7ciBjYW5jZXItZml4LWNvbG5hbWVzfQ0KbmFtZXMoY2FuY2VyX3JhdykgPC0gdW5saXN0KGNhbmNlcl9yYXdbMl0pDQpjYW5jZXJfZml4ZWRfY29sIDwtIGNhbmNlcl9yYXdbLWMoMToyKSxdDQpoZWFkKGNhbmNlcl9maXhlZF9jb2wpDQpgYGANCg0KV2l0aCB0aGUgY29sdW1uIG5hbWVzIG5vdyBjb3JyZWN0bHkgYXNzaWduZWQsIGxldCdzIHJlbW92ZSAidG90YWxfcGF0aWVudHMiIGFzIGl0IHdpbGwgYmUgdXNlbGVzcyBmb3IgdXMgYWZ0ZXIgd2UgbGVuZ3RoZW4gdGhlIGRhdGEuIFRoZW4gd2Ugd2lsbCB1c2UgdGhlICJtZWx0IiBmdW5jdGlvbiB0byBleHRyYWN0IHRoZSB5ZWFycyBpbnRvIGEgY29sdW1uIHdpdGggdGhlIHZhbHVlcyBiZWluZyBhbm90aGVyIGNvbHVtbi4gQ2xlYW4gdXAgdGhlIG5ldyAiWWVhciIgY29sdW1uIGJ5IHJlbW92aW5nICJDYXNlc18iIGFuZCBtYWtpbmcgdGhlIHJlc3VsdGluZyB5ZWFycyBpbnRvIG51bWVyaWMgdmFsdWVzLg0KYGBge3IgY2FuY2VyLW1lbHR9DQpjYW5jZXJfZml4ZWRfY29sJFRvdGFsX1BhdGllbnRzIDwtIE5VTEwNCmhlYWQoY2FuY2VyX2ZpeGVkX2NvbCkNCmNhbmNlcl9tZWx0IDwtIGNhbmNlcl9maXhlZF9jb2wgJT4lIG1lbHQoaWQudmFycyA9IGMoY29sbmFtZXMoY2FuY2VyX2ZpeGVkX2NvbClbMTo0XSxjb2xuYW1lcyhjYW5jZXJfZml4ZWRfY29sKVsxMF0pLCB2YXJpYWJsZS5uYW1lID0gJ1llYXInLCB2YWx1ZS5uYW1lID0gJ0Nhc2VzJykNCmhlYWQoY2FuY2VyX21lbHQpDQpjYW5jZXJfbWVsdCA8LSBjYW5jZXJfbWVsdCAlPiUgbXV0YXRlKFllYXIgPSBhcy5udW1lcmljKHN0cl9yZW1vdmUoWWVhciwgJ0Nhc2VzXycpKSkNCmNhbmNlcl9tZWx0IDwtIGNhbmNlcl9tZWx0ICU+JSBtdXRhdGUoQ2FzZXMgPSBhcy5udW1lcmljKENhc2VzKSkNCmhlYWQoY2FuY2VyX21lbHQpDQpgYGANCg0KIyMjIENhbmNlciBmaW5hbCBhbmFseXNpcw0KDQpXaXRoIHRoZSBkYXRhIGJldHRlciBmb3JtYXR0ZWQsIGxldCdzIHRha2UgYWR2YW50YWdlIG9mIGhhdmluZyB0aGUgY2FzZXMgYW5kIHllYXIgcHJvcGVybHkgc3BsaXQuIExldCdzIHNlZSBob3cgbWFueSBjYW5jZXIgY2FzZXMgZWFjaCByYWNlIGhhZCBmcm9tIDIwMjAtMjAyNC4NCg0KYGBge3IgY2FuY2VyLWFubnVhbC1wbG90fQ0KYWdncmVnYXRlKENhc2VzIH4gUmFjZSArIFllYXIsIGRhdGEgPSBjYW5jZXJfbWVsdCwgc3VtKQ0KYWdncmVnYXRlKENhc2VzIH4gUmFjZSArIFllYXIsIGRhdGEgPSBjYW5jZXJfbWVsdCwgc3VtKSAlPiUgZ2dwbG90KGFlcyh4ID0gWWVhciwgeSA9IENhc2VzLCBmaWxsID0gUmFjZSkpICsNCiAgZ2VvbV9iYXIoc3RhdCA9ICdpZGVudGl0eScsIHBvc2l0aW9uID0gJ2RvZGdlJykgKw0KICBsYWJzKHRpdGxlID0gJ0NhbmNlciByYXRlIGZyb20gMjAyMC0yMDI0IGJ5IHJhY2UnKQ0KYGBgDQoNCk5vdGhpbmcgbG9va3Mgb3V0IG9mIHBsYWNlIGluIHRoaXMgY2hhcnQuIFRoZSB0b3RhbCBjYW5jZXIgY2FzZXMgZm9yIGVhY2ggcmFjZSBpbmNyZWFzZWQgZ3JhZHVhbGx5IGVhY2ggeWVhci4gVGhpcyB0cmVuZCBkaWQgbm90IGFmZmVjdCB0aGUgcmF0aW8gb2YgY2FuY2VyIHJhdGVzIGZvciBlYWNoIHJhY2UgYWdhaW5zdCBlYWNoIG90aGVyLiBBc2lhbnMgaGFkIHRoZSBsb3dlc3QgcmF0ZSB3aGlsZSBCbGFja3MgaGFkIHRoZSBoaWdoZXN0Lg0KDQojIyBGaW5kaW5ncyBhbmQgUmVjb21tZW5kYXRpb25zDQoNCk5vdCBhbGwgdW50aWR5IGRhdGEgbG9vayB0aGUgc2FtZS4gTWFueSBzaW1pbGFyIHRlY2huaXF1ZXMgd2VyZSB1c2VkIHRvIHRpZHkgZWFjaCBkYXRhIHNldCwgYnV0IGRpZmZlcmVudCBudWFuY2VzIHdlcmUgc3RpbGwgcmVxdWlyZWQuIA0KDQpUaGUgUG9rZW1vbiBkYXRhIHNldCB3YXMgYSBnb29kIGNhbmRpZGF0ZSBmb3IgYSB3aWRlIHRvIGxvbmcgdHJhbnNmb3JtYXRpb24uIFNpbmNlIHRoZSBnb2FsIHdhcyB0byBjb21wYXJlICJ0b3RhbF9zdGF0cyIgYW5kICJ1c2FnZSIgdG8gZGV0ZXJtaW5lIGlmIHRoZXJlIHdhcyBhIGNvcnJlbGF0aW9uLCB0aGUgZm9jdXMgd2FzIG9uIHRpZHlpbmcgdGhlICJ1c2FnZSIgY29sdW1uLiBBIGNob2ljZSB3YXMgbWFkZSB0byByZW1vdmUgYW55IGRhdGEgbGlzdGVkIGFzICJOb1VzYWdlIiBiZWNhdXNlIHRocm91Z2ggZnVydGhlciBhbmFseXNpcywgdGhlcmUgd2FzIG5vdCBlbm91Z2ggaW5mb3JtYXRpb24gdG8gZGVjaWRlIGlmIGEgdGVjaG5pcXVlIGxpa2UgaW1wdXRhdGlvbiB3YXMgYmV0dGVyIHNlcnZlZC4gVGhlIG5lZWQgdG8gYW5hbHl6ZSB0aGUgZGF0YSBiZWZvcmUgZGV0ZXJtaW5pbmcgYSB0aWR5aW5nIHN0cmF0ZWd5IHdhcyBhIHVuaXF1ZSBmZWF0dXJlIG9mIHRoaXMgZGF0YSBzZXQuIEZ1cnRoZXIgd29yayBvbiB0aGUgZGF0YSBjb3VsZCBpbmNsdWRlIGJyZWFraW5nIGRvd24gdGhlIGZvcm1hdHMgaW50byBzcGVjaWZpYyB5ZWFycyBhbmQgdG91cm5hbWVudHMgdG8gc2VlIHRpbWUgc2VyaWVzIHRyZW5kcy4NCg0KVGhlIGhhcmRlc3QgZGF0YSBzZXQgdG8gd29yayB3aXRoIHdhcyB0aGUgY2hlZXNlIGRhdGEgYXMgd2l0aG91dCBtb3JlIGNvbnRleHQgYmV5b25kIHdoYXQgdGhlIGRhdGEgc2V0IHByb3ZpZGVkLCBpdCB3YXMgbmVhcmx5IGltcG9zc2libGUgdG8gYWNjb3VudCBmb3IgZXZlcnkgZXhjZXB0aW9uLiBDb25jZXNzaW9ucyBoYWQgdG8gYmUgbWFkZSB3aGVuIHRpZHlpbmcgYmVjYXVzZSB0b28gbWFueSBtYW51YWwgZml4ZXMgYW5kIGFzc3VtcHRpb25zIG5lZWRlZCB0byBiZSBtYWRlLiBBIGdvb2QgbGVzc29uIGZyb20gdGhpcyB3b3JrIHdhcyB0aGF0IGRlc3BpdGUgdGhlIGlzc3VlcywgdHJlbmQgZGF0YSBzdGlsbCBzdG9vZCBvdXQuIEFzIG1vcmUgbnVhbmNlZCB0cmVuZHMgd291bGQgYmUgaGFyZCB0byBkZXRlY3QsIGZ1cnRoZXIgd29yayB3b3VsZCBiZSBpbnRyb2R1Y2luZyBnZW9ncmFwaGljIGRhdGEgc2V0cyB0aGF0IGNhbiBoZWxwIHdpdGggY2xlYW5pbmcgYW5kIGNhdGVnb3JpemluZyB0aGUgY291bnRyaWVzIGFuZCByZWdpb25zLg0KDQpUaGUgbWFpbiB0YWtlYXdheSBmcm9tIHdvcmtpbmcgd2l0aCB0aGUgY2FuY2VyIGRhdGEgc2V0IHdhcyB0aGUgaW1wb3J0YW5jZSBvZiBwcmFjdGljZS4gSXQgd2FzIHRoZSBsYXN0IGRhdGEgc2V0IHRvIGJlIHRpZGllZCBhbmQgdGhhdCBmZWx0IGV2aWRlbnQgYXMgbWFueSBwYXR0ZXJucyB3ZXJlIHF1aWNrZXIgdG8gZmluZC4gRnVydGhlciB3b3JrIGNvdWxkIGJlIHRvIGluY29ycG9yYXRlIHRoZSBvdGhlciBwcmVkaWN0b3JzLg0KDQo=