Required packages

Provide the packages required to reproduce the report. Make sure you fulfilled the minimum requirement #10.

library(readr)
library(tidyr)
library(dplyr)
library(Hmisc)
library(lubridate)
library(outliers)
library(forecast)

Executive Summary

In this assignment the first requirement was to merge two datasets which I did using the join function. The next step was to understand my data. In this step I first understood the types of variables in the data that was imported and then made necessary changes like conversion to factors and numeric types. After this step we were asked to check if our dataset satisfies the tidy principle. My dataset did not satisfy the tidy principles and hence the next step required us to convert the dataframe to a tidy format. The genre column in my dataset was not tidy as it did not contain one value. It contained upto 3 genres for each movie with the first genre being the main genre as per the IMDB website. So for this assignment I separated the values and then used the main genre for tmy analysis and dropped the sub genres. The next important part was to deal with the null values which was done by replacing the null values in numerical columns with the mean, and then the numerical columns were checked for outliers. If outliers were detected I handled them using the capping method. The last requirement was to transform a variable into normalised form which was I did using the square root method.

Data

The datasets that I chose for this assignment were extracted from the IMDB website that can be downloaded from https://www.imdb.com/interfaces/. The movies_title dataset contains information of all movies, tv series, short movies etc from 2005 to 2022. This dataset includes attributes like the title type ie whether it is a movie, tv series, documentary etc, primary title of each movie, tconst(which is a unique identifier given to each title), isAdult, startYear, runtimeMinutes and the genres. The second dataset is the ratings dataset that contained attributes like tconst, averageRating and numVotes. The two datasets were joined using the tconst variable that was present in both datasets to produce a new dataset, the movies dataset.

movie_title <- read_csv("movie_title.csv")
Parsed with column specification:
cols(
  tconst = col_character(),
  titleType = col_character(),
  primaryTitle = col_character(),
  originalTitle = col_character(),
  isAdult = col_double(),
  startYear = col_double(),
  runtimeMinutes = col_double(),
  genres = col_character()
)
movie_title
ratings <- read_csv("ratings.csv") #Loading ratings dataset
Parsed with column specification:
cols(
  tconst = col_character(),
  averageRating = col_double(),
  numVotes = col_double()
)
ratings
movies <- left_join(movie_title, ratings, by="tconst") #Joining the two datasets/merging
movies

Understand

The movies dataset has 10 attributes and 67237 observations. The types of variables in this dataset are characters, numerics and factors. titleType was converted into factor and it was observed that there are 10 types of titles in this dataframe. The isAdult column signifies whether the movie is Adult or not. 0 is for movies that are not Adult and 1 for movies that are adult and hence this column was also converted to factor.

str(movies)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    67237 obs. of  10 variables:
 $ tconst        : chr  "tt0065791" "tt0068943" "tt0069049" "tt0088751" ...
 $ titleType     : chr  "tvShort" "short" "movie" "movie" ...
 $ primaryTitle  : chr  "Góry o zmierzchu" "Miedzy Wroclawiem a Zielona Góra" "The Other Side of the Wind" "The Naked Monster" ...
 $ originalTitle : chr  "Góry o zmierzchu" "Miedzy Wroclawiem a Zielona Góra" "The Other Side of the Wind" "The Naked Monster" ...
 $ isAdult       : num  0 0 0 0 0 0 0 0 0 0 ...
 $ startYear     : num  2009 2010 2018 2005 2019 ...
 $ runtimeMinutes: num  28 11 122 100 20 80 36 73 75 90 ...
 $ genres        : chr  "Short" "Documentary,Short" "Drama" "Comedy,Horror,Sci-Fi" ...
 $ averageRating : num  6.5 5.1 6.9 5.6 5.8 6.6 8.6 6.2 NA 4.8 ...
 $ numVotes      : num  13 30 4926 229 27 ...
dim(movies)
[1] 67237    10
movies$titleType <- factor(movies$titleType)
is.factor(movies$titleType)
[1] TRUE
table(movies$titleType)

       movie        short    tvEpisode tvMiniSeries      tvMovie     tvSeries      tvShort    tvSpecial        video 
       13358        13203        23628          449         3784         3626          408          662         8109 
   videoGame 
          10 
movies <- mutate(movies, isAdult= factor(movies$isAdult, levels = c(0, 1))) #Converting isAdult to factor
movies

Tidy & Manipulate Data I

According to the Tidy principles each observation must have its own cell. In this dataset a maximum of 3 genres have been provided for each movie in the genre column with the first genre being the main genre of that movie. To convert my dataset into Tidy format, I separated the Genres into 3 columns; “main-genre”, “sub-genre1” and “sub-genre2”. Since I only wanted to consider the main genre in my analysis I dropped the other two columns.

movies<- movies %>% separate('genres',into=c("main-genre", "sub-genre1", "sub-genre2"), sep=",")
Expected 3 pieces. Missing pieces filled with `NA` in 49856 rows [1, 2, 3, 5, 7, 8, 9, 10, 11, 12, 13, 15, 20, 21, 23, 24, 27, 29, 30, 31, ...].
movies
table(movies$`main-genre`)

     Action       Adult   Adventure   Animation   Biography      Comedy       Crime Documentary       Drama      Family 
       5300        3258        2413        4026         881       13050        3376        9326       11621         786 
    Fantasy   Game-Show     History      Horror       Music     Musical     Mystery        News  Reality-TV     Romance 
        340        1058         126        1455        1511         115         237         672        1422         371 
     Sci-Fi       Short       Sport   Talk-Show    Thriller         War     Western 
        313        4451         283         506         306          11          23 
table(movies$`sub-genre1`)

      Adult   Adventure   Animation   Biography      Comedy       Crime Documentary       Drama      Family     Fantasy 
          9        2268        1682          37        2831        1820        1024        6848        1579         812 
  Game-Show     History      Horror       Music     Musical     Mystery        News  Reality-TV     Romance      Sci-Fi 
        347         652         764         887         149         819         376        1164        2999         435 
      Short       Sport   Talk-Show    Thriller         War     Western 
       9990         441         719        1071         166          36 
table(movies$`sub-genre2`)

  Adventure   Animation   Biography      Comedy       Crime Documentary       Drama      Family     Fantasy   Game-Show 
          4        1357           4        1132         166          73        2471         875        1219          80 
    History      Horror       Music     Musical     Mystery        News  Reality-TV     Romance      Sci-Fi       Short 
        320         426         366         116        1921         143         413        1667         725        1487 
      Sport   Talk-Show    Thriller         War     Western 
        434         368        1243         250         121 
movies <- movies %>% select(-(`sub-genre1`),-(`sub-genre2`))
movies

Tidy & Manipulate Data II

For this step I created a new Column named PopularityIndex by multiplying the averageRating and numVotes data. The Popularity Index gives a measure of the populatrity of the movies. Some movies might be rated highly but received only a handful number of Votes on the other hand some movies might be given an average rating but voted by a large number of users. Hence by the product of the two columns we can know the Popularity of that movies based on both votes and rating.

movies <- mutate(movies, PopularityIndex = averageRating * numVotes )
movies

Scan I

On scanning the data for NA values I found that the 3 numerical columns had more than 30% values as NA. The NA values were hence not omitted due to the large number and were instead replaced by the mean of the columns. I also checked for NAN and INF values and found that there were none of the two present in my dataset.

sum(is.na(movies)) #Checking the total number of NA in the dataframe
[1] 76614
colSums(is.na(movies))  #Checking the total sum of NA in each column
         tconst       titleType    primaryTitle   originalTitle         isAdult       startYear  runtimeMinutes 
              0               0               0               0               0               0               0 
     main-genre   averageRating        numVotes PopularityIndex 
              0           25538           25538           25538 
replaceNAbyMean <- function(x){
   (replace(x, is.na(x), mean(x, na.rm = TRUE)))
} 
#a<-lapply(movies, replaceNAbyMean)
movies[] <- lapply(movies, replaceNAbyMean)
argument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NA
colSums(is.na(movies)) 
         tconst       titleType    primaryTitle   originalTitle         isAdult       startYear  runtimeMinutes 
              0               0               0               0               0               0               0 
     main-genre   averageRating        numVotes PopularityIndex 
              0               0               0               0 
colSums(is.na(movies))
         tconst       titleType    primaryTitle   originalTitle         isAdult       startYear  runtimeMinutes 
              0               0               0               0               0               0               0 
     main-genre   averageRating        numVotes PopularityIndex 
              0               0               0               0 
is.specialorNA<- function(x){
  if (is.numeric(x)) (sum(is.infinite(x)) | sum(is.nan(x)) | sum(is.na(x)))
}
sapply(movies, is.specialorNA)
$tconst
NULL

$titleType
NULL

$primaryTitle
NULL

$originalTitle
NULL

$isAdult
NULL

$startYear
[1] FALSE

$runtimeMinutes
[1] FALSE

$`main-genre`
NULL

$averageRating
[1] FALSE

$numVotes
[1] FALSE

$PopularityIndex
[1] FALSE

Scan II

Checked for outliers in all 4 numerical columns; runtimeMinutes, averageRating, numVotes and PopularityIndex using boxplots. Found that all 4 have a large number of outliers. I used the capping method to take care of the outliers after which I plotted the box plots again for all 4 variables.

movies$runtimeMinutes %>% boxplot(col="lightblue",main="RUN TIME MINUTES")

movies$averageRating %>% boxplot(col="lightblue",main="Average Rating")

movies$numVotes %>% boxplot(col="lightblue",main="Number of Votes")

movies$PopularityIndex %>% boxplot(col="lightblue",main="Popularity Index")

cap <- function(x){
  quantiles <- quantile( x, c(.05, 0.25, 0.75, .95 ) )
  x[ x < quantiles[2] - 1.5*IQR(x) ] <- quantiles[1]
  x[ x > quantiles[3] + 1.5*IQR(x) ] <- quantiles[4]
  return(x)
}
movies$runtimeMinutes <-cap(movies$runtimeMinutes)
movies$runtimeMinutes %>% boxplot(col="purple",main="RUN TIME MINUTES")

movies$averageRating <-cap(movies$averageRating)
movies$averageRating %>% boxplot(col="purple",main="Average Rating")

movies$numVotes <-cap(movies$numVotes)
movies$numVotes %>% boxplot(col="purple",main="Number of Votes")

movies$PopularityIndex <-cap(movies$PopularityIndex)
movies$PopularityIndex %>% boxplot(col="purple",main="Popularity Index")

Transform

In this assignment we were required to transform atleast one variable. I plotted the histogram for runtimeMinutes to check for the distribution. The distribution was not normal and was skewed towards the right. I used a number of transformations including the BoxCox but noticed that sqrt(square root) gave the best result.

hist_averageRating<-hist(movies$runtimeMinutes, col="seagreen",main="Histogram for Runtime Minutes",xlab="Minutes")



LS0tDQp0aXRsZTogIk1BVEgyMzQ5IFNlbWVzdGVyIDIsIDIwMTkiDQphdXRob3I6ICJMaXBpa2EgU2hhcm1hLVMzNzY0MDczIg0Kc3VidGl0bGU6IEFzc2lnbm1lbnQgMw0Kb3V0cHV0Og0KICBodG1sX25vdGVib29rOiBkZWZhdWx0DQotLS0NCg0KIyMgUmVxdWlyZWQgcGFja2FnZXMgDQoNCg0KUHJvdmlkZSB0aGUgcGFja2FnZXMgcmVxdWlyZWQgdG8gcmVwcm9kdWNlIHRoZSByZXBvcnQuIE1ha2Ugc3VyZSB5b3UgZnVsZmlsbGVkIHRoZSBtaW5pbXVtIHJlcXVpcmVtZW50ICMxMC4NCg0KYGBge3J9DQpsaWJyYXJ5KHJlYWRyKQ0KbGlicmFyeSh0aWR5cikNCmxpYnJhcnkoZHBseXIpDQpsaWJyYXJ5KEhtaXNjKQ0KbGlicmFyeShsdWJyaWRhdGUpDQpsaWJyYXJ5KG91dGxpZXJzKQ0KbGlicmFyeShmb3JlY2FzdCkNCmBgYA0KDQoNCiMjIEV4ZWN1dGl2ZSBTdW1tYXJ5IA0KDQpJbiB0aGlzIGFzc2lnbm1lbnQgdGhlIGZpcnN0IHJlcXVpcmVtZW50IHdhcyB0byBtZXJnZSB0d28gZGF0YXNldHMgd2hpY2ggSSBkaWQgdXNpbmcgdGhlIGpvaW4gZnVuY3Rpb24uIFRoZSBuZXh0IHN0ZXAgd2FzIHRvIHVuZGVyc3RhbmQgbXkgZGF0YS4gSW4gdGhpcyBzdGVwIEkgZmlyc3QgdW5kZXJzdG9vZCB0aGUgdHlwZXMgb2YgdmFyaWFibGVzIGluIHRoZSBkYXRhIHRoYXQgd2FzIGltcG9ydGVkIGFuZCB0aGVuIG1hZGUgbmVjZXNzYXJ5IGNoYW5nZXMgbGlrZSBjb252ZXJzaW9uIHRvIGZhY3RvcnMgYW5kIG51bWVyaWMgdHlwZXMuDQpBZnRlciB0aGlzIHN0ZXAgd2Ugd2VyZSBhc2tlZCB0byBjaGVjayBpZiBvdXIgZGF0YXNldCBzYXRpc2ZpZXMgdGhlIHRpZHkgcHJpbmNpcGxlLiBNeSBkYXRhc2V0IGRpZCBub3Qgc2F0aXNmeSB0aGUgdGlkeSBwcmluY2lwbGVzIGFuZCBoZW5jZSB0aGUgbmV4dCBzdGVwIHJlcXVpcmVkIHVzIHRvIGNvbnZlcnQgdGhlIGRhdGFmcmFtZSB0byBhIHRpZHkgZm9ybWF0LiBUaGUgZ2VucmUgY29sdW1uIGluIG15IGRhdGFzZXQgd2FzIG5vdCB0aWR5IGFzIGl0IGRpZCBub3QgY29udGFpbiBvbmUgdmFsdWUuIEl0IGNvbnRhaW5lZCB1cHRvIDMgZ2VucmVzIGZvciBlYWNoIG1vdmllIHdpdGggdGhlIGZpcnN0IGdlbnJlIGJlaW5nIHRoZSBtYWluIGdlbnJlIGFzIHBlciB0aGUgSU1EQiB3ZWJzaXRlLiBTbyBmb3IgdGhpcyBhc3NpZ25tZW50IEkgc2VwYXJhdGVkIHRoZSB2YWx1ZXMgYW5kIHRoZW4gdXNlZCB0aGUgbWFpbiBnZW5yZSBmb3IgdG15IGFuYWx5c2lzIGFuZCBkcm9wcGVkIHRoZSBzdWIgZ2VucmVzLiANClRoZSBuZXh0IGltcG9ydGFudCBwYXJ0IHdhcyB0byBkZWFsIHdpdGggdGhlIG51bGwgdmFsdWVzIHdoaWNoIHdhcyBkb25lIGJ5IHJlcGxhY2luZyB0aGUgbnVsbCB2YWx1ZXMgaW4gbnVtZXJpY2FsIGNvbHVtbnMgd2l0aCB0aGUgbWVhbiwgYW5kIHRoZW4gdGhlIG51bWVyaWNhbCBjb2x1bW5zIHdlcmUgY2hlY2tlZCBmb3Igb3V0bGllcnMuIElmIG91dGxpZXJzIHdlcmUgZGV0ZWN0ZWQgSSBoYW5kbGVkIHRoZW0gdXNpbmcgdGhlIGNhcHBpbmcgbWV0aG9kLiBUaGUgbGFzdCByZXF1aXJlbWVudCB3YXMgdG8gdHJhbnNmb3JtIGEgdmFyaWFibGUgaW50byBub3JtYWxpc2VkIGZvcm0gd2hpY2ggd2FzIEkgZGlkIHVzaW5nIHRoZSBzcXVhcmUgcm9vdCBtZXRob2QuDQoNCg0KDQoNCiMjIERhdGEgDQoNClRoZSBkYXRhc2V0cyB0aGF0IEkgY2hvc2UgZm9yIHRoaXMgYXNzaWdubWVudCB3ZXJlIGV4dHJhY3RlZCBmcm9tIHRoZSBJTURCIHdlYnNpdGUgdGhhdCBjYW4gYmUgZG93bmxvYWRlZCBmcm9tIGh0dHBzOi8vd3d3LmltZGIuY29tL2ludGVyZmFjZXMvLiBUaGUgbW92aWVzX3RpdGxlIGRhdGFzZXQgY29udGFpbnMgaW5mb3JtYXRpb24gb2YgYWxsIG1vdmllcywgdHYgc2VyaWVzLCBzaG9ydCBtb3ZpZXMgZXRjIGZyb20gMjAwNSB0byAyMDIyLiBUaGlzIGRhdGFzZXQgaW5jbHVkZXMgYXR0cmlidXRlcyBsaWtlIHRoZSB0aXRsZSB0eXBlIGllIHdoZXRoZXIgaXQgaXMgYSBtb3ZpZSwgdHYgc2VyaWVzLCBkb2N1bWVudGFyeSBldGMsIHByaW1hcnkgdGl0bGUgb2YgZWFjaCBtb3ZpZSwgdGNvbnN0KHdoaWNoIGlzIGEgdW5pcXVlIGlkZW50aWZpZXIgZ2l2ZW4gdG8gZWFjaCB0aXRsZSksIGlzQWR1bHQsIHN0YXJ0WWVhciwgcnVudGltZU1pbnV0ZXMgYW5kIHRoZSBnZW5yZXMuIA0KVGhlIHNlY29uZCBkYXRhc2V0IGlzIHRoZSByYXRpbmdzIGRhdGFzZXQgdGhhdCBjb250YWluZWQgYXR0cmlidXRlcyBsaWtlIHRjb25zdCwgYXZlcmFnZVJhdGluZyBhbmQgbnVtVm90ZXMuDQpUaGUgdHdvIGRhdGFzZXRzIHdlcmUgam9pbmVkIHVzaW5nIHRoZSB0Y29uc3QgdmFyaWFibGUgdGhhdCB3YXMgcHJlc2VudCBpbiBib3RoIGRhdGFzZXRzIHRvIHByb2R1Y2UgYSBuZXcgZGF0YXNldCwgdGhlIG1vdmllcyBkYXRhc2V0Lg0KDQoNCmBgYHtyfQ0KbW92aWVfdGl0bGUgPC0gcmVhZF9jc3YoIm1vdmllX3RpdGxlLmNzdiIpDQptb3ZpZV90aXRsZQ0KYGBgDQoNCmBgYHtyfQ0KcmF0aW5ncyA8LSByZWFkX2NzdigicmF0aW5ncy5jc3YiKSAjTG9hZGluZyByYXRpbmdzIGRhdGFzZXQNCnJhdGluZ3MNCmBgYA0KDQpgYGB7cn0NCm1vdmllcyA8LSBsZWZ0X2pvaW4obW92aWVfdGl0bGUsIHJhdGluZ3MsIGJ5PSJ0Y29uc3QiKSAjSm9pbmluZyB0aGUgdHdvIGRhdGFzZXRzL21lcmdpbmcNCm1vdmllcw0KDQpgYGANCg0KIyMgVW5kZXJzdGFuZCANCg0KVGhlIG1vdmllcyBkYXRhc2V0IGhhcyAxMCBhdHRyaWJ1dGVzIGFuZCA2NzIzNyBvYnNlcnZhdGlvbnMuIA0KVGhlIHR5cGVzIG9mIHZhcmlhYmxlcyBpbiB0aGlzIGRhdGFzZXQgYXJlIGNoYXJhY3RlcnMsIG51bWVyaWNzIGFuZCBmYWN0b3JzLiANCnRpdGxlVHlwZSB3YXMgY29udmVydGVkIGludG8gZmFjdG9yIGFuZCBpdCB3YXMgb2JzZXJ2ZWQgdGhhdCB0aGVyZSBhcmUgMTAgdHlwZXMgb2YgdGl0bGVzIGluIHRoaXMgZGF0YWZyYW1lLg0KVGhlIGlzQWR1bHQgY29sdW1uIHNpZ25pZmllcyB3aGV0aGVyIHRoZSBtb3ZpZSBpcyBBZHVsdCBvciBub3QuIDAgaXMgZm9yIG1vdmllcyB0aGF0IGFyZSBub3QgQWR1bHQgYW5kIDEgZm9yIG1vdmllcyB0aGF0IGFyZSBhZHVsdCBhbmQgaGVuY2UgdGhpcyBjb2x1bW4gd2FzIGFsc28gY29udmVydGVkIHRvIGZhY3Rvci4NCg0KYGBge3J9DQpzdHIobW92aWVzKQ0KDQpgYGANCg0KYGBge3J9DQpkaW0obW92aWVzKQ0KYGBgDQoNCmBgYHtyfQ0KbW92aWVzJHRpdGxlVHlwZSA8LSBmYWN0b3IobW92aWVzJHRpdGxlVHlwZSkNCmlzLmZhY3Rvcihtb3ZpZXMkdGl0bGVUeXBlKQ0KdGFibGUobW92aWVzJHRpdGxlVHlwZSkNCmBgYA0KDQpgYGB7cn0NCm1vdmllcyA8LSBtdXRhdGUobW92aWVzLCBpc0FkdWx0PSBmYWN0b3IobW92aWVzJGlzQWR1bHQsIGxldmVscyA9IGMoMCwgMSkpKSAjQ29udmVydGluZyBpc0FkdWx0IHRvIGZhY3Rvcg0KbW92aWVzDQpgYGANCg0KDQojIwlUaWR5ICYgTWFuaXB1bGF0ZSBEYXRhIEkgDQoNCkFjY29yZGluZyB0byB0aGUgVGlkeSBwcmluY2lwbGVzIGVhY2ggb2JzZXJ2YXRpb24gbXVzdCBoYXZlIGl0cyBvd24gY2VsbC4NCkluIHRoaXMgZGF0YXNldCBhIG1heGltdW0gb2YgMyBnZW5yZXMgaGF2ZSBiZWVuIHByb3ZpZGVkIGZvciBlYWNoIG1vdmllIGluIHRoZSBnZW5yZSBjb2x1bW4gd2l0aCB0aGUgZmlyc3QgZ2VucmUgYmVpbmcgdGhlIG1haW4gZ2VucmUgb2YgdGhhdCBtb3ZpZS4NClRvIGNvbnZlcnQgbXkgZGF0YXNldCBpbnRvIFRpZHkgZm9ybWF0LCBJIHNlcGFyYXRlZCB0aGUgR2VucmVzIGludG8gMyBjb2x1bW5zOyAibWFpbi1nZW5yZSIsICJzdWItZ2VucmUxIiBhbmQgInN1Yi1nZW5yZTIiLg0KU2luY2UgSSBvbmx5IHdhbnRlZCB0byBjb25zaWRlciB0aGUgbWFpbiBnZW5yZSBpbiBteSBhbmFseXNpcyBJIGRyb3BwZWQgdGhlIG90aGVyIHR3byBjb2x1bW5zLg0KDQpgYGB7cn0NCm1vdmllczwtIG1vdmllcyAlPiUgc2VwYXJhdGUoJ2dlbnJlcycsaW50bz1jKCJtYWluLWdlbnJlIiwgInN1Yi1nZW5yZTEiLCAic3ViLWdlbnJlMiIpLCBzZXA9IiwiKQ0KbW92aWVzDQoNCmBgYA0KDQpgYGB7cn0NCnRhYmxlKG1vdmllcyRgbWFpbi1nZW5yZWApDQoNCmBgYA0KYGBge3J9DQp0YWJsZShtb3ZpZXMkYHN1Yi1nZW5yZTFgKQ0KYGBgDQoNCmBgYHtyfQ0KdGFibGUobW92aWVzJGBzdWItZ2VucmUyYCkNCmBgYA0KDQpgYGB7cn0NCm1vdmllcyA8LSBtb3ZpZXMgJT4lIHNlbGVjdCgtKGBzdWItZ2VucmUxYCksLShgc3ViLWdlbnJlMmApKQ0KbW92aWVzDQpgYGANCg0KDQoNCiMjCVRpZHkgJiBNYW5pcHVsYXRlIERhdGEgSUkgDQoNCkZvciB0aGlzIHN0ZXAgSSBjcmVhdGVkIGEgbmV3IENvbHVtbiBuYW1lZCBQb3B1bGFyaXR5SW5kZXggYnkgbXVsdGlwbHlpbmcgdGhlIGF2ZXJhZ2VSYXRpbmcgYW5kIG51bVZvdGVzIGRhdGEuIFRoZSBQb3B1bGFyaXR5IEluZGV4IGdpdmVzIGEgbWVhc3VyZSBvZiB0aGUgcG9wdWxhdHJpdHkgb2YgdGhlIG1vdmllcy4NClNvbWUgbW92aWVzIG1pZ2h0IGJlIHJhdGVkIGhpZ2hseSBidXQgcmVjZWl2ZWQgb25seSBhIGhhbmRmdWwgbnVtYmVyIG9mIFZvdGVzIG9uIHRoZSBvdGhlciBoYW5kIHNvbWUgbW92aWVzIG1pZ2h0IGJlIGdpdmVuIGFuIGF2ZXJhZ2UgcmF0aW5nIGJ1dCB2b3RlZCBieSBhIGxhcmdlIG51bWJlciBvZiB1c2Vycy4gSGVuY2UgYnkgdGhlIHByb2R1Y3Qgb2YgdGhlIHR3byBjb2x1bW5zIHdlIGNhbiBrbm93IHRoZSBQb3B1bGFyaXR5IG9mIHRoYXQgbW92aWVzIGJhc2VkIG9uIGJvdGggdm90ZXMgYW5kIHJhdGluZy4NCg0KDQpgYGB7cn0NCm1vdmllcyA8LSBtdXRhdGUobW92aWVzLCBQb3B1bGFyaXR5SW5kZXggPSBhdmVyYWdlUmF0aW5nICogbnVtVm90ZXMgKQ0KbW92aWVzDQpgYGANCg0KDQojIwlTY2FuIEkgDQoNCk9uIHNjYW5uaW5nIHRoZSBkYXRhIGZvciBOQSB2YWx1ZXMgSSBmb3VuZCB0aGF0IHRoZSAzIG51bWVyaWNhbCBjb2x1bW5zIGhhZCBtb3JlIHRoYW4gMzAlIHZhbHVlcyBhcyBOQS4gVGhlIE5BIHZhbHVlcyB3ZXJlIGhlbmNlIG5vdCBvbWl0dGVkIGR1ZSB0byB0aGUgbGFyZ2UgbnVtYmVyIGFuZCB3ZXJlIGluc3RlYWQgcmVwbGFjZWQgYnkgdGhlIG1lYW4gb2YgdGhlIGNvbHVtbnMuDQpJIGFsc28gY2hlY2tlZCBmb3IgTkFOIGFuZCBJTkYgdmFsdWVzIGFuZCBmb3VuZCB0aGF0IHRoZXJlIHdlcmUgbm9uZSBvZiB0aGUgdHdvIHByZXNlbnQgaW4gbXkgZGF0YXNldC4NCg0KYGBge3J9DQoNCnN1bShpcy5uYShtb3ZpZXMpKSAjQ2hlY2tpbmcgdGhlIHRvdGFsIG51bWJlciBvZiBOQSBpbiB0aGUgZGF0YWZyYW1lDQoNCmNvbFN1bXMoaXMubmEobW92aWVzKSkgICNDaGVja2luZyB0aGUgdG90YWwgc3VtIG9mIE5BIGluIGVhY2ggY29sdW1uDQoNCmBgYA0KDQpgYGB7cn0NCiNGdW5jdGlvbiB0byByZXBsYWNlIE5BIHdpdGggbWVhbg0KDQpyZXBsYWNlTkFieU1lYW4gPC0gZnVuY3Rpb24oeCl7ICAgICAgICAgDQogICAocmVwbGFjZSh4LCBpcy5uYSh4KSwgbWVhbih4LCBuYS5ybSA9IFRSVUUpKSkNCn0gDQptb3ZpZXNbXSA8LSBsYXBwbHkobW92aWVzLCByZXBsYWNlTkFieU1lYW4pDQpgYGANCg0KYGBge3J9DQpjb2xTdW1zKGlzLm5hKG1vdmllcykpDQpgYGANCg0KDQpgYGB7cn0NCmlzLnNwZWNpYWxvck5BPC0gZnVuY3Rpb24oeCl7DQogIGlmIChpcy5udW1lcmljKHgpKSAoc3VtKGlzLmluZmluaXRlKHgpKSB8IHN1bShpcy5uYW4oeCkpIHwgc3VtKGlzLm5hKHgpKSkNCn0NCnNhcHBseShtb3ZpZXMsIGlzLnNwZWNpYWxvck5BKQ0KYGBgDQoNCg0KDQojIwlTY2FuIElJDQoNCkNoZWNrZWQgZm9yIG91dGxpZXJzIGluIGFsbCA0IG51bWVyaWNhbCBjb2x1bW5zOyBydW50aW1lTWludXRlcywgYXZlcmFnZVJhdGluZywgbnVtVm90ZXMgYW5kIFBvcHVsYXJpdHlJbmRleCB1c2luZyBib3hwbG90cy4gRm91bmQgdGhhdCBhbGwgNCBoYXZlIGEgbGFyZ2UgbnVtYmVyIG9mIG91dGxpZXJzLg0KSSB1c2VkIHRoZSBjYXBwaW5nIG1ldGhvZCB0byB0YWtlIGNhcmUgb2YgdGhlIG91dGxpZXJzIGFmdGVyIHdoaWNoIEkgcGxvdHRlZCB0aGUgYm94IHBsb3RzIGFnYWluIGZvciBhbGwgNCB2YXJpYWJsZXMuDQoNCmBgYHtyfQ0KbW92aWVzJHJ1bnRpbWVNaW51dGVzICU+JSBib3hwbG90KGNvbD0ibGlnaHRibHVlIixtYWluPSJSVU4gVElNRSBNSU5VVEVTIikNCmBgYA0KDQpgYGB7cn0NCm1vdmllcyRhdmVyYWdlUmF0aW5nICU+JSBib3hwbG90KGNvbD0ibGlnaHRibHVlIixtYWluPSJBdmVyYWdlIFJhdGluZyIpDQpgYGANCg0KYGBge3J9DQptb3ZpZXMkbnVtVm90ZXMgJT4lIGJveHBsb3QoY29sPSJsaWdodGJsdWUiLG1haW49Ik51bWJlciBvZiBWb3RlcyIpDQpgYGANCg0KYGBge3J9DQptb3ZpZXMkUG9wdWxhcml0eUluZGV4ICU+JSBib3hwbG90KGNvbD0ibGlnaHRibHVlIixtYWluPSJQb3B1bGFyaXR5IEluZGV4IikNCg0KYGBgDQoNCmBgYHtyfQ0KY2FwIDwtIGZ1bmN0aW9uKHgpew0KICBxdWFudGlsZXMgPC0gcXVhbnRpbGUoIHgsIGMoLjA1LCAwLjI1LCAwLjc1LCAuOTUgKSApDQogIHhbIHggPCBxdWFudGlsZXNbMl0gLSAxLjUqSVFSKHgpIF0gPC0gcXVhbnRpbGVzWzFdDQogIHhbIHggPiBxdWFudGlsZXNbM10gKyAxLjUqSVFSKHgpIF0gPC0gcXVhbnRpbGVzWzRdDQogIHJldHVybih4KQ0KfQ0KDQptb3ZpZXMkcnVudGltZU1pbnV0ZXMgPC1jYXAobW92aWVzJHJ1bnRpbWVNaW51dGVzKQ0KbW92aWVzJHJ1bnRpbWVNaW51dGVzICU+JSBib3hwbG90KGNvbD0icHVycGxlIixtYWluPSJSVU4gVElNRSBNSU5VVEVTIikNCmBgYA0KDQpgYGB7cn0NCm1vdmllcyRhdmVyYWdlUmF0aW5nIDwtY2FwKG1vdmllcyRhdmVyYWdlUmF0aW5nKQ0KbW92aWVzJGF2ZXJhZ2VSYXRpbmcgJT4lIGJveHBsb3QoY29sPSJwdXJwbGUiLG1haW49IkF2ZXJhZ2UgUmF0aW5nIikNCmBgYA0KDQpgYGB7cn0NCm1vdmllcyRudW1Wb3RlcyA8LWNhcChtb3ZpZXMkbnVtVm90ZXMpDQptb3ZpZXMkbnVtVm90ZXMgJT4lIGJveHBsb3QoY29sPSJwdXJwbGUiLG1haW49Ik51bWJlciBvZiBWb3RlcyIpDQpgYGANCg0KYGBge3J9DQptb3ZpZXMkUG9wdWxhcml0eUluZGV4IDwtY2FwKG1vdmllcyRQb3B1bGFyaXR5SW5kZXgpDQptb3ZpZXMkUG9wdWxhcml0eUluZGV4ICU+JSBib3hwbG90KGNvbD0icHVycGxlIixtYWluPSJQb3B1bGFyaXR5IEluZGV4IikNCmBgYA0KDQoNCiMjCVRyYW5zZm9ybSANCg0KSW4gdGhpcyBhc3NpZ25tZW50IHdlIHdlcmUgcmVxdWlyZWQgdG8gdHJhbnNmb3JtIGF0bGVhc3Qgb25lIHZhcmlhYmxlLiBJIHBsb3R0ZWQgdGhlIGhpc3RvZ3JhbSBmb3IgcnVudGltZU1pbnV0ZXMgdG8gY2hlY2sgZm9yIHRoZSBkaXN0cmlidXRpb24uIFRoZSBkaXN0cmlidXRpb24gd2FzIG5vdCBub3JtYWwgYW5kIHdhcyBza2V3ZWQgdG93YXJkcyB0aGUgcmlnaHQuDQpJIHVzZWQgYSBudW1iZXIgb2YgdHJhbnNmb3JtYXRpb25zIGluY2x1ZGluZyB0aGUgQm94Q294IGJ1dCBub3RpY2VkIHRoYXQgc3FydChzcXVhcmUgcm9vdCkgZ2F2ZSB0aGUgYmVzdCByZXN1bHQuDQoNCmBgYHtyfQ0KaGlzdF9hdmVyYWdlUmF0aW5nPC1oaXN0KG1vdmllcyRydW50aW1lTWludXRlcywgY29sPSJzZWFncmVlbiIsbWFpbj0iSGlzdG9ncmFtIGZvciBSdW50aW1lIE1pbnV0ZXMiLHhsYWI9Ik1pbnV0ZXMiKQ0KDQpgYGANCg0KDQpgYGB7cn0NCnRyYW5zZm9ybWVkIDwtIHNxcnQobW92aWVzJHJ1bnRpbWVNaW51dGVzKQ0KaGlzdCh0cmFuc2Zvcm1lZCwgY29sPSJwaW5rIiwgbWFpbiA9ICJUcmFuZm9ybWVkIFJ1bnRpbWUgTWludXRlcyIsIHhsYWI9Ik1pbnV0ZXMiKQ0KYGBgDQoNCg0KDQoNCjxicj4NCjxicj4NCg==