Introductory Statistics (CRN: 6896)



Objective

Today we learn how to use rstudio to get a sense of our dataset. This means learning about the dataset size, the number of variables, the number of observations, the type of variables, the number of missing values and so forth. This is an initial step in any data analysis project. We need to know what is we have before we can decide on what to do.


The CSV format

Datasets come in many formats. A very common one is the csv format. CSV stands for comma sperated values, which surprisingly(!) means values are seperated by commas. But what values? A dataset consists of variables (e.g., Age, Sex) and specific values (e.g., 12, Male). A variable is the name we give to the data on a certain characteristic. So if I have 30 people in a room and collect dta on their age call the variable age (12,23,12,…), sex (Male, Female, Other), location (Manhattan, Brooklyn, Queens,…) with a made up variable called subject (1,2,3,…) and store them in csv format, it will look like this:

subject, age, sex, location
1,12, Male, Brooklyn
2,23, Male, Queens
3,12, Female, Manhattan
4,14, Male, Booklyn

The commas here act as sperators between values. Usually, we open csv files using MS Excel or other spreadsheet programs where we find the data in table — the commas are replaced with lines. This looks much better to the human eye, doesn’t it?

Subject age sex location
1 12 Male Brooklyn
2 23 Male Queens
3 12 Female Manhattan
4 14 Male Brooklyn


Reading a file in CSV format

Ok. Let’s learn how read a csv file into R. The command is as intuitive as anything else in R: read.csv(). Remember, in R we have two types of things: objects (e.g., numbers, letters, tables) and functions (e.g., read(), sum()). Functions take objects as inputs and produce objects as outputs. Because the output of a function is an object, it can be used as input to naother function. So for example, sum(1,3) gives the sum of 1 and 3. Running this command produced the output 4. This output can be used as an input to another function. For example, the output of sum(sum(1,3), 7) is the sum of the number 7 and the sum of the numbers 1 and 3. Which is… 11.

What happens if we runs these lines of code:

sum(1)
[1] 1
sum(1,2,3,4)
[1] 10
sum(1:5)
[1] 15
a<-3
b<-7
sum(a,b)
[1] 10

The command below reads a csv file, and gives it a name, lab2.sample:

lab2.sample <- read.csv("./sample.dataset.csv")


Let’s use common sense to decipher what this command is doing. If I asked you to open a file, what would be the first thing you’d ask me? Exactly: What file and from where? This is precisely what we are telling R in this command. We are giving it the address and the name of the file. Becasue we are using the commnad read.csv, R already knows to treat the file as a csv file (i.e., commas seperate values). Once you run the comman, rstudio creates a table like this:


Note that there is a line underneath the variables’ names row which either says <int> or <fctr>. This line is not in the dataset but is added to show us what the data types in each column are. <int> means the column has integers (e.g., 0,1,2,3…) as values, and <fctr> means the column is a factor (i.e., content is a list of words — Brooklyn, Manhattan, Queens, Bronx, Staten Island).

You can also use the command below to view your datasets:

View(lab2.sample)


Need help?:

Inputs to a function are called arguments. To learn about a function’s arguments, you can type ? followed by the name of the function. This shows you the help page for that function. The help page opens in the bottem right box. Help articles in R have a specific sructure which you wil get used to over time. But for now, type in ?sum and see wht the help articles tells you. It starts by telling you which package the function is a part of, and a short description of what it does. So in the case of sum, you see the following:



In the Usage section, you can see the arguments that a function takes and below that, there is a brief definition for each arguement. For the function sum, we have the arguement ... defined as numeric, complex, or logical vectors. This is a fancy way of saying a list of things you want to add up. Then there is na.rm which can be set as TRUE or FALSE. This field or arguement tells the function what to do when there are missing values. We call these missing values NA (not available). Let’s try the two lines of code below ans see what happens:

sum(NA, 3, 4)
[1] NA
sum(NA, 3, 4, na.rm = T)
[1] 7

When you set the na.rm=T, you are telling R to remove NAs and give you the sum of whatever is left. By default, the na.rm is e to be FALSE, which is why the first line of code above returns NA.

Tips:

  • The arguments of a function have a set default. This frees us from having to always have to set them. You can look up the help section to see what arguments a function takes, what you can in order to get to funcion to behave differently, and what are teh default values for those arguments.
  • At the bottom section of the help article, you have a set of example. These examples are designed in a way that you can copy and paste them into your console and run them.


You can see the example in the console using the command below:

example(sum)

sum> ## Pass a vector to sum, and it will add the elements together.
sum> sum(1:5)
[1] 15

sum> ## Pass several numbers to sum, and it also adds the elements.
sum> sum(1, 2, 3, 4, 5)
[1] 15

sum> ## In fact, you can pass vectors into several arguments, and everything gets added.
sum> sum(1:2, 3:5)
[1] 15

sum> ## If there are missing values, the sum is unknown, i.e., also missing, ....
sum> sum(1:5, NA)
[1] NA

sum> ## ... unless  we exclude missing values explicitly:
sum> sum(1:5, NA, na.rm = TRUE)
[1] 15


Looking at the data

Not that we have the data file loaded in, we want to know things about the dataset. Here are some possible functions to get us started:


  • dim() is a function that tells us how many rows and columns (i.e., how many varaibles and values) our dataset has. It shows us the dimensions of the dataset.
dim(lab2.sample)
[1] 30  4

So our dataset has 30 rows, and 4 column. That seems right.

  • head() shows you the first few lines. you can set what few means
head(lab2.sample)
head(lab2.sample, 10)


  • summary() - give you info about each varaible (e.g., type, calues, fequencies, NAs, etc)
summary(lab2.sample)
    Subject           age            sex              location 
 Min.   : 1.00   Min.   :15.00   Female:17   Bronx        : 2  
 1st Qu.: 8.25   1st Qu.:16.25   Male  : 7   Brooklyn     : 6  
 Median :15.50   Median :19.00   Other : 6   Manhattan    :10  
 Mean   :15.50   Mean   :20.03               Queens       : 6  
 3rd Qu.:22.75   3rd Qu.:23.75               Staten Island: 6  
 Max.   :30.00   Max.   :28.00                                 


  • str() - similar to the two above. Also shows you he type of variable and the few first lines
str(lab2.sample)
'data.frame':   30 obs. of  4 variables:
 $ Subject : int  1 2 3 4 5 6 7 8 9 10 ...
 $ age     : int  17 20 20 18 16 15 26 26 16 24 ...
 $ sex     : Factor w/ 3 levels "Female","Male",..: 2 1 1 1 1 1 1 1 1 1 ...
 $ location: Factor w/ 5 levels "Bronx","Brooklyn",..: 4 4 2 4 3 2 3 3 5 5 ...


Accessing variables

So far, we have run functions on an object that happened to be a dataset. What if we want to access a specific variable in the dataset? For example, what if we are interested only in the age variable? To access specificcolumns, we can use either of the the following: dataset.name$variable.name or dataset.name[["variable.name"]]. Let’s run the two lines of code below:

lab2.sample$age
 [1] 17 20 20 18 16 15 26 26 16 24 18 26 20 16 28 18 19 18 26 23 20 16 27 15 20 26 15 15 19 18
lab2.sample[["age"]]
 [1] 17 20 20 18 16 15 26 26 16 24 18 26 20 16 28 18 19 18 26 23 20 16 27 15 20 26 15 15 19 18

Same goes for other variables (i.e., columns):

lab2.sample$sex
 [1] Male   Female Female Female Female Female Female Female Female Female Other  Female Female Female Female Female Other  Male   Male   Other 
[21] Male   Male   Female Male   Other  Female Female Other  Other  Male  
Levels: Female Male Other
lab2.sample$location
 [1] Queens        Queens        Brooklyn      Queens        Manhattan     Brooklyn      Manhattan     Manhattan     Staten Island Staten Island
[11] Brooklyn      Queens        Staten Island Manhattan     Manhattan     Queens        Manhattan     Manhattan     Bronx         Staten Island
[21] Staten Island Brooklyn      Staten Island Queens        Brooklyn      Bronx         Manhattan     Manhattan     Manhattan     Brooklyn     
Levels: Bronx Brooklyn Manhattan Queens Staten Island


Notice that for Location and Sex, there is a line at the end that starts with Levels. Remember when we talked about integers and factors? These two variables are factors, so their value is an item from the list. What list? The list of values a factor can take is called levels.


Descriptives

Now tha twe know how to access specific variable sin a dataset, we want to find ways to describe them. Let’s focus on age. This variable is an integer. You can check by running the command, is(lab2.sample$age). Soe let’s see what the *mean** age is:

mean(lab2.sample$age)
[1] 20.03333

what about *median**?

median(lab2.sample$age)
[1] 19

Is this the median? Hmmm, let’s check. The *median** is the number that is exactly in the middle of a list of values, or if we have an even number of values, the the median is the average of the two middle numbers. So how can find the middle number? Rememebr, in R everything is intuitive. One thing we’d have to do first it to sort the numbers. But how? Is there a function for sorting values? Let’s try the function below.

apropos("sort")
[1] ".doSortWrap"         ".rs.sortCompletions" "is.unsorted"         "sort"                "sort.default"        "sort.int"           
[7] "sort.list"           "sort.POSIXlt"        "sortedXyData"       

The apropos command tells R to gives anything that refers to or is related to the word sort. That’s why we put it in quotation marks. The resutls show us a bunch of functions but one is called sort. Let’s see what it does:

?sort()

Oh look, the help page say, ‘Sort (or order) a vector or factor (partially) into ascending or descending order’. That seems like what we are looking for.

So let’s ryi it on the variable age:

sort(lab2.sample$age)
 [1] 15 15 15 15 16 16 16 16 17 18 18 18 18 18 19 19 20 20 20 20 20 23 24 26 26 26 26 26 27 28

This seem right. So how do we find the middle numbers? We knwo that we have 30 rows so the median is the average of the two middle values, 15 and 16. The values in these two spots are 19 and 19. The average of them is 19. So our median is correct.

But what if we had a much larger dataset? How would we find a specific value? Let’s try this:

sort(lab2.sample$age)[15]
sort(lab2.sample$age)[16]
sort(lab2.sample$age)[30]

Thisis amazing!! We can use the [] to access the values within the columns. So froexample, let’s look at the age data fro the first person in our dataset:

lab2.sample$age[1]

How do we read this line of code? We are telling are to take the first row of the column named age of the dataset named lab2.sample. Rememebr, the dataset in R islooks like a table, so it has rows and columns. When you type dataset.name[1,4], you are referring to row 1 and column 4. If you refer to the column using the $ sign, then all you need to do is specify the row number. Let’s try a few things.

lab2.sample$age
lab2.sample$age[1]
lab2.sample$age[5]
lab2.sample$age[2:5]

So now we know how to find individual datapoints, specific variables, and observations.


Summary statistics

Le’ts try the commands below to get some decripive statitics split by the sex variable:

by(lab2.sample$age, lab2.sample$sex, mean)
by(lab2.sample$age, lab2.sample$sex, sd)
by(lab2.sample$age, lab2.sample$sex, IQR)

Another way to summarize all this is to use the command below:

summary(lab2.sample$age)


Plotting
hist(lab2.sample$age)
boxplot(age ~ sex, data = lab2.sample) 
LS0tCnRpdGxlOiAiTGFiIDIiCm91dHB1dDoKICBodG1sX25vdGVib29rOiBkZWZhdWx0CiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0CkF1dGhvcjogTW9zdGFmYSBTYWxhcmkgUmFkCgotLS0KCiMjIyBJbnRyb2R1Y3RvcnkgU3RhdGlzdGljcyAoQ1JOOiA2ODk2KQoKXApcCgojIyMjIE9iamVjdGl2ZQpUb2RheSB3ZSBsZWFybiBob3cgdG8gdXNlIHJzdHVkaW8gdG8gZ2V0IGEgc2Vuc2Ugb2Ygb3VyIGRhdGFzZXQuIFRoaXMgbWVhbnMgbGVhcm5pbmcgYWJvdXQgdGhlIGRhdGFzZXQgc2l6ZSwgdGhlIG51bWJlciBvZiB2YXJpYWJsZXMsIHRoZSBudW1iZXIgb2Ygb2JzZXJ2YXRpb25zLCB0aGUgdHlwZSBvZiB2YXJpYWJsZXMsIHRoZSBudW1iZXIgb2YgbWlzc2luZyB2YWx1ZXMgYW5kIHNvIGZvcnRoLiBUaGlzIGlzIGFuIGluaXRpYWwgc3RlcCBpbiBhbnkgZGF0YSBhbmFseXNpcyBwcm9qZWN0LiBXZSBuZWVkIHRvIGtub3cgd2hhdCBpcyB3ZSBoYXZlIGJlZm9yZSB3ZSBjYW4gZGVjaWRlIG9uIHdoYXQgdG8gZG8uICAKClwKCiMjIyMjIFRoZSBDU1YgZm9ybWF0CkRhdGFzZXRzIGNvbWUgaW4gbWFueSBmb3JtYXRzLiBBIHZlcnkgY29tbW9uIG9uZSBpcyB0aGUgKipjc3YqKiBmb3JtYXQuIENTViBzdGFuZHMgZm9yICpjb21tYSBzcGVyYXRlZCB2YWx1ZXMqLCB3aGljaCBzdXJwcmlzaW5nbHkoISkgbWVhbnMgdmFsdWVzIGFyZSBzZXBlcmF0ZWQgYnkgY29tbWFzLiBCdXQgd2hhdCAqdmFsdWVzKj8gQSBkYXRhc2V0IGNvbnNpc3RzIG9mICp2YXJpYWJsZXMqIChlLmcuLCBBZ2UsIFNleCkgYW5kIHNwZWNpZmljICp2YWx1ZXMqIChlLmcuLCAxMiwgTWFsZSkuIEEgdmFyaWFibGUgaXMgdGhlIG5hbWUgd2UgZ2l2ZSB0byB0aGUgZGF0YSBvbiBhIGNlcnRhaW4gY2hhcmFjdGVyaXN0aWMuIFNvIGlmIEkgaGF2ZSAzMCBwZW9wbGUgaW4gYSByb29tIGFuZCBjb2xsZWN0IGR0YSBvbiB0aGVpciBhZ2UgY2FsbCB0aGUgdmFyaWFibGUgYGFnZWAgKDEyLDIzLDEyLC4uLiksIGBzZXhgIChNYWxlLCBGZW1hbGUsIE90aGVyKSwgYGxvY2F0aW9uYCAoTWFuaGF0dGFuLCBCcm9va2x5biwgUXVlZW5zLC4uLikgd2l0aCBhIG1hZGUgdXAgdmFyaWFibGUgY2FsbGVkIGBzdWJqZWN0YCAoMSwyLDMsLi4uKSBhbmQgc3RvcmUgdGhlbSBpbiBjc3YgZm9ybWF0LCBpdCB3aWxsIGxvb2sgbGlrZSB0aGlzOgoKCj5zdWJqZWN0LCBhZ2UsIHNleCwgbG9jYXRpb24gIAo+MSwxMiwgTWFsZSwgQnJvb2tseW4gIAo+MiwyMywgTWFsZSwgUXVlZW5zICAKPjMsMTIsIEZlbWFsZSwgTWFuaGF0dGFuICAKPjQsMTQsIE1hbGUsIEJvb2tseW4gIAo+IC4uLgoKVGhlIGNvbW1hcyBoZXJlIGFjdCBhcyBzcGVyYXRvcnMgYmV0d2VlbiB2YWx1ZXMuIFVzdWFsbHksIHdlIG9wZW4gY3N2IGZpbGVzIHVzaW5nIE1TIEV4Y2VsIG9yIG90aGVyIHNwcmVhZHNoZWV0IHByb2dyYW1zIHdoZXJlIHdlIGZpbmQgdGhlIGRhdGEgaW4gdGFibGUg4oCUIHRoZSBjb21tYXMgYXJlIHJlcGxhY2VkIHdpdGggbGluZXMuIFRoaXMgbG9va3MgbXVjaCBiZXR0ZXIgdG8gdGhlIGh1bWFuIGV5ZSwgZG9lc24ndCBpdD8gCgoKfFN1YmplY3QgfCBhZ2UgICB8IHNleCAgICB8bG9jYXRpb24gIHwKfC0tLS0tLS0tfC0tLS0tLS18LS0tLS0tLS18LS0tLS0tLS0tLS0KfCAxICAgICAgfCAgMTIgICB8IE1hbGUgICB8QnJvb2tseW4gIHwKfCAyICAgICAgfCAgMjMgICB8IE1hbGUgICB8UXVlZW5zICAgIHwKfCAzICAgICAgfCAgMTIgICB8IEZlbWFsZSB8TWFuaGF0dGFuIHwKfCA0ICAgICAgfCAgMTQgICB8IE1hbGUgICB8QnJvb2tseW4gIHwKClwKCiMjIyMjIFJlYWRpbmcgYSBmaWxlIGluIENTViBmb3JtYXQKT2suIExldCdzIGxlYXJuIGhvdyByZWFkIGEgY3N2IGZpbGUgaW50byBSLiBUaGUgY29tbWFuZCBpcyBhcyBpbnR1aXRpdmUgYXMgYW55dGhpbmcgZWxzZSBpbiBSOiBgcmVhZC5jc3YoKWAuIFJlbWVtYmVyLCBpbiBSIHdlIGhhdmUgdHdvIHR5cGVzIG9mIHRoaW5nczogKipvYmplY3RzKiogKGUuZy4sIG51bWJlcnMsIGxldHRlcnMsIHRhYmxlcykgYW5kICoqZnVuY3Rpb25zKiogKGUuZy4sIHJlYWQoKSwgc3VtKCkpLiBGdW5jdGlvbnMgdGFrZSBvYmplY3RzIGFzIGlucHV0cyBhbmQgcHJvZHVjZSBvYmplY3RzIGFzIG91dHB1dHMuIEJlY2F1c2UgdGhlIG91dHB1dCBvZiBhIGZ1bmN0aW9uIGlzIGFuIG9iamVjdCwgaXQgY2FuIGJlIHVzZWQgYXMgaW5wdXQgdG8gbmFvdGhlciBmdW5jdGlvbi4gU28gZm9yIGV4YW1wbGUsIGBzdW0oMSwzKWAgZ2l2ZXMgdGhlIHN1bSBvZiAxIGFuZCAzLiBSdW5uaW5nIHRoaXMgY29tbWFuZCBwcm9kdWNlZCB0aGUgb3V0cHV0IDQuIFRoaXMgb3V0cHV0IGNhbiBiZSB1c2VkIGFzIGFuIGlucHV0IHRvIGFub3RoZXIgZnVuY3Rpb24uIEZvciBleGFtcGxlLCB0aGUgb3V0cHV0IG9mIGBzdW0oc3VtKDEsMyksIDcpYCBpcyB0aGUgc3VtIG9mIHRoZSBudW1iZXIgNyBhbmQgdGhlIHN1bSBvZiB0aGUgbnVtYmVycyAxIGFuZCAzLiBXaGljaCBpcy4uLiAxMS4gCgpXaGF0IGhhcHBlbnMgaWYgd2UgcnVucyB0aGVzZSBsaW5lcyBvZiBjb2RlOgoKYGBge3J9CnN1bSgxKQpzdW0oMSwyLDMsNCkKc3VtKDE6NSkKYTwtMwpiPC03CnN1bShhLGIpCmBgYAoKClRoZSBjb21tYW5kIGJlbG93IHJlYWRzIGEgY3N2IGZpbGUsIGFuZCBnaXZlcyBpdCBhIG5hbWUsICoqbGFiMi5zYW1wbGUqKjogCgpgYGB7cn0KbGFiMi5zYW1wbGUgPC0gcmVhZC5jc3YoIi4vc2FtcGxlLmRhdGFzZXQuY3N2IikKYGBgCgpcCgpMZXQncyB1c2UgY29tbW9uIHNlbnNlIHRvIGRlY2lwaGVyIHdoYXQgdGhpcyBjb21tYW5kIGlzIGRvaW5nLiBJZiBJIGFza2VkIHlvdSB0byBvcGVuIGEgZmlsZSwgd2hhdCB3b3VsZCBiZSB0aGUgZmlyc3QgdGhpbmcgeW91J2QgYXNrIG1lPyBFeGFjdGx5OiBXaGF0IGZpbGUgYW5kIGZyb20gd2hlcmU/IFRoaXMgaXMgcHJlY2lzZWx5IHdoYXQgd2UgYXJlIHRlbGxpbmcgUiBpbiB0aGlzIGNvbW1hbmQuIFdlIGFyZSBnaXZpbmcgaXQgdGhlIGFkZHJlc3MgYW5kIHRoZSBuYW1lIG9mIHRoZSBmaWxlLiBCZWNhc3VlIHdlIGFyZSB1c2luZyB0aGUgY29tbW5hZCBgcmVhZC5jc3ZgLCBSIGFscmVhZHkga25vd3MgdG8gdHJlYXQgdGhlIGZpbGUgYXMgYSBjc3YgZmlsZSAoaS5lLiwgY29tbWFzIHNlcGVyYXRlIHZhbHVlcykuIE9uY2UgeW91IHJ1biB0aGUgY29tbWFuLCByc3R1ZGlvIGNyZWF0ZXMgYSB0YWJsZSBsaWtlIHRoaXM6CgpgYGB7ciwgZWNobz1GfQpsYWIyLnNhbXBsZSA8LSByZWFkLmNzdigiLi9zYW1wbGUuZGF0YXNldC5jc3YiKQpwcmludChsYWIyLnNhbXBsZSkKYGBgCgpcCgpOb3RlIHRoYXQgdGhlcmUgaXMgYSBsaW5lIHVuZGVybmVhdGggdGhlIHZhcmlhYmxlcycgbmFtZXMgcm93IHdoaWNoIGVpdGhlciBzYXlzIGA8aW50PmAgb3IgYDxmY3RyPmAuIFRoaXMgbGluZSBpcyBub3QgaW4gdGhlIGRhdGFzZXQgYnV0IGlzIGFkZGVkIHRvIHNob3cgdXMgd2hhdCB0aGUgZGF0YSB0eXBlcyBpbiBlYWNoIGNvbHVtbiBhcmUuIGA8aW50PmAgbWVhbnMgdGhlIGNvbHVtbiBoYXMgaW50ZWdlcnMgKGUuZy4sIDAsMSwyLDMuLi4pIGFzIHZhbHVlcywgYW5kIGA8ZmN0cj5gIG1lYW5zIHRoZSBjb2x1bW4gaXMgYSBmYWN0b3IgKGkuZS4sIGNvbnRlbnQgaXMgYSBsaXN0IG9mIHdvcmRzIOKAlCBCcm9va2x5biwgTWFuaGF0dGFuLCBRdWVlbnMsIEJyb254LCBTdGF0ZW4gSXNsYW5kKS4KCllvdSBjYW4gYWxzbyB1c2UgdGhlIGNvbW1hbmQgYmVsb3cgdG8gdmlldyB5b3VyIGRhdGFzZXRzOgoKCmBgYHtyfQpWaWV3KGxhYjIuc2FtcGxlKQpgYGAKClwKCgojIyMjIyBOZWVkIGhlbHA/OgpJbnB1dHMgdG8gYSBmdW5jdGlvbiBhcmUgY2FsbGVkICphcmd1bWVudHMqLiBUbyBsZWFybiBhYm91dCBhIGZ1bmN0aW9uJ3MgYXJndW1lbnRzLCB5b3UgY2FuIHR5cGUgYD9gIGZvbGxvd2VkIGJ5IHRoZSBuYW1lIG9mIHRoZSBmdW5jdGlvbi4gVGhpcyBzaG93cyB5b3UgdGhlIGhlbHAgcGFnZSBmb3IgdGhhdCBmdW5jdGlvbi4gVGhlIGhlbHAgcGFnZSBvcGVucyBpbiB0aGUgYm90dGVtIHJpZ2h0IGJveC4gSGVscCBhcnRpY2xlcyBpbiBSIGhhdmUgYSBzcGVjaWZpYyBzcnVjdHVyZSB3aGljaCB5b3Ugd2lsIGdldCB1c2VkIHRvIG92ZXIgdGltZS4gQnV0IGZvciBub3csIHR5cGUgaW4gYD9zdW1gIGFuZCBzZWUgd2h0IHRoZSBoZWxwICBhcnRpY2xlcyB0ZWxscyB5b3UuIEl0IHN0YXJ0cyBieSB0ZWxsaW5nIHlvdSB3aGljaCBwYWNrYWdlIHRoZSBmdW5jdGlvbiBpcyBhIHBhcnQgb2YsIGFuZCBhIHNob3J0IGRlc2NyaXB0aW9uIG9mIHdoYXQgaXQgZG9lcy4gU28gaW4gdGhlIGNhc2Ugb2YgYHN1bWAsIHlvdSBzZWUgdGhlIGZvbGxvd2luZzoKClwKCiFbXShpbWFnZXMvaW1hZ2UxLnBuZykKClwKSW4gdGhlICoqVXNhZ2UqKiBzZWN0aW9uLCB5b3UgY2FuIHNlZSB0aGUgYXJndW1lbnRzIHRoYXQgYSBmdW5jdGlvbiB0YWtlcyBhbmQgYmVsb3cgdGhhdCwgdGhlcmUgaXMgYSBicmllZiBkZWZpbml0aW9uIGZvciBlYWNoIGFyZ3VlbWVudC4gRm9yIHRoZSBmdW5jdGlvbiBgc3VtYCwgd2UgaGF2ZSB0aGUgYXJndWVtZW50IGAuLi5gIGRlZmluZWQgYXMgKm51bWVyaWMsIGNvbXBsZXgsIG9yIGxvZ2ljYWwgdmVjdG9ycyouIFRoaXMgaXMgYSBmYW5jeSB3YXkgb2Ygc2F5aW5nICphIGxpc3Qgb2YgdGhpbmdzIHlvdSB3YW50IHRvIGFkZCB1cCouIFRoZW4gdGhlcmUgaXMgYG5hLnJtYCB3aGljaCBjYW4gYmUgc2V0IGFzIFRSVUUgb3IgRkFMU0UuIFRoaXMgZmllbGQgb3IgYXJndWVtZW50IHRlbGxzIHRoZSBmdW5jdGlvbiB3aGF0IHRvIGRvIHdoZW4gdGhlcmUgYXJlIG1pc3NpbmcgdmFsdWVzLiBXZSBjYWxsIHRoZXNlIG1pc3NpbmcgdmFsdWVzIGBOQWAgKG5vdCBhdmFpbGFibGUpLiBMZXQncyB0cnkgdGhlIHR3byBsaW5lcyBvZiBjb2RlIGJlbG93IGFucyBzZWUgd2hhdCBoYXBwZW5zOgoKYGBge3J9CnN1bShOQSwgMywgNCkKc3VtKE5BLCAzLCA0LCBuYS5ybSA9IFQpCmBgYAoKV2hlbiB5b3Ugc2V0IHRoZSBuYS5ybT1ULCB5b3UgYXJlIHRlbGxpbmcgUiB0byByZW1vdmUgTkFzIGFuZCBnaXZlIHlvdSB0aGUgc3VtIG9mIHdoYXRldmVyIGlzIGxlZnQuIEJ5IGRlZmF1bHQsIHRoZSBgbmEucm1gIGlzIGUgdG8gYmUgYEZBTFNFYCwgd2hpY2ggaXMgd2h5IHRoZSBmaXJzdCBsaW5lIG9mIGNvZGUgYWJvdmUgcmV0dXJucyBOQS4gCgoKKipUaXBzKio6CgotIFRoZSBhcmd1bWVudHMgb2YgYSBmdW5jdGlvbiBoYXZlIGEgc2V0IGRlZmF1bHQuIFRoaXMgZnJlZXMgdXMgZnJvbSBoYXZpbmcgdG8gYWx3YXlzIGhhdmUgdG8gc2V0IHRoZW0uIFlvdSBjYW4gbG9vayB1cCB0aGUgaGVscCBzZWN0aW9uIHRvIHNlZSB3aGF0IGFyZ3VtZW50cyBhIGZ1bmN0aW9uIHRha2VzLCB3aGF0IHlvdSBjYW4gaW4gb3JkZXIgdG8gZ2V0IHRvIGZ1bmNpb24gdG8gYmVoYXZlIGRpZmZlcmVudGx5LCBhbmQgd2hhdCBhcmUgdGVoIGRlZmF1bHQgdmFsdWVzIGZvciB0aG9zZSBhcmd1bWVudHMuICAKLSBBdCB0aGUgYm90dG9tIHNlY3Rpb24gb2YgdGhlIGhlbHAgYXJ0aWNsZSwgeW91IGhhdmUgYSBzZXQgb2YgZXhhbXBsZS4gVGhlc2UgZXhhbXBsZXMgYXJlIGRlc2lnbmVkIGluIGEgd2F5IHRoYXQgeW91IGNhbiBjb3B5IGFuZCBwYXN0ZSB0aGVtIGludG8geW91ciBjb25zb2xlIGFuZCBydW4gdGhlbS4gCgpcCgohW10oaW1hZ2VzL2ltYWdlMi5wbmcpCgpZb3UgY2FuIHNlZSB0aGUgZXhhbXBsZSBpbiB0aGUgY29uc29sZSB1c2luZyB0aGUgY29tbWFuZCBiZWxvdzoKYGBge3J9CmV4YW1wbGUoc3VtKQpgYGAKCgpcCgojIyMjIExvb2tpbmcgYXQgdGhlIGRhdGEKCk5vdCB0aGF0IHdlIGhhdmUgdGhlIGRhdGEgZmlsZSBsb2FkZWQgaW4sIHdlIHdhbnQgdG8ga25vdyB0aGluZ3MgYWJvdXQgdGhlIGRhdGFzZXQuIEhlcmUgYXJlIHNvbWUgcG9zc2libGUgZnVuY3Rpb25zIHRvIGdldCB1cyBzdGFydGVkOgoKXAoKCi0gYGRpbSgpYCBpcyBhIGZ1bmN0aW9uIHRoYXQgdGVsbHMgdXMgaG93IG1hbnkgcm93cyBhbmQgY29sdW1ucyAoaS5lLiwgaG93IG1hbnkgdmFyYWlibGVzIGFuZCB2YWx1ZXMpIG91ciBkYXRhc2V0IGhhcy4gSXQgc2hvd3MgdXMgdGhlIGRpbWVuc2lvbnMgb2YgdGhlIGRhdGFzZXQuIApgYGB7cn0KZGltKGxhYjIuc2FtcGxlKQpgYGAKClNvIG91ciBkYXRhc2V0IGhhcyAzMCByb3dzLCBhbmQgNCBjb2x1bW4uIFRoYXQgc2VlbXMgcmlnaHQuIAoKLSBgaGVhZCgpYCBzaG93cyB5b3UgdGhlIGZpcnN0IGZldyBsaW5lcy4geW91IGNhbiBzZXQgd2hhdCAqZmV3KiBtZWFucwpgYGB7ciwgZWNobz1UfQpoZWFkKGxhYjIuc2FtcGxlKQpoZWFkKGxhYjIuc2FtcGxlLCAxMCkKYGBgCgpcCgotIGBzdW1tYXJ5KClgIC0gZ2l2ZSB5b3UgaW5mbyBhYm91dCBlYWNoIHZhcmFpYmxlIChlLmcuLCB0eXBlLCBjYWx1ZXMsIGZlcXVlbmNpZXMsIE5BcywgZXRjKQpgYGB7ciwgZWNobz1UfQpzdW1tYXJ5KGxhYjIuc2FtcGxlKQpgYGAKClwKCi0gYHN0cigpYCAtIHNpbWlsYXIgdG8gdGhlIHR3byBhYm92ZS4gQWxzbyBzaG93cyB5b3UgaGUgdHlwZSBvZiB2YXJpYWJsZSBhbmQgdGhlIGZldyBmaXJzdCBsaW5lcwpgYGB7cn0Kc3RyKGxhYjIuc2FtcGxlKQpgYGAKCgpcCgojIyMjIEFjY2Vzc2luZyB2YXJpYWJsZXMKU28gZmFyLCB3ZSBoYXZlIHJ1biBmdW5jdGlvbnMgb24gYW4gb2JqZWN0IHRoYXQgaGFwcGVuZWQgdG8gYmUgYSBkYXRhc2V0LiBXaGF0IGlmIHdlIHdhbnQgdG8gYWNjZXNzIGEgc3BlY2lmaWMgdmFyaWFibGUgaW4gdGhlIGRhdGFzZXQ/IEZvciBleGFtcGxlLCB3aGF0IGlmIHdlIGFyZSBpbnRlcmVzdGVkIG9ubHkgaW4gdGhlIGBhZ2VgIHZhcmlhYmxlPyBUbyBhY2Nlc3Mgc3BlY2lmaWNjb2x1bW5zLCB3ZSBjYW4gdXNlIGVpdGhlciBvZiB0aGUgdGhlIGZvbGxvd2luZzogYGRhdGFzZXQubmFtZSR2YXJpYWJsZS5uYW1lYCBvciBgZGF0YXNldC5uYW1lW1sidmFyaWFibGUubmFtZSJdXWAuIExldCdzIHJ1biB0aGUgdHdvIGxpbmVzIG9mIGNvZGUgYmVsb3c6CgpgYGB7ciwgZWNobz1UfQpsYWIyLnNhbXBsZSRhZ2UKbGFiMi5zYW1wbGVbWyJhZ2UiXV0KYGBgCgoKU2FtZSBnb2VzIGZvciBvdGhlciB2YXJpYWJsZXMgKGkuZS4sIGNvbHVtbnMpOiAKCmBgYHtyLCBlY2hvPVR9CmxhYjIuc2FtcGxlJHNleApsYWIyLnNhbXBsZSRsb2NhdGlvbgpgYGAKXAoKTm90aWNlIHRoYXQgZm9yIGBMb2NhdGlvbmAgYW5kIGBTZXhgLCB0aGVyZSBpcyBhIGxpbmUgYXQgdGhlIGVuZCB0aGF0IHN0YXJ0cyB3aXRoICoqTGV2ZWxzKiouIFJlbWVtYmVyIHdoZW4gd2UgdGFsa2VkIGFib3V0IGludGVnZXJzIGFuZCBmYWN0b3JzPyBUaGVzZSB0d28gdmFyaWFibGVzIGFyZSBmYWN0b3JzLCBzbyB0aGVpciB2YWx1ZSBpcyBhbiBpdGVtIGZyb20gdGhlIGxpc3QuIFdoYXQgbGlzdD8gVGhlIGxpc3Qgb2YgdmFsdWVzIGEgZmFjdG9yIGNhbiB0YWtlIGlzIGNhbGxlZCBgbGV2ZWxzYC4KClwKCiMjIyMgRGVzY3JpcHRpdmVzCk5vdyB0aGEgdHdlIGtub3cgaG93IHRvIGFjY2VzcyBzcGVjaWZpYyB2YXJpYWJsZSBzaW4gYSBkYXRhc2V0LCB3ZSB3YW50IHRvIGZpbmQgd2F5cyB0byBkZXNjcmliZSB0aGVtLiBMZXQncyBmb2N1cyBvbiBgYWdlYC4gVGhpcyB2YXJpYWJsZSBpcyBhbiBpbnRlZ2VyLiBZb3UgY2FuIGNoZWNrIGJ5IHJ1bm5pbmcgdGhlIGNvbW1hbmQsIGBpcyhsYWIyLnNhbXBsZSRhZ2UpYC4gU29lIGxldCdzIHNlZSB3aGF0IHRoZSAqbWVhbioqIGFnZSBpczoKCmBgYHtyfQptZWFuKGxhYjIuc2FtcGxlJGFnZSkKYGBgCgp3aGF0IGFib3V0ICptZWRpYW4qKj8KCmBgYHtyfQptZWRpYW4obGFiMi5zYW1wbGUkYWdlKQpgYGAKCklzIHRoaXMgdGhlIG1lZGlhbj8gSG1tbSwgbGV0J3MgY2hlY2suIFRoZSAqbWVkaWFuKiogaXMgdGhlIG51bWJlciB0aGF0IGlzIGV4YWN0bHkgaW4gdGhlIG1pZGRsZSBvZiBhIGxpc3Qgb2YgdmFsdWVzLCBvciBpZiB3ZSBoYXZlIGFuIGV2ZW4gbnVtYmVyIG9mIHZhbHVlcywgdGhlIHRoZSBtZWRpYW4gaXMgdGhlIGF2ZXJhZ2Ugb2YgdGhlIHR3byBtaWRkbGUgbnVtYmVycy4gU28gaG93IGNhbiBmaW5kIHRoZSBtaWRkbGUgbnVtYmVyPyBSZW1lbWViciwgaW4gUiBldmVyeXRoaW5nIGlzIGludHVpdGl2ZS4gT25lIHRoaW5nIHdlJ2QgaGF2ZSB0byBkbyBmaXJzdCBpdCB0byBzb3J0IHRoZSBudW1iZXJzLiBCdXQgaG93PyBJcyB0aGVyZSBhIGZ1bmN0aW9uIGZvciBzb3J0aW5nIHZhbHVlcz8gTGV0J3MgdHJ5IHRoZSBmdW5jdGlvbiBiZWxvdy4gIAoKYGBge3J9CmFwcm9wb3MoInNvcnQiKQpgYGAKClRoZSBgYXByb3Bvc2AgY29tbWFuZCB0ZWxscyBSIHRvIGdpdmVzIGFueXRoaW5nIHRoYXQgcmVmZXJzIHRvIG9yIGlzIHJlbGF0ZWQgdG8gdGhlIHdvcmQgKnNvcnQqLiBUaGF0J3Mgd2h5IHdlIHB1dCBpdCBpbiBxdW90YXRpb24gbWFya3MuIFRoZSByZXN1dGxzIHNob3cgdXMgYSBidW5jaCBvZiBmdW5jdGlvbnMgYnV0IG9uZSBpcyBjYWxsZWQgc29ydC4gTGV0J3Mgc2VlIHdoYXQgaXQgZG9lczoKCmBgYHtyfQo/c29ydCgpCmBgYAoKT2ggbG9vaywgdGhlIGhlbHAgcGFnZSBzYXksIConU29ydCAob3Igb3JkZXIpIGEgdmVjdG9yIG9yIGZhY3RvciAocGFydGlhbGx5KSBpbnRvIGFzY2VuZGluZyBvciBkZXNjZW5kaW5nIG9yZGVyJyouIFRoYXQgc2VlbXMgbGlrZSB3aGF0IHdlIGFyZSBsb29raW5nIGZvci4KClNvIGxldCdzIHJ5aSBpdCBvbiB0aGUgdmFyaWFibGUgYWdlOgoKYGBge3J9CnNvcnQobGFiMi5zYW1wbGUkYWdlKQpgYGAKClRoaXMgc2VlbSByaWdodC4gU28gaG93IGRvIHdlIGZpbmQgdGhlIG1pZGRsZSBudW1iZXJzPyBXZSBrbndvIHRoYXQgd2UgaGF2ZSAzMCByb3dzIHNvIHRoZSBtZWRpYW4gaXMgdGhlIGF2ZXJhZ2Ugb2YgdGhlIHR3byBtaWRkbGUgdmFsdWVzLCAxNSBhbmQgMTYuIFRoZSB2YWx1ZXMgaW4gdGhlc2UgdHdvIHNwb3RzIGFyZSAxOSBhbmQgMTkuIFRoZSBhdmVyYWdlIG9mIHRoZW0gaXMgMTkuIFNvIG91ciBtZWRpYW4gaXMgY29ycmVjdC4gCgpCdXQgd2hhdCBpZiB3ZSBoYWQgYSBtdWNoIGxhcmdlciBkYXRhc2V0PyBIb3cgd291bGQgd2UgZmluZCBhIHNwZWNpZmljIHZhbHVlPyBMZXQncyB0cnkgdGhpczoKCmBgYHtyfQpzb3J0KGxhYjIuc2FtcGxlJGFnZSlbMTVdCnNvcnQobGFiMi5zYW1wbGUkYWdlKVsxNl0Kc29ydChsYWIyLnNhbXBsZSRhZ2UpWzMwXQpgYGAKCgpUaGlzaXMgYW1hemluZyEhIFdlIGNhbiB1c2UgdGhlIGBbXWAgdG8gYWNjZXNzIHRoZSB2YWx1ZXMgd2l0aGluIHRoZSBjb2x1bW5zLiBTbyBmcm9leGFtcGxlLCBsZXQncyBsb29rIGF0IHRoZSBhZ2UgZGF0YSBmcm8gdGhlIGZpcnN0IHBlcnNvbiBpbiBvdXIgZGF0YXNldDoKCmBgYHtyfQpsYWIyLnNhbXBsZSRhZ2VbMV0KYGBgCgpIb3cgZG8gd2UgcmVhZCB0aGlzIGxpbmUgb2YgY29kZT8gV2UgYXJlIHRlbGxpbmcgYXJlIHRvIHRha2UgdGhlIGZpcnN0IHJvdyBvZiB0aGUgY29sdW1uIG5hbWVkIGBhZ2VgIG9mIHRoZSBkYXRhc2V0IG5hbWVkIGBsYWIyLnNhbXBsZWAuIFJlbWVtZWJyLCB0aGUgZGF0YXNldCBpbiBSIGlzbG9va3MgbGlrZSBhIHRhYmxlLCBzbyBpdCBoYXMgcm93cyBhbmQgY29sdW1ucy4gV2hlbiB5b3UgdHlwZSBgZGF0YXNldC5uYW1lWzEsNF1gLCB5b3UgYXJlIHJlZmVycmluZyB0byByb3cgMSBhbmQgY29sdW1uIDQuIElmIHlvdSByZWZlciB0byB0aGUgY29sdW1uIHVzaW5nIHRoZSBgJGAgc2lnbiwgdGhlbiBhbGwgeW91IG5lZWQgdG8gZG8gaXMgc3BlY2lmeSB0aGUgcm93IG51bWJlci4gTGV0J3MgdHJ5IGEgZmV3IHRoaW5ncy4gCgpgYGB7ciwgaW5jbHVkZT1UfQpsYWIyLnNhbXBsZSRhZ2UKbGFiMi5zYW1wbGUkYWdlWzFdCmxhYjIuc2FtcGxlJGFnZVs1XQpsYWIyLnNhbXBsZSRhZ2VbMjo1XQpgYGAKCgpTbyBub3cgd2Uga25vdyBob3cgdG8gZmluZCBpbmRpdmlkdWFsIGRhdGFwb2ludHMsIHNwZWNpZmljIHZhcmlhYmxlcywgYW5kIG9ic2VydmF0aW9ucy4gCgpcCgojIyMjIyBTdW1tYXJ5IHN0YXRpc3RpY3MKCkxlJ3RzIHRyeSB0aGUgY29tbWFuZHMgYmVsb3cgdG8gZ2V0IHNvbWUgZGVjcmlwaXZlIHN0YXRpdGljcyBzcGxpdCBieSB0aGUgc2V4IHZhcmlhYmxlOgoKCmBgYHtyfQpieShsYWIyLnNhbXBsZSRhZ2UsIGxhYjIuc2FtcGxlJHNleCwgbWVhbikKYnkobGFiMi5zYW1wbGUkYWdlLCBsYWIyLnNhbXBsZSRzZXgsIHNkKQpieShsYWIyLnNhbXBsZSRhZ2UsIGxhYjIuc2FtcGxlJHNleCwgSVFSKQpgYGAKCgpBbm90aGVyIHdheSB0byBzdW1tYXJpemUgYWxsIHRoaXMgaXMgdG8gdXNlIHRoZSBjb21tYW5kIGJlbG93OgoKYGBge3J9CnN1bW1hcnkobGFiMi5zYW1wbGUkYWdlKQpgYGAKClwKCiMjIyMjIFBsb3R0aW5nCgoKYGBge3J9Cmhpc3QobGFiMi5zYW1wbGUkYWdlKQpgYGAKCmBgYHtyfQpib3hwbG90KGFnZSB+IHNleCwgZGF0YSA9IGxhYjIuc2FtcGxlKSAKYGBgCgoKCgoKCgoKCgoKCgo=