Harold Nelson

9/10/2018

- Review packages tab and command line.

Install the package mosaic using the packages pane. A few other packages including MosaicData will also be installed.

Look at the documentation for mosaicData in the package pane. What is in the Alcohol data?

Try to examine this dataframe in RStudio.

```
# str(Alcohol) will fail
# Check the box to the left of the package to load it.
# Now you can see it.
library("mosaicData")
str(Alcohol)
```

```
## 'data.frame': 411 obs. of 4 variables:
## $ X : int 139 328 517 706 895 980 997 1012 1084 1273 ...
## $ country: chr "Russia" "Russia" "Russia" "Russia" ...
## $ year : int 1985 1986 1987 1988 1989 1990 1990 1990 1990 1991 ...
## $ alcohol: num 13.3 10.8 11 11.6 12 ...
```

You may use either “=” or “<-”. The arrow emphasizes the direction in which information flows.

Does a reversed arrow move information from left to right.

- Creating an object in R does not automatically display it.

Note the difference in the following.

```
## [1] 0.881707322 -0.471165743 -1.297096793 0.446784815 -0.826080628
## [6] 0.141705507 -0.681989303 1.117945399 0.002295676 1.640320880
```

`## [1] 0.2067563`

Generally we use the c() function. A vector can’t have more than one type of entry. Use class to find out what kind of objects a vector holds.

`## [1] "numeric"`

`## [1] "integer"`

`## [1] "1" "2" "Three"`

`## [1] "character"`

Note that R forces (coerces) compliance with its one-type rule.

The c() function is very flexible with its inputs.

```
## [1] 1.0000000 2.0000000 3.0000000 4.0000000 5.0000000 -1.6299282
## [7] 1.6770383 0.6159077 0.3791377 6.0000000 7.0000000 8.0000000
## [13] 9.0000000 10.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [19] 0.0000000
```

Use the c() function to create a vector w containing the following in order. 1. The contents of a numeric vector created by c(12,95,26)

23 values of 1.

The values 16, -4 and 0.

The mean value of the vector in item.

The standard deviation of the vector in 1.

The median of the vector in 1.

```
## [1] 12.00000 95.00000 26.00000 1.00000 1.00000 1.00000 1.00000
## [8] 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
## [15] 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
## [22] 1.00000 1.00000 1.00000 1.00000 1.00000 16.00000 -4.00000
## [29] 0.00000 44.33333 44.43347 26.00000
```

We can create logical vectors with the values TRUE and FALSE. Note that these are not character strings.

`## [1] TRUE TRUE FALSE FALSE TRUE`

We can also use the abbreviations T and F to save typing.

`## [1] TRUE TRUE FALSE FALSE TRUE`

When we apply numerical functions to logical vectors, the TRUE and FALSE values are coerced to act like 1 and 0 respectively.

`## [1] 3`

`## [1] 0.6`

Two useful facts:

- The sum of a vector of logical expressions is the count of TRUE values.
- THe mean of a vector of logical expressions is the proportion of cases in which the logical expression is TRUE.

Example

`## [1] 1 2 3 4 5 6 7 8 9 10`

`## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE`

`## [1] 5`

`## [1] 5`

`## [1] 0.5`

Create a vector of 1000 observations drawn from a standard normal distribution.

What fraction of these observations is positive? Repeat. Do you get exactly the same answer.

`## [1] 0.514`

You get different answers with each run, but they are all close to .5.

NA means “not available.” NA values may exist in datasets you import. You may also set the value of a variable to NA. NaN (Not a number) values arise from failed computations.

`## [1] NA`

```
# Any computation involving an NA value will result in NA.
# Use "na.rm = T" to get numeric functions to skip the NA values.
mean(x,na.rm=T)
```

`## [1] 2`

is.na is a logical function which tests values.

`## [1] FALSE FALSE FALSE TRUE`

How do you count the number of NA values in a vector? Try with our vector x.

Can we use logical equality (==) to test for NA values? Use or vector x.

`## [1] NA NA NA NA`

Any calculation involving NA produces NA.

`## [1] NA`

`## [1] TRUE`

Normally if we perform operations on pairs of vectors, we expect them to be the same length and the results are intuitively obvious.

Example

`## [1] 1 2 3 4`

`## [1] 6 7 8 9`

`## [1] 7 9 11 13`

`## [1] 6 14 24 36`

What happens if x and y are of different lengths? See if you can infer the rule.

`## [1] 1 2 3 4`

`## [1] 6 7`

`## [1] 7 9 9 11`

What happens in this case?

`## [1] 1 2 3 4 5 6 7 8`

`## [1] 2 3 4`

```
## Warning in x + y: longer object length is not a multiple of shorter object
## length
```

`## [1] 3 5 7 6 8 10 9 11`

Basically the same thing, but there is a warning.

`## [1] 1 2`

How do you explain this?

`## [1] 1 2 4 5 7 8 10`

Another example of the recycling rule.