Harold Nelson
9/10/2018
Install the package mosaic using the packages pane. A few other packages including MosaicData will also be installed.
Look at the documentation for mosaicData in the package pane. What is in the Alcohol data?
Try to examine this dataframe in RStudio.
# str(Alcohol) will fail
# Check the box to the left of the package to load it.
# Now you can see it.
library("mosaicData")
str(Alcohol)
## 'data.frame': 411 obs. of 4 variables:
## $ X : int 139 328 517 706 895 980 997 1012 1084 1273 ...
## $ country: chr "Russia" "Russia" "Russia" "Russia" ...
## $ year : int 1985 1986 1987 1988 1989 1990 1990 1990 1990 1991 ...
## $ alcohol: num 13.3 10.8 11 11.6 12 ...
You may use either “=” or “<-”. The arrow emphasizes the direction in which information flows.
Does a reversed arrow move information from left to right.
Note the difference in the following.
## [1] -0.98083402 0.78891075 0.22587433 0.65428631 1.14779661
## [6] 0.71341770 2.47666419 -0.72802280 0.07813383 0.60015560
## [1] -0.2188568
Generally we use the c() function. A vector can’t have more than one type of entry. Use class to find out what kind of objects a vector holds.
## [1] "numeric"
## [1] "integer"
## [1] "1" "2" "Three"
## [1] "character"
Note that R forces (coerces) compliance with its one-type rule.
The c() function is very flexible with its inputs.
## [1] 1.0000000 2.0000000 3.0000000 4.0000000 5.0000000 1.2377674
## [7] 1.3356949 1.4055788 -0.5307456 6.0000000 7.0000000 8.0000000
## [13] 9.0000000 10.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [19] 0.0000000
Use the c() function to create a vector w containing the following in order. 1. The contents of a numeric vector created by c(12,95,26)
23 values of 1.
The values 16, -4 and 0.
The mean value of the vector in item.
The standard deviation of the vector in 1.
The median of the vector in 1.
## [1] 12.00000 95.00000 26.00000 1.00000 1.00000 1.00000 1.00000
## [8] 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
## [15] 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
## [22] 1.00000 1.00000 1.00000 1.00000 1.00000 16.00000 -4.00000
## [29] 0.00000 44.33333 44.43347 26.00000
We can create logical vectors with the values TRUE and FALSE. Note that these are not character strings.
## [1] TRUE TRUE FALSE FALSE TRUE
We can also use the abbreviations T and F to save typing.
## [1] TRUE TRUE FALSE FALSE TRUE
When we apply numerical functions to logical vectors, the TRUE and FALSE values are coerced to act like 1 and 0 respectively.
## [1] 3
## [1] 0.6
Two useful facts:
Example
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
## [1] 5
## [1] 5
## [1] 0.5
Create a vector of 1000 observations drawn from a standard normal distribution.
What fraction of these observations is positive? Repeat. Do you get exactly the same answer.
## [1] 0.503
You get different answers with each run, but they are all close to .5.
NA means “not available.” NA values may exist in datasets you import. You may also set the value of a variable to NA. NaN (Not a number) values arise from failed computations.
## [1] NA
# Any computation involving an NA value will result in NA.
# Use "na.rm = T" to get numeric functions to skip the NA values.
mean(x,na.rm=T)
## [1] 2
is.na is a logical function which tests values.
## [1] FALSE FALSE FALSE TRUE
How do you count the number of NA values in a vector? Try with our vector x.
Can we use logical equality (==) to test for NA values? Use or vector x.
## [1] NA NA NA NA
Any calculation involving NA produces NA.
## [1] NA
## [1] TRUE
Normally if we perform operations on pairs of vectors, we expect them to be the same length and the results are intuitively obvious.
Example
## [1] 1 2 3 4
## [1] 6 7 8 9
## [1] 7 9 11 13
## [1] 6 14 24 36
What happens if x and y are of different lengths? See if you can infer the rule.
## [1] 1 2 3 4
## [1] 6 7
## [1] 7 9 9 11
What happens in this case?
## [1] 1 2 3 4 5 6 7 8
## [1] 2 3 4
## Warning in x + y: longer object length is not a multiple of shorter object
## length
## [1] 3 5 7 6 8 10 9 11
Basically the same thing, but there is a warning.
## [1] 1 2
How do you explain this?
## [1] 1 2 4 5 7 8 10
Another example of the recycling rule.