1.20 The built-in data set islands contains the size of the world’s land masses that exceed 10,000 square miles. Use sort() with the argument decreasing=TRUE to find the seven largest land masses.
data("islands")
sort(islands, decreasing=TRUE)[1:7]
## Asia Africa North America South America Antarctica
## 16988 11506 9390 6795 5500
## Europe Australia
## 3745 2968
1.21 Load the data set primes (UsingR). This is the set of prime numbers in [1,2003]. How many are there? How many in the range [1,100]? [100,1000]?
# Load the related packages
library(MASS)
library(HistData)
library(lattice)
library(survival)
library(Formula)
library(ggplot2)
library(Hmisc)
library(UsingR)
data("primes")
# How many elements in this data set
length(primes)
## [1] 304
# How many elements in the range [1,100]
length(primes[primes <= 100])
## [1] 25
# How many elements in the range [100,1000]
length(primes[primes>=100 & primes <= 1000])
## [1] 143
1.22 Load the data set primes (UsingR). We wish to find all the twin primes. These are numbers \(p\) and \(p+2\), where both are prime.
primes[-1] returns.primes.head(primes[-1])
## [1] 3 5 7 11 13 17
n=length(primes), explain what primes[-n] returns.primes.n <- length(primes)
tail(primes[-n])
## [1] 1973 1979 1987 1993 1997 1999
primes[-1]-primes[-n] give clues as to what the twin primes are? How many twin primes are there in the data set?primes[-1]-primes[-n] give us a vector including the difference between two neighbour prime numbers, then we can see how many elements are equal to 2 in this vector, which implies how many twin primes here.sum(primes[-1]-primes[-n]==2)
## [1] 61
[-1] and [-n].new_primes <- primes+2
sum(primes[-1]==new_primes[-n])
## [1] 61
1.23 For the data set treering, which contains tree-ring widths in dimension-less units, use an R function to answer the following:
data("treering")
length(treering)
## [1] 7980
min(treering)
## [1] 0
max(treering)
## [1] 1.908
sum(treering>1.5)
## [1] 219
1.24 The data set mandms (UsingR) contains the targeted color distribution in a bag of M&Ms as percentages for varies types of packaging. Answer these questions.
# load the data set
data("mandms")
# find the pacaging whose colors with one missing (value is 0)
names(which(rowSums(mandms==0)==1))
## [1] "Peanut Butter"
# an alternative way using subset function
rownames(subset(mandms,mandms[,1]==0|mandms[,2]==0|
mandms[,3]==0|mandms[,4]==0|
mandms[,5]==0|mandms[,6]==0 ))
## [1] "Peanut Butter"
# find the pacaging whose colors proportion with the same value
names(which(rowSums(mandms==rowMeans(mandms))==6))
## [1] "Almond" "kid minis"
# an alternative way using subset function
rownames(subset(mandms,mandms[,1]==mandms[,2] &
mandms[,2]==mandms[,3] &
mandms[,3]==mandms[,4] &
mandms[,4]==mandms[,5] &
mandms[,5]==mandms[,6] &
mandms[,6]==mandms[,1]))
## [1] "Almond" "kid minis"
# return the single color is the unique maximum of its row
names(which(rowSums(mandms==max(mandms))==1))
## [1] "milk chocolate"
# return the color name
names(which(colSums(mandms==max(mandms))==1))
## [1] "brown"
# an alternative way using subset function
package <- subset(mandms,
(mandms[,1]>mandms[,2] & mandms[,1]>mandms[,3] & mandms[,1]>mandms[,4] &
mandms[,1]>mandms[,5] & mandms[,1]>mandms[,6])|
(mandms[,2]>mandms[,1] & mandms[,2]>mandms[,3] & mandms[,2]>mandms[,4] &
mandms[,2]>mandms[,5] & mandms[,2]>mandms[,6])|
(mandms[,3]>mandms[,1] & mandms[,3]>mandms[,2] & mandms[,3]>mandms[,4] &
mandms[,3]>mandms[,5] & mandms[,3]>mandms[,6])|
(mandms[,4]>mandms[,1] & mandms[,4]>mandms[,2] & mandms[,4]>mandms[,3] &
mandms[,4]>mandms[,5] & mandms[,4]>mandms[,6])|
(mandms[,5]>mandms[,1] & mandms[,5]>mandms[,2] & mandms[,5]>mandms[,3] &
mandms[,5]>mandms[,4] & mandms[,5]>mandms[,6])|
(mandms[,6]>mandms[,1] & mandms[,6]>mandms[,2] & mandms[,6]>mandms[,3] &
mandms[,6]>mandms[,4] & mandms[,6]>mandms[,5]))
# return the packaging satisfying the condition
rownames(package)
## [1] "milk chocolate"
# return the color name that is more likely than all the others
names(which.max(package))
## [1] "brown"
1.25 The t imes variable in the data set nym. 2002 (UsingR) contains the time to finish for several participants in the 2002 New York City Marathon. Answer these questions.
data("nym.2002")
length(nym.2002$time)
## [1] 1000
fastest <- min(nym.2002$time)
# return the fastest time in minutes
fastest
## [1] 147.3333
# convert the value into format 'Hours : Minutes'
paste(fastest %/% 60, round(fastest %% 60),sep=":")
## [1] "2:27"
slowest <- max(nym.2002$time)
# return the slowest time in minutes
slowest
## [1] 566.7833
# convert the value into format 'Hours : Minutes'
paste(slowest %/% 60, round(slowest %% 60), sep=":")
## [1] "9:27"
1.26 For the data set rivers, which is the longest river? The shortest?
# load the data set
data("rivers")
# return the longest river in miles
max(rivers)
## [1] 3710
# return the shortest river in miles
min(rivers)
## [1] 135
1.27 The data set uspop contains decade-by-decade population figures for the United States from 1790 to 1970.
# load the data set
data("uspop")
# add the year name to the data vector
names(uspop) <- seq(1790,1970,by=10)
# print the data with the corrsponding year
uspop
## Time Series:
## Start = 1790
## End = 1970
## Frequency = 0.1
## 1790 1800 1810 1820 1830 1840 1850 1860 1870 1880
## 3.93 5.31 7.24 9.64 12.90 17.10 23.20 31.40 39.80 50.20
## 1890 1900 1910 1920 1930 1940 1950 1960 1970
## 62.90 76.00 92.00 105.70 122.80 131.70 151.30 179.30 203.20
# assign a variable to compute difference within each decade
delta <- diff(uspop)
# find the decade of the greatest increase
uspop[c(which.max(delta),which.max(delta)+1)]
## 1950 1960
## 151.3 179.3
max(delta)
## [1] 28
TRUE means increase, FALSE means decrease)# return logic value that whether the differences are increase or not
delta > 0
## Time Series:
## Start = 1800
## End = 1970
## Frequency = 0.1
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [15] TRUE TRUE TRUE TRUE