Uploading Libraries

library(UsingR)

1.10

The three variables are Tree, age, and circumference.

colnames(Orange)
## [1] "Tree"          "age"           "circumference"

1.11

The average age of the trees in the Orange dataset is 922.1429 years old.

mean(Orange$age)
## [1] 922.1429

1.12

The largest circumference of the trees is 214.

max(Orange$circumference)
## [1] 214

2.4

rep("a", times=5)           #Sequence 1
## [1] "a" "a" "a" "a" "a"
seq(1, 100, by = 2)         #Sequence 2
##  [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
## [26] 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99
rep(1:3, each = 3)          #Sequence 3
## [1] 1 1 1 2 2 2 3 3 3
rep(1:3, times = c(3, 2, 1))        #Sequence 4
## [1] 1 1 1 2 2 3
c(1:5, 4:1)             #Sequence 5
## [1] 1 2 3 4 5 4 3 2 1

2.20

The mean for the months containing 31 days is lower than the mean for months not containing 31 days.

cd <- c(79, 74, 161, 127, 133, 210, 99, 143, 249, 249, 368, 302)
names(cd) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
thirtyone <- cd[c(1, 3, 5, 7, 8, 10, 12)]
notthirtyone <- cd[c(2, 4, 6, 9, 11)]
mean(thirtyone)
## [1] 166.5714
mean(notthirtyone)
## [1] 205.6

2.21

In 1995, the amount dropped from the previous year. 1991 has the biggest percentage increase.

salary <- c(0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72)
names(salary) <- c(1990:1999)
diff(salary)
##  1991  1992  1993  1994  1995  1996  1997  1998  1999 
##  0.32  0.19  0.04  0.06 -0.11  0.10  0.21  0.06  0.28
diff(salary)/salary[-length(salary)] * 100
##      1991      1992      1993      1994      1995      1996      1997      1998 
## 56.140351 21.348315  3.703704  5.357143 -9.322034  9.345794 17.948718  4.347826 
##      1999 
## 19.444444

2.23

f <- function(x) mean(x^2) - mean(x)^2
f(1:10)
## [1] 8.25

2.42a

58.15% are less than 500 miles long.

sum(rivers<500)/length(rivers)
## [1] 0.5815603

2.42b

66.66% are less than the mean length.

sum(rivers<(mean(rivers)))/length(rivers)
## [1] 0.6666667

2.42c

The 75 quantile is 680.

summary(rivers)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   135.0   310.0   425.0   591.2   680.0  3710.0

2.44

The mean is much higher than both the median and the 25% trimmed mean.

mean(rivers)
## [1] 591.1844
median(rivers)
## [1] 425
mean(rivers, trim=0.25)
## [1] 449.9155

2.47

zscore <- scale(rivers)
mean(zscore)
## [1] -5.006707e-17
sd(zscore)
## [1] 1

2.47 Histogram and Boxplot

The data is skewed to the right as it has a long tail to the right. It is unimodal because there is only one peak. There are outliers, all occurring above around 1,200. The furthest outlier occurs around 3700.

hist(rivers, freq = FALSE, xlab = "River Length(miles)", main = "Rivers Length Histogram" )
lines(density(rivers))

boxplot(rivers, main = "Boxplot of Rivers Dataset", col = "blue", horizontal = TRUE)

2.62

The summary function for factors returns a count of the occurences for each level.

summary(Cars93$Cylinders)
##      3      4      5      6      8 rotary 
##      3     49      2     31      7      1

2.64 Bargraph

barplot(table(Cars93$Cylinders), 
        main = "Distribution of Cylinders in Cars93 Dataset",
        xlab = "Number of Cylinders", ylab = "Count",
        col = "blue", border = "black")