ST 710 HW 6

Uploading Libraries

library(UsingR)

1.10

The three variables are Tree, age, and circumference.

colnames(Orange)

## [1] "Tree"          "age"           "circumference"

1.11

The average age of the trees in the Orange dataset is 922.1429 years old.

mean(Orange$age)

## [1] 922.1429

1.12

The largest circumference of the trees is 214.

max(Orange$circumference)

## [1] 214

2.4

rep("a", times=5)           #Sequence 1

## [1] "a" "a" "a" "a" "a"

seq(1, 100, by = 2)         #Sequence 2

##  [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
## [26] 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99

rep(1:3, each = 3)          #Sequence 3

## [1] 1 1 1 2 2 2 3 3 3

rep(1:3, times = c(3, 2, 1))        #Sequence 4

## [1] 1 1 1 2 2 3

c(1:5, 4:1)             #Sequence 5

## [1] 1 2 3 4 5 4 3 2 1

2.20

The mean for the months containing 31 days is lower than the mean for months not containing 31 days.

cd <- c(79, 74, 161, 127, 133, 210, 99, 143, 249, 249, 368, 302)
names(cd) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
thirtyone <- cd[c(1, 3, 5, 7, 8, 10, 12)]
notthirtyone <- cd[c(2, 4, 6, 9, 11)]
mean(thirtyone)

## [1] 166.5714

mean(notthirtyone)

## [1] 205.6

2.21

In 1995, the amount dropped from the previous year. 1991 has the biggest percentage increase.

salary <- c(0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72)
names(salary) <- c(1990:1999)
diff(salary)

##  1991  1992  1993  1994  1995  1996  1997  1998  1999 
##  0.32  0.19  0.04  0.06 -0.11  0.10  0.21  0.06  0.28

diff(salary)/salary[-length(salary)] * 100

##      1991      1992      1993      1994      1995      1996      1997      1998 
## 56.140351 21.348315  3.703704  5.357143 -9.322034  9.345794 17.948718  4.347826 
##      1999 
## 19.444444

2.23

f <- function(x) mean(x^2) - mean(x)^2
f(1:10)

## [1] 8.25

2.42a

58.15% are less than 500 miles long.

sum(rivers<500)/length(rivers)

## [1] 0.5815603

2.42b

66.66% are less than the mean length.

sum(rivers<(mean(rivers)))/length(rivers)

## [1] 0.6666667

2.42c

The 75 quantile is 680.

summary(rivers)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   135.0   310.0   425.0   591.2   680.0  3710.0

2.44

The mean is much higher than both the median and the 25% trimmed mean.

mean(rivers)

## [1] 591.1844

median(rivers)

## [1] 425

mean(rivers, trim=0.25)

## [1] 449.9155

2.47

zscore <- scale(rivers)
mean(zscore)

## [1] -5.006707e-17

sd(zscore)

## [1] 1

2.47 Histogram and Boxplot

The data is skewed to the right as it has a long tail to the right. It is unimodal because there is only one peak. There are outliers, all occurring above around 1,200. The furthest outlier occurs around 3700.

hist(rivers, freq = FALSE, xlab = "River Length(miles)", main = "Rivers Length Histogram" )
lines(density(rivers))

boxplot(rivers, main = "Boxplot of Rivers Dataset", col = "blue", horizontal = TRUE)

2.62

The summary function for factors returns a count of the occurences for each level.

summary(Cars93$Cylinders)

##      3      4      5      6      8 rotary 
##      3     49      2     31      7      1

2.64 Bargraph

barplot(table(Cars93$Cylinders), 
        main = "Distribution of Cylinders in Cars93 Dataset",
        xlab = "Number of Cylinders", ylab = "Count",
        col = "blue", border = "black")

ST 710 HW 6

2025-03-18

Uploading Libraries

1.10

The three variables are Tree, age, and circumference.

1.11

The average age of the trees in the Orange dataset is 922.1429 years old.

1.12

The largest circumference of the trees is 214.

2.4

2.20

The mean for the months containing 31 days is lower than the mean for months not containing 31 days.

2.21

In 1995, the amount dropped from the previous year. 1991 has the biggest percentage increase.

2.23

2.42a

58.15% are less than 500 miles long.

2.42b

66.66% are less than the mean length.

2.42c

The 75 quantile is 680.

2.44

The mean is much higher than both the median and the 25% trimmed mean.

2.47

2.47 Histogram and Boxplot

The data is skewed to the right as it has a long tail to the right. It is unimodal because there is only one peak. There are outliers, all occurring above around 1,200. The furthest outlier occurs around 3700.

2.62

The summary function for factors returns a count of the occurences for each level.

2.64 Bargraph