S2101 Lab Revision Exercise

Question on Descriptive Statistics

Textbook, Page 143, Question 11

Multiple Births The numbers of various multiple births in the United States for the past 10 years are listed. Find the mean, median, range, variance, and standard deviation of the data sets. Find the value that corresponds to the \(92^{th}\) percentile? Find the \(1^{st}\), \(2^{nd}\) and \(3^{rd}\) quartiles. Which set of data is the most variable?

Triplets	quadruplets	Quintuplets
5877 7110 5937 6898 6118 6885 6208 6742 6750 6742	345 468 369 434 355 501 418 506 439 512	46 85 91 69 67 85 68 77 86 67

Solution

triplets    <-  c(5877, 7110,   5937,  6898, 6118, 6885, 6208, 6742, 6750,6742)
quadruplets <-  c(345, 468, 369, 434, 355, 501,  418, 506, 439,512)
quintuplets <-  c(46, 85, 91, 69, 67, 85, 68, 77, 86, 67)

# Triplets Descriptive Statistics

# mean(triplets)
# median(triplets)
# range(triplets)
# var(triplets)
# sd(triplets)
# scale(triplets)
quantile(triplets, 0.92)

##     92% 
## 6957.36

# quantile(triplets, c(0.25, 0.50, 0.75))
# fivenum(triplets)
# IQR(triplets)

# Quadruplets Descriptive Statistics

# mean(quadruplets)
# median(quadruplets)
# range(quadruplets)
# var(quadruplets)
# sd(quadruplets)
# scale(quadruplets)
quantile(quadruplets, 0.92)

##    92% 
## 507.68

# quantile(quadruplets, c(0.25, 0.50, 0.75))
# fivenum(quadruplets)
# IQR(quadruplets)

# Quintuplets Descriptive Statistics

# mean(quintuplets)
# median(quintuplets)
# range(quintuplets)
# var(quintuplets)
# sd(quintuplets)
# scale(quintuplets)
quantile(quintuplets, 0.92)

##  92% 
## 87.4

# quantile(quintuplets, c(0.25, 0.50, 0.75))
# fivenum(quintuplets)
# IQR(quintuplets)

triplets.sum <- summary(triplets)
quadruplets.sum  <- summary(quadruplets)
quintuplets.sum  <- summary(quintuplets)

summ  <- cbind(triplets.sum, quadruplets.sum, quintuplets.sum)
summ

##         triplets.sum quadruplets.sum quintuplets.sum
## Min.         5877.00          345.00           46.00
## 1st Qu.      6140.50          381.25           67.25
## Median       6742.00          436.50           73.00
## Mean         6526.70          434.70           74.10
## 3rd Qu.      6851.25          492.75           85.00
## Max.         7110.00          512.00           91.00

# Most variable Data, we need to calculate the Coefficients of Variation

triplets.cv  <- sd(triplets)/mean(triplets)
quadruplets.cv  <- sd(quadruplets)/mean(quadruplets)
quintuplets.cv  <- sd(quintuplets)/mean(quintuplets)

cvsum  <- cbind(triplets.cv, quadruplets.cv, quintuplets.cv)
cvsum

##      triplets.cv quadruplets.cv quintuplets.cv
## [1,]  0.06828257      0.1446333      0.1814433

Therefore the data for Quintuplets is more variable

Question Tabulation

Refer to 13, page 52

Favorite Coffee Flavor A survey was taken asking the favorite flavor of a coffee drink a person prefers. The responses were V = Vanilla, C = Caramel, M = Mocha, H = Hazelnut, and P = Plain. Construct a categorical frequency distribution for the data. Which class has the most data values and which class has the fewest data values?

Solution

coffee <- c("V","C", "P", "P", "M", "M", "P", "P", "M", "C",
"M", "M", "V", "M", "M", "M", "V", "M", "M", "M",
"P", "V", "C", "M", "V", "M", "C", "P", "M", "P",
"M", "M", "M", "P", "M", "M", "C", "V", "M", "C",
"C", "P", "M", "P", "M", "H", "H", "P", "H", "P")

FreqTable <- transform(table(coffee))
FreqTable$RelFreq <- prop.table(FreqTable$Freq)*100
FreqTable$CumFreq <- cumsum(FreqTable$Freq)
FreqTable

##   coffee Freq RelFreq CumFreq
## 1      C    7      14       7
## 2      H    3       6      10
## 3      M   22      44      32
## 4      P   12      24      44
## 5      V    6      12      50

Mocha (M) has the most data values \((44\%)\)
Hazelnut (H) has the fewest data values \((6\%)\)

Question Graphical

Refer to 18, page 52

Stories in the World’s Tallest Buildings: The number of stories in each of a sample of the world’s 30 tallest buildings follows. Construct a histogram and a box plot.

Solution

stories <- c(88,88, 110, 88, 80, 69, 102, 78, 70, 55,
79, 85, 80, 100, 60, 90, 77, 55, 75, 55,
54, 60, 75, 64, 105, 56, 71, 70, 65, 72)

stories1 <- c(88,88, 110, 88, 80, 69, 102, 78, 70, 55,
79, 85, 80, 100, 60, 90, 77, 55, 75, 55,
54, 60, 75, 64, 105, 56, 71, 70, 65, 72)

hs <- hist(stories,
     xlab = "Stories per building",
     ylab = "Frequency",
     col  = "cadetblue4",
     main = "Number of Stories for 30 Tallest Buildings")

bp <- boxplot(stories, stories1,
        horizontal = TRUE,
        col  = "blue",
        xlab = "Number of stories",
        main = "Number of Stories for 30 Tallest Buildings")

Question CI and Hypothesis ( Page 437, Q.24)

A motorist claims that the South Boro Police issue an average of 60 speeding tickets per day. These data show the number of speeding tickets issued each day for a randomly selected period of 30 days. Assume \(\sigma= 13.42\). Is there enough evidence to reject the motorist’s claim at \(\alpha = 0.05\)? Use the P-value method.

Solution

speeding <- c(72, 45, 36, 68, 69, 71, 57, 60,
83, 26, 60, 72, 58, 87, 48, 59,
60, 56, 64, 68, 42, 57, 57,
58, 63, 49, 73, 75, 42, 63)

sigma  <- 13.42
mu     <- 60
n      <- length(speeding)

# Confidence Interval
x_bar <- mean(speeding)
z     <- qnorm(0.05, lower.tail = FALSE)
E     <- z*sigma/sqrt(n)
LL    <- x_bar-E
UL    <- x_bar+E
CL    <- cbind(LL, UL)
CL

##           LL       UL
## [1,] 55.9032 63.96346

# Hypothesis Testing

z_test <- (mean(speeding)-mu)/(sigma/sqrt(n))

p_value <- 2*pnorm(abs(z_test), lower.tail = FALSE)
p_value

## [1] 0.9782928

Question Correlation and Regression

Refer to page 547

Is there a linear relationship between the monthly average temperatures and the number of homicides committed during the month?
If so, how strong is the relationship between the average monthly temperature and the number of homicides committed?
If a relationship exists, can it be said that an increase in temperatures will cause an increase in the number of homicides occurring in that city?

Solution

month       <- c("January", "February", "March", "April", "May", "June", "July", "August", "SEptember", "October", "November", "December")
temperature <- c(32,38,47,59, 70,80,84,83,76,64,49,37)
homicides   <- c(32,20,35,35, 49,49,53,56,62,29,36,32)
data        <- data.frame(month,temperature, homicides)
attach(data)

## The following objects are masked _by_ .GlobalEnv:
## 
##     homicides, month, temperature

cor(temperature, homicides)

## [1] 0.8357474

cor.test(temperature, homicides)

## 
##  Pearson's product-moment correlation
## 
## data:  temperature and homicides
## t = 4.813, df = 10, p-value = 0.0007097
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5031981 0.9526994
## sample estimates:
##       cor 
## 0.8357474

model <- lm(homicides ~ temperature)


plot(temperature, homicides)

abline(model)

Question Probability

A vending machine automatically pours soft drinks into cups. The amount of soft drink dispensed into a cup is normally distributed with mean of 7.6 oz and standard deviation of 0.4 oz.

What is the probability that the machine will overflow an 8 oz cup?
What is the probability that the amount dispensed by the machine is between 7.2 and 8.0 oz?
What is the probability the average amount dispensed in a random sample of 9 cups is less than 7.4 oz?
Use normal approximation to binomial to find the probability that in a random sample of 19 cups 5 or more will not overflow an 8 oz cup?

Solution

Let X=Soft drink dispensed

\[\mu=7.6,~~ \sigma=0.4\] (a) What is the probability that the machine will overflow an 8 oz cup?

By Hand:

\[P(X>8)=1-P(X<8)=1-P\left(Z<\frac{8-7.6}{0.4}\right)=1-P(Z<1)=0.1586\]

By R:

a <- pnorm(8, 7.6, 0.4, lower.tail = FALSE) 

cat(" the probability that the machine will overflow an 8 oz cup is: ", a)

##  the probability that the machine will overflow an 8 oz cup is:  0.1586553

What is the probability that the amount dispensed by the machine is between 7.2 and 8.0 oz?

By Hand:

\[P(7.2<X<8.0)=P(X<8)-P(X<7.2)=P\left(Z<\frac{8-7.6}{0.4}\right)-P\left(Z<\frac{7.2-7.6}{0.4}\right)=P(Z<1)-P(Z<-1)=0.6827\]

By R:

b <- pnorm(8, 7.6, 0.4, lower.tail = TRUE)-pnorm(7.2, 7.6, 0.4, lower.tail = TRUE) 

cat(" the probability that the amount dispensed by the machine is between 7.2 and 8.0 oz is: ", b)

##  the probability that the amount dispensed by the machine is between 7.2 and 8.0 oz is:  0.6826895

What is the probability the average amount dispensed in a random sample of 9 cups is less than 7.4 oz?

By Hand:

\[n=9,~~ \sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}}=0.1333 \longrightarrow P(\bar{X}<7.4) = P(Z<-1.5) = 0.0668\] By R:

c <- pnorm(7.4, 7.6, 0.1333, lower.tail = TRUE) 

cat(" the probability the average amount dispensed in a random sample of 9 cups is less than 7.4 oz is: ", c)

##  the probability the average amount dispensed in a random sample of 9 cups is less than 7.4 oz is:  0.06675863

Use normal approximation to binomial to find the probability that in a random sample of 19 cups 5 or more will not overflow an 8 oz cup?

By Hand:

\[n=19, p=0.8414,~~ \longrightarrow \mu=np=15.9866, ~~ \sigma = \sqrt{npq}=1.5923\] \[P(X\ge5)=1-P\left(Z < \frac{5-15.9866}{1.5923}\right)=1-P(Z<-6.9)=1\] By R:

d <- pnorm(5, 15.9866, 1.5923, lower.tail = FALSE) 

cat(" the probability that in a random sample of 19 cups 5 or more will not overflow an 8 oz cup is: ", d)

##  the probability that in a random sample of 19 cups 5 or more will not overflow an 8 oz cup is:  1

S2101 Lab Revision Exercise

Introduction to Statistics (SP2023)

2023-05-07

Question on Descriptive Statistics

Solution

Question Tabulation

Solution

Question Graphical

Solution

Question CI and Hypothesis ( Page 437, Q.24)

Solution

Question Correlation and Regression

Solution

Question Probability

Solution