The article “Study on the Life Distribution of Microdrills” (J. of Engr. Manufacture, 2002: 301–305) reported the following observations, listed in increasing order, on drill lifetime (number of holes that a drill machines before it breaks) when holes were drilled in a certain brass alloy:
11 14 20 23 31 36 39 44 47 50 59 61 65 67 68 71 74 76 78 79 81 84 85 89 91 93 96 99 101 104 105 105 112 118 123 136 139 141 148 158 161 168 184 206 248 263 289 322 388 513
Why can a frequency distribution not be based on the class intervals 0–50, 50–100, 100–150, and so on?
Construct a frequency distribution and histogram of the data using class boundaries 0, 50, 100, …, and then comment on interesting characteristics.
Construct a frequency distribution and histogram of the natural logarithms of the lifetime observations, and comment on interesting characteristics.
What proportion of the lifetime observations in this sample are less than 100? What proportion of the observations are at least 200?
The accompanying data set consists of observations on shear strength (lb) of ultrasonic spot welds made on a certain type of alclad sheet.
5434 4948 4521 4570 4990 5702 5241 5112 5015 4659 4806 4637 5670 4381 4820 5043 4886 4599 5288 5299 4848 5378 5260 5055 5828 5218 4859 4780 5027 5008 4609 4772 5133 5095 4618 4848 5089 5518 5333 5164 5342 5069 4755 4925 5001 4803 4951 5679 5256 5207 5621 4918 5138 4786 4500 5461 5049 4974 4592 4173 5296 4965 5170 4740 5173 4568 5653 5078 4900 4968 5248 5245 4723 5275 5419 5205 4452 5227 5555 5388 5498 4681 5076 4774 4931 4493 5309 5582 4308 4823 4417 5364 5640 5069 5188 5764 5273 5042 5189 4986
Construct a relative frequency histogram based on ten equal-width classes with boundaries 4000, 4200, …. [The histogram will agree with the one in “Comparison of Properties of Joints Prepared by Ultrasonic Welding and Other Means” (J. of Aircraft, 1983: 552–556).] Comment on its features.
The cumulative frequency and cumulative relative frequency for a particular class interval are the sum of frequencies and relative frequencies, respectively, for that interval and all intervals lying below it. If, for example, there are four intervals with frequencies 9, 16, 13, and 12, then the cumulative frequencies are 9, 25, 38, and 50, and the cumulative relative frequencies are 0.18, 0.50, 0.76, and 1.00. Compute the cumulative frequencies and cumulative relative frequencies for the data.
The accompanying data set consists of observations on shower-flow rate (L/min) for a sample of \(n=129\) houses in Perth, Australia (“An Application of Bayes Methodology to the Analysis of Diary Records in a Water Use Study”, J. Amer. Stat. Assoc., 1987: 705–711):
4.6 12.3 7.1 7.0 4.0 9.2 6.7 6.9 11.5 5.1 11.2 10.5 14.3 8.0 8.8 6.4 5.1 5.6 9.6 7.5 7.5 6.2 5.8 2.3 3.4 10.4 9.8 6.6 3.7 6.4 8.3 6.5 7.6 9.3 9.2 7.3 5.0 6.3 13.8 6.2 5.4 4.8 7.5 6.0 6.9 10.8 7.5 6.6 5.0 3.3 7.6 3.9 11.9 2.2 15.0 7.2 6.1 15.3 18.9 7.2 5.4 5.5 4.3 9.0 12.7 11.3 7.4 5.0 3.5 8.2 8.4 7.3 10.3 11.9 6.0 5.6 9.5 9.3 10.4 9.7 5.1 6.7 10.2 6.2 8.4 7.0 4.8 5.6 10.5 14.6 10.8 15.5 7.5 6.4 3.4 5.5 6.6 5.9 15.0 9.6 7.8 7.0 6.9 4.1 3.6 11.9 3.7 5.7 6.8 11.3 9.3 9.6 10.4 9.3 6.9 9.8 9.1 10.6 4.5 6.2 8.3 3.2 4.9 5.0 6.0 8.2 6.3 3.8 6.0
Construct a stem-and-leaf display of the data.
What is a typical, or representative, flow rate?
Does the display appear to be highly concentrated or spread out?
Does the distribution of values appear to be reasonably symmetric? If not, how would you describe the departure from symmetry?
Would you describe any observation as being far from the rest of the data (an outlier)?
The accompanying specific gravity values for various wood types used in construction appeared in the article “Bolted Connection Design Values Based on European Yield Model” (J. of Structural Engr., 1993: 2169–2186):
0.31 0.35 0.36 0.36 0.37 0.38 0.40 0.40 0.40 0.41 0.41 0.42 0.42 0.42 0.42 0.42 0.43 0.44 0.45 0.46 0.46 0.47 0.48 0.48 0.48 0.51 0.54 0.54 0.55 0.58 0.62 0.66 0.66 0.67 0.68 0.75
Construct a stem-and-leaf display using repeated stems, and comment on any interesting features of the display.
# The given data
data <- c(11, 14, 20, 23, 31, 36, 39, 44, 47, 50,
59, 61, 65, 67, 68, 71, 74, 76, 78, 79,
81, 84, 85, 89, 91, 93, 96, 99, 101, 104,
105, 105, 112, 118, 123, 136, 139, 141, 148, 158,
161, 168, 184, 206, 248, 263, 289, 322, 388, 513)
head(data)
## [1] 11 14 20 23 31 36
# We create a frequency table
breaks <- seq(0, 550, by=50)
data_cut <- cut(data, breaks, right=FALSE) #class intervals
freq_table <- table(data_cut) #frequency table
freq_df <- as.data.frame(freq_table) #create a dataframe for the frequency table
colnames(freq_df) <- c("Class Interval", "Frequency") #name the columns
freq_df$Relative_Frequency <- freq_df$Frequency / length(data) #relative frequency
print(freq_df) #print to view
## Class Interval Frequency Relative_Frequency
## 1 [0,50) 9 0.18
## 2 [50,100) 19 0.38
## 3 [100,150) 11 0.22
## 4 [150,200) 4 0.08
## 5 [200,250) 2 0.04
## 6 [250,300) 2 0.04
## 7 [300,350) 1 0.02
## 8 [350,400) 1 0.02
## 9 [400,450) 0 0.00
## 10 [450,500) 0 0.00
## 11 [500,550) 1 0.02
# Plot histogram
hist(data, breaks=breaks, right=FALSE, col="skyblue",
main="Histogram of Drill Lifetime Observations",
xlab="Drill Lifetime (number of holes)",
ylab="Frequency")
# Natural logarithm of the observations
log_data <- log(data)
round(head(log_data), 2) # rounding to two decimal places
## [1] 2.40 2.64 3.00 3.14 3.43 3.58
# We create a frequency table for the log-transformed data
log_breaks <- seq(2.25, 6.25, by=0.5)
log_data_cut <- cut(log_data, log_breaks, right=FALSE) # class intervals
log_freq_table <- table(log_data_cut) # frequency table
log_freq_df <- as.data.frame(log_freq_table) # create a dataframe for the frequency table
colnames(log_freq_df) <- c("Class Interval (ln)", "Frequency") # name the columns
log_freq_df$Relative_Frequency <- log_freq_df$Frequency / length(log_data) # relative frequency
print(log_freq_df) # print to view
## Class Interval (ln) Frequency Relative_Frequency
## 1 [2.25,2.75) 2 0.04
## 2 [2.75,3.25) 2 0.04
## 3 [3.25,3.75) 3 0.06
## 4 [3.75,4.25) 8 0.16
## 5 [4.25,4.75) 18 0.36
## 6 [4.75,5.25) 10 0.20
## 7 [5.25,5.75) 4 0.08
## 8 [5.75,6.25) 3 0.06
# Plot histogram of the log-transformed data
hist(log_data, breaks=log_breaks, right=FALSE, col="skyblue",
main="Histogram of Natural Logarithms of Drill Lifetime Observations",
xlab="Natural Logarithm of Drill Lifetime",
ylab="Frequency")
# The given data
data <- c(5434, 4948, 4521, 4570, 4990, 5702, 5241,
5112, 5015, 4659, 4806, 4637, 5670, 4381,
4820, 5043, 4886, 4599, 5288, 5299, 4848,
5378, 5260, 5055, 5828, 5218, 4859, 4780,
5027, 5008, 4609, 4772, 5133, 5095, 4618,
4848, 5089, 5518, 5333, 5164, 5342, 5069,
4755, 4925, 5001, 4803, 4951, 5679, 5256,
5207, 5621, 4918, 5138, 4786, 4500, 5461,
5049, 4974, 4592, 4173, 5296, 4965, 5170,
4740, 5173, 4568, 5653, 5078, 4900, 4968,
5248, 5245, 4723, 5275, 5419, 5205, 4452,
5227, 5555, 5388, 5498, 4681, 5076, 4774,
4931, 4493, 5309, 5582, 4308, 4823, 4417,
5364, 5640, 5069, 5188, 5764, 5273, 5042,
5189, 4986)
head(data)
## [1] 5434 4948 4521 4570 4990 5702
# We combine part (a) and (b): We Compute cumulative frequencies and cumulative relative frequencies
breaks <- seq(4000, 6000, by=200)
hist_data <- hist(data, breaks=breaks, right=FALSE, plot=FALSE)
relative_freq <- hist_data$counts / length(data)
cumulative_freq <- cumsum(hist_data$counts)
cumulative_relative_freq <- cumsum(relative_freq)
freq_table <- data.frame(
"Class Interval" = hist_data$breaks[-length(hist_data$breaks)],
"Frequency" = hist_data$counts,
"Relative Frequency" = relative_freq,
"Cumulative Frequency" = cumulative_freq,
"Cumulative Relative Frequency" = cumulative_relative_freq
)
print(freq_table)
## Class.Interval Frequency Relative.Frequency Cumulative.Frequency
## 1 4000 1 0.01 1
## 2 4200 2 0.02 3
## 3 4400 9 0.09 12
## 4 4600 12 0.12 24
## 5 4800 19 0.19 43
## 6 5000 22 0.22 65
## 7 5200 20 0.20 85
## 8 5400 7 0.07 92
## 9 5600 7 0.07 99
## 10 5800 1 0.01 100
## Cumulative.Relative.Frequency
## 1 0.01
## 2 0.03
## 3 0.12
## 4 0.24
## 5 0.43
## 6 0.65
## 7 0.85
## 8 0.92
## 9 0.99
## 10 1.00
hist(data, breaks=breaks, right=FALSE, freq=FALSE, col="skyblue",
main="Relative Frequency Histogram of Shear Strength",
xlab="Shear Strength (lb)", ylab="Relative Frequency")
# The given data
data <- c(4.6, 12.3, 7.1, 7.0, 4.0, 9.2, 6.7, 6.9, 11.5, 5.1,
11.2, 10.5, 14.3, 8.0, 8.8, 6.4, 5.1, 5.6, 9.6, 7.5,
7.5, 6.2, 5.8, 2.3, 3.4, 10.4, 9.8, 6.6, 3.7, 6.4,
8.3, 6.5, 7.6, 9.3, 9.2, 7.3, 5.0, 6.3, 13.8, 6.2,
5.4, 4.8, 7.5, 6.0, 6.9, 10.8, 7.5, 6.6, 5.0, 3.3,
7.6, 3.9, 11.9, 2.2, 15.0, 7.2, 6.1, 15.3, 18.9, 7.2,
5.4, 5.5, 4.3, 9.0, 12.7, 11.3, 7.4, 5.0, 3.5, 8.2,
8.4, 7.3, 10.3, 11.9, 6.0, 5.6, 9.5, 9.3, 10.4, 9.7,
5.1, 6.7, 10.2, 6.2, 8.4, 7.0, 4.8, 5.6, 10.5, 14.6,
10.8, 15.5, 7.5, 6.4, 3.4, 5.5, 6.6, 5.9, 15.0, 9.6,
7.8, 7.0, 6.9, 4.1, 3.6, 11.9, 3.7, 5.7, 6.8, 11.3,
9.3, 9.6, 10.4, 9.3, 6.9, 9.8, 9.1, 10.6, 4.5, 6.2,
8.3, 3.2, 4.9, 5.0, 6.0, 8.2, 6.3, 3.8, 6.0)
head(data)
## [1] 4.6 12.3 7.1 7.0 4.0 9.2
# Part (a): Construct a stem-and-leaf
stem(data)
##
## The decimal point is at the |
##
## 2 | 23
## 3 | 2344567789
## 4 | 01356889
## 5 | 00001114455666789
## 6 | 0000122223344456667789999
## 7 | 00012233455555668
## 8 | 02233448
## 9 | 012233335666788
## 10 | 2344455688
## 11 | 2335999
## 12 | 37
## 13 | 8
## 14 | 36
## 15 | 0035
## 16 |
## 17 |
## 18 | 9
# Part (b): Calculate the typical (representative) flow rate
typical_flow_rate <- median(data)
cat("\nTypical (Representative) Flow Rate:", typical_flow_rate, "\n")
##
## Typical (Representative) Flow Rate: 7
# Part (c): Determine if the display appears to be highly concentrated or spread out
spread <- sd(data)
cat("\nSpread (Standard Deviation):", spread, "\n")
##
## Spread (Standard Deviation): 3.076844
# Part (d): Determine if the distribution of values appears to be reasonably symmetric
skewness <- e1071::skewness(data)
cat("\nSkewness:", skewness, "\n")
##
## Skewness: 0.8647805
# Part (e): Identify any observation that is far from the rest of the data (an outlier)
Q1 <- quantile(data, 0.25)
Q3 <- quantile(data, 0.75)
IQR <- Q3 - Q1
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
outliers <- data[data < lower_bound | data > upper_bound]
cat("\nOutliers:", outliers, "\n")
##
## Outliers: 18.9
# Given data
data <- c(0.31, 0.35, 0.36, 0.36, 0.37, 0.38, 0.40, 0.40, 0.40,
0.41, 0.41, 0.42, 0.42, 0.42, 0.42, 0.42, 0.43, 0.44,
0.45, 0.46, 0.46, 0.47, 0.48, 0.48, 0.48, 0.51, 0.54,
0.54, 0.55, 0.58, 0.62, 0.66, 0.66, 0.67, 0.68, 0.75)
# Construct stem-and-leaf display using repeated stems
stem(data, scale = 1)
##
## The decimal point is 1 digit(s) to the left of the |
##
## 3 | 1
## 3 | 56678
## 4 | 000112222234
## 4 | 5667888
## 5 | 144
## 5 | 58
## 6 | 2
## 6 | 6678
## 7 |
## 7 | 5