Assignment One Solution

Problem 1

The article “Study on the Life Distribution of Microdrills” (J. of Engr. Manufacture, 2002: 301–305) reported the following observations, listed in increasing order, on drill lifetime (number of holes that a drill machines before it breaks) when holes were drilled in a certain brass alloy:

11 14 20 23 31 36 39 44 47 50 59 61 65 67 68 71 74 76 78 79 81 84 85 89 91 93 96 99 101 104 105 105 112 118 123 136 139 141 148 158 161 168 184 206 248 263 289 322 388 513

Why can a frequency distribution not be based on the class intervals 0–50, 50–100, 100–150, and so on?
Construct a frequency distribution and histogram of the data using class boundaries 0, 50, 100, …, and then comment on interesting characteristics.
Construct a frequency distribution and histogram of the natural logarithms of the lifetime observations, and comment on interesting characteristics.
What proportion of the lifetime observations in this sample are less than 100? What proportion of the observations are at least 200?

Problem 2

The accompanying data set consists of observations on shear strength (lb) of ultrasonic spot welds made on a certain type of alclad sheet.

5434 4948 4521 4570 4990 5702 5241 5112 5015 4659 4806 4637 5670 4381 4820 5043 4886 4599 5288 5299 4848 5378 5260 5055 5828 5218 4859 4780 5027 5008 4609 4772 5133 5095 4618 4848 5089 5518 5333 5164 5342 5069 4755 4925 5001 4803 4951 5679 5256 5207 5621 4918 5138 4786 4500 5461 5049 4974 4592 4173 5296 4965 5170 4740 5173 4568 5653 5078 4900 4968 5248 5245 4723 5275 5419 5205 4452 5227 5555 5388 5498 4681 5076 4774 4931 4493 5309 5582 4308 4823 4417 5364 5640 5069 5188 5764 5273 5042 5189 4986

Construct a relative frequency histogram based on ten equal-width classes with boundaries 4000, 4200, …. [The histogram will agree with the one in “Comparison of Properties of Joints Prepared by Ultrasonic Welding and Other Means” (J. of Aircraft, 1983: 552–556).] Comment on its features.
The cumulative frequency and cumulative relative frequency for a particular class interval are the sum of frequencies and relative frequencies, respectively, for that interval and all intervals lying below it. If, for example, there are four intervals with frequencies 9, 16, 13, and 12, then the cumulative frequencies are 9, 25, 38, and 50, and the cumulative relative frequencies are 0.18, 0.50, 0.76, and 1.00. Compute the cumulative frequencies and cumulative relative frequencies for the data.

Problem 3

The accompanying data set consists of observations on shower-flow rate (L/min) for a sample of \(n=129\) houses in Perth, Australia (“An Application of Bayes Methodology to the Analysis of Diary Records in a Water Use Study”, J. Amer. Stat. Assoc., 1987: 705–711):

4.6 12.3 7.1 7.0 4.0 9.2 6.7 6.9 11.5 5.1 11.2 10.5 14.3 8.0 8.8 6.4 5.1 5.6 9.6 7.5 7.5 6.2 5.8 2.3 3.4 10.4 9.8 6.6 3.7 6.4 8.3 6.5 7.6 9.3 9.2 7.3 5.0 6.3 13.8 6.2 5.4 4.8 7.5 6.0 6.9 10.8 7.5 6.6 5.0 3.3 7.6 3.9 11.9 2.2 15.0 7.2 6.1 15.3 18.9 7.2 5.4 5.5 4.3 9.0 12.7 11.3 7.4 5.0 3.5 8.2 8.4 7.3 10.3 11.9 6.0 5.6 9.5 9.3 10.4 9.7 5.1 6.7 10.2 6.2 8.4 7.0 4.8 5.6 10.5 14.6 10.8 15.5 7.5 6.4 3.4 5.5 6.6 5.9 15.0 9.6 7.8 7.0 6.9 4.1 3.6 11.9 3.7 5.7 6.8 11.3 9.3 9.6 10.4 9.3 6.9 9.8 9.1 10.6 4.5 6.2 8.3 3.2 4.9 5.0 6.0 8.2 6.3 3.8 6.0

Construct a stem-and-leaf display of the data.
What is a typical, or representative, flow rate?
Does the display appear to be highly concentrated or spread out?
Does the distribution of values appear to be reasonably symmetric? If not, how would you describe the departure from symmetry?
Would you describe any observation as being far from the rest of the data (an outlier)?

Problem 4

The accompanying specific gravity values for various wood types used in construction appeared in the article “Bolted Connection Design Values Based on European Yield Model” (J. of Structural Engr., 1993: 2169–2186):

0.31 0.35 0.36 0.36 0.37 0.38 0.40 0.40 0.40 0.41 0.41 0.42 0.42 0.42 0.42 0.42 0.43 0.44 0.45 0.46 0.46 0.47 0.48 0.48 0.48 0.51 0.54 0.54 0.55 0.58 0.62 0.66 0.66 0.67 0.68 0.75

Construct a stem-and-leaf display using repeated stems, and comment on any interesting features of the display.

Problem One

# The given data
data <- c(11, 14, 20, 23, 31, 36, 39, 44, 47, 50,
          59, 61, 65, 67, 68, 71, 74, 76, 78, 79,
          81, 84, 85, 89, 91, 93, 96, 99, 101, 104,
          105, 105, 112, 118, 123, 136, 139, 141, 148, 158,
          161, 168, 184, 206, 248, 263, 289, 322, 388, 513)

head(data)

## [1] 11 14 20 23 31 36

Frequency Table

# We create a frequency table
breaks <- seq(0, 550, by=50)
data_cut <- cut(data, breaks, right=FALSE) #class intervals
freq_table <- table(data_cut) #frequency table
freq_df <- as.data.frame(freq_table) #create a dataframe for the frequency table
colnames(freq_df) <- c("Class Interval", "Frequency") #name the columns
freq_df$Relative_Frequency <- freq_df$Frequency / length(data) #relative frequency
print(freq_df) #print to view

##    Class Interval Frequency Relative_Frequency
## 1          [0,50)         9               0.18
## 2        [50,100)        19               0.38
## 3       [100,150)        11               0.22
## 4       [150,200)         4               0.08
## 5       [200,250)         2               0.04
## 6       [250,300)         2               0.04
## 7       [300,350)         1               0.02
## 8       [350,400)         1               0.02
## 9       [400,450)         0               0.00
## 10      [450,500)         0               0.00
## 11      [500,550)         1               0.02

Histogram

# Plot histogram
hist(data, breaks=breaks, right=FALSE, col="skyblue", 
     main="Histogram of Drill Lifetime Observations",
     xlab="Drill Lifetime (number of holes)",
     ylab="Frequency")

Natural log transformation

# Natural logarithm of the observations
log_data <- log(data)
round(head(log_data), 2) # rounding to two decimal places

## [1] 2.40 2.64 3.00 3.14 3.43 3.58

Frequency Table

# We create a frequency table for the log-transformed data
log_breaks <- seq(2.25, 6.25, by=0.5)
log_data_cut <- cut(log_data, log_breaks, right=FALSE) # class intervals
log_freq_table <- table(log_data_cut) # frequency table
log_freq_df <- as.data.frame(log_freq_table) # create a dataframe for the frequency table
colnames(log_freq_df) <- c("Class Interval (ln)", "Frequency") # name the columns
log_freq_df$Relative_Frequency <- log_freq_df$Frequency / length(log_data) # relative frequency
print(log_freq_df) # print to view

##   Class Interval (ln) Frequency Relative_Frequency
## 1         [2.25,2.75)         2               0.04
## 2         [2.75,3.25)         2               0.04
## 3         [3.25,3.75)         3               0.06
## 4         [3.75,4.25)         8               0.16
## 5         [4.25,4.75)        18               0.36
## 6         [4.75,5.25)        10               0.20
## 7         [5.25,5.75)         4               0.08
## 8         [5.75,6.25)         3               0.06

Histogram of the log transformed

# Plot histogram of the log-transformed data
hist(log_data, breaks=log_breaks, right=FALSE, col="skyblue", 
     main="Histogram of Natural Logarithms of Drill Lifetime Observations",
     xlab="Natural Logarithm of Drill Lifetime",
     ylab="Frequency")

Problem Two

# The given data
data <- c(5434, 4948, 4521, 4570, 4990, 5702, 5241,
          5112, 5015, 4659, 4806, 4637, 5670, 4381,
          4820, 5043, 4886, 4599, 5288, 5299, 4848,
          5378, 5260, 5055, 5828, 5218, 4859, 4780,
          5027, 5008, 4609, 4772, 5133, 5095, 4618,
          4848, 5089, 5518, 5333, 5164, 5342, 5069,
          4755, 4925, 5001, 4803, 4951, 5679, 5256,
          5207, 5621, 4918, 5138, 4786, 4500, 5461,
          5049, 4974, 4592, 4173, 5296, 4965, 5170,
          4740, 5173, 4568, 5653, 5078, 4900, 4968,
          5248, 5245, 4723, 5275, 5419, 5205, 4452,
          5227, 5555, 5388, 5498, 4681, 5076, 4774,
          4931, 4493, 5309, 5582, 4308, 4823, 4417,
          5364, 5640, 5069, 5188, 5764, 5273, 5042,
          5189, 4986)

head(data)

## [1] 5434 4948 4521 4570 4990 5702

Frequency Table

# We combine part (a) and (b): We Compute cumulative frequencies and cumulative relative frequencies
breaks <- seq(4000, 6000, by=200)
hist_data <- hist(data, breaks=breaks, right=FALSE, plot=FALSE)
relative_freq <- hist_data$counts / length(data)
cumulative_freq <- cumsum(hist_data$counts)
cumulative_relative_freq <- cumsum(relative_freq)
freq_table <- data.frame(
  "Class Interval" = hist_data$breaks[-length(hist_data$breaks)],
  "Frequency" = hist_data$counts,
  "Relative Frequency" = relative_freq,
  "Cumulative Frequency" = cumulative_freq,
  "Cumulative Relative Frequency" = cumulative_relative_freq
)
print(freq_table)

##    Class.Interval Frequency Relative.Frequency Cumulative.Frequency
## 1            4000         1               0.01                    1
## 2            4200         2               0.02                    3
## 3            4400         9               0.09                   12
## 4            4600        12               0.12                   24
## 5            4800        19               0.19                   43
## 6            5000        22               0.22                   65
## 7            5200        20               0.20                   85
## 8            5400         7               0.07                   92
## 9            5600         7               0.07                   99
## 10           5800         1               0.01                  100
##    Cumulative.Relative.Frequency
## 1                           0.01
## 2                           0.03
## 3                           0.12
## 4                           0.24
## 5                           0.43
## 6                           0.65
## 7                           0.85
## 8                           0.92
## 9                           0.99
## 10                          1.00

Histogram with RF

hist(data, breaks=breaks, right=FALSE, freq=FALSE, col="skyblue",
     main="Relative Frequency Histogram of Shear Strength",
     xlab="Shear Strength (lb)", ylab="Relative Frequency")

Problem Three

# The given data
data <- c(4.6, 12.3, 7.1, 7.0, 4.0, 9.2, 6.7, 6.9, 11.5, 5.1,
          11.2, 10.5, 14.3, 8.0, 8.8, 6.4, 5.1, 5.6, 9.6, 7.5,
          7.5, 6.2, 5.8, 2.3, 3.4, 10.4, 9.8, 6.6, 3.7, 6.4,
          8.3, 6.5, 7.6, 9.3, 9.2, 7.3, 5.0, 6.3, 13.8, 6.2,
          5.4, 4.8, 7.5, 6.0, 6.9, 10.8, 7.5, 6.6, 5.0, 3.3,
          7.6, 3.9, 11.9, 2.2, 15.0, 7.2, 6.1, 15.3, 18.9, 7.2,
          5.4, 5.5, 4.3, 9.0, 12.7, 11.3, 7.4, 5.0, 3.5, 8.2,
          8.4, 7.3, 10.3, 11.9, 6.0, 5.6, 9.5, 9.3, 10.4, 9.7,
          5.1, 6.7, 10.2, 6.2, 8.4, 7.0, 4.8, 5.6, 10.5, 14.6,
          10.8, 15.5, 7.5, 6.4, 3.4, 5.5, 6.6, 5.9, 15.0, 9.6,
          7.8, 7.0, 6.9, 4.1, 3.6, 11.9, 3.7, 5.7, 6.8, 11.3,
          9.3, 9.6, 10.4, 9.3, 6.9, 9.8, 9.1, 10.6, 4.5, 6.2,
          8.3, 3.2, 4.9, 5.0, 6.0, 8.2, 6.3, 3.8, 6.0)
head(data)

## [1]  4.6 12.3  7.1  7.0  4.0  9.2

# Part (a): Construct a stem-and-leaf 
stem(data)

## 
##   The decimal point is at the |
## 
##    2 | 23
##    3 | 2344567789
##    4 | 01356889
##    5 | 00001114455666789
##    6 | 0000122223344456667789999
##    7 | 00012233455555668
##    8 | 02233448
##    9 | 012233335666788
##   10 | 2344455688
##   11 | 2335999
##   12 | 37
##   13 | 8
##   14 | 36
##   15 | 0035
##   16 | 
##   17 | 
##   18 | 9

# Part (b): Calculate the typical (representative) flow rate
typical_flow_rate <- median(data)
cat("\nTypical (Representative) Flow Rate:", typical_flow_rate, "\n")

## 
## Typical (Representative) Flow Rate: 7

# Part (c): Determine if the display appears to be highly concentrated or spread out
spread <- sd(data)
cat("\nSpread (Standard Deviation):", spread, "\n")

## 
## Spread (Standard Deviation): 3.076844

# Part (d): Determine if the distribution of values appears to be reasonably symmetric
skewness <- e1071::skewness(data)
cat("\nSkewness:", skewness, "\n")

## 
## Skewness: 0.8647805

# Part (e): Identify any observation that is far from the rest of the data (an outlier)
Q1 <- quantile(data, 0.25)
Q3 <- quantile(data, 0.75)
IQR <- Q3 - Q1
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
outliers <- data[data < lower_bound | data > upper_bound]
cat("\nOutliers:", outliers, "\n")

## 
## Outliers: 18.9

Problem Four

# Given data
data <- c(0.31, 0.35, 0.36, 0.36, 0.37, 0.38, 0.40, 0.40, 0.40,
          0.41, 0.41, 0.42, 0.42, 0.42, 0.42, 0.42, 0.43, 0.44,
          0.45, 0.46, 0.46, 0.47, 0.48, 0.48, 0.48, 0.51, 0.54,
          0.54, 0.55, 0.58, 0.62, 0.66, 0.66, 0.67, 0.68, 0.75)

# Construct stem-and-leaf display using repeated stems
stem(data, scale = 1)

## 
##   The decimal point is 1 digit(s) to the left of the |
## 
##   3 | 1
##   3 | 56678
##   4 | 000112222234
##   4 | 5667888
##   5 | 144
##   5 | 58
##   6 | 2
##   6 | 6678
##   7 | 
##   7 | 5

Assignment One Solution

Faustus Maale

2025-01-26

Problem 1

Problem 2

Problem 3

Problem 4

Problem One

Frequency Table

Histogram

Natural log transformation

Frequency Table

Histogram of the log transformed

Problem Two

Frequency Table

Histogram with RF

Problem Three

Problem Four