Problem 1 The percentage of cotton in material used to manufacture men’s shirts follows.

  1. Compute the sample mean, variance and median.
library(openxlsx)
dat= read.xlsx('/Users/seventails/Downloads/Stat/Week 5/HW4_data.xlsx')
mean(dat$Cotton_Percentage)
## [1] 34.79844
var(dat$Cotton_Percentage)
## [1] 1.860791
median(dat$Cotton_Percentage)
## [1] 34.7
  1. Construct a stem-and-leaf display for the data.
stem(dat$Cotton_Percentage)
## 
##   The decimal point is at the |
## 
##   32 | 156789
##   33 | 11456666688
##   34 | 011122355666667777779
##   35 | 00111234456789
##   36 | 2346888
##   37 | 13689
  1. Construct a frequency distribution and histogram for the cotton content.
#Creating a numeric vector of cotton percentages as input to the cut method
percentages= dat$Cotton_Percentage

#Creating bins to sort the percentages into
bins= seq(32, 39, by= 1)

#sort the values into the bins created above
freq= cut(percentages, bins)

#view the factor created above as a table.
transform(table(freq))
##      freq Freq
## 1 (32,33]    6
## 2 (33,34]   12
## 3 (34,35]   22
## 4 (35,36]   12
## 5 (36,37]    7
## 6 (37,38]    5
## 7 (38,39]    0
#Create the histogram
hist(percentages, main = "Histogram for Cotton Content", xlab = "cotton percentage")

(d) Construct a box plot of the data and comment on the information in this display.

boxplot(dat$Cotton_Percentage)

## The min and max ranges betwen 32 and 38. 
## The median lies between 34.5 and 35. There are no outliers in the data.

Problem 2. Suppose X has Bernoulli distribution with parameter p. Show that the sample mean X ̄ is a MVUE of p.

Let \(\sf{X_{1}, X_{2},..., X_{n}}\) be Bernoulli trials with success parameter p and set the estimator for \(p\) to be \(d(X)= \overline{X}\), the sample mean. Then,
\[\begin{aligned} \ E_{p} \overline{X}&= \frac{1}{n}({EX_{1},EX_{2},...,EX_{n}} )\\ &=\frac{1}{n}(p + p+...+ p)\\ &=p \end{aligned}\]

Problem 3 Use the MLE method to build estimators for the parameters of the following distributions:

  1. Bernoulli
    \[\begin{aligned} \text {Likelihood function is-}\\ L(\theta)&= \prod_{i = 1}^{n} p^{X_{i}}(1-p)^{1-X_{i}}\\ \text{The log-likelihood function is-} \\ LL(\theta)&= \sum_{i= 1}^{n}log p^{X_{i}}(1-p)^{1-X_{i}}\\ &=\sum_{i= 1}^{n} X_{i}(\log {p}) + (1-X_{i}) \log{(1-p)}\\ &= Y\log{p}+ (n-Y)\log{(1-p)} &where\ Y= \sum_{i= 1}^{n} X_{i} \end{aligned}\]

Now finding the first derivative of the function and setting it to 0 to find the MLE:

\[\begin{aligned} \frac{\partial LL(p)}{\partial p}&= Y\frac{1}{p}\ +(n-Y)\frac{-p}{1-p}= 0\\ \hat{p}&= \frac{Y}{n}= \frac{\sum_{i= 1}^{n} X_{i}}{n}\\ \text{The estimator is the mean here} \end{aligned}\]
  1. Exponential
    \[\begin{aligned} \text {Likelihood function is-}\\ L( \lambda, x_{1},x_{2}, ..., x_{n})&= \prod_{i = 1}^{n}\lambda e^{-\lambda x}\\ &= \lambda ^{n}e^{-\lambda \sum_{i= 1}^{n}X_{i}}\\ \text{The log-likelihood function is-}\\ LL(\theta)&= n\log{\lambda}-\lambda \sum_{i= 1}^{n}X_{i} \log{e}\\ &=n\log{\lambda}-\lambda Y &where\ Y= \sum_{i= 1}^{n} X_{i}\\ \frac{\partial LL(p)}{\partial p}&= \frac{n}{\lambda} - Y= 0\\ \hat{\lambda} &= \frac{n}{Y}\\ \hat{\lambda} &= \frac{n}{\sum_{i=1}^{n}X_{i}}\\ \text{The estimator is the reciprocal of the mean}\\ \end{aligned}\]
  2. Lognormal \[\begin{aligned} \text {Likelihood function is-}\\ L(\theta) &= \prod_{i = 1}^{n}\frac{1}{\sqrt{2\pi\theta_{1}}} e^-{\frac{(X_{i}-\theta_{0})^2}{2\theta_{1}}}\\ \text{The log-likelihood function is-}\\ LL(\theta) &= \prod_{i = 1}^{n}\log{\frac{1}{\sqrt{2\pi\theta_{1}}}} e^-{\frac{(X_{i}-\theta_{0})^2}{2\theta_{1}}}\\ \text{Solving for both }\theta_{0} and \theta_{1}\\ \hat{\mu}= \hat{\theta_{0}}\\ \hat{\sigma^2}= \hat{\theta_{1}} \end{aligned}\]

Problem 6 (24 points total)

  1. Calculate the least squares estimates of the slope and intercept. Graph the regression line. (12 points)
n= 20
sigma_Y= 12.7
sigma_Y2= 8.8
sigma_X=1487
sigma_X2=143215
sigma_XY= 1083
S_XX= sigma_X2- (sigma_X^2/n)
S_XY= sigma_XY-((sigma_Y*sigma_X)/n)

b1= S_XY/S_XX
b1
## [1] 0.004248918
b0= sigma_Y/n- (b1*(sigma_X/n))
b0
## [1] 0.319093
L= function(x= c(-100:100)){
  b0+b1*(x)
}
plot(L)

(b) Use the equation of the fitted line to predict what pavement deflection would be ob- served when the surface temperature is 85◦F. (4 points)

b0+(b1*85)
## [1] 0.680251
  1. What is the mean pavement deflection when the surface temperature is 90◦F? (4 points)
b0+(b1*90)
## [1] 0.7014956
  1. What change in mean pavement deflection would be expected for a 1◦F change in surface temperature? (4 points)
b1
## [1] 0.004248918