Mustangs = read.csv("C:/Users/Will/OneDrive/Documents/School/375T Predictive Analytics/HW1/Mustangs.csv")
price = (Mustangs$Price)
n = length(price)
N = 10^4
price.mean = numeric(N)
price.sd = numeric(N)
price.median = numeric(N)
for (i in 1:N)
{
x = sample(price, n, replace = TRUE)
price.mean[i] = mean(x)
price.sd[i] = sd(x)
price.median[i] = median(x)
}
#par(mfrow=c(1,2))
hist(price.mean, main="Bootstrap Distribution of Means")
qqnorm(price.mean)
qqline(price.mean)
biasBoot = mean(price.mean)-mean(price)
biasBoot/sd(price.mean)
## [1] 0.009192021
We observe that the histogram and qqplots of the distribution of means is bell shaped and only slightly skewed. Furthermore, The the ratio of our boot bias – that is, the expected value of the boot mean minus the mean of our sample – over the boot standard error is less than +/-10%. The preferred method is to use the 2*SE method, where our 1-alpha is captured by the interval
mean(price.mean) - 2*sd(price.mean)
## [1] 11.65472
mean(price.mean) + 2*sd(price.mean)
## [1] 20.34522
#par(mfrow=c(1,2))
hist(price.sd, main="Bootstrap Distribution of Standard Deviations")
qqnorm(price.sd)
qqline(price.sd)
biasBoot = mean(price.sd)-sd(price)
abs(biasBoot/sd(price.sd))
## [1] 0.202325
Our histogram resembles a bell curve, but is observably left-skewed. abs(BiasBoot/sd(price.sd)) > 10% supports our assessment. We are better off using the percentile method where our 1-alpha is within the interval
upperBound = sort(price.sd)[9750]
deltaUpper = abs(mean(price.sd) - upperBound)
lowerBound = sort(price.sd)[250]
deltaLower = abs(mean(price.sd) - lowerBound)
mean(price.sd) - deltaLower
## [1] 7.179445
mean(price.sd) + deltaUpper
## [1] 13.88588
#par(mfrow=c(1,2))
hist(price.median, main="Bootstrap Distribution of Medians")
qqnorm(price.median)
qqline(price.median)
We can observe from the histogram alone that neither the percentile or 95% method is appropriate for identifying the bootstrap median.
Manhattan = read.csv("C:/Users/Will/OneDrive/Documents/School/375T Predictive Analytics/HW1/Manhattan.csv")
rent = (Manhattan$Rent)
n = length(rent)
N = 10^4
rent.mean = numeric(N)
rent.sd = numeric(N)
rent.median = numeric(N)
for (i in 1:N)
{
x = sample(rent, n, replace = TRUE)
rent.mean[i] = mean(x)
rent.sd[i] = sd(x)
rent.median[i] = median(x)
}
#par(mfrow=c(1,2))
hist(rent.mean, main="Bootstrap Distribution of Means")
qqnorm(rent.mean)
qqline(rent.mean)
biasBoot = mean(rent.mean)-mean(rent)
abs(biasBoot/sd(rent.mean))
## [1] 0.01362008
The histogram and qqplot appear to indicate a significant right-skew even though that the biasBoot to sd(rent.mean) ratio is below 2%. With that said, we will test the percentile method.
upperBound = sort(rent.mean)[9750]
deltaUpper = abs(mean(rent.mean) - upperBound)
lowerBound = sort(rent.mean)[250]
deltaLower = abs(mean(rent.mean) - lowerBound)
mean(rent.mean) - deltaLower
## [1] 2655.45
mean(rent.mean) + deltaUpper
## [1] 3815.6
This interval is similar to that derived from the “2*SE" method. Per our instructions, the Percentile method is preferable.
#par(mfrow=c(1,2))
hist(rent.sd, main="Bootstrap Distribution of Standard Deviations")
qqnorm(rent.sd)
qqline(rent.sd)
biasBoot = mean(rent.sd)-sd(rent)
abs(biasBoot/sd(rent.sd))
## [1] 0.2252147
Analyzing our QQ plot we see that the distribution of the bootstrap standard deviations has fatter tails, and lacks a smoother spread than our previous subjects. That said, the distribution is somewhat bell-shaped, and the skew is small. In fact, the ratio of the deltas – The absolute difference between the mean and a 2.5% bound – is less than 5%. Given this, I would consider using the percentile method with an addendum that larger samples should be considered. Our interval then is
upperBound = sort(rent.sd)[9750]
deltaUpper = abs(mean(rent.sd) - upperBound)
lowerBound = sort(rent.sd)[250]
deltaLower = abs(mean(rent.sd) - lowerBound)
mean(rent.sd) - deltaLower
## [1] 510.826
mean(rent.sd) + deltaUpper
## [1] 2004.196
#par(mfrow=c(1,2))
hist(rent.median, main="Bootstrap Distribution of Medians")
qqnorm(rent.median)
qqline(rent.median)
The distribution of bootstrap medians is strongly skewed, enough so that it is better to provide a summary of the details.