Mustang Price

Create bootstrap

Mustangs = read.csv("C:/Users/Will/OneDrive/Documents/School/375T Predictive Analytics/HW1/Mustangs.csv")
price = (Mustangs$Price)
n = length(price)
N = 10^4
price.mean = numeric(N)
price.sd = numeric(N)
price.median = numeric(N)
for (i in 1:N)
{
  x = sample(price, n, replace = TRUE)
  price.mean[i] = mean(x)
  price.sd[i] = sd(x)
  price.median[i] = median(x)
  
}

Analyze mean

#par(mfrow=c(1,2))
hist(price.mean, main="Bootstrap Distribution of Means")

qqnorm(price.mean)
qqline(price.mean)

biasBoot = mean(price.mean)-mean(price)
biasBoot/sd(price.mean)

## [1] 0.009192021

We observe that the histogram and qqplots of the distribution of means is bell shaped and only slightly skewed. Furthermore, The the ratio of our boot bias – that is, the expected value of the boot mean minus the mean of our sample – over the boot standard error is less than +/-10%. The preferred method is to use the 2*SE method, where our 1-alpha is captured by the interval

mean(price.mean) - 2*sd(price.mean)

## [1] 11.65472

mean(price.mean) + 2*sd(price.mean)

## [1] 20.34522

Analyze standard deviation

#par(mfrow=c(1,2))
hist(price.sd, main="Bootstrap Distribution of Standard Deviations")

qqnorm(price.sd)
qqline(price.sd)

biasBoot = mean(price.sd)-sd(price)
abs(biasBoot/sd(price.sd))

## [1] 0.202325

Our histogram resembles a bell curve, but is observably left-skewed. abs(BiasBoot/sd(price.sd)) > 10% supports our assessment. We are better off using the percentile method where our 1-alpha is within the interval

upperBound = sort(price.sd)[9750]
deltaUpper = abs(mean(price.sd) - upperBound)
lowerBound = sort(price.sd)[250]
deltaLower = abs(mean(price.sd) - lowerBound)

mean(price.sd) - deltaLower

## [1] 7.179445

mean(price.sd) + deltaUpper

## [1] 13.88588

Analyze median

#par(mfrow=c(1,2))
hist(price.median, main="Bootstrap Distribution of Medians")

qqnorm(price.median)
qqline(price.median)

We can observe from the histogram alone that neither the percentile or 95% method is appropriate for identifying the bootstrap median.

Manhatten Rents

Create Bootstrap

Manhattan = read.csv("C:/Users/Will/OneDrive/Documents/School/375T Predictive Analytics/HW1/Manhattan.csv")
rent = (Manhattan$Rent)
n = length(rent)
N = 10^4
rent.mean = numeric(N)
rent.sd = numeric(N)
rent.median = numeric(N)
for (i in 1:N)
{
  x = sample(rent, n, replace = TRUE)
  rent.mean[i] = mean(x)
  rent.sd[i] = sd(x)
  rent.median[i] = median(x)
}

Analyze mean

#par(mfrow=c(1,2))
hist(rent.mean, main="Bootstrap Distribution of Means")

qqnorm(rent.mean)
qqline(rent.mean)

biasBoot = mean(rent.mean)-mean(rent)
abs(biasBoot/sd(rent.mean))

## [1] 0.01362008

The histogram and qqplot appear to indicate a significant right-skew even though that the biasBoot to sd(rent.mean) ratio is below 2%. With that said, we will test the percentile method.

upperBound = sort(rent.mean)[9750]
deltaUpper = abs(mean(rent.mean) - upperBound)
lowerBound = sort(rent.mean)[250]
deltaLower = abs(mean(rent.mean) - lowerBound)

mean(rent.mean) - deltaLower

## [1] 2655.45

mean(rent.mean) + deltaUpper

## [1] 3815.6

This interval is similar to that derived from the “2*SE" method. Per our instructions, the Percentile method is preferable.

Analyze standard deviation

#par(mfrow=c(1,2))
hist(rent.sd, main="Bootstrap Distribution of Standard Deviations")

qqnorm(rent.sd)
qqline(rent.sd)

biasBoot = mean(rent.sd)-sd(rent)
abs(biasBoot/sd(rent.sd))

## [1] 0.2252147

Analyzing our QQ plot we see that the distribution of the bootstrap standard deviations has fatter tails, and lacks a smoother spread than our previous subjects. That said, the distribution is somewhat bell-shaped, and the skew is small. In fact, the ratio of the deltas – The absolute difference between the mean and a 2.5% bound – is less than 5%. Given this, I would consider using the percentile method with an addendum that larger samples should be considered. Our interval then is

upperBound = sort(rent.sd)[9750]
deltaUpper = abs(mean(rent.sd) - upperBound)
lowerBound = sort(rent.sd)[250]
deltaLower = abs(mean(rent.sd) - lowerBound)

mean(rent.sd) - deltaLower

## [1] 510.826

mean(rent.sd) + deltaUpper

## [1] 2004.196

Analyzing the Median

#par(mfrow=c(1,2))
hist(rent.median, main="Bootstrap Distribution of Medians")

qqnorm(rent.median)
qqline(rent.median)

The distribution of bootstrap medians is strongly skewed, enough so that it is better to provide a summary of the details.

Bootstrap

William Chance - wtc464

January 31, 2019

Mustang Price

Create bootstrap

Analyze mean

Analyze standard deviation

Analyze median

Manhatten Rents

Create Bootstrap

Analyze mean

Analyze standard deviation

Analyzing the Median