1. Use the file HotelRate to analyze nightly hotel rates in the U.S.
Mean_Rate Median_Rate Quart_1 Quart_3
159 161 138 168

The mean and median hotel rates are relatively similar indicating that the distribution is roughly symmetric. We shouldn’t see much skew on either side of the distribution.

  1. Use the file TaxPrep to answer the following questions about the cost of tax preparation from a sample of tax returns in the U.S.
Mean_Cost Median_Cost First_Quart Third_Quart P_90
165 152 115 184 237
SD Upper Lower
65 360 -30.4

a.  Compute the mean, median, and mode. From this information do you believe the distribution to be symmetrical or skewed?

A slight difference between the mean and median would suggest a small amount of right skew.

b.  Compute the first and third quartiles. 
c.  Compute and interpret the 90th Percentile.
d.  Compute the sample standard deviation and interpret the results.

A sample standard deviation of $65 suggests a moderate to small amount of variation.

e.  Create a histogram for the variable cost. Describe the shape of the distribution. Is this consistent with your response from above using the mean and median? Explain this result.

The histogram suggests there is some right skew in the distribution. There is a large number of observations between 0-200 and then a few observations in the right tail.

f.  Determine the value for cost that would be considered an outlier both above and below the mean.
  1. Use the data file J-Ville-House to answer the following questions. The data contains a sample from houses in the Jacksonville area.
Mean_Price Median_Price SDev_Price
85659 72000 49650
a.  Calculate the mean and median for the variable price. Discuss the difference in values between the two and how it relates to the symmetry in the distribution.

The average and median are approximately $13,600 apart. This would suggest the distribution is right skewed. There are likely a small number of expensive houses that are pulling the distribution to the right.

b.  Report the standard deviation for the variable price. How would you characterize the variation in prices across houses? That is, is there a lot of variation or a little?

The standard deviation of $49,560 would indicate that there is a lot of variation in prices across houses. This should make sense because houses are typially very different.

c.  Calculate the coefficient of variation for the variable price. Interpret this value in terms of the level of variation for house prices.

Coefficient of Variation = 0.58. This would validate are previous answer that demonstrates there is a lot of variation in price. A C.V. of 58% can be interpreted to mean that the standard deviation is 58% of the mean and is an indicator of a lot of variation in price.

d.  What is the z-score for a house that is priced at $125,000?

z = 0.792

e.  What is the z-score for a house that is priced at $60,000?

z = -0.517

f.  What is the z-score for the minimum priced house? Using the z-score criteria, would this be considered an outlier? Explain.

Zmin = -1.705. This would not be considered an outlier. It is within three standard deviations away from the mean.

g.  Using the z-score criteria, calculate the upper boundary above the mean price that would constitute an outlier. Use this value to determine if there are any outliers for the variable price.

Upper Bound = 234609.946, Max Price = 271000. Since the maximum house price is greater than our upper bound outlier criteria, there are outliers above the mean in the data.

h.  Assuming house prices are fairly symmetric, approximately what percentage of prices would lie within ± two standard deviations from the mean?

Approximately 95% using the empirical rule.

  1. Use the file SO2Auction to analyze the distribution of permit prices in the Sulfur Dioxide auction market.

    1. For the variable Price Per Permit, report the Five Number Summary. From this information, discuss the shape of the distribution for permit prices.
Min First_Quartile Second_Quartile Third_Quartile Max
127 131 136 144 250

The distance between the minimum value, the first quartile, second quartile, and third quartile are fairly consistent, suggesting nearly three-fourths of our data lies relatively close together. The difference between the max and the third quartile is substantially larger, suggesting that there is some right skew in the distribution. There are a few high bids pulling the average to the right.

b.  Calculate the interquartile range for the Price Per Permit variable. Interpret this value in relation to the amount of variability in permit prices.
  

IQR = 12.159. The IQR represents the range for the middle 50% of the data. In this case the range is only about $12, suggesting a small amount of variation between the first and third quartiles.

c.  Using the quartile criteria for determining outliers, what upper threshold level of permit prices would be considered an outlier? Are there values that would be considered outliers from above?
  

Upper Outlier = 161.739, Max = 250. Since the max auction bid is greater than the upper outlier condition, there are outliers above the median.

d.  Using the quartile criteria for determining outliers, what lower threshold level of permit prices would be considered an outlier? Are there values that would be considered an outlier from below?

Lower Outlier = 113.102, Min = 127.296. Since the minimum auction bid is not lower than the lower outlier condition, there are no outliers below the median.

  1. The following table categorizes car accidents by age. Specifically, it is tabulating the percentage of accidents on record from a sample of insurance customers who are classified as above 85 or below 85. Use the table to answer the following questions.

Table 1: Accident Reports by Age

Less than 85 years old At least 85 years old Total
Accident 8% 24% 32%
No Accident 64% 4% 68%
Total 72% 28% 100%
  1. What is the joint probability that someone is less than 85 years old is also in an accident?
8%
  1. What percentage of people in the sample are at least 85 years old?
28%
  1. What percentage of people in the sample were in an accident?
32%
  1. Use the file StockPrice to examine the relationship between the price of Apple stock and the price of Netflix Stock. The data contains a sample of 10 observations for closing price for each stock over the last year.

  1. Create a scatter diagram for Apple and Netflix stock prices. Place Apple on the x-axis and Netflix on the y-axis. Add a trend line to the scatter plot. Visually interpret the sign (\(\pm\)) and the strength of the relationship.

With the trend line and scatter diagram it is evident that there is a fairly strong positive relationship between the two variables.

  1. Compute the sample covariance between Apple and Netflix stock prices.

    COV = 70.18

  2. Compute the correlation coefficient between Apple and Netflix stock prices. Interpret this value in terms of the sign (\(\pm\)) and strength of the relationship. Is this result consistent with your

    CORR = 0.257

The correlation coefficient indicates a moderate positive relationship between the two stock prices.