Name

  1. The data set “Price” collected data on over 400 housing transactions in Jacksonville. The city is separated into five distinct wards that describe the location of the house, Price measures the sales price of the house, Sqft measures the size of the house in square feet, and quality rates the condition of the house.
  1. Describe the entities on which the data are collected for this study. Does this represent a population or a sample of houses in Jacksonville?
  1. Characterize each variable in the data set as categorical (qualitative) or quantitative. If the variable is categorical determine if it is nominal or ordinal. If the variable is quantitative then determine if it is discrete or continuous.
Ward Ward_Count Percent
Ward1 42 9.6
Ward2 106 24.1
Ward3 28 6.4
Ward4 171 39.0
Ward5 92 21.0

Ward 3 includes 6.4% of houses from the sample. Each ward is not represented equally. Ward 4 is represented the most with 39% and Ward 3 is represented with the fewest houses.

  1. Create a bar plot for the variable Wards and express as a percent and include in this document. Add appropriate labels to the chart.

  1. Create a histogram for the variable Price. Create exactly 6 bins and label the axes correctly. Is this histogram symmetric or asymmetric? Explain this result.

  1. Would you advise increasing the number of bins to 20? Explain this result.

No. That would spread the data over too many bins and provide too much detail.

  1. Would you advise reducing the number of bins to 2? Explain this result.

No. That would concentrate the data over too few bins not showing enough detail.

  1. Calculate the mean and the median for the variable Price. Interpret these values. Compare the mean and the median and relate the relative difference in values to the histogram. Does there appear to be skew from these measures?
Average Median
85659 72000

There does appear to be right skew given a substantial difference between the average and the median. This is consistent with the histogram as well.

  1. Create a table for the five-number summary and discuss the symmetry of the distribution.
Price
0% 1000
25% 53000
50% 72000
75% 114700
100% 271000

There is an indication of right skew particulalry looking at the difference between the max value and the third quartile.

  1. What house price would put you in the top 10% of house prices in Jacksonville from the sample?
Price
90% 156200
  1. Create a pivot table in excel that groups houses by wards and calculate the average house price for each ward. Compare the average house prices across wards.
Ward Avg_Price
Ward1 44798
Ward2 52388
Ward3 96537
Ward4 83098
Ward5 144098
  1. Calculate the standard deviation for house price and the coefficient of variation. Comment on the level of variation in house prices.
St_Dev Coeff_Variation
49650 0.58

There is a significant amount of variation in house prices across houses.

  1. Calculate the z-score for a house priced at $130,000.
Z_130
0.893
  1. Calculate the z-score for a house priced at $70,000
Z_70
-0.315
  1. Using the z-score method, what value of house would be considered an outlier above the mean?
Upper_Z
234610
  1. Using the quartile method, what value of house would be considered an outlier above the mean?
Upper_Q
207250