Name
- The data set “Price” collected data on over 400 housing transactions
in Jacksonville. The city is separated into five distinct wards that
describe the location of the house, Price measures the sales price of
the house, Sqft measures the size of the house in square feet, and
quality rates the condition of the house.
- Describe the entities on which the data are collected for this
study. Does this represent a population or a sample of houses in
Jacksonville?
- The elements for this data set are the houses
- This is a sample of houses in Jacksonville
- Characterize each variable in the data set as categorical
(qualitative) or quantitative. If the variable is categorical determine
if it is nominal or ordinal. If the variable is quantitative then
determine if it is discrete or continuous.
Ward - Categorical Nominal
Price - Quantitative Continuous
Sqft - Quantitative Continuous
Quality - Categorical Ordinal
- Create a percent frequency table for the variable Wards and include
it in this document. What percentage of the observations are from ward
3? Is each ward represented equally? Explain your answer.
Ward
|
Ward_Count
|
Percent
|
Ward1
|
42
|
9.6
|
Ward2
|
106
|
24.1
|
Ward3
|
28
|
6.4
|
Ward4
|
171
|
39.0
|
Ward5
|
92
|
21.0
|
Ward 3 includes 6.4% of houses from the sample. Each ward is not
represented equally. Ward 4 is represented the most with 39% and Ward 3
is represented with the fewest houses.
- Create a bar plot for the variable Wards and express as a percent
and include in this document. Add appropriate labels to the chart.

- Create a histogram for the variable Price. Create exactly 6 bins and
label the axes correctly. Is this histogram symmetric or asymmetric?
Explain this result.

- Would you advise increasing the number of bins to 20? Explain this
result.
No. That would spread the data over too many bins and provide too
much detail.
- Would you advise reducing the number of bins to 2? Explain this
result.
No. That would concentrate the data over too few bins not showing
enough detail.
- Calculate the mean and the median for the variable Price. Interpret
these values. Compare the mean and the median and relate the relative
difference in values to the histogram. Does there appear to be skew from
these measures?
Average
|
Median
|
85659
|
72000
|
There does appear to be right skew given a substantial difference
between the average and the median. This is consistent with the
histogram as well.
- Create a table for the five-number summary and discuss the symmetry
of the distribution.
|
Price
|
0%
|
1000
|
25%
|
53000
|
50%
|
72000
|
75%
|
114700
|
100%
|
271000
|
There is an indication of right skew particulalry looking at the
difference between the max value and the third quartile.
- What house price would put you in the top 10% of house prices in
Jacksonville from the sample?
- Create a pivot table in excel that groups houses by wards and
calculate the average house price for each ward. Compare the average
house prices across wards.
Ward
|
Avg_Price
|
Ward1
|
44798
|
Ward2
|
52388
|
Ward3
|
96537
|
Ward4
|
83098
|
Ward5
|
144098
|
- Calculate the standard deviation for house price and the coefficient
of variation. Comment on the level of variation in house prices.
St_Dev
|
Coeff_Variation
|
49650
|
0.58
|
There is a significant amount of variation in house prices across
houses.
- Calculate the z-score for a house priced at $130,000.
- Calculate the z-score for a house priced at $70,000
- Using the z-score method, what value of house would be considered an
outlier above the mean?
- Using the quartile method, what value of house would be considered
an outlier above the mean?