- Use the data file “mpg” to answer the following questions. The data
contains a sample of cars manufactured from 1998-2008.
- What are the elements for this data set and how many observations
does it include?
- The elements for this data set are Cars
- There are 234 observations
- Does this data represent a sample or a population?
- This data represents a sample of cars manufactured between 1999 and
2008.
- Characterize each variable as categorical or quantitative. If the
variable is categorical, determine if it is nominal or ordinal. If the
variable is quantitative specify if it is discrete or continuous.
- Year - quantitative discrete
- manufacturer - categorical nominal
- model - categorical nominal
- cty - quantitative continuous
- hwy - quantitative continuous
- class - categorical nominal
- Construct a percentage frequency table and bar plot for the variable
“Class”. Are the classes represented equally? Which class is represented
the most? Label the bar plot correctly with a main tile and axis
titles.
SUV is the highest represented class with 26% of the observations.
There is some variation among the classes and they are not distributed
equally. With seven categories we would expect about 14% in each if they
were distributed similarly. In this case the lowest category represented
2% of the observations and the highest category represented 26% with
varying levels in between.
class
|
Count_Class
|
Percent
|
suv
|
62
|
26
|
compact
|
47
|
20
|
midsize
|
41
|
18
|
subcompact
|
35
|
15
|
pickup
|
33
|
14
|
minivan
|
11
|
5
|
2seater
|
5
|
2
|
Total
|
234
|
100
|

- Construct a histogram for the variable HWY MPG. Group the data as
you see fit so that the histogram provides a good representation for the
distribution. Include a title and axis labels in the chart. Discuss the
shape of the distribution relating to symmetry and skew.

- The following three histograms describe the distribution for City
MPG. Which of the three do you believe provides the best representation
for the distribution? Explain your reasoning?
Figure B provides the best representation overall for the
distribution. Figure C would also be considered okay to use. Figure A
does not provide enough information.
