Objectives

  1. Construct S & L plots in order to summarize distributions of a single numericalvariable
  2. Utilize a legend to identify the scaling of a distribution
  3. Define a distribution as the values a variable takes and the frequency of those values
  4. Understand conceptually and define precisely the definition of an outlier
  5. Recognize that frequency tables are a way of sorting variable observations into “bins” in order to determine distributions
  6. Understand that a histogram is the visual representation of a frequency table
  7. Construct frequency tables and histograms given a data set
  8. Analyze the density of data within a given interval using frequency tables and histograms
  9. Utilize histograms to construct a cumulative frequency diagram in order to analyze the cumulative distribution of a variable

Key vocabulary

  1. A distribution refers to the values a variable takes and the frequency of those values
  2. An outlier is an observation that appears extreme relative to the rest of the data

Case studies

Case study 1: Using the births data set part I

On our spreadsheet for today, go to the tab titled “BIRTHS”. In 2004, the state of North Carolina released to the public a large data set containing information on births recorded in this state. This data set has been of interest to medical researchers who are studying the relation between habits and practices of expectant mothers and the birth of their children. This is a random sample of 1,000 cases from this data set.

  1. Use google sheets to create a histogram of the variable mage which stands for “mother’s age”. Place this histogram in a separate tab and name it “HIST - mAGE”.
  2. Look at the histogram you just created which is a visual summary of the distribution of mother’s ages in the data set. When do most mothers seem to have children?
  3. Again using the histogram describing the distribution of mother’s age, do you happen to notice any outliers in the distribution? What are they? What do you think is going on with those observations?
  4. If you were to describe the shape of the distribution, how would you describe it?
  5. Using the first 10 observations within mage create a stem an leaf plot depicting the distribution of mage

Case study 2: Using the births data set part II

  1. Use google sheets to create a histogram of the variable weight which is data collected on the weight of a baby at birth. Place this histogram in a separate tab and name the tab “HIST - WEIGHT”
  2. Look at the histogram you just created which is a visual summary of the distribution of baby weights in the data set. What is the weight most babies seem to be born at?
  3. Again using the histogram describing the distribution of baby weights, do you happen to notice any outliers in the distribution? What are they? What do you think is going on with those observations?
  4. If you were to describe the shape of the distribution, how would you describe it?
  5. Using the first 10 observations within weight create a stem an leaf plot depicting the distribution of weight

Case study 3: Using the cars data set part I

On our spreadsheet for today, go to the tab titled “CARS”. This is a data matrix with 54 rows and 6 columns. The columns represent the variables type, price, mpgCity, driveTrain, passengers, weight for a sample of 54 cars from 1993.

  1. Use google sheets to create a histogram of the variable mpgCity which stands for “Miles per gallon, city”. Place this histogram in a separate tab and name it “HIST - MPG”.
  2. Look at the histogram you just created which is a visual summary of the distribution of the car’s MPG in the data set. What sort of MPG does it seem like most cars get in the distribution?
  3. Again using the histogram describing the distribution of car MPG, do you happen to notice any outliers in the distribution? What are they? What do you think is going on with those observations?
  4. If you were to describe the shape of the distribution, how would you describe it?
  5. Using the first 10 observations within mpgCity create a stem an leaf plot depicting the distribution of mpgCity

Case study 4: Using the cars data set part II

  1. Use google sheets to create a histogram of the variable weight which stands for “Weight of car (lbs)”. Place this histogram in a separate tab and name it “HIST - carWEIGHT”.
  2. Look at the histogram you just created which is a visual summary of the distribution of the car’s weights in the data set. What sort of weight does it seem like most cars have in the distribution?
  3. Again using the histogram describing the distribution of car MPG, do you happen to notice any outliers in the distribution? What are they? What do you think is going on with those observations?
  4. If you were to describe the shape of the distribution, how would you describe it?
  5. Using the first 10 observations within weight create a stem an leaf plot depicting the distribution of weight