beer.data <- read.csv("http://homepages.gac.edu/~anienow2/MCS_142/Data/beer.csv", header=TRUE, sep=",")

Homework Problem

1.62 Alcohol content of beer. Brewing beer involves a variety of steps that can affect the alcohol content. A Web site gives the percent alcohol for 86 domestic brands of beer.26

  1. Use graphical and numerical summaries of your choice to describe these data. Give reasons for your choice.
hist(beer.data$PercentAlcohol, main="Histogram of Percent Alcohol in 86 Domestic Beers", xlab="Percent Alcohol")

The mean is 4.7593023. The standard deviation is 0.7523106. The median is 4.7. The histogram displays the distrubution and frequency of distribution of the percent alcohol in each of the 86 domestic beers.

boxplot(beer.data$PercentAlcohol, horizontal=TRUE, xlab="Percent Alcohol", main="Boxplot of Percent Alcohol in 86 Domestic Beers")

The five number summary includes the minimum, the 1st quartile, the median, the mean, the 3rd quartile, and the maximum, respectively: 0.4, 4.325, 4.7, 4.759, 5, 6.5. By using a boxplot, the outlier is visably displayed.

  1. The data set contains an outlier. Explain why this particular beer is unusual and how its outlier status is related to how it is marketed.

Observation 57 is an outlier. I know this because the function which.min(beer.data$PercentAlcohol) returns the location of the smallest value in the dataset. Observation is an outlier because of its percent alcohol at 0.4, which is much lower than the other observations.

1.63 An outlier for alcohol content of beer. Refer to the previous exercise.

  1. Calculate the mean with and without the outlier. Do the same for the median. Explain how these statistics change when the outlier is excluded.

The mean without the outlier is 4.810588. The mean with the outlier is 4.759. The median without the outlier is 4.7. The median with the outlier is 4.7. With the outlier out of the dataset, the median has remained the same, unless I typed in the wrong code, but the mean increased. Having the outlier gone from the dataset makes the statistics more accurately depict the data because outliers always cause a slight skew in the statistics.

  1. Calculate the standard deviation with and without the outlier. Do the same for the quartiles. Explain how these statistics change when the outlier is excluded.

Without the outlier, the standard deviation is 0.5863575. With the outlier, the standard deviation is 0.7523106. Thus, not having the outlier there lowered the standard deviation. Therefore, it means that the data distribution is now closer to the mean and the data can be more accurately displayed without the outlier.

  1. Write a short paragraph summarizing what you have learned in this exercise.

This excercise has taught me how to better use R Studio to display data and use that data to calculate statistics. This excercise has also taught me how to interpret outliers, and also, how to remove them from the dataset when calculating statistics. From that, I learned how much outliers affect the data and its statistics. By removing the outlier from the dataset, I was able to more accurately calculate statistics that would better represent the data.

1.64 Calories in beer. Refer to the previous two exercises. The data set also gives the calories per 12 ounces of beverage.

  1. Analyze the data and summarize the distribution of calories for these 86 brands of beer.
hist(beer.data$Calories, main="Histogram of Calories in 86 Domestic Beers", xlab="Calories")

The mean is 141.0581395. The standard deviation is 27.7913887. This high of a standard deviation suggests that the mean is not very reliable to dataset.

  1. In the previous exercise you identified one brand of beer as an outlier. To what extent is this brand an outlier in the distribution of calories? Explain your answer.
boxplot(beer.data$Calories, main="Boxplot of Calories in 86 Domestic Beers", horizontal=TRUE, xlab="Calories")

As seen in the boxplot above, the brand that was an outlier for the percent alcohol data is not an outlier for the calories data. According to the boxplot, there are no outliers depicted.

  1. The distribution of calories suggests that there may be two groups of beers which might be marketed differently. Examine the data file carefully and explain the characteristics of the two groups.

These two groups may refer to the higher calorie beers and the lower calorie beers, also known as normal and low-cal/light beers. There is a connection between number of calories and number of carbohydrates. The higher calorie beers tend to have more carbohydrates than the lower calorie beers. There is not as noticable of a connection between calories and percent alcohol.