*Submit your homework to Canvas by the due date and time. Email your lecturer if you have extenuating circumstances and need to request an extension.

*If an exercise asks you to use R, include a copy of the code and output. Please edit your code and output to be only the relevant portions.

*If a problem does not specify how to compute the answer, you many use any appropriate method. I may ask you to use R or use manually calculations on your exams, so practice accordingly.

*You must include an explanation and/or intermediate calculations for an exercise to be complete.

*Be sure to submit the Homework 1 Autograde Quiz which will give you ~20 of your 40 accuracy points.

*50 points total: 40 points accuracy, and 10 points completion

Basics of Statistics and Summarizing Data Graphically (I)

Exercise 1. A number of individuals are interested in the proportion of citizens within a county who will vote to use tax money to upgrade a professional football stadium in the upcoming vote. Consider the following methods:

The Football Team Owner surveyed 10,000 people attending one of the football games held in the stadium. Seventy three percent (73%) of respondents said they supported the use of tax money to upgrade the stadium.

The Pollster generated 1,000 random numbers between 1-40,768 (number of county voters in last election) and surveyed the 1,000 citizens who corresponded to those numbers on the voting roll. Forty seven percent (47%) of respondents said they supported the use of tax money to upgrade the stadium.

  1. What is the population of interest? What is the parameter of interest? Will this parameter ever be calculated?

The population of interest is the citizens in the county who vote. The parameter of interest is the number of people in the county that support upgrading the football stadium using tax money. Based on the data given, the parameter will never be calculated because not every citizen’s vote is being accounted for. The football team’s owner’s method only includes citizens’ votes of those who attend football games. The pollster’s method uses simple random sampling and only surveys a portion of the county.

  1. What were the sample sizes used and statistics calculated from those samples? Are these simple random samples from the population of interest?

For the football team’s owner’s method,the sample size is 10,000 citizens. For the pollster’s method, the sample size is 1,000. The football team’s owner’s method had 73% of the sample that supported the upgrade to the stadium, whereas the pollster’s method had a lower percentage at 47%. The pollster’s method represents less people, yet it’s more random for not all people that attend the football games were chosen.

  1. The football team owner claims that the survey done at the football stadium will better predict the voting outcome because the sample size was much larger. What is your response?

This statement is false. Although the sample size is larger, it does not represent all the citizens of the county because the citizens chosen attend football games, therefore having more of a liking for upgrades to the stadium. The citizens who do not attend football games were not accounted for.

Exercise 2: After manufacture, computer disks are tested for errors. The table below tabulates the number of errors detected on each of the 100 disks produced in a day.

Number of Defects Number of Disks
0 41
1 31
2 15
3 8
4 5
  1. Describe the type of data that is being recorded about the sample of 100 disks, being as specific as possible.

numerical, discrete

  1. Code for a frequency histogram showing the frequency for number of errors on the 100 disks is given below.
  1. Knit the document and confirm that the histogram displays in the knitted file.
error.data=c(rep(0,41), rep(1,31), rep(2,15), rep(3,8), rep(4, 5))
hist(error.data, breaks=c(seq(from=-0.5, 4.5, by=1)), 
     xlab="Defects", main="Number of Defects", 
     labels=TRUE, ylim=c(0,50))

  1. Describe what the rep() function does in this code chunk.

the rep() function allows us to insert the x and y values of the histogram; rep(x,y)

  1. Describe how the breaks command affects the histogram’s appearance in the above code.

breaks is the number of bins, the less breaks, the more wide the bin widths will be. these are written as seq(x…). the breaks command alters the number of bins on the histogram.

  1. Describe how setting ylim = c(0,30) instead of ylim = c(0,50) would change the histogram’s appearance. Which value for ylim is preferable for clear communication of the data?

The ylim = c(0,30) changes the axis interval from (0,50) to (0,30). It cuts off the top part of one of the bars, making the histogram unreliable. Changing the y limit will change the y-axis and make all the data not shown. 50 is a preferable y limit for it shows all the data and makes the histogram clear.