Stat 371 Homework #1 Due Wednesday Sept 14th 11:59 pm

*Submit your homework to Canvas by the due date and time. Email your lecturer if you have extenuating circumstances and need to request an extension.

*If an exercise asks you to use R, include a copy of the code and output. Please edit your code and output to be only the relevant portions.

*If a problem does not specify how to compute the answer, you many use any appropriate method. I may ask you to use R or use manually calculations on your exams, so practice accordingly.

*You must include an explanation and/or intermediate calculations for an exercise to be complete.

*Be sure to submit the HWK1 Autograde Quiz which will give you ~20 of your 40 accuracy points.

Basics of Statistics and Summarizing Data Graphically (I)

Exercise 1. A number of individuals are interested in the proportion of citizens within a county who will vote to use tax money to upgrade a professional football stadium in the upcoming vote. Consider the following methods:

The Football Team Owner surveyed 10,000 people attending one of the football games held in the stadium. Seventy three percent (73%) of respondents said they supported the use of tax money to upgrade the stadium.

The Pollster generated 1,000 random numbers between 1-40,768 (number of county voters in last election) and surveyed the 1,000 citizens who corresponded to those numbers on the voting roll. Forty seven percent (47%) of respondents said they supported the use of tax money to upgrade the stadium.

What is the population of interest? What is the parameter of interest? Will this parameter ever be calculated?

The population of interest is the amount of votes in the upcoming poll about the tax money upgrade. The parameter of interest is the number of citizens who will choose to vote for the stadium for the tax money. This parameter will be calculated by finding the votes.

What were the sample sizes used and statistics calculated from those samples? Are these simple random samples from the population of interest?

The sample sizes used were 10,000 individuals in the Football Team Owner survey and 1,000 individuals in the Pollster survey. The statistics calculated were the percentages of respondents saying they supported the stadium (Football- 73%, Pollster- 47%). While the Pollster was a simple random sample, the Football Team Owner survey was not.

The football team owner claims that the survey done at the football stadium will better predict the voting outcome because the sample size was much larger. What is your response?

Yes, the survey done by the football team owner was much larger. However, the football team owner surveyed individuals who attended football games, and therefore the poll was not randomized. The people in his poll would likely be biased towards a stadium because they already enjoy and support football by attending the games. It will not better predict the voting outcome due to the lean in preference that exists prior to voting.

Exercise 2: After manufacture, computer disks are tested for errors. The table below tabulates the number of errors detected on each of the 100 disks produced in a day.

Number of Defects	Number of Disks
0	41
1	31
2	15
3	8
4	5

Describe the type of data that is being recorded about the sample of 100 disks, being as specific as possible.

This is quantitative data, as it is measuring the numerical quantities. It is discrete because it involves a limited number of integers.

Code for a frequency histogram showing the frequency for number of errors on the 100 disks is given below.

error.data=c(rep(0,41), rep(1,31), rep(2,15), rep(3,8), rep(4, 5))
hist(error.data, breaks=c(seq(from=-0.5, 4.5, by=1)), 
     xlab="Defects", main="Number of Defects", 
     labels=TRUE, ylim=c(0,60))

bi. Knit the document and confirm that the histogram displays in the knitted file.

error.data=c(rep(0,41), rep(1,31), rep(2,15), rep(3,8), rep(4, 5))
hist(error.data, breaks=c(seq(from=-0.5, 4.5, by=1)), 
     xlab="Defects", main="Number of Defects", 
     labels=TRUE, ylim=c(0,50))

bii. Describe what the rep() function does in this code chunk.

The rep function summarizes how wide each of the bars should be. The parentheses symbolizes where the bar begins and ends, from one number (1,x) to another (x). They provide a visual indicator of what the questions are trying to explain and what the statistics represent. Rep assigns values to each function.

biii. Describe how this breaks command affects the histogram’s appearance in this code chunk.

The breaks command just demonstrates where the graph should end and where it should begin. In the code, the graph begins at -0.5 and ends at 4.5, in line with where the shaded area also begins and ends.

biv. Describe how setting ylim=c(0,30) instead of ylim=c(0,50) would change the histogram’s appearance. Which value for ylim is preferable for clear communication of the data?

By imputing 30 instead of 50, the y axis would only go up to 30. The visual indicators would change and the bars may be shorter as the first bar is being cut off. For clear communication, a value as low as 41 could be used, but anything that is lower than that cannot be. Ylim is used to describe the limit for the y-axis.

Stat 371 Homework #1 Due Wednesday Sept 14th 11:59 pm

GWENETH CHILDS

Basics of Statistics and Summarizing Data Graphically (I)

Instructors to contact regarding grading questions for HWK1: