Rachel Walsh 1. If you wanted to estimate the mean height of all the students at UW Madison, which one of the following sampling strategies would be best? Why? (Note that none of the methods are true simple random samples.) a. Measure the heights of 100 students found in the gym during basketball intramurals. b. Measure the heights of the engineering majors. c. Measure the heights of the students selected by choosing the first name on each page of a list of students enrolled that semester. C, it provides the most diverse set of data that is randomly selected excluding outside variables.
Is it possible to determine the direction in which (increase/decrease) the mean changes? Or how much the mean changes? If so, by how much does it change? If not, why not? Yes, the mean would decrease if one number that is included in the numerical set is decreased. You can calculate the change by the following: 24=(11.9+x)/12 > 24*12 [1] 288 > 288-11.9 [1] 276.1 > (1.19+276.1)/12 [1] 23.1075 > 24-23.1075 [1] 0.8925
Is it possible to determine the direction in which the median changes? Or how much the median changes? If so, by how much does it change? If not, why not? No, you would need to know the other numbers to determine this.
Is it possible to predict the direction in which the standard deviation changes? If so, does it get larger or smaller? If not, why not? Describe why it is difficult to predict by how much the standard deviation will change in this case. No, we do not know all of the numbers so we cannot compute the deviations.
Males<-c(54.2, 39.6, 52.3, 48.4, 35.9, 30.4, 25.2, 45.4, 48.9, 48.9, 45.8, 44.0, 52.5, 48.3, 59.9, 51.7, 38.6, 39.1, 49.9, 38.3)
Females<-c(30.3, 43.0, 25.7, 26.7, 27.3, 31.9, 53.7, 32.9, 19.4, 23.7, 23.3, 23.3, 37.8, 39.5, 33.5, 30.4, 28.5)
sort(Males)
## [1] 25.2 30.4 35.9 38.3 38.6 39.1 39.6 44.0 45.4 45.8 48.3 48.4 48.9 48.9 49.9
## [16] 51.7 52.3 52.5 54.2 59.9
sort(Females)
## [1] 19.4 23.3 23.3 23.7 25.7 26.7 27.3 28.5 30.3 30.4 31.9 32.9 33.5 37.8 39.5
## [16] 43.0 53.7
hist(Males, xlim = c(25,60))
hist(Females, xlim = c(18,55), ylim = c(0,6))
b. Compare the shape of the throws from the male and female students observed in this sample. The females historgram is shifted further left than the males and had a smaller arrary of data, the males was more towards the right with larger chunks of data with less distribution. c. Compute the mean and median throw velocities observed separately for the male and female students using R. Compare both measures of center across the two groups.
mean(Males)
## [1] 44.865
median(Males)
## [1] 47.05
mean(Females)
## [1] 31.22941
median(Females)
## [1] 30.3
Males have a higher measure of center d. Compute and compare the standard deviation in throw velocities observed in the male and female students.
sd(Males)
## [1] 8.513845
sd(Females)
## [1] 8.519666
The standard deviations are very similar with the females being slightly larger. e. Use R to create side-by-side boxplots of the two sets so they are easily comparable.
genders<-c("Males","Females")
boxplot(Males, Females, horizontal = TRUE, main = "Males and Females")
f. Explain why the highest value in the Female Velocity boxplot is shown as a point. That is, explain what calculations determined that 53.7 was an outlying value. Also specify to what value the upper female whisker extends. The female data is more spread out with an outlier at 53.7 which turns it into a point because there is no nearby data. g. What would be the mean and median throw velocity if we combined the throw velocities into one large data set? Show how only one of the mean or median can be calculated from your female and male summary measures in part (c).
Throw<-c(54.2, 39.6, 52.3, 48.4, 35.9, 30.4, 25.2, 45.4, 48.9, 48.9, 45.8, 44.0, 52.5, 48.3, 59.9, 51.7, 38.6, 39.1,
49.9, 38.3,30.3, 43.0, 25.7, 26.7, 27.3, 31.9, 53.7, 32.9, 19.4, 23.7, 23.3, 23.3, 37.8, 39.5, 33.5, 30.4, 28.5)
mean(Throw)
## [1] 38.6
median(Throw)
## [1] 38.6
sd(Throw)
## [1] 10.86166
The standard deviation increased as the data set grew to include a wider spread of values. This makes sense because males threw at a higher velocity than the females. 4. After manufacture, computer disks are tested for errors. The table below tabulates the number of errors detected on each of the 100 disks produced in a day. Number of Defects: 0,1,2,3,4 Number of Disks: 41, 31, 15, 8, 5 a. Describe the type of data that is being recorded about the sample of 100 disks, being as specific as possible. The data presented are numbers making it quantitive, and is discrete as it is distinct how many defects there are and how many defects are present, this data isn’t continous as time isn’t a factor and the situation isn’t fluid. b. A frequency histogram showing the frequency for number of errors on the 100 disks is given below. Write the R code to produce this frequency histogram by creating bins at [-.5, .5), [.5, 1.5), etc. Be sure to create useful labels.
Zerodefects<-rep(0,41)
Onedefect<-rep(1,31)
Twodefects<-rep(2,15)
Threedefects<-rep(3,8)
Fourdefects<-rep(4,5)
Defects<-c(Zerodefects, Onedefect, Twodefects,Threedefects, Fourdefects)
hist(Defects, breaks = c(-.5,.5,1.5,2.5,3.5,4.5))
mean(Defects)
## [1] 1.05
median(Defects)
## [1] 1
mean by hand: (31+30+21+20)/100=1.05, median by hand: 41 zeros then add 9 to get to 50 which is half of 100 so the middle and next there are 31 ones so adding 9 would mean the median is one. This makes sense as the shape is left skewed so the center values would be towards the beginning of the data. e. Calculate the sample standard deviation “by hand” and using R. Are the values consistent between the two methods?
sd(Defects)
## [1] 1.157976
by hand: 41(0-1.05)2+31(1-1.05)^2+15(2-1.05)2+8(3-1.05)^2+5*(4-1.05)^2 [1] 132.75 > 132.75/99 [1] 1.340909 > (1.340909)^(1/2) [1] 1.157976
The values are the exact same. f. Explain why the histogram is better able to show the discrete nature of the data than a boxplot. The histogram shows the spread of data which is important with discrete quantitive data as the categories can be easily displayed where the boxplot would just group all of the data together and doesn’t visually diplay differences. g. Suppose a customer came to pick up a single computer disk from the 100 produced on that day. What is the probability that disk has at least 1 defect? (31+15+8+5)/100=59/100=59% h. Suppose a customer came to pick up three computer disks from the 100 produced on that day. What is the probability that at least 1 of the three disks has at least 1 defect? 1-1-41/10040/9939/98=0.9340754