Homework 1: Due September 3, 2023 by 11:59 pm

#Change the author above to your name. Type your answers as comments. To make a comment, start the line with a number sign. To insert an R chunk, go to the green +C to the left of Run (top right) and click the R. To run the R chunk click the green arrow on the right top of the chunk. You may discuss your ideas with classmates, but please write your own answers and R code. When you are done, Knit to a pdf and upload in the homework box. Each question is worth 4 points for 20 points total. SAVE YOUR R FILES if you are working on the school server!!

Question 1: 1.20 from our book

How Fast Do Homing Pigeons Go? Homing pigeons have an amazing ability to find their way home over extremely long distances. How fast do they go on these trips? In the 2019 Midwest Classic, held in Topeka, Kansas, the fastest bird went 1676 YPM (yards per minute), which is about 56 miles per hour. The top seven finishers included three female pigeons (Hens) and four male pigeons (Cocks). Their speeds, in YPM, are given in Table below.

#a) How many cases are there in this dataset? How many variables are there and what are they? Is each variable categorical or quantitative? # There are 7 cases, 2 variables: gender and velocity. Gender is qualitative and velocity is quanitative.

b) Use R to display the information as a dataset with cases as rows and variables as columns. See the smoke example. You may not have row names or just pigeon1, pigeon2,

print("Pigeon Speeds")
## [1] "Pigeon Speeds"
pigeonSpeeds <- matrix(c(1676,1452,1499,1458,1435,1418,1413),ncol=2,nrow=7,byrow=TRUE)
colnames(pigeonSpeeds) <- c("Gender","Speed")
rownames(pigeonSpeeds) <- c("pigeon1","pigeon2","pigeon3","pigeon4","pigeon5","pigeon6","pigeon7")
pigeon <- as.table(pigeonSpeeds)
pigeon
##         Gender Speed
## pigeon1   1676  1452
## pigeon2   1499  1458
## pigeon3   1435  1418
## pigeon4   1413  1676
## pigeon5   1452  1499
## pigeon6   1458  1435
## pigeon7   1418  1413
henTimes <- c(1676, 1452, 1499)
cockTimes <- c(1458, 1435, 1418, 1413)
print("Original Table")
## [1] "Original Table"
cat("Hen ", henTimes,"\n")
## Hen  1676 1452 1499
cat("Cock", cockTimes, "\n\n")
## Cock 1458 1435 1418 1413
print("Example of Table Using a Matrix")
## [1] "Example of Table Using a Matrix"
smoke <- matrix(c(51,43,22,92,28,21,68,22,9),ncol=3,byrow=TRUE)
colnames(smoke) <- c("High","Low","Middle")
rownames(smoke) <- c("current","former","never")
smoke <- as.table(smoke)
smoke
##         High Low Middle
## current   51  43     22
## former    92  28     21
## never     68  22      9

Question 2: 1.62 from our book

Pennsylvania High School Seniors describes a dataset, stored in PASeniors, for a sample of students who filled out a survey though the US Census at School project. When downloading the sample we specified Pennsylvania as the state and Grade 12 as the school year, then the website chose a random sample of 457 students from among all students who matched those criteria. We’d like to generalize results from this sample to a larger population. Discuss whether this would be reasonable for each of the groups listed below.

a) The 457 students in the original sample.

The results would be highly reasonable to infer on this sample because they are the source of the survey.

b) All Pennsylvania high school seniors who participated in the Census at School survey.

#It would be fairly reasonable to assume that the results from the Census at School could be inferred onto students in grades 9-11. # c) All Pennsylvania high school seniors. #It would be fair to assume the results from the Census at School could be inferred for all Pennsylvania high school students, however, there would be more deviation from the original 457 students polled. # d) All students in the United States who participated in the Census at School survey. #It is unreasonable to infer a sample size of 457 onto a very large population such as all students in the US who took the poll

Question 3: 1.108 from our book

Infections Can Lower IQ A headline in June 2015 proclaims “Infections can lower IQ.” The headline is based on a study in which scientists gave an IQ test to Danish men at age 19. They also analyzed the hospital records of the men and found that 35% of them had been in a hospital with an infection such as an STI or a urinary tract infection. The average IQ score was lower for the men who had an infection than for the men who hadn’t.

a) What are the cases in this study?

19 Year old Danish men are the cases.

b) What is the explanatory variable? Is it categorical or quantitative?

The explanatory variable is testing positive for a STI or UTI and it is a categorical variable.

c) What is the response variable? Is it categorical or quantitative?

The response variable is IQ among men who test positive

d) Does the headline imply causation?

The headline implies causation because IQ is effected by rate of STI/UTI positivity.

e) Is the study an experiment or an observational study?

It is an observational study.

f) Is it appropriate to conclude causation in this case?

No it is not appropriate to conclude causation because of possible confounding variables.

print("Question 4")
## [1] "Question 4"

Question 4: 2.28 from our book

Can Dogs Smell Cancer? Scientists are working to train dogs to smell cancer, including early stage cancer that might not be detected with other means. In previous studies, dogs have been able to distinguish the smell of bladder cancer, lung cancer, and breast cancer. Now, it appears that a dog in Japan has been trained to smell bowel cancer. Researchers collected breath and stool samples from patients with bowel cancer as well as from healthy people. The dog was given five samples in each test, one from a patient with cancer and four from healthy volunteers. The dog correctly selected the cancer sample in 33 out of 36 breath tests and in 37 out of 38 stool tests.

a) The cases in this study are the individual tests. What are the variables?

The variables are cancer positivity and dog identifying a cancer-positive sample

b) Make a two-way table displaying the results of the study using R. Include the totals. Refer smoke example again. Your row names will be breath test, stool tests and total. You figure out the rest.

c) What proportion of the breath samples did the dog get correct? What proportion of the stool samples did the dog get correct?

The dog guessed correctly 33/36 times in breath tests and 37/38 times in stool samples.

d) Of all the tests the dog got correct, what proportion were stool tests?

The dog guessed correctly 37/74 times in stool tests in all samples.

print("Question 5")
## [1] "Question 5"

Question 5:

i) 2.76: Insect Weights Consider a dataset giving the adult weight of species of insects. Most species of insects weigh less than 5 grams, but there are a few species that weigh a great deal, including the largest insect known: the rare and endangered Giant Weta from New Zealand, which can weigh as much as 71 grams. Describe the shape of the distribution of weights of insects. Is it symmetric or skewed? If it is skewed, is it skewed to the left or skewed to the right? Which will be larger, the mean or the median?

The graph will be skewed to the right because the median is larger than the mean.

ii) 2.87: (Make up your own data set. Do not give the same answer as in the book!!)

Create a Dataset Give any set of five numbers satisfying the condition that:

a) The mean of the numbers is substantially less than the median.

(1, 5, 6, 8, 15)

b) The mean of the numbers is substantially more than the median.

(10, 15, 20, 25, 100)

c) The mean and the median are equal.

(10, 10, 10, 10, 10)

print("end")
## [1] "end"