Homework 1: Due September 3, 2023 by 11:59 pm
#Change the author above to your name. Type your answers as comments.
To make a comment, start the line with a number sign. To insert an R
chunk, go to the green +C to the left of Run (top right) and click the
R. To run the R chunk click the green arrow on the right top of the
chunk. You may discuss your ideas with classmates, but please write your
own answers and R code. When you are done, Knit to a pdf and upload in
the homework box. Each question is worth 4 points for 20 points total.
SAVE YOUR R FILES if you are working on the school server!!
Question 1: 1.20 from our book
How Fast Do Homing Pigeons Go? Homing pigeons have an amazing
ability to find their way home over extremely long distances. How fast
do they go on these trips? In the 2019 Midwest Classic, held in Topeka,
Kansas, the fastest bird went 1676 YPM (yards per minute), which is
about 56 miles per hour. The top seven finishers included three female
pigeons (Hens) and four male pigeons (Cocks). Their speeds, in YPM, are
given in Table below.
#a) How many cases are there in this dataset? How many variables are
there and what are they? Is each variable categorical or quantitative? #
There are 7 cases, 2 variables: gender and velocity. Gender is
qualitative and velocity is quanitative.
b) Use R to display the information as a dataset with cases as rows
and variables as columns. See the smoke example. You may not have row
names or just pigeon1, pigeon2,
print("Pigeon Speeds")
## [1] "Pigeon Speeds"
pigeonSpeeds <- matrix(c(1676,1452,1499,1458,1435,1418,1413),ncol=2,nrow=7,byrow=TRUE)
colnames(pigeonSpeeds) <- c("Gender","Speed")
rownames(pigeonSpeeds) <- c("pigeon1","pigeon2","pigeon3","pigeon4","pigeon5","pigeon6","pigeon7")
pigeon <- as.table(pigeonSpeeds)
pigeon
## Gender Speed
## pigeon1 1676 1452
## pigeon2 1499 1458
## pigeon3 1435 1418
## pigeon4 1413 1676
## pigeon5 1452 1499
## pigeon6 1458 1435
## pigeon7 1418 1413
henTimes <- c(1676, 1452, 1499)
cockTimes <- c(1458, 1435, 1418, 1413)
print("Original Table")
## [1] "Original Table"
cat("Hen ", henTimes,"\n")
## Hen 1676 1452 1499
cat("Cock", cockTimes, "\n\n")
## Cock 1458 1435 1418 1413
print("Example of Table Using a Matrix")
## [1] "Example of Table Using a Matrix"
smoke <- matrix(c(51,43,22,92,28,21,68,22,9),ncol=3,byrow=TRUE)
colnames(smoke) <- c("High","Low","Middle")
rownames(smoke) <- c("current","former","never")
smoke <- as.table(smoke)
smoke
## High Low Middle
## current 51 43 22
## former 92 28 21
## never 68 22 9
Question 2: 1.62 from our book
Pennsylvania High School Seniors describes a dataset, stored in
PASeniors, for a sample of students who filled out a survey though the
US Census at School project. When downloading the sample we specified
Pennsylvania as the state and Grade 12 as the school year, then the
website chose a random sample of 457 students from among all students
who matched those criteria. We’d like to generalize results from this
sample to a larger population. Discuss whether this would be reasonable
for each of the groups listed below.
a) The 457 students in the original sample.
The results would be highly reasonable to infer on this sample
because they are the source of the survey.
b) All Pennsylvania high school seniors who participated in the
Census at School survey.
#It would be fairly reasonable to assume that the results from the
Census at School could be inferred onto students in grades 9-11. # c)
All Pennsylvania high school seniors. #It would be fair to assume the
results from the Census at School could be inferred for all Pennsylvania
high school students, however, there would be more deviation from the
original 457 students polled. # d) All students in the United States who
participated in the Census at School survey. #It is unreasonable to
infer a sample size of 457 onto a very large population such as all
students in the US who took the poll
Question 3: 1.108 from our book
Infections Can Lower IQ A headline in June 2015 proclaims
“Infections can lower IQ.” The headline is based on a study in which
scientists gave an IQ test to Danish men at age 19. They also analyzed
the hospital records of the men and found that 35% of them had been in a
hospital with an infection such as an STI or a urinary tract infection.
The average IQ score was lower for the men who had an infection than for
the men who hadn’t.
a) What are the cases in this study?
19 Year old Danish men are the cases.
b) What is the explanatory variable? Is it categorical or
quantitative?
The explanatory variable is testing positive for a STI or UTI and it
is a categorical variable.
c) What is the response variable? Is it categorical or
quantitative?
The response variable is IQ among men who test positive
d) Does the headline imply causation?
The headline implies causation because IQ is effected by rate of
STI/UTI positivity.
e) Is the study an experiment or an observational study?
It is an observational study.
f) Is it appropriate to conclude causation in this case?
No it is not appropriate to conclude causation because of possible
confounding variables.
print("Question 4")
## [1] "Question 4"
Question 4: 2.28 from our book
Can Dogs Smell Cancer? Scientists are working to train dogs to smell
cancer, including early stage cancer that might not be detected with
other means. In previous studies, dogs have been able to distinguish the
smell of bladder cancer, lung cancer, and breast cancer. Now, it appears
that a dog in Japan has been trained to smell bowel cancer. Researchers
collected breath and stool samples from patients with bowel cancer as
well as from healthy people. The dog was given five samples in each
test, one from a patient with cancer and four from healthy volunteers.
The dog correctly selected the cancer sample in 33 out of 36 breath
tests and in 37 out of 38 stool tests.
a) The cases in this study are the individual tests. What are the
variables?
The variables are cancer positivity and dog identifying a
cancer-positive sample
b) Make a two-way table displaying the results of the study using R.
Include the totals. Refer smoke example again. Your row names will be
breath test, stool tests and total. You figure out the rest.
c) What proportion of the breath samples did the dog get correct?
What proportion of the stool samples did the dog get correct?
The dog guessed correctly 33/36 times in breath tests and 37/38
times in stool samples.
d) Of all the tests the dog got correct, what proportion were stool
tests?
i) 2.76: Insect Weights Consider a dataset giving the adult weight
of species of insects. Most species of insects weigh less than 5 grams,
but there are a few species that weigh a great deal, including the
largest insect known: the rare and endangered Giant Weta from New
Zealand, which can weigh as much as 71 grams. Describe the shape of the
distribution of weights of insects. Is it symmetric or skewed? If it is
skewed, is it skewed to the left or skewed to the right? Which will be
larger, the mean or the median?
The graph will be skewed to the right because the median is larger
than the mean.
ii) 2.87: (Make up your own data set. Do not give the same answer as
in the book!!)
Create a Dataset Give any set of five numbers satisfying the
condition that:
a) The mean of the numbers is substantially less than the
median.
(1, 5, 6, 8, 15)
b) The mean of the numbers is substantially more than the
median.
(10, 15, 20, 25, 100)
c) The mean and the median are equal.
(10, 10, 10, 10, 10)
print("end")
## [1] "end"