There are 10 questions and each question (part of a question) is worth 7.5 points each. When completed, knit the file to a .HTML and save the file as Test#1_LastName and submit the .HTML file to the Test #1 assignment link in Canvas.

Due Date: Thursday April 16, 2020 by 11:59p.m. EST.

  1. If you have data that is in case form format, and you want to use the xtabs() function to construct a crosstabulation of categorical variables, would there be a variable in front of the ‘~’ sign in the xtabs() function. State Yes or No. Explain your response.

Answer: No, we don’t need to add a variable in front of the ‘~’ sign. When the data have already been tabulated in a frquency form, we need to add a variable in front of the ‘~’ sign in the xtabs() function.

  1. This problem uses the DanishWelfare data frame that you used in Homework #1 (#2.4 on p. 61). The code below uses structable() to create a certain formatted table using the Danish Welfare data frame in the vcd library. Run the code below in the code chunk and examine the output that is produced. In the second code chunk, modify the code (still using structable()) so that marital status (Status) is on the columns instead of the rows.
library(vcd)
## Warning: package 'vcd' was built under R version 3.6.3
## Loading required package: grid
#run code below
data("DanishWelfare",package="vcd")

#creating a crosstabulation of alcohol consumption (Alcohol), location (Urban) and
#marital status(Status)
structable(Alcohol ~ Urban + Status, DanishWelfare)
##                         Alcohol <1 1-2 >2
## Urban         Status                     
## Copenhagen    Widow              4   4  4
##               Married            4   4  4
##               Unmarried          4   4  4
## SubCopenhagen Widow              4   4  4
##               Married            4   4  4
##               Unmarried          4   4  4
## LargeCity     Widow              4   4  4
##               Married            4   4  4
##               Unmarried          4   4  4
## City          Widow              4   4  4
##               Married            4   4  4
##               Unmarried          4   4  4
## Country       Widow              4   4  4
##               Married            4   4  4
##               Unmarried          4   4  4
#insert your modified code below
structable(Status~Urban+Alcohol, DanishWelfare)
##                       Status Widow Married Unmarried
## Urban         Alcohol                               
## Copenhagen    <1                 4       4         4
##               1-2                4       4         4
##               >2                 4       4         4
## SubCopenhagen <1                 4       4         4
##               1-2                4       4         4
##               >2                 4       4         4
## LargeCity     <1                 4       4         4
##               1-2                4       4         4
##               >2                 4       4         4
## City          <1                 4       4         4
##               1-2                4       4         4
##               >2                 4       4         4
## Country       <1                 4       4         4
##               1-2                4       4         4
##               >2                 4       4         4
  1. View the DanishWelfare data frame. Is the data frame in case or frequency form?

Answer: It is in frequency form.

  1. Using your answer from part b, if you were to apply xtabs() to the DanishWelfare data frame, what is the name of the variable that would be to the left of the ‘~’ sign in the xtabs() function? If there should be no variable name to the left of the ‘~’ sign, explain why this is the case.

Answer: ‘count’ can be put to the left of the ‘~’ sign in the xtabs() function.

  1. Describe a binomial experiment that would generate binomail data. That is, explain what the experiement is and make a case for how the experiment meets the three criteria for binomial data. Do not use an example from the book or one discussed in class. Specifically, do not use any flipping coins examples or dice examples. Come up with your own example! Think about some process/experiment that you deal with on a daily basis (work or personal). Place your response below.

Reminder: Three criteria for Binomial experiment (from our class notes): 1. n independent trials (state n and explain why trial are independent) 2. only one of two outcomes; “success” and “failure” (specify what is a “success” and what is a “failure”) 3. the probability of “success” stays the same from trial to trial (state p and why the probability stays the same from trial to trial)

Answer: An FBI survey shows that about 80% of all property crimes got unsolved. Suppose I randomly select 10 such crimes, I can find the probability that 5 of them are unsolved.

In this experiment, (1).there are 10 crimes in my town. n=10 independent trials.(2). There are only two outcomes: Yes or No. Yes: the crime is unsolved; No: the crime is solved. (3). The probability of ‘Yes’ stays the same from trial to trial and the probability of unsolved crimes is 80% from the FBI survey. (Data comes from stat.PSU.edu website).

  1. You are going to be working with a researcher assisting him in designing a survey for 100 subjects. One of the questions is ‘Were you vaccinated for the flu this year?’

Is this a binomial experiment? State Yes or No. If Yes, describe the three criteria that make this experiment Binomial. If No, state why this is not a Binomial experiment.

Answer: Yes, it is a binomial experiment. (a).It has n=100 independent trials. (b).It has two outcomes: Yes or No. (c).The probability of a person-P% is vaccined(Yes) can be calculated from the collected data.P% stays the same from trial to trial.

  1. A student answers 10 quiz questions completely at random; the first five are true/false, the second five are multiple choice, with four options each.

Is this a binomial experiment? State Yes or No. If Yes, describe the three criteria that make this experiment Binomial. If No, state why this is not a Binomial experiment.

Answer: No, it is not a binomial experiment. Some trials have four outcomes. The first five questions have two outcomes for each of them but the second five questions have four outcomes for each of them.

  1. Use the appropriate R function (must be one we discussed from class) and find the probability of 6 successes from a Bin(10,1/4) distribution.
dbinom(6,10,0.25)
## [1] 0.016222
  1. Use the appropriate R function (must be one we discussed in class) and find the probability of 5 or less successes from a Bin(10,1/4) distribution.
pbinom(5,10,0.25)
## [1] 0.9802723
  1. You are a healthcare analyst working for a hospital. You are interested in the number of patient discharges every day. X=# patients discharged in one day. You know the mean number of patients discharged in 4.0.
  1. What is the probability that more than 6 patients will be discharged in one day?

Use the appropriate R function (must be one we discussed in class) to find the probability.

#P(X>6)
meanofdischarge=4.0
ppois(6,meanofdischarge,lower.tail = FALSE)
## [1] 0.110674
  1. What is the probability that 6 or more patients will be discharged in one day?

Use the appropriate R function (must be one we discussed in class) to find the probability.

#P(X>5)
ppois(5,meanofdischarge,lower.tail = FALSE)
## [1] 0.2148696
  1. Using the same scenario in #8, find the probability that exactly 6 patients are discharged in a day.

Use the appropriate R function (must be one we discussed in class) to find the probability.

#P(X=6)
dpois(6,meanofdischarge)
## [1] 0.1041956
  1. Use the data for the London Cycling Deaths (introduced in Example 3.6 on p.p. 71-72), create a hanging rootgram. What is this graph telling you?
data("CyclingDeaths", package="vcdExtra")
CyclingDeaths
##           date deaths
## 1   2005-01-01      1
## 2   2005-01-15      0
## 3   2005-01-29      0
## 4   2005-02-12      0
## 5   2005-02-26      1
## 6   2005-03-12      1
## 7   2005-03-26      1
## 8   2005-04-09      0
## 9   2005-04-23      2
## 10  2005-05-07      0
## 11  2005-05-21      1
## 12  2005-06-04      0
## 13  2005-06-18      3
## 14  2005-07-02      1
## 15  2005-07-16      1
## 16  2005-07-30      2
## 17  2005-08-13      0
## 18  2005-08-27      0
## 19  2005-09-10      1
## 20  2005-09-24      0
## 21  2005-10-08      0
## 22  2005-10-22      0
## 23  2005-11-05      3
## 24  2005-11-19      0
## 25  2005-12-03      2
## 26  2005-12-17      0
## 27  2005-12-31      1
## 28  2006-01-14      0
## 29  2006-01-28      1
## 30  2006-02-11      0
## 31  2006-02-25      0
## 32  2006-03-11      1
## 33  2006-03-25      1
## 34  2006-04-08      0
## 35  2006-04-22      1
## 36  2006-05-06      0
## 37  2006-05-20      1
## 38  2006-06-03      1
## 39  2006-06-17      0
## 40  2006-07-01      0
## 41  2006-07-15      3
## 42  2006-07-29      0
## 43  2006-08-12      0
## 44  2006-08-26      3
## 45  2006-09-09      1
## 46  2006-09-23      1
## 47  2006-10-07      1
## 48  2006-10-21      0
## 49  2006-11-04      1
## 50  2006-11-18      1
## 51  2006-12-02      0
## 52  2006-12-16      1
## 53  2006-12-30      1
## 54  2007-01-13      0
## 55  2007-01-27      0
## 56  2007-02-10      0
## 57  2007-02-24      3
## 58  2007-03-10      0
## 59  2007-03-24      1
## 60  2007-04-07      1
## 61  2007-04-21      1
## 62  2007-05-05      0
## 63  2007-05-19      0
## 64  2007-06-02      0
## 65  2007-06-16      1
## 66  2007-06-30      0
## 67  2007-07-14      0
## 68  2007-07-28      0
## 69  2007-08-11      0
## 70  2007-08-25      2
## 71  2007-09-08      0
## 72  2007-09-22      0
## 73  2007-10-06      1
## 74  2007-10-20      0
## 75  2007-11-03      0
## 76  2007-11-17      0
## 77  2007-12-01      2
## 78  2007-12-15      1
## 79  2007-12-29      0
## 80  2008-01-12      1
## 81  2008-01-26      0
## 82  2008-02-09      1
## 83  2008-02-23      0
## 84  2008-03-08      1
## 85  2008-03-22      0
## 86  2008-04-05      1
## 87  2008-04-19      1
## 88  2008-05-03      0
## 89  2008-05-17      0
## 90  2008-05-31      0
## 91  2008-06-14      1
## 92  2008-06-28      0
## 93  2008-07-12      0
## 94  2008-07-26      1
## 95  2008-08-09      0
## 96  2008-08-23      0
## 97  2008-09-06      1
## 98  2008-09-20      1
## 99  2008-10-04      0
## 100 2008-10-18      1
## 101 2008-11-01      0
## 102 2008-11-15      2
## 103 2008-11-29      0
## 104 2008-12-13      2
## 105 2008-12-27      0
## 106 2009-01-10      1
## 107 2009-01-24      1
## 108 2009-02-07      0
## 109 2009-02-21      0
## 110 2009-03-07      0
## 111 2009-03-21      0
## 112 2009-04-04      2
## 113 2009-04-18      0
## 114 2009-05-02      1
## 115 2009-05-16      1
## 116 2009-05-30      0
## 117 2009-06-13      0
## 118 2009-06-27      2
## 119 2009-07-11      0
## 120 2009-07-25      0
## 121 2009-08-08      0
## 122 2009-08-22      0
## 123 2009-09-05      1
## 124 2009-09-19      0
## 125 2009-10-03      0
## 126 2009-10-17      1
## 127 2009-10-31      1
## 128 2009-11-14      0
## 129 2009-11-28      1
## 130 2009-12-12      0
## 131 2009-12-26      1
## 132 2010-01-09      0
## 133 2010-01-23      1
## 134 2010-02-06      1
## 135 2010-02-20      0
## 136 2010-03-06      2
## 137 2010-03-20      0
## 138 2010-04-03      1
## 139 2010-04-17      1
## 140 2010-05-01      0
## 141 2010-05-15      1
## 142 2010-05-29      0
## 143 2010-06-12      0
## 144 2010-06-26      0
## 145 2010-07-10      1
## 146 2010-07-24      1
## 147 2010-08-07      0
## 148 2010-08-21      0
## 149 2010-09-04      0
## 150 2010-09-18      0
## 151 2010-10-02      0
## 152 2010-10-16      0
## 153 2010-10-30      0
## 154 2010-11-13      0
## 155 2010-11-27      0
## 156 2010-12-11      0
## 157 2010-12-25      1
## 158 2011-01-08      0
## 159 2011-01-22      1
## 160 2011-02-05      0
## 161 2011-02-19      0
## 162 2011-03-05      1
## 163 2011-03-19      1
## 164 2011-04-02      1
## 165 2011-04-16      2
## 166 2011-04-30      0
## 167 2011-05-14      1
## 168 2011-05-28      1
## 169 2011-06-11      1
## 170 2011-06-25      0
## 171 2011-07-09      0
## 172 2011-07-23      2
## 173 2011-08-06      0
## 174 2011-08-20      0
## 175 2011-09-03      0
## 176 2011-09-17      0
## 177 2011-10-01      1
## 178 2011-10-15      1
## 179 2011-10-29      1
## 180 2011-11-12      0
## 181 2011-11-26      1
## 182 2011-12-10      0
## 183 2011-12-24      1
## 184 2012-01-07      0
## 185 2012-01-21      0
## 186 2012-02-04      0
## 187 2012-02-18      0
## 188 2012-03-03      1
## 189 2012-03-17      2
## 190 2012-03-31      0
## 191 2012-04-14      0
## 192 2012-04-28      1
## 193 2012-05-12      0
## 194 2012-05-26      0
## 195 2012-06-09      0
## 196 2012-06-23      2
## 197 2012-07-07      1
## 198 2012-07-21      1
## 199 2012-08-04      0
## 200 2012-08-18      0
## 201 2012-09-01      0
## 202 2012-09-15      0
## 203 2012-09-29      0
## 204 2012-10-13      1
## 205 2012-10-27      1
## 206 2012-11-10      1
## 207 2012-11-24      1
## 208 2012-12-08      0
a=table(CyclingDeaths$deaths)
a
## 
##   0   1   2   3 
## 114  75  14   5
b=goodfit(a)
summary(b)
## 
##   Goodness-of-fit test for poisson distribution
## 
##                       X^2 df  P(> X^2)
## Likelihood Ratio 4.151738  2 0.1254474
plot(b, type="hanging", shade=TRUE)

Type interpretation of this graph below (what is this graph telling you?):

The observed data follows the Poisson Distribution. The red line is the fitted counts. Bars for 1 death and 3 deaths are below zero horizontal line which means the expected counts are less than the observed counts, model is underfitting. Bars for 0 death and 2 deaths are above the zero horizontal line which means the expected counts are greater than the observed counts, model is overfitting.