<- read.csv("friday.csv") friday13
STAT 410: Exam 1 Practice
Preliminaries
- Rename the Exam file to FirstNameExam1Practice.qmd. Save it and the data file “friday.csv” in your working directory.
- Before changing anything below, verify that this Exam template will render as either a PDF or HTML.
- Answer all questions below by adding text to this document. Code should be placed in the given code chunks. If you are unable to produce code that runs, turn a code chunk off by setting the chunk option
eval = FALSE
. Your code will still be visible for partial credit. - When time is up, submit your final qmd and rendered files on Canvas.
- This exam is open notes. For the practice exam, you may use all your electronic files from the course and the R help menu. For the exam, you are limited to 10 sides of printed notes and the R help menu. No other online resources are permitted. Student who violate these rules are subject to the SDSU Academic Dishonesty Policy and will receive a 0 on the exam.
Exam questions
- [2 pts] Write code to read in the data file “friday.csv” into a dataframe called
friday13
.
This data set addresses issues of how superstitions regarding Friday the 13th affect human behavior, and whether Friday the 13th is an unlucky day. Scanlon, et al. (BMJ) collected data on traffic and shopping patterns and accident frequency for Fridays the 6th and 13th between October of 1989 and November of 1992. The variables include:
Variable | Definition |
---|---|
type | traffic, shopping, or accident data |
date | Year and month data were collected |
sixth | count of events on Friday the 6th |
thirteenth | count of events on Friday the 13th |
location | time frame for traffic counts, shopping center for shopping |
- [2 pts] Add a column to the data frame that computes the difference in events on the sixth and the thirteenth (sixth minus thirteenth)
<- friday13 |> mutate(difference =sixth - thirteenth) friday13wdiff
- [2 pts] Create another data frame called shop that contains only the rows with shopping data.
<- friday13wdiff |>
shop filter(type == "shopping")
- [5 pts] Make a either a dotplot or boxplot of differences in shoppers colored or filled by location. Comment on what you notice about the data from your plot.
Most of the groups of locations are seemingly not normally distributed. Further, assessing the normality of the differences with this box plot makes it appear that there is a left skewness to the plot, meaning an educated guess is that the differences are not normally distributed about 0.
- [5 pts] Assess the normality of the differences. Comment on your results.
Shapiro-Wilk normality test
data: friday13wdiff$difference
W = 0.58917, p-value = 8.492e-12
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The p-value of the Shapiro-Wilk normality test was 8.492 *10^-12, meaning the differences were not normally distributed. Observing a histogram or a density plot of the differences, there is a clear right skewness to the distribution.
- [5 pts] Conduct a test to determine if the center of the distribution of difference in shopping is greater than zero. Give your conclusion (you do not need to write out all five steps)
One Sample t-test
data: shop$difference
t = -1.7822, df = 44, p-value = 0.9592
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
-90.31719 Inf
sample estimates:
mean of x
-46.48889
Due to a p-value of over 0.95, there was sufficient statistical evidence to conclude that the center of the distribution of difference in shopping between the 6th and 13th is greater than 0.
- [4 pts] Returning back to the friday13 data set, looking only at the traffic rows, give mean and standard deviation by time period. Store this in a data frame called trafficsummary, then print this data frame. It should have the mean and standard deviation for 7 to 8 and the mean and standard deviation for 9 to 10.
<- friday13 |>
friday13onlytraffic filter(type == "traffic") |>
mutate(difference = sixth - thirteenth)
<- friday13onlytraffic |>
trafficsummary group_by(location) |>
summarize(meandiff = mean(difference), sddiff = sd(difference))
view(trafficsummary)
- [2 pts] Referencing the friday13 data set, print the accident data in order from most negative to most positive diff. Your printout should include only the columns date and diff.
|>
friday13wdiff ::select(date, difference) |>
dplyrarrange(difference)
date difference
1 1992, November -774
2 1992, March -297
3 1990, July -266
4 1990, July -248
5 1990, July -244
6 1990, July -242
7 1991, September -227
8 1990, July -194
9 1992, March -169
10 1990, July -163
11 1990, July -146
12 1992, March -136
13 1992, November -123
14 1991, December -115
15 1992, March -97
16 1991, December -81
17 1992, November -55
18 1991, December -41
19 1992, November -34
20 1992, March -33
21 1992, November -26
22 1992, March -24
23 1991, December -11
24 1991, September -10
25 1992, November -7
26 1990, July -6
27 1989, October -4
28 1991, September -3
29 1992, March -3
30 1992, March -3
31 1991, September -3
32 1992, March -1
33 1991, December 1
34 1991, December 9
35 1992, November 11
36 1991, September 14
37 1991, September 18
38 1991, December 21
39 1992, March 32
40 1991, September 47
41 1990, July 60
42 1992, November 67
43 1992, November 73
44 1991, December 102
45 1992, November 107
46 1991, December 118
47 1991, September 119
48 1991, September 159
49 1991, September 205
50 1991, December 209
51 1990, July 302
52 1992, November 321
53 1990, July 698
54 1991, September 1037
55 1990, July 1104
56 1992, November 1839
57 1991, September 1889
58 1991, December 1911
59 1991, December 2416
60 1992, March 2761
61 1992, March 4382