STAT 410: Exam 1 Practice

Author

Jackson Bain

Preliminaries

Rename the Exam file to FirstNameExam1Practice.qmd. Save it and the data file “friday.csv” in your working directory.
Before changing anything below, verify that this Exam template will render as either a PDF or HTML.
Answer all questions below by adding text to this document. Code should be placed in the given code chunks. If you are unable to produce code that runs, turn a code chunk off by setting the chunk option eval = FALSE. Your code will still be visible for partial credit.
When time is up, submit your final qmd and rendered files on Canvas.
This exam is open notes. For the practice exam, you may use all your electronic files from the course and the R help menu. For the exam, you are limited to 10 sides of printed notes and the R help menu. No other online resources are permitted. Student who violate these rules are subject to the SDSU Academic Dishonesty Policy and will receive a 0 on the exam.

Exam questions

[2 pts] Write code to read in the data file “friday.csv” into a dataframe called friday13.

friday13 <- read.csv("friday.csv")

This data set addresses issues of how superstitions regarding Friday the 13th affect human behavior, and whether Friday the 13th is an unlucky day. Scanlon, et al. (BMJ) collected data on traffic and shopping patterns and accident frequency for Fridays the 6th and 13th between October of 1989 and November of 1992. The variables include:

Variable	Definition
type	traffic, shopping, or accident data
date	Year and month data were collected
sixth	count of events on Friday the 6th
thirteenth	count of events on Friday the 13th
location	time frame for traffic counts, shopping center for shopping

[2 pts] Add a column to the data frame that computes the difference in events on the sixth and the thirteenth (sixth minus thirteenth)

friday13wdiff <- friday13 |> mutate(difference =sixth - thirteenth)

[2 pts] Create another data frame called shop that contains only the rows with shopping data.

shop <- friday13wdiff |> 
  filter(type == "shopping")

[5 pts] Make a either a dotplot or boxplot of differences in shoppers colored or filled by location. Comment on what you notice about the data from your plot.

Most of the groups of locations are seemingly not normally distributed. Further, assessing the normality of the differences with this box plot makes it appear that there is a left skewness to the plot, meaning an educated guess is that the differences are not normally distributed about 0.

[5 pts] Assess the normality of the differences. Comment on your results.


    Shapiro-Wilk normality test

data:  friday13wdiff$difference
W = 0.58917, p-value = 8.492e-12

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The p-value of the Shapiro-Wilk normality test was 8.492 *10^-12, meaning the differences were not normally distributed. Observing a histogram or a density plot of the differences, there is a clear right skewness to the distribution.

[5 pts] Conduct a test to determine if the center of the distribution of difference in shopping is greater than zero. Give your conclusion (you do not need to write out all five steps)


    One Sample t-test

data:  shop$difference
t = -1.7822, df = 44, p-value = 0.9592
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
 -90.31719       Inf
sample estimates:
mean of x 
-46.48889

Due to a p-value of over 0.95, there was sufficient statistical evidence to conclude that the center of the distribution of difference in shopping between the 6th and 13th is greater than 0.

[4 pts] Returning back to the friday13 data set, looking only at the traffic rows, give mean and standard deviation by time period. Store this in a data frame called trafficsummary, then print this data frame. It should have the mean and standard deviation for 7 to 8 and the mean and standard deviation for 9 to 10.

 friday13onlytraffic <- friday13 |> 
   filter(type == "traffic") |> 
   mutate(difference = sixth - thirteenth)
 
 trafficsummary <- friday13onlytraffic |> 
   group_by(location) |> 
   summarize(meandiff = mean(difference), sddiff = sd(difference))
 
view(trafficsummary)

[2 pts] Referencing the friday13 data set, print the accident data in order from most negative to most positive diff. Your printout should include only the columns date and diff.

friday13wdiff |> 
  dplyr::select(date, difference) |> 
  arrange(difference)

               date difference
1   1992,  November       -774
2      1992,  March       -297
3       1990,  July       -266
4       1990,  July       -248
5       1990,  July       -244
6       1990,  July       -242
7  1991,  September       -227
8       1990,  July       -194
9      1992,  March       -169
10      1990,  July       -163
11      1990,  July       -146
12     1992,  March       -136
13  1992,  November       -123
14  1991,  December       -115
15     1992,  March        -97
16  1991,  December        -81
17  1992,  November        -55
18  1991,  December        -41
19  1992,  November        -34
20     1992,  March        -33
21  1992,  November        -26
22     1992,  March        -24
23  1991,  December        -11
24 1991,  September        -10
25  1992,  November         -7
26      1990,  July         -6
27   1989,  October         -4
28 1991,  September         -3
29     1992,  March         -3
30     1992,  March         -3
31 1991,  September         -3
32     1992,  March         -1
33  1991,  December          1
34  1991,  December          9
35  1992,  November         11
36 1991,  September         14
37 1991,  September         18
38  1991,  December         21
39     1992,  March         32
40 1991,  September         47
41      1990,  July         60
42  1992,  November         67
43  1992,  November         73
44  1991,  December        102
45  1992,  November        107
46  1991,  December        118
47 1991,  September        119
48 1991,  September        159
49 1991,  September        205
50  1991,  December        209
51      1990,  July        302
52  1992,  November        321
53      1990,  July        698
54 1991,  September       1037
55      1990,  July       1104
56  1992,  November       1839
57 1991,  September       1889
58  1991,  December       1911
59  1991,  December       2416
60     1992,  March       2761
61     1992,  March       4382