Welcome to RMarkdown!
data("discoveries")
What years of data do we have information on?
discoveries
## Time Series:
## Start = 1860
## End = 1959
## Frequency = 1
## [1] 5 3 0 2 0 3 2 3 6 1 2 1 2 1 3 3 3 5 2 4 4 0 2 3 7
## [26] 12 3 10 9 2 3 7 7 2 3 3 6 2 4 3 5 2 2 4 0 4 2 5 2 3
## [51] 3 6 5 8 3 6 6 0 5 2 2 2 6 3 4 4 2 2 4 7 5 3 3 0 2
## [76] 2 2 1 3 4 2 2 1 1 1 2 1 4 4 3 2 1 4 1 1 1 0 0 2 0
We have data information on the years 1860-1959
Based on the data, how many great discoveries were there in 1863?
I can’t tell the answer for this one yet.
What is another question we could explore using this data set?
“Based on the data, how many great discoveries were there between the years 1860 and 1880?”
max(discoveries)
## [1] 12
What was the largest number of great discoveries recorded by the Wolrd Almanac and Book of Facts?
12
Was the largest number of great discoveries made in the 19th century (before 1900) or in the 20th century (1900 or later)?
plot(discoveries)
Before
Suppose you are asked to describe what information this graph is showing. How would you reply? Note: There is no one right answer, but a good way to start is to comment on what information is on the x axis and what information is on the y axis, and then comment on any general story you think can be told based on the graph
This graph is showing the influx of great discoveries from the years 1860-1960.
What is the average number of great discoveries recorded in this data set?
mean(discoveries)
## [1] 3.1
3.1
What is the smallest number of great discoveries recorded in this data set?
min(discoveries)
## [1] 0
0
data("faithful")
How many rows are in this data set? How many columns?
2 columns, 2 rows
How many numeric variables do we have in the :faithful: data set? How many categorical variables?
4 numeric, 5 categorical
faithful$waiting
## [1] 79 54 74 62 85 55 88 85 51 85 54 84 78 47 83 52 62 84 52 79 51 47 78 69 74
## [26] 83 55 76 78 79 73 77 66 80 74 52 48 80 59 90 80 58 84 58 73 83 64 53 82 59
## [51] 75 90 54 80 54 83 71 64 77 81 59 84 48 82 60 92 78 78 65 73 82 56 79 71 62
## [76] 76 60 78 76 83 75 82 70 65 73 88 76 80 48 86 60 90 50 78 63 72 84 75 51 82
## [101] 62 88 49 83 81 47 84 52 86 81 75 59 89 79 59 81 50 85 59 87 53 69 77 56 88
## [126] 81 45 82 55 90 45 83 56 89 46 82 51 86 53 79 81 60 82 77 76 59 80 49 96 53
## [151] 77 77 65 81 71 70 81 93 53 89 45 86 58 78 66 76 63 88 52 93 49 57 77 68 81
## [176] 81 73 50 85 74 55 77 83 83 51 78 84 46 83 55 81 57 76 84 77 81 87 77 51 78
## [201] 60 82 91 53 78 46 77 84 49 83 71 80 49 75 64 76 53 94 55 76 50 82 54 75 78
## [226] 79 78 78 70 79 70 54 86 50 90 54 54 77 79 64 75 47 86 63 85 82 57 82 67 74
## [251] 54 83 73 73 88 80 71 83 56 79 78 84 58 83 43 60 75 81 46 90 46 74
What command would access just the column in the data set that tells us about the number of eruptions?
faithful$eruptions
##Question 12
What command would tell us the largest number of eruptions in this data set
max(faithful)
If you had to wait 80 minutes in between Old Faithful eruption cycles, how many times do you expect the geyser to erupt during its eruption cycle?
plot(x = faithful$waiting, y = faithful$eruptions, xlab= "The X Axis Label", ylab= "The Y Axis Label", col = "blue")
3-3.5 times
By adapting the code above, create a scatter plot where (1) the x axis is labeled Waiting Time in Minutes, (2) the y axis is labelled Number of Eruptions and (3) the color of the dots are red
plot(x = faithful$waiting, y = faithful$eruptions, xlab= "Waiting Time in Minutes", ylab= "Number of Eruptions", col = "red")
An individual who is visiting the park wants to know if your work so far indicates that longer waiting times are associated with more eruptions during the eruption cycle. Based on the plot, does it look like this is the case for these 272 eruptions? Explain your reasoning.
Yes, because as you can see from the scatterplot, the plot is most dense with eruptions from the 70-90 minute mark opposed to the 0-70 minute mark.