setwd("E:\\Box Sync\\Box Sync\\CUNY\\Summer Bridge 2017\\Week3 - R")
I got interested in this data beause I remember visiting Yellowstone and they told us that this Geyser is one the most consistent periodic erruption geysers out there. That’s why it’s named “Faithful”, I supposed. Anyway, it’s amazing to see the geyser erupt. I like to know if the time you have to wait is indeed very periodic as well as if the waiting time has anything to do eruption time.
Just in case you don’t know what Geyser is, here’s a picture:
“Old Faithful Geyser”“
geyser_DB = read.csv("https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/datasets/faithful.csv")
geyser_DB = within(geyser_DB,rm(X))
summary(geyser_DB)
## eruptions waiting
## Min. :1.600 Min. :43.0
## 1st Qu.:2.163 1st Qu.:58.0
## Median :4.000 Median :76.0
## Mean :3.488 Mean :70.9
## 3rd Qu.:4.454 3rd Qu.:82.0
## Max. :5.100 Max. :96.0
You can expect to wait about 76 mins and you will very likely be see the eruption for approximately 4 seconds. Now I know I was relatively lucky because I only waited about 5 mins.
Let’s just rename the variables to make sure we know the units
geyser_DB=setNames(geyser_DB, c("eruption_sec", "waiting_min"))
Let’s first take a look at scatterplot of waiting time vs eruption time
plot(
geyser_DB$waiting_min,
geyser_DB$eruption_sec,
main = "Waiting time vs Eruption time",
xlab = "Waiting Time - Mins",
ylab = "Eruption Time - Seconds"
)
This is showing that there is a positive correlation between waiting time and eruption time. Also, looks like if the waiting time passes 70mins or so, the eruption time doubles to more than 4 seconds!
Now let’s see what Box Plot will give us
boxplot(geyser_DB$eruption_sec)
title("Eruptions Time - Seconds")
boxplot(geyser_DB$waiting_min)
title("Waiting Time - Minutes")
You can expect to wait around 55 to 80 mins about 50% of the time. The max wait is 96 minutes (known from summary)! Okay, so the eruption will happen if you wait long enough. It’s indeed faithful.
Now let’s look at histogram: ###Histogram
hist(geyser_DB$waiting_min, main = "Histogram of Waiting Time", xlab = "Waiting Time in Minutes")
hist(geyser_DB$eruption_sec, main = "Histogram of Eruption Time", xlab = "Eruption Time in Seconds")
Either you are going to see 2 second eruption or 4 second eruption. Not much in between. This data is missing height of the eruption, which would have been really intersting. You are also likely to wait around 50mins or 80mins if you just missed eruption. I say, go to the bathroom between 65 to 70 min if you want to take a chance.
“Old Faithful Geyser” is indeed faithful in that it will erupt anywhere between 40 to 95mins, with it occuring most frequently around 50min mark and 80min mark. If you want to see it more “lengthy” eruption, then you should hope for 80min wait, which will likely result in 4 seconds eruption time. Height of the geyser would have really enhanced the analysis for the time needed to see more spectacular views.