2/6/2020

Summary

This is a reproducible presentation, part of the Week 4 of the Developing Data Products MOOC on Coursera.

  • Based on the the geyser dataset in the package MASS.
  • Study of waiting time between consecutive eruptions of a geyser by duration of the previous eruption.

Data description

Scatter

I plotted the waiting time between consecutive eruptions against duration of the previous eruption.

I noticed that there is a downwards trend in waiting time between eruptions as duration of the previous eruption increases.

There seem to be three clusters visible on the scatter plot, further reinforced by the 2D density lines.

Clusters

I used K-means (seed=1, centers=3, all other options left to default) to separate the data into 3 clusters based on wuaration and waiting time. These clusters are easily visible on the scatter plot.

I noticed that there is no noticeable trend within each of the clusters.