Plots are a useful tool for seeing relationships in a dataset. To understand the relationship between Petal and Petal Width in the Fisher Dataset, we can create a plot using the “iris.csv” file First, read in the Fisher dataset with the following code:
iris = read.csv("iris.csv", stringsAsFactors = TRUE)
When the data is properly read in, we can create a plot with the following code:
plot(iris$Petal.Length, iris$Petal.Width, xlab = "Petal Length", ylab = "Petal Width", main = "Petal Length vs. Petal Width",
ylim = c(0,3), pch = 16)
The plot() command creates the plot we see above. The first command within the parenthesis pulls the data for the x axis and y axis. “xlab =” and “ylab =” names the axis. “main =” will give your plot a title. “ylim =” sets the boundaries for your plot. “pch =” is the command for using a different symbol.
While this plot is useful, there is more data to use to make more sense of the Fisher dataset!
We can create a plot for a specific species by creating a subset of the dataset with the “subset()” command:
setosa.sub = subset(iris, iris$Species == "setosa")
This command sorted the data by species and extracted information for all Species named “setosa”
We can create a plot for this subset of the setosa species using the following code:
plot(setosa.sub$Petal.Length, setosa.sub$Petal.Width, xlab = "Petal Length", ylab = "Petal Width", main = "Setosa- Petal Length vs. Petal Width",
ylim = c(0,1), pch = 16)
Notice how this plot is zoomed in to only show where data about the setosa species is located.
We can plot the data using unique symbology based on variables such as petal length, petal width, or species.
First, we will:
Remember the “pch =” command gives your values a symbol. We can use this commmand to give each species in our dataset its own symbol
The code for a plot with petal length by petal width with species coded as unique symbols looks like this:
plot(iris$Petal.Length, iris$Petal.Width, xlab = "Petal Length", ylab = "Petal Width", main = "Petal Length vs. Petal Width",
ylim = c(0,3), pch = as.numeric(iris$Species))
Now, each species has its own symbol because of the command “pch = as.numeric(iris$Species)”. This plot shows more data, however it is missing a legend to tell us which species corrsponds to each symbol. To create a legend, we use the folowing code:
plot(iris$Petal.Length, iris$Petal.Width, xlab = "Petal Length", ylab = "Petal Width", main = "Petal Length vs. Petal Width",
ylim = c(0,3), pch = as.numeric(iris$Species))
legend("topleft", legend = levels(iris$Species), pch = seq(1:3))
The first entry into “legend()” corresponds with the placement of the legend. The second inputs the labels for the lenged from the dataset. The third gives us the symbol used in the plot.
We can use a similar technique to create a plot with unique colors for the Fisher dataset. Remember that the “col =” command corresponds with the color used in the plot. To give each species its own color in the plot, we change the symbol to “pch = 16”, and “col =” to pull data from the Species column in the iris dataset:
plot(iris$Petal.Length, iris$Petal.Width, xlab = "Petal Length", ylab = "Petal Width", main = "Petal Length vs. Petal Width", col = iris$Species, pch = 16)
legend("topleft", legend = levels(iris$Species), pch = 16, col = 1:3)
Make sure you include a legend for each plot!
Finally, we combine these techniques to give each species its own color and symbol, while also scaling the symbols based on sepal width. The “cex =” command will scale symbols based on the data you tell it to pull from. The code to do so looks like this:
plot(iris$Petal.Length, iris$Petal.Width, xlab = "Petal Length", ylab = "Petal Width", main = "Petal Length vs. Petal Width",
ylim = c(0,3), pch = as.numeric(iris$Species), col = iris$Species, cex = iris$Sepal.Width)
legend("topleft", legend = levels(iris$Species), pch = seq(1:3), col = seq(1:3))
A smoothing line is a useful tool to show trends in data. To illustrate this, we can use the original plot we created with the code:
plot(iris$Petal.Length, iris$Petal.Width, xlab = "Petal Length", ylab = "Petal Width", main = "Petal Length vs. Petal Width",
ylim = c(0,3), pch = 16)
Next, to add a smoothing line, we use the command “scatter.smooth()” with the same code, which looks like this:
scatter.smooth(iris$Petal.Length, iris$Petal.Width, xlab = "Petal Length", ylab = "Petal Width", main = "Petal Length vs. Petal Width",
ylim = c(0,3), pch = 16, )