#install.packages("MASS")
library(MASS)
library(tidyverse)
data("oats")
?oats
1) What does each row of this dataset represent?
Each of the rows represent four concentrations of a nitrogen fertilizer.
2) What do the columns of this dataset represent?
Indicate whether each variable in the study is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal.
The columns represent the different oat varieties. The columns are categorical and are not ordinal. The rows are numerical and discrete.
4) Create a hypothesis about nitrogen fertilizer concentration levels, without first looking at the data.
The higher the concentration of nitrogen in the fertilizer it is very likely to have a high yield of oats.
Graphics and EDA
5) Plot 1: Use ggplot to create a side-by-side boxplot, which illustrates the yield distribution for each nitrogen fertilizer concentration level and allows for both visual comparison across and within treatments.
ggplot(oats, aes(N,Y))+
geom_boxplot()
6) Look at Plot 1. What are your observations from this plot? Does your hypothesis from part (4) appear to be supported?
From my observations of this plot it looks like the higher the Nitrogen treatment levels the higher the yield is. This actually proves my hypothesis. Pretty Neat!
7) Plot 2: Now use ggplot to create a side-by-side boxplot, which illustrates the yield distribution for each oat variety treatment. Let’s add some color! Fill the boxes with a different color for each variety.
ggplot(oats, aes(N,Y,fill = V))+
geom_boxplot()
8) Look at Plot 2. What are your observations from this plot? Do any of the varieties stand out as being the best producer? Explain.
While analyzing the graph I can say that majority of the Golden.rain was higher percent of yield over all. However, the Marvelous holds the best yield for higher Nitrogen treatment and largest yield.
9) Plot 3: Add facets to your plot from part (6) to compare yields across nitrogen fertilizer concentration levels and the three oat varieties.
ggplot(oats, aes(N,Y, fill = V))+
geom_boxplot()+
facet_grid(.~V)
10) Look at Plot 3. What are your observations from this plot? Do the trends and relationships you observed in parts (6) and (8) appear to be consistent? For instance, does that ordering of the varieties remain consistent across the levels of the fertilizer concentrations or does it change? What do you observe?
My observations are the same. It looks like Golden.rain is higher yield trend until .6 concentration of Nitrogen. Then I notices that Marvelous has a highest yield with the highest concentration of Nitrogen. Regardless of who has the largest yield at the highest concentration, it is an upward tread for all varieties when the Nitrogen levels are higher concentration.
Conclusion
11) What advice would you give the farmer after exploring these data?
I would advice the farmer that consistently Golden.rain has a higher yield for a range of concentration. However if the farmer does decided to have the highest concentration of nitrogen(.6) then Marvelous looks like it would be the best to use in that situation.