March 25, 2020

##Library Packages in Use and Data Frame Read-in

Dataset cereal

cerealAll <- read.csv("cereal.csv", sep=",", header=TRUE)
cerealDisplay <- cerealAll %>% select(-name, -type, -weight, -cups)
cerealDisplay <- cerealDisplay%>% select(-vitamins, -shelf)
head(cerealDisplay)
  mfr calories protein fat sodium fiber carbo sugars potass   rating
1   N       70       4   1    130  10.0   5.0      6    280 68.40297
2   Q      120       3   5     15   2.0   8.0      8    135 33.98368
3   K       70       4   1    260   9.0   7.0      5    320 59.42551
4   K       50       4   0    140  14.0   8.0      0    330 93.70491
5   R      110       2   2    200   1.0  14.0      8     -1 34.38484
6   G      110       2   2    180   1.5  10.5     10     70 29.50954

3D Plot of Nutrition Values

By clicking on different colors on the legend, one may discover interesting correlations between certain colors and between the variables. Play around with turning on and off certain colors!

Code of the Previous Slide

mfrColors <- c('#FF0000','#00FF00','#0000FF','#FFFF00','#00FFFF',
           '#FF00FF','#000000')


fig <- plot_ly(cerealAll, x = ~sugars, y = ~potass, z=~carbo,
           color = ~mfr, colors = mfrColors)
fig <- fig %>% add_markers()
fig <- fig %>% layout(scene = list(xaxis = list(title = 'Sugars'),
                     yaxis = list(title = 'Potassium'),
                     zaxis = list(title = 'Carbohydrates')))

fig

Carbs VS Potassium (Nebisco)

\(\text{Sugar} = \beta_0 + \beta_1\cdot \text{Carbs}\)

Carbs VS Potassium (Kellogs)

\(\text{Potassium} = \beta_0 + \beta_1\cdot \text{Carbs}\)

Code of the Previous Slide

cereal1 <- cerealAll %>% filter(mfr == 'G')
cereal2 <- cerealAll %>% filter(mfr == 'K')

ggplot(cereal1, aes(x=carbo, y=sugars)) +
    geom_point(shape=16, color="darkgreen") +
    geom_smooth(method=lm, color="red", formula=y~x) +
    labs(x="Carbohydrates", y="Sugar")
ggplot(cereal2, aes(x=carbo, y=potass)) +
    geom_point(shape=18, color="blue") +
    geom_smooth(method=lm, color="orange", formula=y~x) +
    labs(x="Carbohydrates", y="Potassium")

Analysis of the Data

We may see from the previous two examples that the data was not very interesting in terms of having strong correlations although from the 3D plot you may angle the view to percieve strong correlation. Need not to fear though! Looking at more variables may lead to something interesting.

Calories, Sugars, Ratings Scatter Space

Using again a 3D plot, maybe we can find more interesting data that will intrigue us with correlations.

Sugar VS Calories (Kellogs)

\(\text{Calories} = \beta_0 + \beta_1\cdot \text{Sugar}\)

Calories VS Ratings (Kellogs)

\(\text{Calories} = \beta_0 + \beta_1\cdot \text{Rating}\)

Data Analysis 2

As we may have seen, by observing different variables we were able to find to variables that correlated more strongly with one another than we had before. While the previous data still isn’t super interesting. We will try to more scatter plots, this time including all brands versus isolating just one particular brand (Kellogs had the most data to give).

Sugar VS Calories (All)

\(\text{Calories} = \beta_0 + \beta_1\cdot \text{Sugar}\)

Calories VS Ratings (All)

\(\text{Calories} = \beta_0 + \beta_1\cdot \text{Rating}\)

## Sugar VS Ratings (All){.bigger}

\(\text{Rating} = \beta_0 + \beta_1\cdot \text{Sugar}\)

Credits

Credits go to:

  • Kaggle for the data Set
  • ggplot2 for their good documentation
  • plotly for their good documentation
  • pandocs for converting all the different languages well