by: T.Ryan This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
When using scatter plots there is option to use geom_smooth to give visual ideal of the mean and direction the information is trending towards possibly. ANother option in quantile that allows you to show multiple levels of trending like the lower 25% or the top 5% of data. THis issue is you ahve to be careful as these lines can become misleading in its information.
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess'
Wiht this plot you see that mpg goes higher with the smaller hp and cylinders. Now let us use method “lm” on smooth to provide straight line
ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) +
geom_point()+
geom_smooth(method = "lm")
method = “lm” gives straight line.
ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) +
geom_point() +
geom_smooth(se = TRUE)
## `geom_smooth()` using method = 'loess'
one option is to set se to TRUE to give better idea of the spread of data. se default is set to TRUE
ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) +
geom_point()+
geom_quantile(quantiles = c(0.25,0.75))
## Loading required package: SparseM
##
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
##
## backsolve
## Smoothing formula not specified. Using: y ~ x
quantile allows to show levels of the data to the veiwer but still rigid could give false representation of data
q10 <- seq(0.05, .95, by = 0.05)
ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) +
geom_point()+
geom_quantile(quantiles = q10)
## Smoothing formula not specified. Using: y ~ x
## Warning in rq.fit.br(wx, wy, tau = tau, ...): Solution may be nonunique
use seq to set up how many lines you wish to show
q10 <- seq(0.20, 080, by = 0.2)
ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) +
geom_point()+
geom_quantile(method = "rqss",)
## Smoothing formula not specified. Using: y ~ qss(x, lambda = 1)
method = “rqss” allows one to show both levels of data but also the more closer represntation of the flow of data.
q10 <- seq(0.20, 080, by = 0.2)
ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) +
geom_point()+
geom_quantile(method = "rqss",lambda = 0.1)
## Smoothing formula not specified. Using: y ~ qss(x, lambda = 0.1)
gives more detail moving of data.
q10 <- seq(0.20, 080, by = 0.2)
ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) +
geom_point()+
geom_quantile(method = "rqss",lambda = 0.05)
## Smoothing formula not specified. Using: y ~ qss(x, lambda = 0.05)
smaller the number you go in lambda the specific the lines represent the flow of data shown. Unfortunetly this is not very smooth flow of the line.