by: T.Ryan This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

When using scatter plots there is option to use geom_smooth to give visual ideal of the mean and direction the information is trending towards possibly. ANother option in quantile that allows you to show multiple levels of trending like the lower 25% or the top 5% of data. THis issue is you ahve to be careful as these lines can become misleading in its information.

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess'

Wiht this plot you see that mpg goes higher with the smaller hp and cylinders. Now let us use method “lm” on smooth to provide straight line

ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) +
  geom_point()+
  geom_smooth(method = "lm")

method = “lm” gives straight line.

 ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) +
  geom_point() +
  geom_smooth(se = TRUE)
## `geom_smooth()` using method = 'loess'

one option is to set se to TRUE to give better idea of the spread of data. se default is set to TRUE

ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) + 
  geom_point()+
  geom_quantile(quantiles = c(0.25,0.75))
## Loading required package: SparseM
## 
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
## 
##     backsolve
## Smoothing formula not specified. Using: y ~ x

quantile allows to show levels of the data to the veiwer but still rigid could give false representation of data

q10 <- seq(0.05, .95, by = 0.05)
 ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) + 
  geom_point()+
  geom_quantile(quantiles = q10)
## Smoothing formula not specified. Using: y ~ x
## Warning in rq.fit.br(wx, wy, tau = tau, ...): Solution may be nonunique

use seq to set up how many lines you wish to show

q10 <- seq(0.20, 080, by = 0.2)
 ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) + 
  geom_point()+
  geom_quantile(method = "rqss",)
## Smoothing formula not specified. Using: y ~ qss(x, lambda = 1)

method = “rqss” allows one to show both levels of data but also the more closer represntation of the flow of data.

q10 <- seq(0.20, 080, by = 0.2)
 ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) + 
  geom_point()+
  geom_quantile(method = "rqss",lambda = 0.1)
## Smoothing formula not specified. Using: y ~ qss(x, lambda = 0.1)

gives more detail moving of data.

q10 <- seq(0.20, 080, by = 0.2)
 ggplot(mtcars, aes(x = mpg, y = hp, color = cyl, size = cyl)) + 
  geom_point()+
  geom_quantile(method = "rqss",lambda = 0.05)
## Smoothing formula not specified. Using: y ~ qss(x, lambda = 0.05)

smaller the number you go in lambda the specific the lines represent the flow of data shown. Unfortunetly this is not very smooth flow of the line.