R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

require(tidyverse)
## Loading required package: tidyverse
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.1     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
roc <- iris %>% 
  gather(Metric, Value, -Species) %>% 
  mutate(Positive = Species == "virginica") %>% 
  group_by(Metric, Value) %>% 
  summarise(Positive = sum(Positive), 
            Negative = n() - sum(Positive)) %>% 
  arrange(-Value) %>% 
  mutate(TPR = cumsum(Positive) / sum(Positive), 
         FPR = cumsum(Negative) / sum(Negative))
## `summarise()` has grouped output by 'Metric'. You can override using the `.groups` argument.
View(roc)
roc %>% 
  group_by(Metric) %>% 
  summarise(AUC = sum(diff(FPR) + na.omit(lead(TPR) + TPR)) / 2)
## # A tibble: 4 x 2
##   Metric         AUC
##   <chr>        <dbl>
## 1 Petal.Length 32.3 
## 2 Petal.Width  17.2 
## 3 Sepal.Length 23.7 
## 4 Sepal.Width   9.74
roc %>% 
  ggplot(aes( x= FPR, y = TPR, color = Metric)) +
  geom_line() +
  geom_abline(lty = 2) +
  xlab("False positive rate (1-specificity)") + 
  ylab("True positive rate (sensitivity)") +
  ggtitle("ROC at predicting Virginica iris species")