2018-05-28

Index

  1. Introduction
  2. Overview
  3. Using the package
  4. Related work
  5. Summary and outlook

Why conditional visualisation

When a model involves more than two predictors, there is no direct way to visualize the model behavior.

What is conditional visualisation

R package: condvis

  • Package allows interactively visualising sections through data space, showing fitted models and observed data

  • Built on base R graphics, with option to use Shiny

  • Works for models: lm, glm, gam, svm, randomForest, …

Example: FEV data

  • FEV data: relating lung health and smoking in children (Kahn,2005)
  • Used for teaching students about conditional relationships
  • Response is forced expiratory volume (FEV, 用力呼氣量), a proxy for lung health
  • Predictors are
age fev height male smoke
9 1.708 57.0 0 0
8 1.724 67.5 0 0
7 1.720 54.5 0 0
9 1.558 53.0 1 0
9 1.895 57.0 1 0
8 2.336 61.0 0 0

boxplot

Condvis:Ceplot

m1 <- svm(fev ~ gender + smoke + age + height, data = fev)
ceplot(data = fev, model = m1, sectionvars = "smoke", type = "separate")

Data:Mtcar

m2 <- lm(mpg ~ wt + hp, data = mtcars)
ceplot(data = mtcars, model = m2, sectionvars = "hp")

Condition selector plot

mtcars$cyl <- as.factor(mtcars$cyl)
m3 <- lm(mpg ~ wt + hp + disp, data = mtcars)
ceplot(data = mtcars, model = m3, 
       sectionvars = c("wt","cyl"), type = "shiny")

Full scatterplot

下圖只有在type = "separate" 才能呈現

m4 <- lm(mpg ~ wt + hp + disp + carb + cyl + vs, data = mtcars)
ceplot(data = mtcars, model = m4, conditionvars = c("wt","cyl","disp"), 
       selectortype = "full" ,type = "separate")

Parallel coordinates

輸入selectortype = "full",一樣只有在separate模式才會出現

m4 <- lm(mpg ~ wt + hp + disp + carb + cyl + vs, data = mtcars)
ceplot(data = mtcars, model = m4, conditionvars = c("wt","cyl","disp"), 
       selectortype = "full" ,type = "separate")

Visualizing sections

Distance form the section

\[d(x_i,x_j)=\parallel x_i-x_j \parallel_p+\lambda M(x_i,x_j)\]

Minkowski distance: \[\parallel x_i-x_j \parallel_p\]

The number of mismatches on the categorical elements:\[M(x_i,x_j)\]

Distance form the section

Using the package

Graphic type and layout

  • default option:section and condition selector plots on the same device.

  • separate option:type = "separate",section and condition selector plots on two different devices.
  • Shiny option:type = "shiny",similar to default but allows some extra interactivity.

Interacting with the graphics

  • Arrow keys, mouse and snapshot with "s" key.

Examples of using Fev data

library("randomForest"); library("mgcv"); library("covreg")
m3 <- list(RF = randomForest(fev ~ ., data = fev),
      lm = lm(fev ~ ., data = fev),
      gam = mgcv::gam(fev ~ smoke + gender + age + height, data=fev))
ceplot(data = fev, model = m3, sectionvars = "smoke",type="default")

Examples of using Powerplant data

library("e1071"); library("mgcv")
m4 <- list(svm = svm(PE ~ ., data = powerplant),
gam = gam(PE ~ AT + V + AP + RH, data = powerplant))
ceplot(data = powerplant, model = m4["svm"], sectionvars = "AT",
type = "separate")

Powerplant example 2

ceplot(data = powerplant, model = m4, sectionvars = "AT",
type = "separate", threshold = 0.5)

ceplot(data = powerplant, model = m4["svm"], sectionvars = c("AT", "V"),
type = "separate", view3d = TRUE, threshold = 0.2)

wine data

library("randomForest")
data("wine", package = "condvis")
wine$Class <- as.factor(wine$Class)
m5 <- randomForest(Class ~ Alcohol + Malic + Ash + Magnesium +
Phenols + Flavanoids, data = wine)
  • It's difficult to comprehend a region in six dimensions but two-dimensional ssection is straightforward

First,Alcohol and Phenols

ceplot(data = wine, model = m5, sectionvars = c("Alcohol", "Phenols"),
type = "shiny")

parallel coordinates condition selector

ceplot(data = wine, model = m5, sectionvars = c("Alcohol", "Phenols"),
type = "separate", selectortype = "pcp", threshold = 2)

Related work

Summary

-The condvis package allows the user to interactively take 2-D and 3-D sections in data space and visualize fitted models where they intersect the section.

-The strength of condvis lies in creating low-dimensional visualizations of fitted models in highdimensional space.

-This method also does not suit situations where categorical predictors have more than 4 or 5 levels