% Spatio-Temporal Imputation Methods: A Case Study
% Lingbing Feng
% 2012/10/29

Missing Value

Missing Piece


Agenda

Spatio-Temporal Data

Spatio-temproral data: data with spatial (geographic), temporal components, and attributes.

You can think it as a set of multivariate time series that are geographically distributed

Examples

Why impute?

It is difficult when

Our Goal

  1. A simple, intuitive method

  2. Takes both spatial and temporal information into consideration

  3. Computationally efficient

What have we achieved

Related Literature

We found

SVD imputation

** We consider it as the competetor **

Methods: about SVD

** The “crown jewel” of linear algebra (Gilbert Strang,2009) **

A little more

how does SVDmiss() work

Comments on SVDmiss()

Our method: a general idea:

Illustration

Some Notes

Case Study in Australia: Data and EDA

Representation (Space-wide format)

        date X048039 X049023 X050004 X050018 X050028 X050031 X050052
1 1911-01-01    33.2    58.3      NA    80.2    95.0   189.4   116.4
2 1911-02-01   149.5   133.6      NA    60.2    70.1    57.2    68.9
3 1911-03-01    14.0    22.8      NA    17.8    41.2    30.0    42.4
4 1911-04-01     0.0     0.8     8.1     0.0     0.0     0.0     0.4
5 1911-05-01    47.4    39.9    42.0    72.0    61.3    45.5    31.1
6 1911-06-01     6.6    23.0    16.5    16.3    12.9    30.5    13.4

location of stations

Missing Pattern (spatially)

plot of chunk unnamed-chunk-3

Structure Heatmap (stations reordered by missingness)

source(file = "functions.R")
HeatStruct(hqmr.cube.reorder) + theme(axis.text.x = element_blank())

plot of chunk unnamed-chunk-4

Cross-Validation

CV results

Performance (CUTOFF)

plot of chunk unnamed-chunk-5

Comparison

plot of chunk unnamed-chunk-6

Comparison (more)

plot of chunk unnamed-chunk-7

Comments

Simulation should be able to:

Reviewing the structure heatmap

plot of chunk unnamed-chunk-8

Structure Analysis for simulation

plot of chunk unnamed-chunk-9

Simulation workhorse (a missing vector): big idea

** Many thanks to Alan for the big idea and to Gen for helping me much on the programme!**

Algorithm: MissingSimulation()

Specification for our data

Try MissingSimulation()

 [1] 0 0 1 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 1
 [1] 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0

And you can play around with it to fit your needs.

Integration to our data problem

HeatStruct(BlockMissing()) + theme(axis.text.x = element_blank())

plot of chunk unnamed-chunk-11

Evaluating Imputation Performance for both method

# Comparison

Choosing \( \rho \) by simulation

Choosing \( \rho \) by simulation (cont'd)

Conclusion

Outlook and future work

About slides and pkg

Thanks & Questions