Instructions

Using R, keep counters of equal-width grid-cells (base counters for micro-cluster definitions) of a 2-dimensional continuous data stream using different window models (landmark, sliding, weighted, fading).

The code should work with an evolving stream from a single record of the PhysioNet Challenge https://physionetchallenges.github.io/2020/

This training set consists of 6,877 (male: 3,699; female: 3,178) 12-ECG recordings lasting from 6 seconds to 60 seconds. Each recording was sampled at 500 Hz. All data is provided in WFDB format with a MATLAB v4 file and a header containing patient sex, age, and diagnosis (Dx) information at the end of the header file. The code should be applicable to one 12-dimensional record file.

Packages ‘R.matlab’ (for reading data from ‘.mat’ files) and ‘stream’ (for accessing the data as a stream) should be used.

Submitted code file should include comments to improve readability.

Setup

First, load the libraries:

Import data using readMat from R.matlab package. The data comes in a List container, so we drop it to work directly with the matrix:

Now we create the data stream interface for the leads I and V2:

Memory Stream Interface
Class: DSD_Memory, DSD_R, DSD_data.frame, DSD 
With NA clusters in 2 dimensions 
Contains 21500 data points - currently at position 1 - loop is FALSE 

Next, let’s create the matrix that will hold the counters for the grid. The database is stored in 16-bits so we can expect that the boundaries will be from −32768 to 32767. Nevertheless, for the purpose of this exercise, a quick inspection of .mat files shows that a boundary of -8192 to 8191 (14-bits) is enough.

Now we are ready to implement the four algorithms for the grid counting: landmark, sliding, weighted, fading.

Landmark

The landmark is actually not a ‘window’. It just keep updating the counters as new points arrive:


Wait for the animation (about 3 sec of aparently no changes)
The cells that have a value > 0 are plotted in ice-white color.
The maximum value is green.

Sliding Window

The sliding window is a little bit different. We use a fixed window size and at each new observation, we drop the oldest one that doesn’t fit the window size. In this case, as this is a counter matrix, the counter won’t have a value larger than the window size, and in every step we subtract one value from each counter:

# Sliding Window algorithm
f_n <- 3000 # The normalizing factor, so we focus on a smaller interval, just for better plotting
n <- 300 # we don't want to reach the end of the dataset in this example.
grid <- grid_base # let's make a copy of the original grid.
reset_stream(stream)

w <- 100 # the window size
window <- list()

invisible(saveGIF( # save all graphics in an animated gif
  for (i in seq_len(n)) {
    # retrieve one observation from stream
    points <- get_points(stream)

    # normalize the value to add to the matrix
    points <- points / f_n

    # transform the values to match the matrix indexes
    points <- floor((points + 1) * d / 2 + 0.5)

    # at the beginning, just count as the landmark did
    grid[points[, 1], points[, 2]] <- grid[points[, 1], points[, 2]] + 1

    # store the current points in window. `as.character` is used to store them with labels instead of index,
    # avoiding the creation of several NULL's if `i` doesn't follow the sequence.
    window[[as.character(i)]] <- points

    if (length(window) > w) {
      # as the window advances, start subtracting values of the oldest points
      grid[window[[1]][, 1], window[[1]][, 2]] <- grid[window[[1]][, 1], window[[1]][, 2]] - 1
      window <- window[-1] # removes the oldest point
    }

    # plot the grid. Using pallete Greens 2, so the white is actually a 'ice' color
    image(grid,
      main = "Sliding Window", col = hcl.colors(2^14, palette = "Greens 2", rev = T),
      zlim = c(1e-5, max(grid)), xlab = names(points)[1], ylab = names(points)[2],
      xaxt = "n", yaxt = "n"
    )
    axis(1, at = seq(0, 1, length.out = d), labels = seq(-f_n, f_n, length.out = d))
    axis(2, at = seq(0, 1, length.out = d), labels = seq(-f_n, f_n, length.out = d))
  },
  "window.gif",
  interval = 0.01, autobrowse = FALSE, ani.res = 96, ani.height = 500
))

Wait for the animation (about 3 sec of aparently no changes)
The cells that have a value > 0 are plotted in ice-white color.
The maximum value is green.

Weighted Window

The weighted window applies an alpha factor that reduces the weight of older observations, we still have to keep the observations in the window array.

# Weighted Sliding Window algorithm
f_n <- 3000 # The normalizing factor, so we focus on a smaller interval, just for better plotting
n <- 300 # nrow(data) # we don't want to reach the end of the dataset in this example.
grid <- grid_base # let's make a copy of the original grid.
reset_stream(stream)

w <- 100 # the window size
eps <- 0.05
alpha <- eps^(1 / w)
window <- list()

invisible(saveGIF( # save all graphics in an animated gif
  for (i in seq_len(n)) {
    # retrieve one observation from stream
    points <- get_points(stream)

    # normalize the value to add to the matrix
    points <- points / f_n

    # transform the values to match the matrix indexes
    points <- floor((points + 1) * d / 2 + 0.5)

    # apply alpha to the entire grid and then sum the next one
    grid <- grid * alpha
    grid[points[, 1], points[, 2]] <- grid[points[, 1], points[, 2]] + 1

    # store the current points in window. `as.character` is used to store them with labels instead of index,
    # avoiding the creation of several NULL's if `i` doesn't follow the sequence.
    window[[as.character(i)]] <- points

    if (length(window) > w) {
      # as the window advances, start subtracting values of the oldest points
      grid[window[[1]][, 1], window[[1]][, 2]] <- grid[window[[1]][, 1], window[[1]][, 2]] - alpha^(w - 1)
      window <- window[-1] # removes the oldest point
    }

    # plot the grid. Using pallete Greens 2, so the white is actually a 'ice' color
    image(grid,
      main = "Weighted Sliding Window", col = hcl.colors(2^14, palette = "Greens 2", rev = T),
      zlim = c(1e-5, max(grid)), xlab = names(points)[1], ylab = names(points)[2],
      xaxt = "n", yaxt = "n"
    )
    axis(1, at = seq(0, 1, length.out = d), labels = seq(-f_n, f_n, length.out = d))
    axis(2, at = seq(0, 1, length.out = d), labels = seq(-f_n, f_n, length.out = d))
  },
  "weighted.gif",
  interval = 0.01, autobrowse = FALSE, ani.res = 96, ani.height = 500
))

Wait for the animation (about 3 sec of aparently no changes)
The cells that have a value > 0 are plotted in ice-white color.
The maximum value is green.

Fading Window

Finally, the fading window is quite similar to the weighted window. The only difference is that we don’t keep any window, just apply the alpha to all counters.


Wait for the animation (about 3 sec of aparently no changes)
The cells that have a value > 0 are plotted in ice-white color.
The maximum value is green.

EOF<<

---
title: "HEADS - HIDA: Assignment 1"
output: 
  html_notebook: 
    highlight: pygments
    theme: united
    toc: yes
author: Francisco Bischoff
---

## Instructions

Using R, keep counters of equal-width grid-cells (base counters for micro-cluster definitions) of a 2-dimensional continuous data stream using different window models (landmark, sliding, weighted, fading).

The code should work with an evolving stream from a single record of the PhysioNet Challenge https://physionetchallenges.github.io/2020/

This training set consists of 6,877 (male: 3,699; female: 3,178) 12-ECG recordings lasting from 6 seconds to 60 seconds. Each recording was sampled at 500 Hz. All data is provided in WFDB format with a MATLAB v4 file and a header containing patient sex, age, and diagnosis (Dx) information at the end of the header file. The code should be applicable to one 12-dimensional record file.

Packages 'R.matlab' (for reading data from '.mat' files) and 'stream' (for accessing the data as a stream) should be used.

Submitted code file should include comments to improve readability.

## Setup

First, load the libraries:

```{r setup, message = FALSE}
library(R.matlab)
library(stream)
library(animation) # animated gifs!
```

Import data using `readMat` from `R.matlab` package. The data comes in a `List` container, so we drop it to work directly with the `matrix`:

```{r import}
# Import the data ----
data <- readMat("A2020.mat")
data <- data[[1]] # get rid of List
data <- t(data) # transpose the matrix, so each column represents one lead.
data <- as.data.frame(data)
colnames(data) <- c("I", "II", "III", "aVR", "aVL", "aVF", "V1", "V2", "V3", "V4", "V5", "V6")
```

Now we create the data stream interface for the leads `I` and  `V2`:

```{r streaming}
# Set seed for replication purposes ----
set.seed(2020)

# Create the Data Streaming obj----
stream <- DSD_Memory(data[, c("I", "V2")], n = NULL) # n is NULL just to silence linter warnings
stream
```

Next, let's create the matrix that will hold the counters for the grid. The database is stored in 16-bits so we can expect that the boundaries will be from −32768 to 32767. Nevertheless, for the purpose of this exercise, a quick inspection of .mat files shows that a boundary of -8192 to 8191 (14-bits) is enough.

```{r the_grid}
# Create the grid ----
d <- 17 # this is the dimention of the grid (17x17)
grid_base <- matrix(0, nrow = d, ncol = d) # matrix filled with zeroes
```

Now we are ready to implement the four algorithms for the grid counting: landmark, sliding, weighted, fading.

## Landmark

The landmark is actually not a 'window'. It just keep updating the counters as new points arrive:

```{r landmark, fig.height=5, fig.width=5}
# Landmark algorithm
f_n <- 3000 # The normalizing factor, so we focus on a smaller interval, just for better plotting
n <- 300 # we don't want to reach the end of the dataset in this example.
grid <- grid_base # let's make a copy of the original grid.
reset_stream(stream)

invisible(saveGIF( # save all graphics in an animated gif
  for (i in seq_len(n)) {
    # retrieve one observation from stream
    points <- get_points(stream)

    # normalize the value to add to the matrix
    points <- points / f_n

    # transform the values to match the matrix indexes
    points <- floor((points + 1) * d / 2 + 0.5)

    grid[points[, 1], points[, 2]] <- grid[points[, 1], points[, 2]] + 1

    # plot the grid. Using pallete Greens 2, so the white is actually a 'ice' color
    image(grid,
      main = "Landmark", col = hcl.colors(2^14, palette = "Greens 2", rev = T),
      zlim = c(1e-5, max(grid)), xlab = names(points)[1], ylab = names(points)[2],
      xaxt = "n", yaxt = "n"
    )
    axis(1, at = seq(0, 1, length.out = d), labels = seq(-f_n, f_n, length.out = d))
    axis(2, at = seq(0, 1, length.out = d), labels = seq(-f_n, f_n, length.out = d))
  },
  "landmark.gif",
  interval = 0.01, autobrowse = FALSE, ani.res = 96, ani.height = 500
))
```

<center>![](landmark.gif)
<br>Wait for the animation (about 3 sec of aparently no changes)
<br>The cells that have a value > 0 are plotted in ice-white color.
<br>The maximum value is green.
</center>

## Sliding Window

The sliding window is a little bit different. We use a fixed window size and at each new observation, we drop the oldest one that doesn't fit the window size. In this case, as this is a counter matrix, the counter won't have a value larger than the window size, and in every step we subtract one value from each counter:

```{r sliding}
# Sliding Window algorithm
f_n <- 3000 # The normalizing factor, so we focus on a smaller interval, just for better plotting
n <- 300 # we don't want to reach the end of the dataset in this example.
grid <- grid_base # let's make a copy of the original grid.
reset_stream(stream)

w <- 100 # the window size
window <- list()

invisible(saveGIF( # save all graphics in an animated gif
  for (i in seq_len(n)) {
    # retrieve one observation from stream
    points <- get_points(stream)

    # normalize the value to add to the matrix
    points <- points / f_n

    # transform the values to match the matrix indexes
    points <- floor((points + 1) * d / 2 + 0.5)

    # at the beginning, just count as the landmark did
    grid[points[, 1], points[, 2]] <- grid[points[, 1], points[, 2]] + 1

    # store the current points in window. `as.character` is used to store them with labels instead of index,
    # avoiding the creation of several NULL's if `i` doesn't follow the sequence.
    window[[as.character(i)]] <- points

    if (length(window) > w) {
      # as the window advances, start subtracting values of the oldest points
      grid[window[[1]][, 1], window[[1]][, 2]] <- grid[window[[1]][, 1], window[[1]][, 2]] - 1
      window <- window[-1] # removes the oldest point
    }

    # plot the grid. Using pallete Greens 2, so the white is actually a 'ice' color
    image(grid,
      main = "Sliding Window", col = hcl.colors(2^14, palette = "Greens 2", rev = T),
      zlim = c(1e-5, max(grid)), xlab = names(points)[1], ylab = names(points)[2],
      xaxt = "n", yaxt = "n"
    )
    axis(1, at = seq(0, 1, length.out = d), labels = seq(-f_n, f_n, length.out = d))
    axis(2, at = seq(0, 1, length.out = d), labels = seq(-f_n, f_n, length.out = d))
  },
  "window.gif",
  interval = 0.01, autobrowse = FALSE, ani.res = 96, ani.height = 500
))
```

<center>![](window.gif)
<br>Wait for the animation (about 3 sec of aparently no changes)
<br>The cells that have a value > 0 are plotted in ice-white color.
<br>The maximum value is green.
</center>

## Weighted Window

The weighted window applies an alpha factor that reduces the weight of older observations, we still have to keep the observations in the window array.

```{r weighted}
# Weighted Sliding Window algorithm
f_n <- 3000 # The normalizing factor, so we focus on a smaller interval, just for better plotting
n <- 300 # we don't want to reach the end of the dataset in this example.
grid <- grid_base # let's make a copy of the original grid.
reset_stream(stream)

w <- 100 # the window size
eps <- 0.05
alpha <- eps^(1 / w)
window <- list()

invisible(saveGIF( # save all graphics in an animated gif
  for (i in seq_len(n)) {
    # retrieve one observation from stream
    points <- get_points(stream)

    # normalize the value to add to the matrix
    points <- points / f_n

    # transform the values to match the matrix indexes
    points <- floor((points + 1) * d / 2 + 0.5)


    # apply alpha to the entire grid and then sum the next one
    grid <- grid * alpha
    grid[points[, 1], points[, 2]] <- grid[points[, 1], points[, 2]] + 1

    # store the current points in window. `as.character` is used to store them with labels instead of index,
    # avoiding the creation of several NULL's if `i` doesn't follow the sequence.
    window[[as.character(i)]] <- points

    if (length(window) > w) {
      # as the window advances, start subtracting values of the oldest points
      grid[window[[1]][, 1], window[[1]][, 2]] <- grid[window[[1]][, 1], window[[1]][, 2]] - alpha^(w - 1)
      window <- window[-1] # removes the oldest point
    }

    # plot the grid. Using pallete Greens 2, so the white is actually a 'ice' color
    image(grid,
      main = "Weighted Sliding Window", col = hcl.colors(2^14, palette = "Greens 2", rev = T),
      zlim = c(1e-5, max(grid)), xlab = names(points)[1], ylab = names(points)[2],
      xaxt = "n", yaxt = "n"
    )
    axis(1, at = seq(0, 1, length.out = d), labels = seq(-f_n, f_n, length.out = d))
    axis(2, at = seq(0, 1, length.out = d), labels = seq(-f_n, f_n, length.out = d))
  },
  "weighted.gif",
  interval = 0.01, autobrowse = FALSE, ani.res = 96, ani.height = 500
))
```

<center>![](weighted.gif)
<br>Wait for the animation (about 3 sec of aparently no changes)
<br>The cells that have a value > 0 are plotted in ice-white color.
<br>The maximum value is green.
</center>

## Fading Window

Finally, the fading window is quite similar to the weighted window. The only difference is that we don't keep any window, just apply the alpha to all counters.

```{r fading}
# Weighted Sliding Window algorithm
f_n <- 3000 # The normalizing factor, so we focus on a smaller interval, just for better plotting
n <- 300 # we don't want to reach the end of the dataset in this example.
grid <- grid_base # let's make a copy of the original grid.
reset_stream(stream)

w <- 100 # the window size
eps <- 0.05
alpha <- eps^(1 / w)

invisible(saveGIF( # save all graphics in an animated gif
  for (i in seq_len(n)) {
    # retrieve one observation from stream
    points <- get_points(stream)

    # normalize the value to add to the matrix
    points <- points / f_n

    # transform the values to match the matrix indexes
    points <- floor((points + 1) * d / 2 + 0.5)

    # apply alpha to the entire grid and then sum the next one
    grid <- grid * alpha
    grid[points[, 1], points[, 2]] <- grid[points[, 1], points[, 2]] + 1

    # plot the grid. Using pallete Greens 2, so the white is actually a 'ice' color
    image(grid,
      main = "Fading Window", col = hcl.colors(2^14, palette = "Greens 2", rev = T),
      zlim = c(1e-5, max(grid)), xlab = names(points)[1], ylab = names(points)[2],
      xaxt = "n", yaxt = "n"
    )
    axis(1, at = seq(0, 1, length.out = d), labels = seq(-f_n, f_n, length.out = d))
    axis(2, at = seq(0, 1, length.out = d), labels = seq(-f_n, f_n, length.out = d))
  },
  "fading.gif",
  interval = 0.01, autobrowse = FALSE, ani.res = 96, ani.height = 500
))
```

<center>![](fading.gif)
<br>Wait for the animation (about 3 sec of aparently no changes)
<br>The cells that have a value > 0 are plotted in ice-white color.
<br>The maximum value is green.
</center>

EOF<<
