Overview

The proposed template for the functionality can be viewed in this sketch. This is a draft of what the functionality the dashboard might include if there is sufficient support for implementation from center staff. All graphical demonstrations are created using mock data.
The functionality is threefold at present:

Center Management Dashboard

Graph 1:

A line graph of the number of students applying for a course
The AUC (area-under-curve) is split by color and binned by a selectable unit of time into stacked bar graphs indicating the ratio of old students/new students per unit time
A prediction for the closing date of a course can be created from the average of two linear regression lines:
- applications per unit time since first application for retreat
- applications in the past week
This will provide the basis for a moving “expected date of close” estimate to provide a guideline for old students intent on attending the course as to when they will need to have the application completed by to avoid being waitlisted. This could also be paired with an automatic email notification feature when the proposed close date reaches 1 week etc.

Graph 2

Graph 2 will provide a statistical overview of various metrics per student across a given time period. Default time bins will be seasons, with adjustments of quarters, months, weeks, days. The graph will have y-axis as the # of students admitted, and x-axis as the date. Stacked bar graphs will show the number of os v ns per retreat across the duration of the retreat. These can be regressed against electricity cost, food cost, donations etc to establish an amount per student of a given resource per given time period.

Long Term Trends

Both graphs can provide prospective insights into the fluctuation in variables across longer time periods such as seasons and years to allow for seasonal comparisons as well as annual projections. The mathematical relationship of the student base to time can be determined as linear, exponential or otherwise.

Ride Suggestion

A ride suggestion algorithm can be created with KNN (k nearest neighbors) and the Google Maps API. The application process can be modified to include inputs for:

Their location of departure
An optional checkbox to select “offering a ride” or “needing a ride”
If offering a ride is selected, the following appear:
- a date of departure selection dialog
- # of seats available (taking into account luggage of each traveler)

With this information, the OpenMaps API can then be queried to provide an inline user selection (with PHP) of the latitude and longitude of that location. Because kNN is directionally indifferent, we need to create a custom algorithm to find the individuals that are closest to the driver’s path to the center. To do so we will need to compute the distance of parallel lines between the rider’s origin point \(r_{la},r_{lo}\) and the vector between the driver’s origin and the center \(\vec{d_{oc}}\) for a rough estimate of whom will be closest to the path of travel. We can determine which riders will be encompassed in a query for a drive by finding the bisecting half way point \(distance(\vec{d_{oc}})/2\) on the direction vector from the driver origin to center, and creating a circular radius equivalent to half the distance of the journey with an overlapping sphere of 10 mile radius from their point of origin (a driver would likely be willing to pick up someone with 10 miles even if in the opposite direction). The distance of each of rider’s origin falling within these radii to the will be calculated using an equation for a perpendicular line, and k closest riders will be determined where k is equal to the number of open seats.

The k closest riders will then be submitted to the Google Maps API to determine the amount of time added to the driver’s trip in order to pick up the rider - the total added time will be computed for each and for all (combinations can be manually calculated from this info). The Google Maps API provides 2500 free queries per 24 hour period, thus the queries will be as follows:

Time to the center (1 query)
Time to the center with each of the closest riders (k queries)
Time to the center with all of the closest riders (1 query)

With the pre-computation described above, the queries per day will be unlikely to exceed the 24 hour quota.

Exceptions

Exceptions might occur where a person resides just outside of a 10 mile radius but also in the same large city, and would be willing to take a Lyft or public transportation to reach the driver. To account for such exceptions, a regex matching system based on city name would be helpful to identify and match these exceptions.

Potentially relevant links:

Graph 1

dRange <- seq.Date(lubridate::ymd("2015-01-01"), lubridate::ymd("2018-05-25"), by = "day")
applications <- data.frame(Date = dRange, NS = round(runif(length(dRange), 1, 20)), 
    OS = round(runif(length(dRange), 1, 20)/3), stringsAsFactors = F)
applications %<>% gather(NS, OS, key = Status, value = Applied) %>% arrange(Date)  # Gather the New Student and Old student column as a factor by date
applications %<>% mutate(Wk = lubridate::week(Date), WkBgn = lubridate::floor_date(Date, 
    "week"))  # Bin these values by week for a clearer graph
appWk <- applications %>% group_by(WkBgn, Status) %>% summarize(Applied = sum(Applied))

ggplot(data = appWk %>% filter(WkBgn >= lubridate::ymd("2018-01-01")), mapping = aes(x = WkBgn, 
    y = Applied)) + geom_bar(stat = "identity", position = "stack", aes(y = Applied, 
    fill = Status)) + geom_text(data = appT <- appWk %>% group_by(WkBgn) %>% summarize(Total = sum(Applied)) %>% 
    right_join(appWk, by = "WkBgn") %>% filter(WkBgn >= lubridate::ymd("2018-01-01")), 
    aes(x = WkBgn, y = Applied, label = paste0(round(Applied/Total * 100), "%")), 
    size = 4, vjust = 1) + geom_smooth(data = appT, aes(y = Total)) + scale_fill_manual(values = c(NS = "#FF7878", 
    OS = "#5CFF66")) + scale_x_date(date_breaks = "week", date_labels = "%m-%e") + 
    theme_minimal(base_size = 11) + xlab("# of Applications") + ylab("Time YTD 2018 (by wk)") + 
    ggtitle("Demo of Graph 1", subtitle = "Graph created using Demo Data") + labs(caption = "New Student/Old Student ratio stacked bar graph with Loess average trendline of total applications") + 
    theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5), 
        axis.text.x = element_text(angle = 60, size = 8))

googlesheets::gs_auth()
coursesheet <- googlesheets::gs_url("https://docs.google.com/spreadsheets/d/1Qx4a2SbT136pRcYputkOIWj8qdX_F565owJn3cnL7Wo/edit?ts=5b0b40d9#gid=0")
coursedf <- googlesheets::gs_read(coursesheet)
cdf <- coursedf[1:15, ] %>% separate(col = `Course Dates`, into = c("Begin", "End"), 
    sep = "\\s\\-\\s?") %>% mutate_at(.vars = vars(Begin), .funs = funs(lubridate::mdy(paste(., 
    "2018", sep = "/"))))

enddate <- function(col1, col2) {
    if (is.na(col1)) {
        col1 <- col2
    }
    if (str_count(col1, "\\/") < 1) {
        col1 <- lubridate::mdy(paste(lubridate::month(col2), col1, "2018", sep = "/"))
    } else if (str_count(col1, "\\/") < 2) {
        col1 <- lubridate::mdy(paste(col1, "2018", sep = "/"))
    } else {
        col1 <- lubridate::mdy(col1)
    }
    
    return(col1)
}
cdf %>% mutate(EndDate = enddate(End, Start))
# ----------------------- Sun May 27 20:17:49 2018 ------------------------# It
# will be easier just to use constructed data for demo purposes as the data is
# ultimately going to be different for the final implementation.

# ----------------------- Sun May 27 13:00:06 2018 ------------------------#
# Aggregate statistics by day in the previous created dataset do not allow for
# the linear regression model predicting course closure date to be built. A new
# dataset will be constructed to accommodate this task.
Courses <- seq.Date(lubridate::ymd("2018-01-04"), lubridate::now() %>% lubridate::as_date(), 
    by = "13 days")  # Create mock course dates
addCourse <- function(Dt, Courses) {
    # p <- rev(exp(1:length(Courses))) %>% {. / sum(.)}
    out <- sample(Courses[Courses > Dt], 1)
    return(out)
}
applications <- data.frame(Date = sample(dRange, 5000, replace = T), Student.No = seq(1, 
    5000, 1), Type = sample(c("NS", "OS"), 5000, replace = T, prob = c(0.66, 0.34)), 
    Status = sample(c("Accepted", "In Process", "Rejected"), 5000, replace = T, prob = c(0.5, 
        0.4, 0.1)))
applications %<>% filter(Date > lubridate::mdy("01/01/2018")) %>% rowwise %>% mutate(Course = addCourse(Date, 
    Courses))

Due to the number and diversity of variables involved in this analysis, I believe that trying to develop a demo with dummy data will be a wasted effort. The code for this particular calculation will be heavily dependent on where, how, and in what units the data is stored and retrieved and thus the actual code necessary to complete the analysis will likely entirely invalidate any mock data demo. This section will be skipped until further information is available.

Graph 2

Mock Electric Usage per Student

Elec <- data.frame(Date = lubridate::mdy(paste(seq(1, 5, 1), "25", "2018", sep = "-")), 
    Usage = runif(5, 20000, 50000), stringsAsFactors = F)
App_By_Mo <- applications %>% group_by(Course) %>% count(Status) %>% filter(Status == 
    "Accepted") %>% mutate(Mo = lubridate::month(Course)) %>% ungroup %>% group_by(Mo) %>% 
    summarise(TotalS = sum(n))
Elec %>% mutate(Mo = lubridate::month(Date)) %>% left_join(App_By_Mo, by = "Mo") %>% 
    ggplot(mapping = aes(x = factor(Date))) + geom_bar(stat = "identity", aes(y = Usage/TotalS), 
    fill = "green") + labs(title = "Electricity Usage Per Student", subtitle = "kwH / Student", 
    caption = "Created using mock data", x = "Month", y = "kwH") + theme(plot.title = element_text(hjust = 0.5), 
    plot.subtitle = element_text(hjust = 0.5))

Ride Suggestion

The level of complexity of the algorithm documented above may not be feasible for the purposes of this project, but a simpler mechanism that employs kNN and a directional filtering mechanism may be implementable in a reasonable amount of time and provide sufficient matching to help individuals without vehicular transporation get to the center with greater ease.

RS <- data.frame(Student = 1:6, Type = c("D", rep("R", 5)), Lon = c(40.695331, 41.322251, 
    41.264751, 41.829653, 42.08225, 42.3781439), Lat = c(-73.9256047, -72.970297, 
    -75.8360507, -72.632024, -72.566943, -72.5487467), City = c("Brooklyn", "New Haven", 
    "Wilkes-Barre", "South Windsor", "Springfield", "Hadley"), State = c("NY", "CT", 
    "PA", "CT", "MA", "MA"))

spdep::knearneigh(as.matrix(RS[, c("Lon", "Lat")]), k = 3)

## $nn
##      [,1] [,2] [,3]
## [1,]    2    4    5
## [2,]    4    5    6
## [3,]    1    2    4
## [4,]    5    6    2
## [5,]    4    6    2
## [6,]    5    4    2
## 
## $np
## [1] 6
## 
## $k
## [1] 3
## 
## $dimension
## [1] 2
## 
## $x
##           Lon       Lat
## [1,] 40.69533 -73.92560
## [2,] 41.32225 -72.97030
## [3,] 41.26475 -75.83605
## [4,] 41.82965 -72.63202
## [5,] 42.08225 -72.56694
## [6,] 42.37814 -72.54875
## 
## attr(,"class")
## [1] "knn"
## attr(,"call")
## spdep::knearneigh(x = as.matrix(RS[, c("Lon", "Lat")]), k = 3)

The algorithm will have to be built such that the driver’s shortest path is used as the point from which the nearest locations are calculated.

VMC Dashboard

Stephen Synchronicity

2018-06-03

Overview

Center Management Dashboard

Graph 1:

Graph 2

Long Term Trends

Ride Suggestion

Graph 1

Graph 2

Mock Electric Usage per Student

Ride Suggestion