The Aster graph functions use the Aster Database SQL-GR framework, which allows large-scale graph analysis in-database. While these functions offer wide range of specialized graph analysis their integration and visualization is beyond of the scope of the Aster database. At the same time few of competing graph products, such as neo4j, offer both. toaster attempts to fill this gap by offering comprehensive graph functionality that expands and integrates Aster functions. It also exposes Aster graph data inside R as network (graph) objects and expand set of graph functionality. As always, toaster does this by taking advantage of Aster in-database performance, scalability and distributed architecture.
Make sure you have package toaster installed with its dependencies.
install.packages("toaster")
library(toaster)Other packages that are not required but you may want to install to run couple of examples are intergraph and igraph.
toaster demo data sets dallas was updated to include graph data. Please use its latest copy available from toaster github wiki. Aster database should be available via ODBC and Teradata Aster ODBC driver should be installed. For details please see here.
Teradata Aster is relational data store that uses relational tables for all data structures. By convention, every graph should be represented using two tables:
In turn, every SQL-GR analytical function expects at minimum the following specification:
To simplify and streamline graph operations in Aster toaster provides specialized object toagraph that holds metadata about Aster graph. The graph metadata consists of the following:
Examples of two graphs defined on Open Dallas Police data and used in all examples that follow:
# undirected graph of police officers
policeGraphUn = toaGraph(vertices = "dallaspolice_officer_vertices",
edges = "dallaspolice_officer_edges_un",
directed = FALSE,
key = "officer", source = "officer1", target = "officer2",
vertexAttrnames = c("offense_count"),
edgeAttrnames = c("weight"))
# directed graph of police officers
policeGraphDi = toaGraph(vertices = "dallaspolice_officer_vertices",
edges = "dallaspolice_officer_edges_di",
directed = TRUE,
key = "officer", source = "officer1", target = "officer2",
vertexAttrnames = c("offense_count"),
edgeAttrnames = c("weight"))Note, that toagraph stores only metadata and requires no ODBC connectivity.
In toaster we always use de facto R standards to hold in R results of computations in Aster. Examples of such approach were using kmeans, lm, or DocumentTermMatrix in corresponding computation functions. The same applied to graph with the choice between igraph and network objects from the same packages. The choice of network object is not obvious and you can find details here. Moreover, using package intergraph one can always use them interchangeably (more examples later).
Function computeGraph transfers graph from the Aster database into R memory as network object where it can be used to visualize graph using ggplot2, igraph and other graph visualization functions. The simplest invocation of computeGraph may look like this:
netPoliceUn = computeGraph(conn, policeGraphUn)
GGally::ggnet2(netPoliceUn, node.label="vertex.names", node.size="offense_count",
legend.position="none", label.size=3)To finish let’s visualize the same graph with igraph:
igraphPoliceUn = asIgraph(netPoliceUn)
plot(igraphPoliceUn)With toaster most of the graph functions that compute various centrality indices and other metrics are available with just two functions: computeGraphHistogram and computeGraphMetric. Currently, the following graph metrics are supported:
Having compact and repeatable capability to visualize and profile various graph metrics lets toaster compliment the power of Aster’s big data distributed database with simple workflow delivering effective results . The computeGraphHistogram computes a distribution for a given metric over graph vertices and returns result to use with createHistogram.
hdegreePolice = computeGraphHistogram(conn, policeGraphUn, type='degree', numbins=36)
createHistogram(hdegreePolice,
title = "Dallas Police Graph Degree Distribution",
xlab='Degree', ylab='Count',
themeExtra = theme(axis.text.x = element_text(angle = 0)))hshortestpathPolice = computeGraphHistogram(conn, policeGraphUn, type='shortestpath',
numbins = 10)
createHistogram(hshortestpathPolice,
title = "Dallas Police Shortest Path Distribution",
xlab = "Distance", ylab = "Count",
themeExtra = theme(axis.text.x = element_text(angle = 0)))# Degree Distribution
hclusterPolice = computeGraphHistogram(conn, policeGraphUn, type='clustering',
numbins = 100)
createHistogram(hclusterPolice,
title = "Dallas Police Clustering Coefficients Distribution",
xlab = "Clustering Coefficient", ylab = "Count",
breaks=c(seq(0.0, 0.99, 0.1),0.99),
themeExtra = theme(axis.text.x = element_text(angle = 0)))hpagerankPolice = computeGraphHistogram(conn, policeGraphDi, type='pagerank',
numbins = 100)
createHistogram(hpagerankPolice, title="Dallas Police PageRank Distribution",
xlab="PageRank", ylab = "Count",
breaks=seq(0.0, 0.00140, 0.0001),
themeExtra = theme(axis.text.x = element_text(angle = 0)))hbetweenPolice = computeGraphHistogram(conn, policeGraphUn, type='betweenness',
numbins = 100, startvalue=0, endvalue=5000)
createHistogram(hbetweenPolice, title = "Dallas Police Betweenness Distribution",
xlab = "Betweenness", ylab = "Count",
breaks=seq(0,4450,300),
themeExtra = theme(axis.text.x = element_text(angle = 0)))hclosenessPolice = computeGraphHistogram(conn, policeGraphUn, type='closeness',
numbins=100)
createHistogram(hclosenessPolice, title = "Dallas Police Closeness Distribution",
xlab = "Closeness", ylab = "Count",
breaks=seq(0,1,0.05),
themeExtra = theme(axis.text.x = element_text(angle = 0)))hkdegreePolice = computeGraphHistogram(conn, policeGraphUn, type='k-degree',
numbins = 100)
createHistogram(hkdegreePolice, title = "Dallas Police K-degree Distribution",
xlab = "K-degree", ylab = "Count",
breaks=seq(0,2200,200),
themeExtra = theme(axis.text.x = element_text(angle = 0)))Analyzing graph distributions is effective first step but it lacks precision of zeroing in on concrete graph vertices. Certain metrics, such as PageRank or Betweenness Centrality, often carry critical or significant value in the underlying network. Thus, determining vertices with top values for a given metric is next natural step for graph analysis.
For this purpose toaster offers 2d function - computeGraphMetric - that calculates top vertices for every metric it supports. For example, to determine top 25 vertices in Betweenness Centrality (a measure of the influence a node has over the spread of information through the network) we simply do following:
Top degree vertices:
topDegree = computeGraphMetric(conn, policeGraphUn, type="degree", top=30)
createTopMetricPlot(topDegree, 'degree', ylab='Degree', title='Top 30 Officers (Vertex Degree)')Similarly, other metrics work, such as centrality measures betweenness and PageRank:
topbetweenness = computeGraphMetric(conn, policeGraphUn, type='betweenness', top=30)
createTopMetricPlot(topbetweenness, 'betweenness', ylab='Betweenness',
title='Top 30 Officers (Betweenness)')# Top PageRank Vertices
topPagerankPolice = computeGraphMetric(conn, policeGraphDi, type='pagerank', top=30)
createTopMetricPlot(topPagerankPolice, 'pagerank', ylab='PageRank',
title='Top 30 Officers (PageRank)')We saw earlier how to load a whole graph from Aster into R memory. Visualizing a whole graph is not particularly useful with big data and sometimes is impossible (due to memory constraints). For this reason toaster offers several ways to discriminate which and how much of graph data to load from Aster into network objects. Effectively, combined with R visualization functions, these mechanisms become a graph query interface to the Aster graph data store.
Using SQL WHERE filters either graph vertices or graph edges or both when defining metadata toagraph object:
policeGraphUnFilter = toaGraph(vertices = "dallaspolice_officer_vertices",
edges = "dallaspolice_officer_edges_un",
directed = FALSE,
key = "officer", source = "officer1", target = "officer2",
vertexAttrnames = c("offense_count"),
edgeAttrnames = c("weight"),
vertexWhere = "officer ~ '[A-Z].*'")
showGraph(conn, policeGraphUnFilter,
node.label="vertex.names", node.size="offense_count",
legend.position="none", label.size=3)The same can be accomplished on per-call basis without defining filters in metadata: with toaster its graph compute functions always offer arguments to override corresponding filters in toagraph (note that toagraph object is back to original policeGraphUn with no filters, and we used both vertex and edge filters this time):
showGraph(conn, policeGraphUn,
vertexWhere = "officer ~ '[A-Z].*'",
edgeWhere = " weight > 0.10 ",
node.label="vertex.names", node.size="offense_count",
legend.position="none", label.size=3)The conditions on the vertex keys with arbitrary logic and access to data beyond graph tables are available to define subgraphs. computeGraph has an optional argument v that is either list of vertex key values or a SQL query that returns such values when executed in Aster:
subGraphOfficerLetters = computeGraph(conn, policeGraphUn,
v = "SELECT officer
FROM dallaspolice_officer_vertices
WHERE officer ~ '[A-Z].*'")
subGraphOfficerLetters %v% "color" =
substr(get.vertex.attribute(subGraphOfficerLetters, "vertex.names"), 1, 1)
GGally::ggnet2(subGraphOfficerLetters, node.label="vertex.names", label.size=3,
node.size="offense_count", size.cut=TRUE, node.color="color",
palette = "Set2", legend.position="none")Subgraph using list of vertices:
subGraphOfficerM = computeGraph(conn, policeGraphUn,
v=list('M175','M190','M203','M183','M202','M193','M201'))
GGally::ggnet2(subGraphOfficerM, node.label="vertex.names", node.size="offense_count",
node.color="vertex.names", size.cut=3, legend.position="none", label.size=3,
palette = "Set3")Subgraph of the top degree nodes:
topDegreeNet = computeGraph(conn, policeGraphUn, v=as.list(as.character(topDegree$key)))
topDegreeNet %v% "degree" =
topDegree[match(get.vertex.attribute(topDegreeNet, "vertex.names"), topDegree$key), "degree"]
GGally::ggnet2(topDegreeNet, node.label="vertex.names", node.size="degree",
legend.position="none")The ego graph (or neighborhood graph) requires a vertex v and an integer order greater than 0. Then ego graph is induced by a vertex v and all of its neighbors that are reachable within distance not greater than order of the ego graph. In Aster, it requires combination of several graph functions and SQL to implement ego graph. toaster function computeEgoGraph does exactly that:
egoCenters = as.list(as.character(topPagerankPolice$key[1:3]))
print(egoCenters)
#> [[1]]
#> [1] "4682"
#>
#> [[2]]
#> [1] "10459"
#>
#> [[3]]
#> [1] "7970"
egoGraphsTopPagerank = computeEgoGraph(conn, policeGraphDi, order = 1, ego = egoCenters)
egoGraph = setVertexColor(egoGraphsTopPagerank[[1]], egoCenters[[1]])
GGally::ggnet2(egoGraph, node.label="vertex.names", node.size="offense_count",
legend.position="none", color="color")
egoGraph = setVertexColor(egoGraphsTopPagerank[[2]], egoCenters[[2]])
GGally::ggnet2(egoGraph, node.label="vertex.names", node.size="offense_count",
legend.position="none", color="color")
egoGraph = setVertexColor(egoGraphsTopPagerank[[3]], egoCenters[[3]])
GGally::ggnet2(egoGraph, node.label="vertex.names", node.size="offense_count",
legend.position="none", color="color")