Annotation

Here we will compute minimum spanning tree (\(MST\)) for some graph using Prim’s MST algorithm. The code is written in R. The task is from Algorithms: Design and Analysis, Part 2 Stanford University course on Coursera.

Minimum Spanning Trees: Definition

Suppose that we have undirected connected graph \(G(V,E)\) (\(V\) - set of vertices, \(E\) - set of edges of this graph) with each edge \(e\in E\) having some cost \(c_{e}\) accociated with it (cost may be negative). Spanning tree for this graph \(G\) will be the graph \(ST_G(V,E')\) such that it is a tree (i.e. has no cycles), has the same set of vertices \(V\) and set of edges \(E'\) that is a subset of \(E\): \(E'\subset E\). For any spanning tree \(ST_G\) (moreover, for any subgraph \(G'\subset G\)) we can compute its cost simply by adding up costs of every edge in that tree: \(C(ST_G)=\sum_{e\in E'}c_e\). Finally, we will call a spanning tree minimum spanning tree (\(MST_G\)) of graph \(G\) if it has minimal cost among all of the spanning trees of graph \(G\).

Finding MST: Prim’s algorithm

To build minimum spanning tree one can use Prim’s algorithm. It works as follows:

  1. Randomly select starting vertex \(v_0\), add it to \(MST\).
  2. Choose the vertex that is connected to the current state of \(MST\) by the cheapest possible edge. Add this vertex and this cheapest edge to \(MST\).
  3. Repeat step 2 until you are out of vertices.

Implementation

The graph we will use is contained in this file: http://spark-public.s3.amazonaws.com/algo2/datasets/edges.txt. This file describes an undirected graph with integer edge costs. It has the format

[number_of_nodes] [number_of_edges]

[one_node_of_edge_1] [other_node_of_edge_1] [edge_1_cost]

[one_node_of_edge_2] [other_node_of_edge_2] [edge_2_cost]

stat=read.table("http://spark-public.s3.amazonaws.com/algo2/datasets/edges.txt",nrows=1)
graph=read.table("http://spark-public.s3.amazonaws.com/algo2/datasets/edges.txt",skip=1,header=FALSE, col.names=c("V1","V2","cost"))

The following piece of code applies Prim’s algorithm to this graph. The cost of tree is saved into mstcost variable, the \(MST\) itself - into mstedge array in the same format as the original graph.

mst<-vector()#vector of MST vertices
mstedge<-matrix(nrow=499,ncol=3)#MST edges
mst[1]=1
mstcost=0
for (i in 2:as.numeric(stat[1])){
  #choosing only that edges that has one end in current MST and another end not in MST
  cand=graph[xor(graph$V2 %in% mst, graph$V1 %in% mst),]
  #order the candidate edges by its cost and choose the cheapest
  nxt=as.vector(cand[order(cand$cost)[1],])
  #add new vertices to MST and drop repeated vertices
  mst=unique(c(mst,as.numeric(nxt[1]),as.numeric(nxt[2])))
  #increase C(MST) by cost of new edge
  mstcost=mstcost+as.numeric(nxt[3])
  #save new edge into mstedge
  mstedge[i-1,1]=as.numeric(nxt[1])
  mstedge[i-1,2]=as.numeric(nxt[2])
  mstedge[i-1,3]=as.numeric(nxt[3])
}

Since the answer for the task was supposed to be \(C(MST)\) let’s look what is saved in the mstcost variable:

mstcost
## [1] -3612829

The cost of the calculated \(MST\) is correct \(\blacksquare\)