Here we will compute minimum spanning tree (\(MST\)) for some graph using Prim’s MST algorithm. The code is written in R. The task is from Algorithms: Design and Analysis, Part 2 Stanford University course on Coursera.
Suppose that we have undirected connected graph \(G(V,E)\) (\(V\) - set of vertices, \(E\) - set of edges of this graph) with each edge \(e\in E\) having some cost \(c_{e}\) accociated with it (cost may be negative). Spanning tree for this graph \(G\) will be the graph \(ST_G(V,E')\) such that it is a tree (i.e. has no cycles), has the same set of vertices \(V\) and set of edges \(E'\) that is a subset of \(E\): \(E'\subset E\). For any spanning tree \(ST_G\) (moreover, for any subgraph \(G'\subset G\)) we can compute its cost simply by adding up costs of every edge in that tree: \(C(ST_G)=\sum_{e\in E'}c_e\). Finally, we will call a spanning tree minimum spanning tree (\(MST_G\)) of graph \(G\) if it has minimal cost among all of the spanning trees of graph \(G\).
To build minimum spanning tree one can use Prim’s algorithm. It works as follows:
The graph we will use is contained in this file: http://spark-public.s3.amazonaws.com/algo2/datasets/edges.txt. This file describes an undirected graph with integer edge costs. It has the format
[number_of_nodes] [number_of_edges]
[one_node_of_edge_1] [other_node_of_edge_1] [edge_1_cost]
[one_node_of_edge_2] [other_node_of_edge_2] [edge_2_cost]
…
stat=read.table("http://spark-public.s3.amazonaws.com/algo2/datasets/edges.txt",nrows=1)
graph=read.table("http://spark-public.s3.amazonaws.com/algo2/datasets/edges.txt",skip=1,header=FALSE, col.names=c("V1","V2","cost"))
The following piece of code applies Prim’s algorithm to this graph. The cost of tree is saved into mstcost
variable, the \(MST\) itself - into mstedge
array in the same format as the original graph.
mst<-vector()#vector of MST vertices
mstedge<-matrix(nrow=499,ncol=3)#MST edges
mst[1]=1
mstcost=0
for (i in 2:as.numeric(stat[1])){
#choosing only that edges that has one end in current MST and another end not in MST
cand=graph[xor(graph$V2 %in% mst, graph$V1 %in% mst),]
#order the candidate edges by its cost and choose the cheapest
nxt=as.vector(cand[order(cand$cost)[1],])
#add new vertices to MST and drop repeated vertices
mst=unique(c(mst,as.numeric(nxt[1]),as.numeric(nxt[2])))
#increase C(MST) by cost of new edge
mstcost=mstcost+as.numeric(nxt[3])
#save new edge into mstedge
mstedge[i-1,1]=as.numeric(nxt[1])
mstedge[i-1,2]=as.numeric(nxt[2])
mstedge[i-1,3]=as.numeric(nxt[3])
}
Since the answer for the task was supposed to be \(C(MST)\) let’s look what is saved in the mstcost
variable:
mstcost
## [1] -3612829
The cost of the calculated \(MST\) is correct \(\blacksquare\)