Why a network graph?
What we will see is that while star agents get the bulk of the lisitngs these many listings are sold by the same agents over and over. Each Listing Agent has a coterie of Selling agents. When a Listing agent gets a new listing some agents will say ‘I always like Fran’s listing so I am going to go to that Open House.’ There is so much in that sentence but for this project we will just leave it at that this agent knows from experience that they often have the buyer for ‘Fran’s’ listing.
There are two parts to a network graph. The Nodes and the Edges. The Nodes are the points and the Edges are lines connecting those points. In the following lines we will at first make a single list of all of the names ignoring whether they were the Listing Agent or the Sellling Agent. Then we will make a table of the relationships (which in Network parlance is known as an adjacency matrix) between each pair of agents who had the lisitng and who the buyer.
This next part is such an unabashed copy of Jess Sadler’s Network Graph tutorial (“https://www.jessesadler.com/post/network-analysis-with-r/”) that we will use those same labels.
Here we make a list of just the names of the Listing and Selling agents silmutaneously renaming the column to “label”.
Remember we are using real data so I have annoymized the data replacing the names with numbers.
To make our graphs readable we have to limit the amount of data. Let us look at only the top 50 pairs of agents.
Network_Data<- NorthShore%>%
group_by(ListingAgent, SellingAgent)%>%
ungroup()
?select
## starting httpd help server ... done
Network_Data<- select(Network_Data, "SellingAgent","ListingAgent","SoldPrice")
head(Network_Data)
## # A tibble: 6 x 3
## SellingAgent ListingAgent SoldPrice
## <int> <int> <int>
## 1 36 702 585000
## 2 73 47 695000
## 3 115 259 330000
## 4 306 125 465000
## 5 348 226 470000
## 6 579 263 785000
Network_Data_50<- Network_Data%>% top_n(50)
## Selecting by SoldPrice
sources<- Network_Data_50 %>%
distinct(ListingAgent)%>%
rename(label=ListingAgent)
#head(sources,1)
destinations <-Network_Data_50%>%
distinct(SellingAgent)%>%
rename(label=SellingAgent)
#head(destinations,1)
And we will combine those two lists creating the new list “Nodes”
nodes <-full_join(sources , destinations, by = "label")
#head(nodes)
Network graphs require a unique identifying number.
nodes<- nodes%>% rowid_to_column("id")
#head(nodes)
Next we will work on the connections between the Agents known as the Edges in Network graphing. First we create a table where the deals of each combination of agents are summed. In other words if Ann and Betty did three deals this table will show one line with the total of those 3 transactions summed. Then we Left_Join our Nodes tho this new data framed
per_route<- Network_Data_50%>%
group_by(ListingAgent,SellingAgent)%>%
summarise(weight=n() ) %>%
ungroup()
Lets compare Nodes to the new data frame .
#head(nodes)
#head(per_route)
edges<- per_route%>%
left_join(nodes, by =c("ListingAgent"="label"))%>%
rename(from =id)
#head(edges)
edges<-edges%>%
left_join(nodes, by = c("SellingAgent"="label"))%>%
rename(to=id)
head(edges)
## # A tibble: 6 x 5
## ListingAgent SellingAgent weight from to
## <int> <int> <int> <int> <int>
## 1 20 775 1 23 63
## 2 21 43 1 6 43
## 3 31 1121 1 35 76
## 4 65 117 1 9 46
## 5 72 1169 1 37 78
## 6 75 600 1 1 39
#head(edges)
edges<- select(edges, from, to , weight)
Next we will start graphing using the Network package.
It is the clusters that interest us here.
plot(routes_network, vertex.cex=3)

Now lets switch the Network package to the IGRAPH package.
detach(package:network)
rm(routes_network)
Igraph calls up the already established Edges and Nodes
routes_igraph<-graph_from_data_frame( d= edges , vertices = nodes, directed =TRUE)
Here is the Adjancy matrix showing every relationships between our realtors.
routes_igraph
## IGRAPH 2636618 DNW- 79 47 --
## + attr: name (v/c), label (v/n), weight (e/n)
## + edges from 2636618 (vertex names):
## [1] 23->63 6 ->43 35->76 9 ->46 37->78 1 ->39 10->47 18->55 13->50 11->48
## [11] 12->49 12->56 21->61 30->69 25->66 19->57 19->59 14->51 17->54 34->74
## [21] 22->62 24->65 20->60 29->68 3 ->41 2 ->40 8 ->45 8 ->69 15->52 15->64
## [31] 15->70 26->66 28->67 31->72 27->66 4 ->42 4 ->58 4 ->71 4 ->75 32->73
## [41] 33->73 16->53 7 ->44 7 ->53 5 ->42 38->79 36->77
Similar to the Network graph but the nodes are labled.
plot(routes_igraph,edge.arrow.size = 0.2)

plot(routes_igraph, layout= layout_with_graphopt, edge.arrow.size = 0.2)

Next let us try Tidygraph and GGraph.
routes_tidy<- tbl_graph(nodes=nodes,edges=edges,directed = TRUE)
routes_tidy
## # A tbl_graph: 79 nodes and 47 edges
## #
## # A rooted forest with 32 trees
## #
## # Node Data: 79 x 2 (active)
## id label
## <int> <int>
## 1 1 75
## 2 2 619
## 3 3 618
## 4 4 702
## 5 5 814
## 6 6 21
## # ... with 73 more rows
## #
## # Edge Data: 47 x 3
## from to weight
## <int> <int> <int>
## 1 23 63 1
## 2 6 43 1
## 3 35 76 1
## # ... with 44 more rows
routes_tidy %>%
activate(edges)%>%
arrange(desc(weight))
## # A tbl_graph: 79 nodes and 47 edges
## #
## # A rooted forest with 32 trees
## #
## # Edge Data: 47 x 3 (active)
## from to weight
## <int> <int> <int>
## 1 19 59 2
## 2 8 69 2
## 3 4 58 2
## 4 23 63 1
## 5 6 43 1
## 6 35 76 1
## # ... with 41 more rows
## #
## # Node Data: 79 x 2
## id label
## <int> <int>
## 1 1 75
## 2 2 619
## 3 3 618
## # ... with 76 more rows
Sparcer? True. But think of it as a litmust test: is it worth developing this graph? Yes becasue there are number clusters.
ggraph(routes_tidy)+
geom_edge_link()+
geom_node_point()+
theme_graph()
## Using `nicely` as default layout

ggraph(routes_tidy, layout = "graphopt") +
geom_node_point() +
geom_edge_link(aes(width = weight), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_node_text(aes(label = label), repel = TRUE) +
labs(edge_width = "weight") +
theme_graph()

#ggsave("Network.png")
A<-knitr::include_graphics("Network1.png")
A

At last we see who the real estate companies should target for recruitmant Not the unpoachable star agents but the satelite agents. For one thing they are poachable and for the other becasue of their much lower commision split the company will make significantly more money.