options(scipen = 9999999)

Overview

The more houses real estate companies sell the more money they make,of course. But unlike most industries they cannot increase the number of transactions that take place. No one can be coerced into selling their home. What real estate companies can do though is increase the number of transactions that they participate in.

Not all real estate agents sell the same number of homes. In fact it is VERY skewed. In an attempt to increase their sales real estate companies try to recruit those top Listing agents.

It happens to be the case that only a few agents get the majority of listings. Real Estate managers get blinded by the marquee number of the amount of property a Star Listing Agent sold. And every month real estate managers discuss and strategize ways to recruit these top agents. I think there is a better approach.

These star agents rarely switch firms. Unlike avg agents who work on a 50/50 split with the company star agents have a 80/20 split. That is as high as one will find in the industry. There is nothing a competing firm can offer a star agent to induce them to switch firms. The result is the star agents stay where they are and any new people that come to an office were not recruited but came in on their own for whatever reasons they might have. In sum, in an industry where the only way of increasing earnings is by getting more succesful sales people no sytematic appraoch is taken

A word about the data. This is real world data so we have annonymized it. These are sales that took place over a 5 yr period.

The Packages

library(tidyverse)
library(ggplot2)
library(network)
library(igraph)
library(tidygraph)
library(ggraph)
library(forcats)
library(scales)
library(treemap)

The Data

NorthShore<-read_csv("NorthShoreAnonymized.csv")
Commision_Splits<- read_csv("Commison_Splits.csv")

Now do the same naming this “Commision_Splits” then import it calling it “Commision_Splits”.

A Pie Chart View Of the World

Real Estate Managers view of the world: A Pie Chart. Managers fixate on the largest slices. A pie chart summarizes a real estate managers view of their mkt. There may be hundreds of agents but these few do the majority of the sales. If these agenst could be persuded to switch firms it would be a big boost for the office.

#trees_10<-NorthShoreAnonymized
trees_10<-NorthShore
trees_10<- trees_10%>%
  select(ListingAgent,SoldPrice)
trees_10<-trees_10%>%drop_na()
trees_10$ListingAgent<- as.factor(trees_10$ListingAgent)
trees_10<- trees_10%>%top_n(10)
## Selecting by SoldPrice
trees_10<- trees_10%>%
  group_by(ListingAgent)%>%
  count()%>%
  ungroup()%>%
  mutate(per=trees_10$SoldPrice/sum(trees_10$SoldPrice))%>%
  arrange(desc(ListingAgent))
trees_10
## # A tibble: 10 x 3
##    ListingAgent     n    per
##    <fct>        <int>  <dbl>
##  1 810              1 0.105 
##  2 774              1 0.105 
##  3 702              1 0.0741
##  4 634              1 0.0678
##  5 627              1 0.125 
##  6 619              1 0.122 
##  7 601              1 0.0886
##  8 375              1 0.122 
##  9 31               1 0.107 
## 10 21               1 0.0831
trees_10<- trees_10%>%
          mutate(percent=trees_10$per*100)
#head(trees_10,1)
ggplot(data=trees_10)+
  geom_bar(aes(x="", y=per, fill=ListingAgent), stat="identity", width = 1)+
  coord_polar("y", start=0)+
  theme_void()+
  geom_text(aes(x=1, y = cumsum(per) - per/2, label=round(percent,digits = 1)))+
  ggtitle("R.E. Managers of the World")

Treemap

But this Pie chart only shows how the 10 most succesful Listing Agents fared compared against each other. Next we will create a Treemap to see how these 10 fared compared to all of the others and then we will see just why managers fixate on them. In fact, 21 ListingAgents out hundreds listed appx 1/3 of all of the homes in this market.

tree_data<- NorthShore
tree_data$ListingAgent<- as.factor(tree_data$ListingAgent)
tree_data <- tree_data%>%
  select(ListingAgent,SoldPrice)
tree_data<- tree_data%>%
  group_by(ListingAgent)%>%
  ungroup()
#dim(tree_data)
#head(tree_data,1)
treemap(tree_data,
        index = "ListingAgent",
        vSize = "SoldPrice",
        type = "index")

Network Graph

Why a network graph?

What we will see is that while star agents get the bulk of the lisitngs these many listings are sold by the same agents over and over. Each Listing Agent has a coterie of Selling agents. When a Listing agent gets a new listing some agents will say ‘I always like Fran’s listing so I am going to go to that Open House.’ There is so much in that sentence but for this project we will just leave it at that this agent knows from experience that they often have the buyer for ‘Fran’s’ listing.

There are two parts to a network graph. The Nodes and the Edges. The Nodes are the points and the Edges are lines connecting those points. In the following lines we will at first make a single list of all of the names ignoring whether they were the Listing Agent or the Sellling Agent. Then we will make a table of the relationships (which in Network parlance is known as an adjacency matrix) between each pair of agents who had the lisitng and who the buyer.

This next part is such an unabashed copy of Jess Sadler’s Network Graph tutorial (“https://www.jessesadler.com/post/network-analysis-with-r/”) that we will use those same labels.

Here we make a list of just the names of the Listing and Selling agents silmutaneously renaming the column to “label”.

Remember we are using real data so I have annoymized the data replacing the names with numbers.

To make our graphs readable we have to limit the amount of data. Let us look at only the top 50 pairs of agents.

Network_Data<- NorthShore%>%
  group_by(ListingAgent, SellingAgent)%>%
  ungroup()
?select
## starting httpd help server ... done
Network_Data<-  select(Network_Data, "SellingAgent","ListingAgent","SoldPrice")
head(Network_Data)
## # A tibble: 6 x 3
##   SellingAgent ListingAgent SoldPrice
##          <int>        <int>     <int>
## 1           36          702    585000
## 2           73           47    695000
## 3          115          259    330000
## 4          306          125    465000
## 5          348          226    470000
## 6          579          263    785000
Network_Data_50<- Network_Data%>% top_n(50)
## Selecting by SoldPrice
sources<- Network_Data_50 %>%
  distinct(ListingAgent)%>%
  rename(label=ListingAgent)
#head(sources,1)
destinations <-Network_Data_50%>%
    distinct(SellingAgent)%>%
    rename(label=SellingAgent)
#head(destinations,1)

And we will combine those two lists creating the new list “Nodes”

nodes <-full_join(sources , destinations, by = "label")
#head(nodes)

Network graphs require a unique identifying number.

nodes<- nodes%>% rowid_to_column("id")
#head(nodes)

Next we will work on the connections between the Agents known as the Edges in Network graphing. First we create a table where the deals of each combination of agents are summed. In other words if Ann and Betty did three deals this table will show one line with the total of those 3 transactions summed. Then we Left_Join our Nodes tho this new data framed

per_route<- Network_Data_50%>%
  group_by(ListingAgent,SellingAgent)%>%
  summarise(weight=n() ) %>%
  ungroup()

Lets compare Nodes to the new data frame .

#head(nodes)
#head(per_route)
edges<- per_route%>%
  left_join(nodes, by =c("ListingAgent"="label"))%>%
  rename(from =id)
#head(edges)
edges<-edges%>%
  left_join(nodes, by = c("SellingAgent"="label"))%>%
  rename(to=id)
head(edges)
## # A tibble: 6 x 5
##   ListingAgent SellingAgent weight  from    to
##          <int>        <int>  <int> <int> <int>
## 1           20          775      1    23    63
## 2           21           43      1     6    43
## 3           31         1121      1    35    76
## 4           65          117      1     9    46
## 5           72         1169      1    37    78
## 6           75          600      1     1    39
#head(edges)
edges<- select(edges, from, to , weight)

Network transforms the data from a ordinary DF to one where there are relationships between the elelments.

routes_network<- network(edges, vertex.attr = nodes, matrix.type="edgelist",ignore.eval=FALSE)
class(routes_network)
## [1] "network"
routes_network
##  Network attributes:
##   vertices = 79 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 47 
##     missing edges= 0 
##     non-missing edges= 47 
## 
##  Vertex attribute names: 
##     id label vertex.names 
## 
##  Edge attribute names: 
##     weight

Next we will start graphing using the Network package.

It is the clusters that interest us here.

plot(routes_network, vertex.cex=3)

Now lets switch the Network package to the IGRAPH package.

detach(package:network)
rm(routes_network)

Igraph calls up the already established Edges and Nodes

routes_igraph<-graph_from_data_frame( d= edges , vertices = nodes, directed =TRUE)

Here is the Adjancy matrix showing every relationships between our realtors.

routes_igraph
## IGRAPH 2636618 DNW- 79 47 -- 
## + attr: name (v/c), label (v/n), weight (e/n)
## + edges from 2636618 (vertex names):
##  [1] 23->63 6 ->43 35->76 9 ->46 37->78 1 ->39 10->47 18->55 13->50 11->48
## [11] 12->49 12->56 21->61 30->69 25->66 19->57 19->59 14->51 17->54 34->74
## [21] 22->62 24->65 20->60 29->68 3 ->41 2 ->40 8 ->45 8 ->69 15->52 15->64
## [31] 15->70 26->66 28->67 31->72 27->66 4 ->42 4 ->58 4 ->71 4 ->75 32->73
## [41] 33->73 16->53 7 ->44 7 ->53 5 ->42 38->79 36->77

Similar to the Network graph but the nodes are labled.

plot(routes_igraph,edge.arrow.size = 0.2)

plot(routes_igraph, layout= layout_with_graphopt,  edge.arrow.size = 0.2)

Next let us try Tidygraph and GGraph.

routes_tidy<- tbl_graph(nodes=nodes,edges=edges,directed = TRUE)
routes_tidy
## # A tbl_graph: 79 nodes and 47 edges
## #
## # A rooted forest with 32 trees
## #
## # Node Data: 79 x 2 (active)
##      id label
##   <int> <int>
## 1     1    75
## 2     2   619
## 3     3   618
## 4     4   702
## 5     5   814
## 6     6    21
## # ... with 73 more rows
## #
## # Edge Data: 47 x 3
##    from    to weight
##   <int> <int>  <int>
## 1    23    63      1
## 2     6    43      1
## 3    35    76      1
## # ... with 44 more rows
routes_tidy %>%
  activate(edges)%>%
  arrange(desc(weight))
## # A tbl_graph: 79 nodes and 47 edges
## #
## # A rooted forest with 32 trees
## #
## # Edge Data: 47 x 3 (active)
##    from    to weight
##   <int> <int>  <int>
## 1    19    59      2
## 2     8    69      2
## 3     4    58      2
## 4    23    63      1
## 5     6    43      1
## 6    35    76      1
## # ... with 41 more rows
## #
## # Node Data: 79 x 2
##      id label
##   <int> <int>
## 1     1    75
## 2     2   619
## 3     3   618
## # ... with 76 more rows

Sparcer? True. But think of it as a litmust test: is it worth developing this graph? Yes becasue there are number clusters.

ggraph(routes_tidy)+
  geom_edge_link()+
  geom_node_point()+
  theme_graph()
## Using `nicely` as default layout

ggraph(routes_tidy, layout = "graphopt") + 
  geom_node_point() +
  geom_edge_link(aes(width = weight), alpha = 0.8) + 
  scale_edge_width(range = c(0.2, 2)) +
  geom_node_text(aes(label = label), repel = TRUE) +
  labs(edge_width = "weight") +
  theme_graph()

#ggsave("Network.png")
A<-knitr::include_graphics("Network1.png")
A

At last we see who the real estate companies should target for recruitmant Not the unpoachable star agents but the satelite agents. For one thing they are poachable and for the other becasue of their much lower commision split the company will make significantly more money.