A Network Graph To Guide Real Estate Companies In Recruitment
Unlike most other companies real estate companies cannot increase the number of sales . The bes they can do is to try to get a larger share of the sales that will occur. Real estate sales are not evenly distribute among the sales force but are HIGHLY skewed. To this aim real estate companies try to recruit star agents but for reasons we will discuss this is just plain futile. But this is just as well. It turns out that each star agent has their own coterie of agents who sell the star agents listings. And not only are these satelite agents poachable but becasue of differences in commision splits the companies will make more money from recruiting the satelite agents than if they had recruited the star.
The packages
library(tidyverse)
## -- Attaching packages ---------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
library(treemap)
The data
Copy&PAste into your favorite browser then C&P into your favorite spreadseet naming it “FakeNews.csv”. Then import it as usual.
Becasue we are using real data from a NYC suburb we will annonymize it. First of all we have replaced the multiple listing numbers with 1: 3500. We have also replaced the actual Listing and Selling Agents with numbers . But the numbers must be consisitant since we will be investigateing the number of deals between the agents. So every time the name Abe appears we need it be the same number. This is so well done in R it is worth a digression demo of how this was done.
fake<- read_csv("FakeNames.csv")
## Parsed with column specification:
## cols(
## ML = col_integer(),
## ListingAgent = col_character(),
## SellingAgent = col_character(),
## ListingBroker = col_character(),
## SellingBroker = col_character()
## )
head(fake)
## # A tibble: 6 x 5
## ML ListingAgent SellingAgent ListingBroker SellingBroker
## <int> <chr> <chr> <chr> <chr>
## 1 1 Abe Abe Jones Jones
## 2 2 Ben Abe Smith Jones
## 3 3 Cam Cam Ford Ford
## 4 4 Dan Dan Klein Klein
## 5 5 Eve Dan Ehrlich Klein
## 6 6 Eve Dan Ehrlich Klein
For example, anywhere the name Abe appeared it is now replace by 1 in both columns.
fake$ListingAgent<-as.numeric(as.factor(fake$ListingAgent))
head(fake)
## # A tibble: 6 x 5
## ML ListingAgent SellingAgent ListingBroker SellingBroker
## <int> <dbl> <chr> <chr> <chr>
## 1 1 1 Abe Jones Jones
## 2 2 2 Abe Smith Jones
## 3 3 3 Cam Ford Ford
## 4 4 4 Dan Klein Klein
## 5 5 5 Dan Ehrlich Klein
## 6 6 5 Dan Ehrlich Klein
fake$SellingAgent<-as.numeric(as.factor(fake$SellingAgent))
head(fake)
## # A tibble: 6 x 5
## ML ListingAgent SellingAgent ListingBroker SellingBroker
## <int> <dbl> <dbl> <chr> <chr>
## 1 1 1 1 Jones Jones
## 2 2 2 1 Smith Jones
## 3 3 3 2 Ford Ford
## 4 4 4 3 Klein Klein
## 5 5 5 3 Ehrlich Klein
## 6 6 5 3 Ehrlich Klein
fake
## # A tibble: 15 x 5
## ML ListingAgent SellingAgent ListingBroker SellingBroker
## <int> <dbl> <dbl> <chr> <chr>
## 1 1 1 1 Jones Jones
## 2 2 2 1 Smith Jones
## 3 3 3 2 Ford Ford
## 4 4 4 3 Klein Klein
## 5 5 5 3 Ehrlich Klein
## 6 6 5 3 Ehrlich Klein
## 7 7 5 4 Ehrlich Ehrlich
## 8 8 5 1 Ehrlich Jones
## 9 9 5 2 Ehrlich Ford
## 10 10 5 4 Ehrlich Ehrlich
## 11 11 6 6 Post Rimmer
## 12 12 6 6 Post Rimmer
## 13 13 6 5 Post Post
## 14 14 7 1 Rimmer Jones
## 15 15 7 2 Rimmer Ford
Next we want to annoomize the Brokerages. But instead of doing the same thing and leaving them as Numbers which might get confusing when we graph we will first convert to numbers and then convert back to consistant but randomly chosen letters.
So first we convert the brokerages to numbers
fake$ListingBroker<-as.numeric(as.factor(fake$ListingBroker))
fake$SellingBroker<-as.numeric(as.factor(fake$SellingBroker))
fake
## # A tibble: 15 x 5
## ML ListingAgent SellingAgent ListingBroker SellingBroker
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 3 3
## 2 2 2 1 7 3
## 3 3 3 2 2 2
## 4 4 4 3 4 4
## 5 5 5 3 1 4
## 6 6 5 3 1 4
## 7 7 5 4 1 1
## 8 8 5 1 1 3
## 9 9 5 2 1 2
## 10 10 5 4 1 1
## 11 11 6 6 5 6
## 12 12 6 6 5 6
## 13 13 6 5 5 5
## 14 14 7 1 6 3
## 15 15 7 2 6 2
Now that we have bleached out the brokerages names we convert the respctive numbers back to letters only so that our graph is easier to read.
fake$ListingBroker<- chartr("123456789" , "ABCDEFGHI" , fake$ListingBroker)
fake$SellingBroker<- chartr("123456789" , "ABCDEFGHI" , fake$SellingBroker)
fake
## # A tibble: 15 x 5
## ML ListingAgent SellingAgent ListingBroker SellingBroker
## <int> <dbl> <dbl> <chr> <chr>
## 1 1 1 1 C C
## 2 2 2 1 G C
## 3 3 3 2 B B
## 4 4 4 3 D D
## 5 5 5 3 A D
## 6 6 5 3 A D
## 7 7 5 4 A A
## 8 8 5 1 A C
## 9 9 5 2 A B
## 10 10 5 4 A A
## 11 11 6 6 E F
## 12 12 6 6 E F
## 13 13 6 5 E E
## 14 14 7 1 F C
## 15 15 7 2 F B
Import our data
data<-read_csv("ReducedData2.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_integer(),
## ListingID = col_integer(),
## ListingAgent = col_integer(),
## ListingBroker = col_character(),
## SellingAgent = col_integer(),
## SellingBroker = col_character(),
## SoldPrice = col_integer(),
## ContractDate = col_integer()
## )
dim(data)
## [1] 3314 8
Lets drop any rows missing data
data%>%drop_na()
## # A tibble: 3,228 x 8
## X1 ListingID ListingAgent ListingBroker SellingAgent SellingBroker
## <int> <int> <int> <chr> <int> <chr>
## 1 1 2728 702 BAC 36 AEI
## 2 2 1 47 GF 73 II
## 3 3 1019 259 A0A 115 II
## 4 4 712 125 A0B 306 II
## 5 5 965 226 AAF 348 AEB
## 6 6 1098 263 BAC 579 AGC
## 7 7 342 75 FH 600 II
## 8 8 1837 405 B0F 608 AFC
## 9 9 6 447 AAE 657 AEA
## 10 10 2465 619 DG 824 AGC
## # ... with 3,218 more rows, and 2 more variables: SoldPrice <int>,
## # ContractDate <int>
dim(data)
## [1] 3314 8
head(data)
## # A tibble: 6 x 8
## X1 ListingID ListingAgent ListingBroker SellingAgent SellingBroker
## <int> <int> <int> <chr> <int> <chr>
## 1 1 2728 702 BAC 36 AEI
## 2 2 1 47 GF 73 II
## 3 3 1019 259 A0A 115 II
## 4 4 712 125 A0B 306 II
## 5 5 965 226 AAF 348 AEB
## 6 6 1098 263 BAC 579 AGC
## # ... with 2 more variables: SoldPrice <int>, ContractDate <int>
Not sure where that “X1”collumn came from but let us get rid of it.
data%>%
select(-c(X1))
## # A tibble: 3,314 x 7
## ListingID ListingAgent ListingBroker SellingAgent SellingBroker
## <int> <int> <chr> <int> <chr>
## 1 2728 702 BAC 36 AEI
## 2 1 47 GF 73 II
## 3 1019 259 A0A 115 II
## 4 712 125 A0B 306 II
## 5 965 226 AAF 348 AEB
## 6 1098 263 BAC 579 AGC
## 7 342 75 FH 600 II
## 8 1837 405 B0F 608 AFC
## 9 6 447 AAE 657 AEA
## 10 2465 619 DG 824 AGC
## # ... with 3,304 more rows, and 2 more variables: SoldPrice <int>,
## # ContractDate <int>
We are intersted in which are the most productive Listing and Selling pairs so we will drop any observations where the Listing agent was also the Selling Agent.
data <- data[(data$ListingAgent != data$SellingAgent),]
dim(data)
## [1] 3313 8
Apparently only one agent was on both sides of a deal.
Let us make sure each transaction is not entered twice
distinct(data, ListingID, .keep_all = TRUE)
## # A tibble: 3,230 x 8
## X1 ListingID ListingAgent ListingBroker SellingAgent SellingBroker
## <int> <int> <int> <chr> <int> <chr>
## 1 1 2728 702 BAC 36 AEI
## 2 2 1 47 GF 73 II
## 3 3 1019 259 A0A 115 II
## 4 4 712 125 A0B 306 II
## 5 5 965 226 AAF 348 AEB
## 6 6 1098 263 BAC 579 AGC
## 7 7 342 75 FH 600 II
## 8 8 1837 405 B0F 608 AFC
## 9 9 6 447 AAE 657 AEA
## 10 10 2465 619 DG 824 AGC
## # ... with 3,220 more rows, and 2 more variables: SoldPrice <int>,
## # ContractDate <int>
dim(data)
## [1] 3313 8
Our premise is that some agents sell a lot more real estate than other agents but how much more? The Treemap below gives some indication.
treemap(data,
index = c("ListingAgent"),
vSize = "SoldPrice" ,
palette = "Greens" ,
title = "Some Agents Sell A LOT More" ,
fontsize.title = 14)

Again, we are interested in the most productive pairs so we will group our data pairwiseweight for purposes later.
And we will change Sold Price to **
data_2<- data %>%
group_by(ListingAgent, SellingAgent) %>%
summarise(weight = sum(SoldPrice)) %>%
ungroup()
Quite a bit of compression
dim(data_2)
## [1] 2630 3
There are 840 unique listing agents. For our purposes this would be nothing but a tangled black blob of a graph so we will winnow the nuber down to the 50 most productive teams.
NumberOfListingAgents<-length(unique(data_2$ListingAgent))
NumberOfListingAgents
## [1] 840
We have already grouped them so all we need to do now is order them.
data_2<- data_2[order(-data_2$weight) , ]
And now we will select the top 50 pairs.
data_2<- data_2[1:50,]
head(data_2,10)
## # A tibble: 10 x 3
## ListingAgent SellingAgent weight
## <int> <int> <int>
## 1 702 979 22745450
## 2 405 588 15541000
## 3 80 130 12588500
## 4 344 510 11016333
## 5 591 839 10361365
## 6 738 1030 7505000
## 7 375 544 6758500
## 8 739 1031 6725950
## 9 373 536 6549000
## 10 65 103 6231995