RealEstateNetworkGraph

A Network Graph To Guide Real Estate Companies In Recruitment

The packages

library(tidyverse)

## -- Attaching packages ---------------------------------------------------------------------------- tidyverse 1.2.1 --

## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0

## -- Conflicts ------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggplot2)
library(treemap)

The data

Copy&PAste into your favorite browser then C&P into your favorite spreadseet naming it “FakeNews.csv”. Then import it as usual.

FakeNames: https://docs.google.com/spreadsheets/d/e/2PACX-1vTScinOGCLXceFsa5JSPXDzI83doTJUtDtSBFq3JVJ3namZwT3LTBuWcujV4UYr0fzjTEugOluh-nN1/pubhtml

This is the data we will actually work with. Again, C&P it in to a spreadsheet then save it as a csv entitled “data”: https://docs.google.com/spreadsheets/d/e/2PACX-1vQmdYoCFgOYVsV1cdXOmPWDN9eldxuGf1CggmLndOlgNzzPVpduWRkNiAu-lMFs4wqq8lxlD5eXhHJs/pubhtml

Becasue we are using real data from a NYC suburb we will annonymize it. First of all we have replaced the multiple listing numbers with 1: 3500. We have also replaced the actual Listing and Selling Agents with numbers . But the numbers must be consisitant since we will be investigateing the number of deals between the agents. So every time the name Abe appears we need it be the same number. This is so well done in R it is worth a digression demo of how this was done.

fake<- read_csv("FakeNames.csv")

## Parsed with column specification:
## cols(
##   ML = col_integer(),
##   ListingAgent = col_character(),
##   SellingAgent = col_character(),
##   ListingBroker = col_character(),
##   SellingBroker = col_character()
## )

head(fake)

## # A tibble: 6 x 5
##      ML ListingAgent SellingAgent ListingBroker SellingBroker
##   <int> <chr>        <chr>        <chr>         <chr>        
## 1     1 Abe          Abe          Jones         Jones        
## 2     2 Ben          Abe          Smith         Jones        
## 3     3 Cam          Cam          Ford          Ford         
## 4     4 Dan          Dan          Klein         Klein        
## 5     5 Eve          Dan          Ehrlich       Klein        
## 6     6 Eve          Dan          Ehrlich       Klein

For example, anywhere the name Abe appeared it is now replace by 1 in both columns.

fake$ListingAgent<-as.numeric(as.factor(fake$ListingAgent))
head(fake)

## # A tibble: 6 x 5
##      ML ListingAgent SellingAgent ListingBroker SellingBroker
##   <int>        <dbl> <chr>        <chr>         <chr>        
## 1     1            1 Abe          Jones         Jones        
## 2     2            2 Abe          Smith         Jones        
## 3     3            3 Cam          Ford          Ford         
## 4     4            4 Dan          Klein         Klein        
## 5     5            5 Dan          Ehrlich       Klein        
## 6     6            5 Dan          Ehrlich       Klein

fake$SellingAgent<-as.numeric(as.factor(fake$SellingAgent))
head(fake)

## # A tibble: 6 x 5
##      ML ListingAgent SellingAgent ListingBroker SellingBroker
##   <int>        <dbl>        <dbl> <chr>         <chr>        
## 1     1            1            1 Jones         Jones        
## 2     2            2            1 Smith         Jones        
## 3     3            3            2 Ford          Ford         
## 4     4            4            3 Klein         Klein        
## 5     5            5            3 Ehrlich       Klein        
## 6     6            5            3 Ehrlich       Klein

fake

## # A tibble: 15 x 5
##       ML ListingAgent SellingAgent ListingBroker SellingBroker
##    <int>        <dbl>        <dbl> <chr>         <chr>        
##  1     1            1            1 Jones         Jones        
##  2     2            2            1 Smith         Jones        
##  3     3            3            2 Ford          Ford         
##  4     4            4            3 Klein         Klein        
##  5     5            5            3 Ehrlich       Klein        
##  6     6            5            3 Ehrlich       Klein        
##  7     7            5            4 Ehrlich       Ehrlich      
##  8     8            5            1 Ehrlich       Jones        
##  9     9            5            2 Ehrlich       Ford         
## 10    10            5            4 Ehrlich       Ehrlich      
## 11    11            6            6 Post          Rimmer       
## 12    12            6            6 Post          Rimmer       
## 13    13            6            5 Post          Post         
## 14    14            7            1 Rimmer        Jones        
## 15    15            7            2 Rimmer        Ford

Next we want to annoomize the Brokerages. But instead of doing the same thing and leaving them as Numbers which might get confusing when we graph we will first convert to numbers and then convert back to consistant but randomly chosen letters.

So first we convert the brokerages to numbers

fake$ListingBroker<-as.numeric(as.factor(fake$ListingBroker))
fake$SellingBroker<-as.numeric(as.factor(fake$SellingBroker))
fake

## # A tibble: 15 x 5
##       ML ListingAgent SellingAgent ListingBroker SellingBroker
##    <int>        <dbl>        <dbl>         <dbl>         <dbl>
##  1     1            1            1             3             3
##  2     2            2            1             7             3
##  3     3            3            2             2             2
##  4     4            4            3             4             4
##  5     5            5            3             1             4
##  6     6            5            3             1             4
##  7     7            5            4             1             1
##  8     8            5            1             1             3
##  9     9            5            2             1             2
## 10    10            5            4             1             1
## 11    11            6            6             5             6
## 12    12            6            6             5             6
## 13    13            6            5             5             5
## 14    14            7            1             6             3
## 15    15            7            2             6             2

Now that we have bleached out the brokerages names we convert the respctive numbers back to letters only so that our graph is easier to read.

fake$ListingBroker<- chartr("123456789" , "ABCDEFGHI" , fake$ListingBroker)
fake$SellingBroker<- chartr("123456789" , "ABCDEFGHI" , fake$SellingBroker)
fake

## # A tibble: 15 x 5
##       ML ListingAgent SellingAgent ListingBroker SellingBroker
##    <int>        <dbl>        <dbl> <chr>         <chr>        
##  1     1            1            1 C             C            
##  2     2            2            1 G             C            
##  3     3            3            2 B             B            
##  4     4            4            3 D             D            
##  5     5            5            3 A             D            
##  6     6            5            3 A             D            
##  7     7            5            4 A             A            
##  8     8            5            1 A             C            
##  9     9            5            2 A             B            
## 10    10            5            4 A             A            
## 11    11            6            6 E             F            
## 12    12            6            6 E             F            
## 13    13            6            5 E             E            
## 14    14            7            1 F             C            
## 15    15            7            2 F             B

Import our data

data<-read_csv("ReducedData2.csv")

## Warning: Missing column names filled in: 'X1' [1]

## Parsed with column specification:
## cols(
##   X1 = col_integer(),
##   ListingID = col_integer(),
##   ListingAgent = col_integer(),
##   ListingBroker = col_character(),
##   SellingAgent = col_integer(),
##   SellingBroker = col_character(),
##   SoldPrice = col_integer(),
##   ContractDate = col_integer()
## )

dim(data)

## [1] 3314    8

Lets drop any rows missing data

data%>%drop_na()

## # A tibble: 3,228 x 8
##       X1 ListingID ListingAgent ListingBroker SellingAgent SellingBroker
##    <int>     <int>        <int> <chr>                <int> <chr>        
##  1     1      2728          702 BAC                     36 AEI          
##  2     2         1           47 GF                      73 II           
##  3     3      1019          259 A0A                    115 II           
##  4     4       712          125 A0B                    306 II           
##  5     5       965          226 AAF                    348 AEB          
##  6     6      1098          263 BAC                    579 AGC          
##  7     7       342           75 FH                     600 II           
##  8     8      1837          405 B0F                    608 AFC          
##  9     9         6          447 AAE                    657 AEA          
## 10    10      2465          619 DG                     824 AGC          
## # ... with 3,218 more rows, and 2 more variables: SoldPrice <int>,
## #   ContractDate <int>

dim(data)

## [1] 3314    8

head(data)

## # A tibble: 6 x 8
##      X1 ListingID ListingAgent ListingBroker SellingAgent SellingBroker
##   <int>     <int>        <int> <chr>                <int> <chr>        
## 1     1      2728          702 BAC                     36 AEI          
## 2     2         1           47 GF                      73 II           
## 3     3      1019          259 A0A                    115 II           
## 4     4       712          125 A0B                    306 II           
## 5     5       965          226 AAF                    348 AEB          
## 6     6      1098          263 BAC                    579 AGC          
## # ... with 2 more variables: SoldPrice <int>, ContractDate <int>

Not sure where that “X1”collumn came from but let us get rid of it.

data%>%
  select(-c(X1))

## # A tibble: 3,314 x 7
##    ListingID ListingAgent ListingBroker SellingAgent SellingBroker
##        <int>        <int> <chr>                <int> <chr>        
##  1      2728          702 BAC                     36 AEI          
##  2         1           47 GF                      73 II           
##  3      1019          259 A0A                    115 II           
##  4       712          125 A0B                    306 II           
##  5       965          226 AAF                    348 AEB          
##  6      1098          263 BAC                    579 AGC          
##  7       342           75 FH                     600 II           
##  8      1837          405 B0F                    608 AFC          
##  9         6          447 AAE                    657 AEA          
## 10      2465          619 DG                     824 AGC          
## # ... with 3,304 more rows, and 2 more variables: SoldPrice <int>,
## #   ContractDate <int>

We are intersted in which are the most productive Listing and Selling pairs so we will drop any observations where the Listing agent was also the Selling Agent.

data <- data[(data$ListingAgent != data$SellingAgent),]

dim(data)

## [1] 3313    8

Apparently only one agent was on both sides of a deal.

Let us make sure each transaction is not entered twice

distinct(data, ListingID, .keep_all = TRUE)

## # A tibble: 3,230 x 8
##       X1 ListingID ListingAgent ListingBroker SellingAgent SellingBroker
##    <int>     <int>        <int> <chr>                <int> <chr>        
##  1     1      2728          702 BAC                     36 AEI          
##  2     2         1           47 GF                      73 II           
##  3     3      1019          259 A0A                    115 II           
##  4     4       712          125 A0B                    306 II           
##  5     5       965          226 AAF                    348 AEB          
##  6     6      1098          263 BAC                    579 AGC          
##  7     7       342           75 FH                     600 II           
##  8     8      1837          405 B0F                    608 AFC          
##  9     9         6          447 AAE                    657 AEA          
## 10    10      2465          619 DG                     824 AGC          
## # ... with 3,220 more rows, and 2 more variables: SoldPrice <int>,
## #   ContractDate <int>

dim(data)

## [1] 3313    8

Our premise is that some agents sell a lot more real estate than other agents but how much more? The Treemap below gives some indication.

treemap(data,
        index = c("ListingAgent"),
        vSize = "SoldPrice" ,
        palette = "Greens" ,
        title = "Some Agents Sell A LOT More" ,
        fontsize.title = 14)

Again, we are interested in the most productive pairs so we will group our data pairwiseweight for purposes later.

And we will change Sold Price to **

data_2<- data %>% 
  group_by(ListingAgent, SellingAgent) %>%
  summarise(weight = sum(SoldPrice)) %>% 
  ungroup()

Quite a bit of compression

dim(data_2)

## [1] 2630    3

There are 840 unique listing agents. For our purposes this would be nothing but a tangled black blob of a graph so we will winnow the nuber down to the 50 most productive teams.

NumberOfListingAgents<-length(unique(data_2$ListingAgent))
NumberOfListingAgents

## [1] 840

We have already grouped them so all we need to do now is order them.

data_2<- data_2[order(-data_2$weight) , ]

And now we will select the top 50 pairs.

data_2<- data_2[1:50,]

head(data_2,10)

## # A tibble: 10 x 3
##    ListingAgent SellingAgent   weight
##           <int>        <int>    <int>
##  1          702          979 22745450
##  2          405          588 15541000
##  3           80          130 12588500
##  4          344          510 11016333
##  5          591          839 10361365
##  6          738         1030  7505000
##  7          375          544  6758500
##  8          739         1031  6725950
##  9          373          536  6549000
## 10           65          103  6231995