Eighth Deliverable

Introduction

The Lazega network is a network of friends in a northeastern US corporate law firm. It includes 71 attorneys including partners and associates. The friendship network looks at who members of the firms indicated in the following question:

Friendship’ network: “Would you go through this list, and check the names of those you socialize with outside work. You know their family, they know yours, for instance. I do not mean all the people you are simply on a friendly level with, or people you happen to meet at Firm functions.”

For our analysis we will look at if the friends a lawyer is connected with influences the hourly rate you charge and the fees collected.

idfriend <- "1dwX4kKlx-ctkU0JyH74p3r1w-jJdTAqi" 
friendshiplazega <- read.csv(sprintf("https://docs.google.com/uc?id=%s&export=download", idfriend))

friendshiplazega<- as.matrix(friendshiplazega)
friendshiplazega <- friendshiplazega
str(friendshiplazega)

##  int [1:71, 1:71] 0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:71] "V1" "V2" "V3" "V4" ...

dim(friendshiplazega)

## [1] 71 71

#Reading attributes also provided by Lazega
idattributes <- "1e0GtrRS5PFFNdnd1e4fJcjeuBZ6deF7g"
datattrout <- read.csv(sprintf("https://docs.google.com/uc?id=%s&export=download", idattributes))

#Transforming the matrices to spatial form (row normalized) as shown in equation (21)
friendshiplazega <-friendshiplazega /rowSums(friendshiplazega)
summary(rowSums(friendshiplazega))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       1       1       1       1       1       1       6

Part 1

Having friends with a higher influence is associated with higher hourly rate and higher fees collected. The Moran’s I indicates a strong social dependence within our data, given that 1 is perfect clustering and zero is randomness.

Having friends with a higher influence in associated with higher fees collected. There is evidence of a stronger dependence on hourly rate than fees connected. The Moran I statstic for HRRate90 > FeesCollected.

# Replacing potential NAN to zeros as shown in equation (22)
friendshiplazega[is.na(friendshiplazega)]<-0

# Running Moran test for spatial dependence
test.listwAd<-mat2listw(friendshiplazega)
moran.test(datattrout$HrRATE90,test.listwAd, zero.policy=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  datattrout$HrRATE90  
## weights: test.listwAd  n reduced by no-neighbour observations
##   
## 
## Moran I statistic standard deviate = 8.6798, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.520938960      -0.015625000       0.003821452

moran.test(datattrout$FeesCollec90,test.listwAd, zero.policy=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  datattrout$FeesCollec90  
## weights: test.listwAd  n reduced by no-neighbour observations
##   
## 
## Moran I statistic standard deviate = 7.628, p-value = 1.192e-14
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.455321539      -0.015625000       0.003811733

Part 2

Next since the Moran’s I indicates social dependence there could be a violation of independence. To correct for this, we need to run an empty model to formally test for social dependence in the residuals. Looking at the residuals we do see a left skew with a long tail towards low values for Hourly Rate.A lot of our data is clustered towards high values. The Fees collected histogram is approximately normal.

Our Moran’s I test proves our earlier assessment of the histogram is correct. The p value is less than .05 (HRRAATE p-value = 2.2e-16, Fees p = p-value = 1.192e-14 ) indicating social dependence in our data. We need to isolate and control for peer effects in our analysis.

#### first we will run a empty model without our connection data and spatial regression
model_hr <- lm(HrRATE90 ~ 1, data = data.frame(datattrout))
hr_null <- model_hr$residuals


#### repeat for fees
model_fees <- lm(FeesCollec90 ~ 1, data = data.frame(datattrout))
fees_null <- model_fees$residuals

####look at the residual distribution
hist(hr_null)

hist(fees_null)

####run the moran test to determine if residuals are independent
moran.test(hr_null,test.listwAd, zero.policy=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  hr_null  
## weights: test.listwAd  n reduced by no-neighbour observations
##   
## 
## Moran I statistic standard deviate = 8.6798, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.520938960      -0.015625000       0.003821452

moran.test(fees_null,test.listwAd, zero.policy=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  fees_null  
## weights: test.listwAd  n reduced by no-neighbour observations
##   
## 
## Moran I statistic standard deviate = 7.628, p-value = 1.192e-14
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.455321539      -0.015625000       0.003811733

To do this we run an empty model with spatial autoregression with our weighted friend connection list. Then to To test if this model properly accounts for social dependence we run a Moran I test with the residuals of the spatial autoregression. From our Moran I we see that both for Hourly Rate and Fees Collected in the updated model p value of our residuals are greater than .05, and are close to 0 or complete randomness indicating there is no social dependence. We’ve controlled for the connection effects within the data. This means that we don’t have to modify the empty model.

HR90Ad <- spautolm(formula = HrRATE90 ~ 1, data = data.frame(datattrout), listw = test.listwAd)
summary(HR90Ad)

## 
## Call: spautolm(formula = HrRATE90 ~ 1, data = data.frame(datattrout), 
##     listw = test.listwAd)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -127.9043  -16.6800   -1.1633   19.2322  113.7546 
## 
## Coefficients: 
##             Estimate Std. Error z value  Pr(>|z|)
## (Intercept)  127.904     13.707  9.3313 < 2.2e-16
## 
## Lambda: 0.90349 LR test value: 51.678 p-value: 6.5392e-13 
## Numerical Hessian standard error of lambda: 0.067377 
## 
## Log likelihood: -357.4523 
## ML residual variance (sigma squared): 1241, (sigma: 35.228)
## Number of observations: 71 
## Number of parameters estimated: 3 
## AIC: 720.9

Fees90Ad <- spautolm(formula = FeesCollec90 ~ 1, data = data.frame(datattrout), listw = test.listwAd)
summary(Fees90Ad)

## 
## Call: spautolm(formula = FeesCollec90 ~ 1, data = data.frame(datattrout), 
##     listw = test.listwAd)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -216274.2  -39115.2   -6300.7   41926.4  156100.4 
## 
## Coefficients: 
##             Estimate Std. Error z value  Pr(>|z|)
## (Intercept)   141650      25998  5.4484 5.082e-08
## 
## Lambda: 0.87081 LR test value: 40.096 p-value: 2.4183e-10 
## Numerical Hessian standard error of lambda: 0.08461 
## 
## Log likelihood: -895.3846 
## ML residual variance (sigma squared): 4788700000, (sigma: 69200)
## Number of observations: 71 
## Number of parameters estimated: 3 
## AIC: 1796.8

hNULL<-residuals(HR90Ad)
#Testing residuals for sp dependence
moran.test(hNULL,test.listwAd, zero.policy=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  hNULL  
## weights: test.listwAd  n reduced by no-neighbour observations
##   
## 
## Moran I statistic standard deviate = 0.17573, p-value = 0.4303
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      -0.005093321      -0.015625000       0.003591805

hNULL<-residuals(Fees90Ad)
moran.test(hNULL,test.listwAd, zero.policy=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  hNULL  
## weights: test.listwAd  n reduced by no-neighbour observations
##   
## 
## Moran I statistic standard deviate = 0.13626, p-value = 0.4458
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      -0.007285569      -0.015625000       0.003745885

Part 2.2

The Moran test indicates that we’ve accounted for social dependence. However, the fact there is a power dynamic in the relationship, suggests we should investigate this impact on our model. It’s likely that friendship with partners may increase the amount of mentorship, advice, or influence a lawyer may have. This could increase their salary and perhaps bolster the fees they collect due to more mentorship or exposure to clients.

Part 3

We see that being a partner has a significant effect on fees collected and hourly rate. When testing the residuals including partner changed the Morans I, with hourly rate moving closer to the expected.

#### Repeat trying to control for spatial dependence instead using the partner variable

HR90Ad <- spautolm(formula = HrRATE90 ~ partner, data = data.frame(datattrout), listw = test.listwAd)
summary(HR90Ad)

## 
## Call: spautolm(formula = HrRATE90 ~ partner, data = data.frame(datattrout), 
##     listw = test.listwAd)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -184.5283  -16.0358   -2.0851   17.9191   82.5835 
## 
## Coefficients: 
##             Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 117.7425     8.3332 14.1293 < 2.2e-16
## partner      66.7858    10.0640  6.6361 3.221e-11
## 
## Lambda: 0.53377 LR test value: 3.8789 p-value: 0.048898 
## Numerical Hessian standard error of lambda: 0.26431 
## 
## Log likelihood: -347.566 
## ML residual variance (sigma squared): 1018.2, (sigma: 31.909)
## Number of observations: 71 
## Number of parameters estimated: 4 
## AIC: 703.13

Fees90Ad <- spautolm(formula = FeesCollec90 ~ partner, data = data.frame(datattrout), listw = test.listwAd)
summary(Fees90Ad)

## 
## Call: 
## spautolm(formula = FeesCollec90 ~ partner, data = data.frame(datattrout), 
##     listw = test.listwAd)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -221898.7  -37379.2   -1568.4   43189.5  154364.2 
## 
## Coefficients: 
##             Estimate Std. Error z value  Pr(>|z|)
## (Intercept)   125629      22670  5.5415 2.998e-08
## partner        71126      23259  3.0580  0.002228
## 
## Lambda: 0.7503 LR test value: 10.971 p-value: 0.00092571 
## Numerical Hessian standard error of lambda: 0.159 
## 
## Log likelihood: -891.786 
## ML residual variance (sigma squared): 4473300000, (sigma: 66882)
## Number of observations: 71 
## Number of parameters estimated: 4 
## AIC: 1791.6

hNULL<-residuals(HR90Ad)
#Testing residuals for sp dependence
moran.test(hNULL,test.listwAd, zero.policy=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  hNULL  
## weights: test.listwAd  n reduced by no-neighbour observations
##   
## 
## Moran I statistic standard deviate = 0.10212, p-value = 0.4593
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      -0.010132327      -0.015625000       0.002893019

hNULL<-residuals(Fees90Ad)
moran.test(hNULL,test.listwAd, zero.policy=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  hNULL  
## weights: test.listwAd  n reduced by no-neighbour observations
##   
## 
## Moran I statistic standard deviate = 0.15546, p-value = 0.4382
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      -0.006179954      -0.015625000       0.003691367

Given that we know partner status may give a person disproportionate influence in a law firm. It is worth exploring as a lagged indicator. We see that from our regression people who have a network of advisors who are all partners, there is a ~$95 increase in hourly rate. If you control for this network of advisors the social dependence

datattrout$lag.partnersAd <- lag.listw(test.listwAd, datattrout$partner, zero.policy=T, na.action=na.omit)

summary(datattrout$lag.partnersAd)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.08889 0.50000 0.51876 0.87500 1.00000

summary(datattrout$partner)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   1.000   0.507   1.000   1.000

HR90Ad <- spautolm(formula = HrRATE90 ~ lag.partnersAd, data = data.frame(datattrout), listw = test.listwAd)
summary(HR90Ad)

## 
## Call: 
## spautolm(formula = HrRATE90 ~ lag.partnersAd, data = data.frame(datattrout), 
##     listw = test.listwAd)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -107.09624  -17.75898   -0.59362   18.86619  122.90376 
## 
## Coefficients: 
##                Estimate Std. Error z value  Pr(>|z|)
## (Intercept)    107.0962     8.8789 12.0619 < 2.2e-16
## lag.partnersAd  94.7810    13.9828  6.7784 1.215e-11
## 
## Lambda: 0.40163 LR test value: 1.6036 p-value: 0.20539 
## Numerical Hessian standard error of lambda: 0.29418 
## 
## Log likelihood: -351.1784 
## ML residual variance (sigma squared): 1141.5, (sigma: 33.787)
## Number of observations: 71 
## Number of parameters estimated: 4 
## AIC: 710.36

hr_null <- residuals(HR90Ad)
moran.test(hr_null,test.listwAd, zero.policy=TRUE)