Useful commands or R Markdown Cheat Sheet

1 Steps for second deliverable

This is where the steps go

library(spatialreg)

## Loading required package: spData

## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`

## Loading required package: Matrix

## Loading required package: sf

## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

library(spdep)

## 
## Attaching package: 'spdep'

## The following objects are masked from 'package:spatialreg':
## 
##     get.ClusterOption, get.coresOption, get.mcOption,
##     get.VerboseOption, get.ZeroPolicyOption, set.ClusterOption,
##     set.coresOption, set.mcOption, set.VerboseOption,
##     set.ZeroPolicyOption

library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

library(classInt)
library(RColorBrewer)
#Friendship
idfriend <- "1dwX4kKlx-ctkU0JyH74p3r1w-jJdTAqi"
friendshiplazega <- read.csv(sprintf("https://docs.google.com/uc?id=%s&export=download", idfriend))

friendshiplazega <- as.matrix(friendshiplazega)
advicelazegac <- friendshiplazega
str(friendshiplazega)

##  int [1:71, 1:71] 0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:71] "V1" "V2" "V3" "V4" ...

dim(friendshiplazega)

## [1] 71 71

idattributes <- "1e0GtrRS5PFFNdnd1e4fJcjeuBZ6deF7g"
datattrout <- read.csv(sprintf("https://docs.google.com/uc?id=%s&export=download", idattributes))
friendshiplazega <-friendshiplazega /rowSums(friendshiplazega)
summary(rowSums(friendshiplazega))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       1       1       1       1       1       1       6

#Question 1: 
#remain isolates
friendshiplazega[is.na(friendshiplazega)]<-0
listwAd<-mat2listw(friendshiplazega)
# hour rate
moran.test(datattrout$HrRATE90,listwAd, zero.policy=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  datattrout$HrRATE90  
## weights: listwAd  
## n reduced by no-neighbour observations  
## 
## Moran I statistic standard deviate = 8.6798, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.520938960      -0.015625000       0.003821452

# Fees
moran.test(datattrout$FeesCollec90,listwAd, zero.policy=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  datattrout$FeesCollec90  
## weights: listwAd  
## n reduced by no-neighbour observations  
## 
## Moran I statistic standard deviate = 7.628, p-value = 1.192e-14
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.455321539      -0.015625000       0.003811733

# Question 2: 
#########
#Visualizing Local Moran’s I clusters and outliers 
#########
mt <- moran.test(datattrout$FeesCollec90, listwAd, zero.policy=TRUE) # saving the coefficient
label_x = "Individual Fees Collected"
label_y = "Lagged Individual Fees Collected"
#graph format
mp <- moran.plot(datattrout$FeesCollec90, listwAd, zero.policy=T,
labels=datattrout$id, xlab = label_x, ylab = label_y)
title(main="Moran’s Plot", cex.main=2, col.main="grey11", 
font.main=2, sub=paste("Plot includes 71 lawyers (Moran’s I = ", 
round(mt$ estimate[1], 3), ", p < .0001)", sep=""), cex.sub=1.15, col.sub="grey11", font.sub=2,)

mt1 <- moran.test(datattrout$HrRATE90, listwAd, zero.policy=TRUE) # saving the coefficient
label_x = "HR rate Collected"
label_y = "Lagged HR rate Collected"
#graph format
mp1 <- moran.plot(datattrout$HrRATE90, listwAd, zero.policy=T,
                  labels=datattrout$id, xlab = label_x, ylab = label_y)
title(main="Moran’s Plot", cex.main=2, col.main="grey11", 
      font.main=2, sub=paste("Plot includes 71 lawyers (Moran’s I = ", 
                             round(mt1$ estimate[1], 3), ", p < .0001)", sep=""), cex.sub=1.15, col.sub="grey11", font.sub=2,)

#present isolates
summary(listwAd, zero.policy=T)

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 71 
## Number of nonzero links: 575 
## Percentage nonzero weights: 11.40647 
## Average number of links: 8.098592 
## 6 regions with no links:
## 3, 6, 37, 44, 47, 55
## 3 disjoint connected subgraphs
## Non-symmetric neighbours list
## Link number distribution:
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 19 22 23 25 
##  6  3  2  4  9  3  6  3  8  5  2  2  4  2  2  3  1  1  2  2  1 
## 3 least connected regions:
## 8 23 71 with 1 link
## 1 most connected region:
## 31 with 25 links
## 
## Weights style: M 
## Weights constants summary:
##    n   nn S0       S1       S2
## M 65 4225 65 18.28909 277.2203

#Question 3: Remove isolates
sub_listNAd <- subset(listwAd[[2]], subset=card(listwAd[[2]])> 0) # at least one neighbor, one friendship; remove isolates
sub_listNAd #neighbor list, no isolates

## Neighbour list object:
## Number of regions: 65 
## Number of nonzero links: 565 
## Percentage nonzero weights: 13.37278 
## Average number of links: 8.692308 
## Non-symmetric neighbours list

sub_listWAd <- nb2listw(sub_listNAd, glist=NULL, style="W", zero.policy=NULL) # list of weights
summary(sub_listWAd)

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 65 
## Number of nonzero links: 565 
## Percentage nonzero weights: 13.37278 
## Average number of links: 8.692308 
## Non-symmetric neighbours list
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 19 22 23 25 
##  4  1  5  8  3  7  4  7  4  2  3  3  2  4  2  1  2  2  1 
## 4 least connected regions:
## 7 8 23 71 with 1 link
## 1 most connected region:
## 31 with 25 links
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1       S2
## W 65 4225 65 19.43774 283.1838

#making the spatial dataframe
sub_datattrout <- subset(datattrout, subset=card(listwAd[[2]]) > 0)
dim(sub_datattrout)

## [1] 65 21

# by removing isolates:
moran.test(sub_datattrout$HrRATE90,sub_listWAd, zero.policy=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  sub_datattrout$HrRATE90  
## weights: sub_listWAd    
## 
## Moran I statistic standard deviate = 10.907, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.684439193      -0.015625000       0.004119984

moran.test(sub_datattrout$FeesCollec90,sub_listWAd, zero.policy=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  sub_datattrout$FeesCollec90  
## weights: sub_listWAd    
## 
## Moran I statistic standard deviate = 8.5307, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.528467238      -0.015625000       0.004067992

#########
#Visualizing Local Moran’s I clusters and outliers WITHOUT isolates
#########
# hr rate plot
mt1 <- moran.test(sub_datattrout$HrRATE90, sub_listWAd, zero.policy=TRUE)
label_x = "Hr Rate Collected"
label_y = "Lagged Hr Rates Collected"
mp1 <- moran.plot(sub_datattrout$HrRATE90, sub_listWAd, zero.policy=T,
                  labels=sub_datattrout$id, xlab = label_x, ylab = label_y)
title(main="Moran’s Plot", cex.main=2, col.main="grey11", 
      font.main=2, sub=paste("Plot includes 71 lawyers (Moran’s I = ", 
                             round(mt1$ estimate[1], 3), ", p < .0001)", sep=""), cex.sub=1.15, col.sub="grey11", font.sub=2,)

# Individual fee plot
mt <- moran.test(sub_datattrout$FeesCollec90, sub_listWAd, zero.policy=TRUE) 
label_x = "Individual Fee Collected"
label_y = "Lagged Individual Fees Collected"
mp <- moran.plot(sub_datattrout$FeesCollec90, sub_listWAd, zero.policy=T,
                 labels=sub_datattrout$id, xlab = label_x, ylab = label_y)
title(main="Moran’s Plot", cex.main=2, col.main="grey11", 
      font.main=2, sub=paste("Plot includes 71 lawyers (Moran’s I = ", 
                             round(mt$ estimate[1], 3), ", p < .0001)", sep=""), cex.sub=1.15, col.sub="grey11", font.sub=2,)

#########
####HIGHER ORDER NEIGHBORS 
# Per Hour Rate
plot.spcor(sp.correlogram(sub_listNAd, sub_datattrout$HrRATE90, order = 6, method = "I", zero.policy=T), xlab = "Social lags", main = "Social correlogram: Autocorrelation with CIs")

sp.correlogram(sub_listNAd, sub_datattrout$HrRATE90, order = 6, method = "I", zero.policy=T) # calculate social correlogram for autocorrelations

## Spatial correlogram for sub_datattrout$HrRATE90 
## method: Moran's I
##           estimate expectation    variance standard deviate Pr(I) two sided    
## 1 (65)  0.68443919 -0.01562500  0.00411998          10.9066       < 2.2e-16 ***
## 2 (64)  0.28650045 -0.01587302  0.00070339          11.4011       < 2.2e-16 ***
## 3 (65) -0.30029146 -0.01562500  0.00074808         -10.4079       < 2.2e-16 ***
## 4 (61) -0.62131921 -0.01666667  0.00439316          -9.1226       < 2.2e-16 ***
## 5 (29) -0.52730140 -0.03571429  0.01166387          -4.5518        5.32e-06 ***
## 6 (4)  -0.09714050 -0.33333333  0.16780179           0.5766          0.5642    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Individual Fee
plot.spcor(sp.correlogram(sub_listNAd, sub_datattrout$FeesCollec90, order = 6, method = "I", zero.policy=T), xlab = "Social lags", main = "Social correlogram: Autocorrelation with CIs")

sp.correlogram(sub_listNAd, sub_datattrout$FeesCollec90, order = 6, method = "I", zero.policy=T)

## Spatial correlogram for sub_datattrout$FeesCollec90 
## method: Moran's I
##           estimate expectation    variance standard deviate Pr(I) two sided    
## 1 (65)  0.52846724 -0.01562500  0.00406799           8.5307       < 2.2e-16 ***
## 2 (64)  0.20081293 -0.01587302  0.00069469           8.2212       < 2.2e-16 ***
## 3 (65) -0.17772748 -0.01562500  0.00073905          -5.9628       2.479e-09 ***
## 4 (61) -0.62090021 -0.01666667  0.00434728          -9.1642       < 2.2e-16 ***
## 5 (29) -0.51826134 -0.03571429  0.01156964          -4.4862       7.250e-06 ***
## 6 (4)  -0.08581660 -0.33333333 -0.23608125              NaN             NaN    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2 Please replicate the procedures used in social dependence or peer effects to answer the following questions but using a friendship dataset instead:

2.1 Is there evidence of stronger dependence in the per hour rate compared to the fees brought in 1990 amounts?

Answer:

The outcome of Moran’s I test for the per-hour rate is 0.52, which is higher than the outcome of Moran’s I test for individual fees, 0.45, in 1990. Both results are positive, indicating that the outcomes align in the same direction as those of peers. This suggests that per-hour rates are more stable and less affected by external factors compared to individual fees. Therefore, there is evidence of stronger spatial dependence in the per-hour rate compared to the fees brought in during 1990.

2.2 Do you have isolates in the model, if so what are the implications?

Answer:

6 isolates identified: A03, A06, A37, A44, A47, A55

Implication: These six lawyers did not exhibit any social interactions outside of work. They had no friendships with people outside their workplace and were recorded as earning $0 per hour or having $0 in fees—figures that are inaccurate These isolates represent inaccurate data points due to disconnected actors in the network, which may be affecting the dependence analysis by introducing noise.

2.3 If you have isolates, after removing them please answer the following:

2.3.1 How do the Moran’s I estimates change?

Answer:

After removing the isolates, Moran’s I estimates increased compared to the results when isolates were included. Moran’s I for the per-hour rate increased from 0.52 to 0.68, while Moran’s I for fees increased from 0.45 to 0.528.

2.3.2 Are there any differences in these changes by outcome?

Answer:

Removing isolates reduces the effects of outliers and clarifies the relationships in the data. The relationships among the per-hour rates and the fees both become clearer, as the removal of noise caused by isolates reveals stronger clustering patterns. This suggests that more meaningful and useful spatial dependencies can be identified in the absence of isolates.

2.3.3 How many higher order neighbors did you find?

Answer:

In both the per-hour rate and fees outcomes, we identified one higher-order neighbor (second-order neighbor).

2.3.4 Does this change by outcome?

Answer:

This did not change across outcomes because the number of higher-order neighbors remained consistent. Based on the outcomes for the per-hour rate and individual fees, there is only one higher-order neighbor.

Deliverable 5: Social Dpendence or Peer Effect: Friendship Network

Jinze Yang

December 16, 2024

1 Steps for second deliverable