In this lab, youâll be building, estimating, and interpreting actor-based longitudinal network models using RSiena
.1 Tom Snijders has led the development of the Siena framework, with RSiena
supplanting a specialized Windows program in 2011. You can learn more about RSiena
, research using the framework, and more here. RSiena
is used to model actor-based longitudinal network models in order to examine the effects of network ties over time on a certain behavior, or the effect of a certain behavior on tie formation over time. SIENA stands for Simulation Investigation for Empirical Network Analysis.
You will be using an excerpt of data from the Teenage Friends and Lifestyle Study.2 You can learn more about this data set, created by Prof. Snijders, here. The data set includes 3 network files containing friendship relationships between 50 teenage girls recorded at 3 consecutive points in time.
s50-network1.dat
s50-network2.dat
s50-network23.dat
The data also includes information about the smoking behavior of the 50 female students (s50-smoke.dat
). The smoking variable has three levels: 1 (does not smoke), 2 (smokes occasionally) and 3 (smokes regularly).
The core analysis conducted in this lab consists of taking a set of intuitive or plain language hypotheses about our data set and converting them into hypotheses in network terminology. Once we have our hypotheses, we will operationalize them into parameters in our model. Finally, we will interpret these results by testing our hypotheses.
First, we will create testable hypotheses based on intuitions about habit and friendship formation.
Formulate hypotheses using network terminology discussed throughout the course based on the following friendship relations.3 If you get stuck, it may be helpful to look at the terms included in the model below for hints on how to translate plain-language intuitions into network terminology. The first one is done for you:
Hypothesis 1: Ties between students are not random.
Relational Hypotheses:
Hypothesis 2: Friendship ties shows reciprocity.
Hypothesis 3: Transitivity is observed in friendship ties.
Hypothesis 4: Degree centrality for smokers is less than non-smokers.
Hypothesis 5: Smokers will have lower in-degree centrality than non- smokers.
Hypothesis 6: Friendship links can be predicted by homophily. Smoking Behavior Hypotheses:
Hypothesis 7:Smoking behavior is likely to increase with time(age).
Hypothesis 8: Actors will show similar smoking behavior as other actors with whom they share a friensdhip tie.
To better intuitively understand our network, let’s examine three network visualizations showing the friendship network over time and the smoking behavior of the nodes.
Let’s visualize our network at the three time periods under discussion. Node size increases with smoking behavior. Green nodes represent no smoking, yellow nodes represent light smoking, and red nodes represent heavy smoking. Think about the macro-level features of each network.
Which two nodes represent isolates in the network at all three time periods?
Node 13 and 20 are the isolates in all the networks at all three time periods.
Describe the change in the network over time. Think about the formation of clusters and the incidence of smoking behavior.
Time T1 In time T1 the network shows 7 clusters of students .Some clusters are big some are really small for example one cluster only has two nodes.There are 5 ocassional smokers, 7 heavy smokers inthe network and majority of students are non smokers.One of the clusters in the network include all non-smoking students.One clusters has all nonsmokers except one heavy smoker.One of the clusters shows a diversity and good intermix of smokers,non-smokers and ocassional smokers.It is observed that ocassional smokers had friendship with heavy smokers as well non-smokers.Some nodes like 19 and 11 are acting as a connectors within the clusters.There are diverse clusters for all three groups nonsmokers, heavy smokers-non-smokers and ocassional smokers and non-smokers which suggests that students socialize with all kind of students.
Time T2 The network in time 2 seems more connected as compared to network at time 1.There are mainly 5 clusters. The nodes like 37,32 ,21 and 40 acts as connectors. In time 1 the network showed more disconnected clusters.These clusters seemed to have connected with each other in time 2 network.For example, node 40 and 21 have connected with each other now.There are 9 heavy smokers and 8 ocassional smokers and rest non smokers.Some nodes like node 19 have become heavy smokers whereas some nodes like 11 have decreased smoking and have changed to ocassional smokers than heavy smokers.The network appeared to have divided into a cluster of non-smokers and smokers.Node 21 and 35 are important connectors, removing these nodes will seperate out a cluster of non smokers.
Time T3
In network in time 3 there are 16 heavy smokers as compared to T2 heavy smokers have increased and ocassional smokers have dropped to only 2 ocassional smokers, and rest non smokers.Node 17 which is an ocassional smoker is a connector node ,if node 17 is removed node 22 will loose connection with all and the cluster will seperate into seperate non smoker groups with one of them having ties with only one heavy smoker.Node 35 is another important connector with no reciprocated tie.Removing node 35 will seperate out a large cluster of non smokers.By now the influence of heavy smokers on non smokers is visible for example in time T1 node V6 and V8 were in a dyad with mutually reciprocative tie between them, in time 2 also V6 shared reciprocative tie with heavy smoker V8 and eventually in time 3 v6 has also become a heavy smoker.
Using the three visualizations, evaluate hypothesis five.
The three visualisations suggests that with time the network has roughly segreagated into smokers and non smokers.Heavy smokers are friends with more of other heavy smokers and non smokers are friends with other non smokers. Also, we observe an increase in number of heavy smokers. Hence the hypothesis 5 is true people with similar behavior will be more likely to become friends or Homophily can predict friendship ties.
To build a SIENA model, we need to create dependent variables, explanatory variables, a combination of both types of variables, and our model specification.
First, we create a SIENA data object including the longitudinal friendship network and the smoking behavioral variable. The results of creating that model, smokeBehXfriendship
:
## Dependent variables: friendship, smokingbeh
## Number of observations: 3
##
## Nodeset Actors
## Number of nodes 50
##
## Dependent variable friendship
## Type oneMode
## Observations 3
## Nodeset Actors
## Densities 0.046 0.047 0.05
##
## Dependent variable smokingbeh
## Type behavior
## Observations 3
## Nodeset Actors
## Range 1 - 3
Using our hypotheses above, we will construct a list of parameters to test using our Siena model. A table of those parameters follows:
## name effectName include fix test
## 1 friendship constant friendship rate (period 1) TRUE FALSE FALSE
## 2 friendship constant friendship rate (period 2) TRUE FALSE FALSE
## 3 friendship outdegree (density) TRUE FALSE FALSE
## 4 friendship reciprocity TRUE FALSE FALSE
## 5 friendship smokingbeh alter TRUE FALSE FALSE
## 6 friendship smokingbeh ego TRUE FALSE FALSE
## 7 friendship same smokingbeh TRUE FALSE FALSE
## 8 smokingbeh rate smokingbeh (period 1) TRUE FALSE FALSE
## 9 smokingbeh rate smokingbeh (period 2) TRUE FALSE FALSE
## 10 smokingbeh smokingbeh linear shape TRUE FALSE FALSE
## 11 smokingbeh smokingbeh quadratic shape TRUE FALSE FALSE
## 12 smokingbeh smokingbeh total similarity TRUE FALSE FALSE
## initialValue parm
## 1 4.69604 0
## 2 4.32885 0
## 3 -1.46770 0
## 4 0.00000 0
## 5 0.00000 0
## 6 0.00000 0
## 7 0.00000 0
## 8 0.81720 0
## 9 0.43579 0
## 10 -0.22314 0
## 11 0.00000 0
## 12 0.00000 0
Next, we will create our model. You can learn about the function that creates Siena models by typing ?sienaModelCreate
into your R console.
## Estimates, standard errors and convergence t-ratios
##
## Estimate Standard Convergence
## Error t-ratio
## Network Dynamics
## 1. rate constant friendship rate (period 1) 5.7774 ( 0.9741 ) -0.0387
## 2. rate constant friendship rate (period 2) 4.4886 ( 0.7035 ) -0.0529
## 3. eval outdegree (density) -2.7837 ( 0.3266 ) -0.0273
## 4. eval reciprocity 2.7695 ( 0.1984 ) 0.0029
## 5. eval smokingbeh alter 0.1120 ( 0.1484 ) -0.0223
## 6. eval smokingbeh ego 0.1294 ( 0.1622 ) -0.0159
## 7. eval same smokingbeh 0.7121 ( 0.4334 ) -0.0044
##
## Behavior Dynamics
## 8. rate rate smokingbeh (period 1) 3.1841 ( 1.4982 ) 0.0193
## 9. rate rate smokingbeh (period 2) 3.1899 ( 1.4731 ) 0.0575
## 10. eval smokingbeh linear shape -1.3800 ( 0.4368 ) -0.0367
## 11. eval smokingbeh quadratic shape 1.9284 ( 0.3733 ) -0.0587
## 12. eval smokingbeh total similarity 1.1089 ( 0.5927 ) 0.0727
##
## Overall maximum convergence ratio: 0.1669
##
##
## Total of 3060 iteration steps.
We need to check the convergence ratios in the final column to evaluate the reliability of our simulation. Individual t-ratios should be less than the absolute value of .1. The overall maximum convergence ratio should be less than .25.
Has your model converged sufficiently? If not, note which terms have not converged, rerun your model using the previous Siena model values as your starting point, and re-evaluate. Repeat this process until the overall maximum convergence ratio and the convergence t-ratio for each term are within acceptable levels.4 If your model has not converged, uncomment prevAns = ans1 in the code block titled Model Creation and rerun that block of code and print results. This will use the previous values generated by the previous model creation as the starting point in the estimation and proceed through the model construction process again. See pp. 58â59 of the RSiena manual for more information.
The model has converged sufficiently since all the convergence t-ratio values are less than 0.1 and overall maximum convergence ratio 0.1599 , is less than 0.25.
The Estimate column, also reported as the theta
vector within an RSiena
object’s effects
vector, represents the chance of an actor forming a tie within the network based on interactions within and between networks or in relation to the presence or absence of a behavior. The Standard Error column provides information about the amount of variation among actors within the network on the given parameter.
The following table presents the Estimate (theta
) score from the preceding table, divided by Standard Error. Recall that when carrying out a t-test, a parameter is significant at the .05 level when the absolute value of the t-score is greater than 2.
sigvalues | |
---|---|
constant friendship rate (period 1) | 5.93 |
constant friendship rate (period 2) | 6.38 |
outdegree (density) | -8.52 |
reciprocity | 13.96 |
smokingbeh alter | 0.75 |
smokingbeh ego | 0.80 |
same smokingbeh | 1.64 |
rate smokingbeh (period 1) | 2.13 |
rate smokingbeh (period 2) | 2.17 |
smokingbeh linear shape | -3.16 |
smokingbeh quadratic shape | 5.17 |
smokingbeh total similarity | 1.87 |
For each of your hypotheses, indicate which parameter operationalizes your hypothesis. Using the tables created above, evaluate that hypothesis, and report whether your results were significant.5 If you’re having difficulty matching up parameters with your hypotheses, take a look at pp. 41-49, § 6.2, in the RSiena Manual.
Hypothesis 1:
Hypothesis 1: Ties between students are not random. The constant friendship rate value parameter for period 1 and period 2 operationalizes the hypothesis. The results for both the period are close to each other and are positive and significant(sigvalues >2error).Hence the hypothesis is supported.
Hypothesis 2:Friendship ties shows reciprocity.
The reciprocity parameter operationalizes the hypothesis.The positive and significant value suggests that network has reciprocated ties between nodes.
Hypothesis 4:
Hypothesis 4 means smokers will have lower in-degree centrality than non- smokers.The smokingbeh ego parameter operationalized this hypothesis.The sig value for the parameter is 0.78 and the sign is positive , also the absolute value of this parameter is less than 2 and it is not significant. The hypothesis is not supported.
Hypothesis 5:
Hypothesis 6:Friendship links can be predicted by homophily.
The hypothesis can be operationalized using same smokingbeh parameter. The sig value for the parameter is 1.88. It is positive and the absolute value of the parameter/ standard error is less than 2 it is not significant.Hence we donot have enough evidence to support it.
Hypothesis 7:Smoking behavior is likely to increase with time(age).
The hypothesis can be operationalized using eval smokingbeh quadratic shape parameter. The quadratic shape parameter has a positive value which indicates that this behavior is potentially addictive and is likely to increase with time.
estimate value 1.9569, error=.3558 The positive and significant value supports this hypothesis.
Hypothesis 8:Actors will show similar smoking behavior as other actors with whom they share a friensdhip tie.
The parameter smokingbeh total similarity is used to operationalize this hypothesis. This parameter suggests whether or not the nodes whose alters have similar values of the behavior will also have a higher tendency to adopt that behavior.The sig value is 2.11. The absolute value of the parameter is positive and is greater than 2 so the hypothesis is supported.
After knitting your file to RPubs, copy the URL and paste it into the comment field of the Lab 2 Assignment on Canvas. Save this .Rmd file and submit it in the file portion of your Canvas assignment. Make sure to review your file and its formatting. Run spell check (built into RStudio) and proofread your answers before submitting. If you can’t publish to RPubs, save your HTML file as a PDF and submit that instead.6 There are many different ways to do this with different browsers. Google it.