In this lab, you’ll be building, estimating, and interpreting actor-based longitudinal network models using RSiena.1 Tom Snijders has led the development of the Siena framework, with RSiena supplanting a specialized Windows program in 2011. You can learn more about RSiena, research using the framework, and more here. RSiena is used to model actor-based longitudinal network models in order to examine the effects of network ties over time on a certain behavior, or the effect of a certain behavior on tie formation over time. SIENA stands for Simulation Investigation for Empirical Network Analysis.
You will be using an excerpt of data from the Teenage Friends and Lifestyle Study.2 You can learn more about this data set, created by Prof. Snijders, here. The data set includes 3 network files containing friendship relationships between 50 teenage girls recorded at 3 consecutive points in time.
s50-network1.dat
s50-network2.dat
s50-network23.dat
The data also includes information about the smoking behavior of the 50 female students (s50-smoke.dat). The smoking variable has three levels: 1 (does not smoke), 2 (smokes occasionally) and 3 (smokes regularly).
The core analysis conducted in this lab consists of taking a set of intuitive or plain language hypotheses about our data set and converting them into hypotheses in network terminology. Once we have our hypotheses, we will operationalize them into parameters in our model. Finally, we will interpret these results by testing our hypotheses.
First, we will create testable hypotheses based on intuitions about habit and friendship formation.
Formulate hypotheses using network terminology discussed throughout the course based on the following friendship relations.3 If you get stuck, it may be helpful to look at the terms included in the model below for hints on how to translate plain-language intuitions into network terminology. The first one is done for you:
Hypothesis 1: Ties between students are not random.
Relational Hypotheses:
Hypothesis 2: The friendship tie will be said to be reciprocal in this case.
Hypothesis 3: This would point to a transitive friendship network.
Hypothesis 4:This would point to the degree centrality network of smokers suggesting that these smokers have a low rate of in-degree in terms of friendship.
Hypothesis 5:This would suggest a low rate of in degree centrailty for the smokers.
Hypothesis 6:This can suggest how the non-smokers would create friendships with smokers and feel attracted to smoking in the later years of their life. Hence, it is the homophily that would exist because of the friendship network.
Smoking Behavior Hypotheses:
Hypothesis 7: It is likely that the smokers who have not gone through any therapy or medication will increase their addiction toward smoking. This is to suggest that with time, smoking will increase.
Hypothesis 8:Based on the homphily network of friends, it is likely that actors will pick up the behavior habits of other actors in that network. This is to state that non-smokers will smoke like their smoker friends.
To better intuitively understand our network, let’s examine three network visualizations showing the friendship network over time and the smoking behavior of the nodes.
Let’s visualize our network at the three time periods under discussion. Node size increases with smoking behavior. Green nodes represent no smoking, yellow nodes represent light smoking, and red nodes represent heavy smoking. Think about the macro-level features of each network.
Which two nodes represent isolates in the network at all three time periods? Nodes 13 and 20 are the isolates in all three networks over time. Describe the change in the network over time. Think about the formation of clusters and the incidence of smoking behavior. Time 1 graph depicts six clusters with actors V 19, V 11 and V 41 acting as the bridge in the network. This is to further explain the actors: V 41, V 44, V 15, V 16, V 1 as light smokers, actors: V 11, V 2, V 26, V 42, V 12, V 23, and V 8 as heavy smokers and, the rest of the actors in the network as non-smokers. It is interesting to note the behavior of actors V 26 and V 29 from Time 1 to Time 3. V 19 is one of the bridge actors in Time 1, while in Time 2 and 3, this actor is a heavy smoker. This suggests that the homophily and addiction along with increase of their behavior has increased over time in this actor’s network. Further actor 8 is noted as a heavy smoker throughout in all three networks. It was an isolate in Time 1, became friends with a light smoker V 6 affecting that actor to an extent that V 6 can be noted a heavy smoker in Time 3 network. It is fascinating to note that actor V 24 who is a non-smoker is friends with a heavy smoker V 23. In all three networks, V 24 remains unaffected with intercation with either V 23 (heavy smoker), V 27, a light smoker in Time 2, or even V 6, a heavy smoker in Time 3 network.
Using the three visualizations, evaluate hypothesis five. The actor V 23 can be used as an example to understand hypthesis 5 that smokers will have a low in-degree centrailty of friendship in their network. Actor 23 remains friends with heavy smokers in Time 2 and 3. # Creating the SIENA Model
To build a SIENA model, we need to create dependent variables, explanatory variables, a combination of both types of variables, and our model specification.
First, we create a SIENA data object including the longitudinal friendship network and the smoking behavioral variable. The results of creating that model, smokeBehXfriendship:
## Dependent variables: friendship, smokingbeh
## Number of observations: 3
##
## Nodeset Actors
## Number of nodes 50
##
## Dependent variable friendship
## Type oneMode
## Observations 3
## Nodeset Actors
## Densities 0.046 0.047 0.05
##
## Dependent variable smokingbeh
## Type behavior
## Observations 3
## Nodeset Actors
## Range 1 - 3
Using our hypotheses above, we will construct a list of parameters to test using our Siena model. A table of those parameters follows:
## name effectName include fix test
## 1 friendship constant friendship rate (period 1) TRUE FALSE FALSE
## 2 friendship constant friendship rate (period 2) TRUE FALSE FALSE
## 3 friendship outdegree (density) TRUE FALSE FALSE
## 4 friendship reciprocity TRUE FALSE FALSE
## 5 friendship smokingbeh alter TRUE FALSE FALSE
## 6 friendship smokingbeh ego TRUE FALSE FALSE
## 7 friendship same smokingbeh TRUE FALSE FALSE
## 8 smokingbeh rate smokingbeh (period 1) TRUE FALSE FALSE
## 9 smokingbeh rate smokingbeh (period 2) TRUE FALSE FALSE
## 10 smokingbeh smokingbeh linear shape TRUE FALSE FALSE
## 11 smokingbeh smokingbeh quadratic shape TRUE FALSE FALSE
## 12 smokingbeh smokingbeh total similarity TRUE FALSE FALSE
## initialValue parm
## 1 4.69604 0
## 2 4.32885 0
## 3 -1.46770 0
## 4 0.00000 0
## 5 0.00000 0
## 6 0.00000 0
## 7 0.00000 0
## 8 0.81720 0
## 9 0.43579 0
## 10 -0.22314 0
## 11 0.00000 0
## 12 0.00000 0
Next, we will create our model. You can learn about the function that creates Siena models by typing ?sienaModelCreate into your R console.
## Estimates, standard errors and convergence t-ratios
##
## Estimate Standard Convergence
## Error t-ratio
## Network Dynamics
## 1. rate constant friendship rate (period 1) 5.1870 ( NA ) -0.1013
## 2. rate constant friendship rate (period 2) 4.0660 ( NA ) -0.4282
## 3. eval outdegree (density) -2.9894 ( NA ) 0.2552
## 4. eval reciprocity 2.7120 ( NA ) -0.4172
## 5. eval smokingbeh alter -0.3817 ( NA ) -3.7410
## 6. eval smokingbeh ego 0.4047 ( NA ) -0.9260
## 7. eval same smokingbeh 0.8677 ( NA ) 1.0466
##
## Behavior Dynamics
## 8. rate rate smokingbeh (period 1) 129.4845 ( NA )
## 9. rate rate smokingbeh (period 2) 137.7225 ( NA )
## 10. eval smokingbeh linear shape -32.5199 ( NA ) -5.8516
## 11. eval smokingbeh quadratic shape 35.1530 ( NA ) -2.7810
## 12. eval smokingbeh total similarity 0.2780 ( NA ) 1.3680
##
## Overall maximum convergence ratio: NA
##
##
## Warning: *** Warning: Noninvertible estimated covariance matrix ***
##
## Total of 3060 iteration steps.
We need to check the convergence ratios in the final column to evaluate the reliability of our simulation. Individual t-ratios should be less than the absolute value of .1. The overall maximum convergence ratio should be less than .25.
Has your model converged sufficiently? If not, note which terms have not converged, rerun your model using the previous Siena model values as your starting point, and re-evaluate. Repeat this process until the overall maximum convergence ratio and the convergence t-ratio for each term are within acceptable levels.4 If your model has not converged, uncomment prevAns = ans1 in the code block titled Model Creation and rerun that block of code and print results. This will use the previous values generated by the previous model creation as the starting point in the estimation and proceed through the model construction process again. See pp. 58—59 of the RSiena manual for more information.
The Model has converged sufficiently according to the requirements above. The individual t-ratios are less than the absolute value of 0.1.This is to explain that these individual t-ratios are: -0.0008, 0.0240 ,0.0437, 0.0509, -0.-417, -0.0589, 0.0428, -0.0121, -0.0216, -0.0060, -0.0480, and 0.0756, which are definitely less than the absolute value of 0.1. Further, since the maximun convergence ratio is 0.1630, which is less than 0.25, the model has converged sufficiently.
# Understanding the Estimate Column
The Estimate column, also reported as the theta vector within an RSiena object’s effects vector, represents the chance of an actor forming a tie within the network based on interactions within and between networks or in relation to the presence or absence of a behavior. The Standard Error column provides information about the amount of variation among actors within the network on the given parameter.
The following table presents the Estimate (theta) score from the preceding table, divided by Standard Error. Recall that when carrying out a t-test, a parameter is significant at the .05 level when the absolute value of the t-score is greater than 2.
| sigvalues | |
|---|---|
| constant friendship rate (period 1) | NA |
| constant friendship rate (period 2) | NA |
| outdegree (density) | NA |
| reciprocity | NA |
| smokingbeh alter | NA |
| smokingbeh ego | NA |
| same smokingbeh | NA |
| rate smokingbeh (period 1) | NA |
| rate smokingbeh (period 2) | NA |
| smokingbeh linear shape | NA |
| smokingbeh quadratic shape | NA |
| smokingbeh total similarity | NA |
For each of your hypotheses, indicate which parameter operationalizes your hypothesis. Using the tables created above, evaluate that hypothesis, and report whether your results were significant.5 If you’re having difficulty matching up parameters with your hypotheses, take a look at pp. 41-49, § 6.2, in the RSiena Manual.
Hypothesis 1: This is stated above as: ties between students will not tend to be random suggesting that friendship networks will less likely change over time. The Constant Friendship rate for Period 1 and Period 2, both are more than 2 and positive. Since the values are significant, this suggests that the parameter operationalizes hypothesis 1. As an example, when the absolute value for Constant Friendship rate for Period 1: 5.7749 is divided by Standard Error: 0.8901, 5.7749/0.89012 > 2.
Hypothesis 2: This is stated as: friendship ties will tend to be reciprocal suggesting that the significant value for reciprocity is positive (2.7692) and standard error is 0.2093 greater than 2. This suggests that the parameter operationalizes hypothesis 2.
Hypothesis 3: N/A
Hypothesis 4: This is stated as: smokers will tend to experience a low degree centrality in their network in comparison to the non-smoker actors. This suggests that since the value for Smoking Behavior Ego is < 2 (0.1161 divided by 0.1846) hence, the parameter does not operationalize hypothesis. This is to further state that as seen in the network for Time 1, Time 2 and Time 3, respetively, heavy smokers are friends with a number of other actors and are not isolates.
Hypothesis 5: N/A
Hypothesis 6: This is stated as: Homphily tends to give rise to more friendship ties in a network. This suggests that since the absolute value for Smoking Behavior total similarity is 1.1347 which when divided by the Standard Error 0.6272 is less than 2, the parameter does not operationalize hypothesis 6.
Hypothesis 7: This is stated as: the behavior in smokers will increase over time. This suggests that the value for Constant Friendship rate for Period 1 > 2 but the value for Constant Friendship rate for Period 2< 2, which does not support the hypothesis. Hence, the value for Smoking Behavior Quadratic Shape (Absolute value) 1.9128 divided by (Standard Error) 0.4307 is greater than 2, the value is said to be positive as well as > 2. Hnece, the parameter does operationalizes for hypothesis 7.
Hypothesis 8: This is stated as: non-smokers will tend to become light smokers and the light smokers will tend to become heavy smokers. This further suggests that since the smoking behavior alter remains 0.1118 with error 0.1627, which is below 2, stating that this behavior will not be the dominating behavior among all actors in the networks, hence the parameter does not operationalizes hypothesis 8.
After knitting your file to RPubs, copy the URL and paste it into the comment field of the Lab 2 Assignment on Canvas. Save this .Rmd file and submit it in the file portion of your Canvas assignment. Make sure to review your file and its formatting. Run spell check (built into RStudio) and proofread your answers before submitting. If you can’t publish to RPubs, save your HTML file as a PDF and submit that instead.6 There are many different ways to do this with different browsers. Google it.