Lab 4: RSiena

Exploring Habit & Friendship Formation Over Time

Amber Johnson

2017-12-23

Introduction

In this lab, you’ll be building, estimating, and interpreting actor-based longitudinal network models using RSiena.1 Tom Snijders has led the development of the Siena framework, with RSiena supplanting a specialized Windows program in 2011. You can learn more about RSiena, research using the framework, and more here. RSiena is used to model actor-based longitudinal network models in order to examine the effects of network ties over time on a certain behavior, or the effect of a certain behavior on tie formation over time. SIENA stands for Simulation Investigation for Empirical Network Analysis.

You will be using an excerpt of data from the Teenage Friends and Lifestyle Study.2 You can learn more about this data set, created by Prof. Snijders, here. The data set includes 3 network files containing friendship relationships between 50 teenage girls recorded at 3 consecutive points in time.

  1. s50-network1.dat

  2. s50-network2.dat

  3. s50-network23.dat

The data also includes information about the smoking behavior of the 50 female students (s50-smoke.dat). The smoking variable has three levels: 1 (does not smoke), 2 (smokes occasionally) and 3 (smokes regularly).

The core analysis conducted in this lab consists of taking a set of intuitive or plain language hypotheses about our data set and converting them into hypotheses in network terminology. Once we have our hypotheses, we will operationalize them into parameters in our model. Finally, we will interpret these results by testing our hypotheses.

First, we will create testable hypotheses based on intuitions about habit and friendship formation.

Hypothesis Construction (20 pts)

Analysis

Formulate hypotheses using network terminology discussed throughout the course based on the following friendship relations.3 If you get stuck, it may be helpful to look at the terms included in the model below for hints on how to translate plain-language intuitions into network terminology. The first one is done for you:

  1. Establishing and maintaining friendships takes time and resources. Students will not befriend people indiscriminately.

Hypothesis 1: Ties between students are not random.

Relational Hypotheses:

  1. If a student nominates a person as a friend, that person is also likely to consider the student a friend.

Hypothesis 2: Transitive ties will not be random. / There will be a non-random level of transivity between nodes.

  1. Students will be friends with the friends of their friends (if A -> B and B -> C, then A -> C).

Hypothesis 3: An actor’s smoking behavior(frequency) will negatively affect the number and/or quality (if weighted/ranked) of friend relationships in the network.

  1. Leaving school to find a place where smoking is permitted takes time that could otherwise be used for socializing. Students who smoke more will have less time to establish and maintain friendships.

Hypothesis 4: Higher smoking frequency will be associated with more negative friendship outcomes (e.g. lower # of reciprocal ties, lower eigenvector centrality to important ties, etc.). This data was collected in the mid-1990s when social norms on smoking were beginning to take hold in the younger generations; this effect, if found, could be a cohort effect related to increasing stigma against smoking in that time period.

  1. Smoking is increasingly frowned upon in the U.S. Students who smoke more will likely not be very popular and few people will nominate them as friends.

Hypothesis 5: We expect to see a high level of homophily in smoking behavior, with a higher likelhood of ties between people with similar smoking behavior.

  1. People with similar smoking behavior will be more likely to become friends.

Hypothesis 6: We expect the frequency of smoking to increase across the three time periods measures. / We hypothesize more smokers in the network as the collection periods advance from age 13 to 15.

Smoking Behavior Hypotheses:

  1. Students will likely smoke more as they get older.

Hypothesis 7: Over the three time periods, we expect a reduction in the variance of smoking behavior in the network, particularly between individual ties/relationships and small k-cliques (<4-cliques).

  1. Friendship relations will make students more similar in their smoking behavior.

Hypothesis 8: Sports behavior will be negatively associated with regular smoking behavior, particularly if the actor is consistently a regular athlete (across 2 or more collections)

Loading Data & Visualizing the Friendship Network Over Time (10 pts)

To better intuitively understand our network, let’s examine three network visualizations showing the friendship network over time and the smoking behavior of the nodes.

Let’s visualize our network at the three time periods under discussion. Node size increases with smoking behavior. Green nodes represent no smoking, yellow nodes represent light smoking, and red nodes represent heavy smoking. Think about the macro-level features of each network.

Network at Time 1

Network at Time 2

Network at Time 3

Analysis

Which two nodes represent isolates in the network at all three time periods? ## 13, 18

Describe the change in the network over time. Think about the formation of clusters and the incidence of smoking behavior. ## The network becomes more clique-y and there is more similarity in the networks (in terms of smoking behavior). ie. There are two components with a high level of smoking behavior and two components with a low incidence (and one that is only on the periphery of the component/not a high level of betweeness or eignvectory centrality/etc)

Using the three visualizations, evaluate hypothesis five. The difference between smoking behavior (i.e. (abs)|1 (non-smoker) - 3(regular)|) will predict reciprocal friendship ties, with 0 (the same smoking behavior) associated with the highest likelihood of mutual friendship tie.

Creating the SIENA Model

To build a SIENA model, we need to create dependent variables, explanatory variables, a combination of both types of variables, and our model specification.

First, we create a SIENA data object including the longitudinal friendship network and the smoking behavioral variable. The results of creating that model, smokeBehXfriendship:

## Dependent variables:  friendship, smokingbeh 
## Number of observations: 3 
## 
## Nodeset                  Actors 
## Number of nodes              50 
## 
## Dependent variable friendship      
## Type               oneMode         
## Observations       3               
## Nodeset            Actors          
## Densities          0.046 0.047 0.05
## 
## Dependent variable smokingbeh
## Type               behavior  
## Observations       3         
## Nodeset            Actors    
## Range              1 - 3

Using our hypotheses above, we will construct a list of parameters to test using our Siena model. A table of those parameters follows:

##    name       effectName                          include fix   test 
## 1  friendship constant friendship rate (period 1) TRUE    FALSE FALSE
## 2  friendship constant friendship rate (period 2) TRUE    FALSE FALSE
## 3  friendship outdegree (density)                 TRUE    FALSE FALSE
## 4  friendship reciprocity                         TRUE    FALSE FALSE
## 5  friendship smokingbeh alter                    TRUE    FALSE FALSE
## 6  friendship smokingbeh ego                      TRUE    FALSE FALSE
## 7  friendship same smokingbeh                     TRUE    FALSE FALSE
## 8  smokingbeh rate smokingbeh (period 1)          TRUE    FALSE FALSE
## 9  smokingbeh rate smokingbeh (period 2)          TRUE    FALSE FALSE
## 10 smokingbeh smokingbeh linear shape             TRUE    FALSE FALSE
## 11 smokingbeh smokingbeh quadratic shape          TRUE    FALSE FALSE
## 12 smokingbeh smokingbeh total similarity         TRUE    FALSE FALSE
##    initialValue parm
## 1     4.69604   0   
## 2     4.32885   0   
## 3    -1.46770   0   
## 4     0.00000   0   
## 5     0.00000   0   
## 6     0.00000   0   
## 7     0.00000   0   
## 8     0.81720   0   
## 9     0.43579   0   
## 10   -0.22314   0   
## 11    0.00000   0   
## 12    0.00000   0

Next, we will create our model. You can learn about the function that creates Siena models by typing ?sienaModelCreate into your R console.

## Estimates, standard errors and convergence t-ratios
## 
##                                                Estimate   Standard   Convergence 
##                                                             Error      t-ratio   
## Network Dynamics 
##    1. rate constant friendship rate (period 1)  5.7916  ( 0.8731   )   -0.0161   
##    2. rate constant friendship rate (period 2)  4.4948  ( 0.7935   )   -0.0737   
##    3. eval outdegree (density)                 -2.7727  ( 0.2744   )   -0.0220   
##    4. eval reciprocity                          2.7752  ( 0.2001   )   -0.0103   
##    5. eval smokingbeh alter                     0.1162  ( 0.1583   )    0.0212   
##    6. eval smokingbeh ego                       0.1199  ( 0.1698   )    0.0074   
##    7. eval same smokingbeh                      0.6936  ( 0.3824   )   -0.0127   
## 
## Behavior Dynamics
##    8. rate rate smokingbeh (period 1)           3.0748  ( 1.2365   )   -0.0095   
##    9. rate rate smokingbeh (period 2)           3.0055  ( 2.0184   )   -0.0730   
##   10. eval smokingbeh linear shape             -1.3818  ( 0.4388   )   -0.0789   
##   11. eval smokingbeh quadratic shape           1.9369  ( 0.3745   )   -0.0771   
##   12. eval smokingbeh total similarity          1.1051  ( 0.6261   )    0.0561   
## 
## Overall maximum convergence ratio:    0.1329 
## 
## 
## Total of 3060 iteration steps.

Checking Convergence (5 pts)

We need to check the convergence ratios in the final column to evaluate the reliability of our simulation. Individual t-ratios should be less than the absolute value of .1. The overall maximum convergence ratio should be less than .25.

Analysis

Has your model converged sufficiently? If not, note which terms have not converged, rerun your model using the previous Siena model values as your starting point, and re-evaluate. Repeat this process until the overall maximum convergence ratio and the convergence t-ratio for each term are within acceptable levels.4 If your model has not converged, uncomment prevAns = ans1 in the code block titled Model Creation and rerun that block of code and print results. This will use the previous values generated by the previous model creation as the starting point in the estimation and proceed through the model construction process again. See pp. 58—59 of the RSiena manual for more information.

Model appears to have significantly converged. All terms were <.1. Each section was below .25 collectively, but if added together, they exceed .25.

Understanding the Estimate Column

The Estimate column, also reported as the theta vector within an RSiena object’s effects vector, represents the chance of an actor forming a tie within the network based on interactions within and between networks or in relation to the presence or absence of a behavior. The Standard Error column provides information about the amount of variation among actors within the network on the given parameter.

Evaluating Significance

The following table presents the Estimate (theta) score from the preceding table, divided by Standard Error. Recall that when carrying out a t-test, a parameter is significant at the .05 level when the absolute value of the t-score is greater than 2.

sigvalues
constant friendship rate (period 1) 6.63
constant friendship rate (period 2) 5.66
outdegree (density) -10.10
reciprocity 13.87
smokingbeh alter 0.73
smokingbeh ego 0.71
same smokingbeh 1.81
rate smokingbeh (period 1) 2.49
rate smokingbeh (period 2) 1.49
smokingbeh linear shape -3.15
smokingbeh quadratic shape 5.17
smokingbeh total similarity 1.77

Reporting Your Results (65 pts)

Analysis

For each of your hypotheses, indicate which parameter operationalizes your hypothesis. Using the tables created above, evaluate that hypothesis, and report whether your results were significant.5 If you’re having difficulty matching up parameters with your hypotheses, take a look at pp. 41-49, § 6.2, in the RSiena Manual.

Hypothesis 1: This hypothesis is valided. The high (abs or sqrt) value of theta for outdegree density indicates that people who have a high (or low?) # of outdegrees are likely to attract (or repel) people due to this feature of their ego network.

Hypothesis 2: This hypothesis is validated. Reciprocal ies are not random. This is indicated by the theta osf reciprocity in the network (14.86). If ties were random, we’d expect to see a much lower value, and this is our highest theta value. This is further validated by the low (~.7) level of theta for paired couples to share a smoking level.

Hypothesis 3:This hypothesis is partially validated. The data suggest smoking behavior (smokingbeh-linear) has a negative linear effect on network ties. However, there is also a (stronger) parametric effect, which indicates that the level of smoking may have a different effect on friendships across different levels of IV (smoking behavior)(?)

Hypothesis 4: I don’t think this hypothesis is validated, or perhaps cannot be validated with the current data. It does not seem that smoking alone accounts for friend formation. To make a claim about the reason for a change would need much more extensive study. It would be interesting to compare similar data across different genations.

Hypothesis 5: This hypothesis is partially validated(?), and approaching significance. Theta for smoking similarity is only 1.98, and, techinically, significance is >2.0.

Hypothesis 6: Yes, this hypothesis is validated, insofar as we have data from 2-3 years and there is more smoking in the later time period. However, this hypothesis may change if applied to the broader public, or applied to the current era (i.e. I can imagne with health concerns and changing norms, people may smoke less as they age now, but not in the past – even the recent past).

Hypothesis 7: Yes, this hypothesis is validated. The variance decreases from 1.7 to 1.06 across the time periods.

Hypothesis 8: Sport behavior is mentioned in the intro, but doesn’t seem to be included in the data we , but this could be checked by adding sport behavior as a variableand seeing if it is uniquely tied to the smoking level variable (compared to friends’ smoking behavior and/or reciprocity)

Submitting the Lab (5 pts)

After knitting your file to RPubs, copy the URL and paste it into the comment field of the Lab 2 Assignment on Canvas. Save this .Rmd file and submit it in the file portion of your Canvas assignment. Make sure to review your file and its formatting. Run spell check (built into RStudio) and proofread your answers before submitting. If you can’t publish to RPubs, save your HTML file as a PDF and submit that instead.6 There are many different ways to do this with different browsers. Google it.