This report seeks to analyze a 1996 dataset of physicians in four american cities collected by Coleman, Katz and Menzel on medical innovation. The authors sought to understand how network ties influenced the physicians adoption of tetracycline, a new drug at the time. The dataset is based on three types of ties: friendship between the physicians (i), patterns of advice (ii), and professional discussions (iii). Nodes attributes are recorded based of a 13 questions survey on basic information regarding their practices, hobbies and professional life.
We seek to understand which node attributes influence the physicians relations in a network structure.Physicians were surveyed in four US cities with no ties in between communities of each city. For the purpose of this modelling, we have taken the largest component, which is physicians residing in the city of Peoria, Illinois. The below graph shows a simple visualization of the network of Peoria physicians, with edge colour representing the three different types of ties.
The network is composed of 117 nodes that represent the physicians and 543 ties combining the three types of ties. There are a total of 13 node atrributes representing the answers to the questions asked to the physicians in the survey. The 13 atrributes are: city of residence (“city”), adoption date of the drug (“adoption date”), the starting year of the profession (“med_sch_yr”), large physicians meetings attended (“meetings”), medical journals subscribed to (“jours”), who free time is spent with (“free_time”), medical matters discussed outside of work (“discuss”), doctors’ club membership (“clubs”), occupation of the three closest friends (“friends”), time spend practicing in the community (“community”), patients seen per week (“patients”), physical proximity to other physicians (“proximity”), and specialty (“specialty”). In our analysis, we will focus on five attributes which we find the most compelling: “meetings”, “med_sch_yr”, “clubs”, “community”, “specialty”. Intuitively, we believe that those attributes are the most likely to influence the pattern of network ties, and we will use a regression model to test this.
The Blau Index tells use the distribution of nodes across all of the possible categories (answers) of each node attributes. A score of 0 indicates homogeneity (all nodes are in the same category for a specific attribute), versus 1 indicates heterogeneity (nodes are spread out evenly between the possible categories).
The results show us that “clubs” is the only attribute with low diversity, because the majority of nodes do not belong to any club. The other attributes display a rather high diversity or heterogeneity, meaning nodes are spread out across the possible categories for each attribute.
The E-I Index helps us understand if there is more or less homophily for a specific attribute that what we would expect from a series of randomly generated networks with the same amount of nodes and ties. The line in the graphs below represent the observed network, while the curve represent the distribution of the values of the randomly generated networks.
For “meetings”, we see the heterophily value is within the expected range of values for a network of the same size and density. For “med_sch_yr”, we see less heterophily than we would expect and it is somewhat statistically significant because the observe value falls outside the grey area. For “clubs”, we observe homophily as the E-I Index is below 0, which is what we would expect as well for a network with this attribute diversity score and density, as indicated by the curve. There is even more homophily than expected. For “community”, we observe a rather significant result that is much less heterophily than expected. “specialty” has less heterophily than community because the E-I Index in below 0.5 (around 0.25), and it is also less than expected as indicated by the distance from the curve. In summary, for the four values that show heterophily, “community” is the attribute with the most significant result and the least amount of heterophily than what we would expect from the randomnly generated networks.
Based on these initial results, we have five hypothesis that we will test.
Hypothesis 1: Physicians are friends more based on how long they have been in the community (homophily based on “community”).
Hypothesis 2: Doctors that went to meetings of professional societies during the last 12 months receive more discussion ties (because we believe they had more conversations about the drug).
Hypothesis 3: Doctors have more friendship ties among peers of the same specialty.
Hypothesis 4: Doctors who have been practicing for longer receive more discussion ties (because we believe they are more respected among their peers).
Hypothesis 5: Doctors within a special club (of doctors) receive more friendship ties.
Here, we want to understand more the influence on receiving ties rather than sending ones because we are looking at the tie attributes “friendship” and “discussion”. The way that data collection was structured for these ties means that most nodes send the same amount of ties (around 3) because they were ask to do so (designate 3 doctors on average).
First, we are applying the regression model to the ‘friendship’ tie attribute to test hypothesis 1, 3 and 5, and then the regression model to the ‘discussion’ tie attribute for hypothesis 2 and 4.
## # A tibble: 4 × 4
## term estimate statistic p.value
## <chr> <dbl> <dbl> <dbl>
## 1 (intercept) -3.51 -53.6 0
## 2 same community 0.637 6.55 0
## 3 same specialty 0.594 6.58 0
## 4 alter clubs -0.00779 -0.517 0.81
## # A tibble: 1 × 9
## pseudo.r.squared AIC AICc BIC chi.squared deviance null.deviance
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.514 4482. 4482. 4512. 0 4474. 18815.
## # ℹ 2 more variables: df.residual <int>, nobs <int>
## # A tibble: 5 × 4
## term estimate statistic p.value
## <chr> <dbl> <dbl> <dbl>
## 1 (intercept) -2.90 -23.0 0
## 2 alter meetings 0.0603 2.67 0.215
## 3 ego med_sch_yr -0.0833 -3.68 0
## 4 alter med_sch_yr -0.0556 -2.20 0.3
## 5 same med_sch_yr 0.763 7.52 0
## # A tibble: 1 × 9
## pseudo.r.squared AIC AICc BIC chi.squared deviance null.deviance
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.514 4497. 4497. 4535. 0 4487. 18815.
## # ℹ 2 more variables: df.residual <int>, nobs <int>
Regarding regression model for “friendship”, we observe that the results for both same(“community”) and same(“specialty”) are statistically significant and their estimate values are positive, as shown in our tidygraph. This means we can assume there is homophily for friendship based on how long physicians have been in the community (of doctors) and based on being part of the same specialty. We do reject the null hypothesis of non-homophily for “community” and “specialty”. However, we can see that the result for alter(“clubs”) is not statistically significant, meaning that doctors don’t receive more friendship ties basef off of their membership in clubs. Therefore, we cannot reject the null hypothesis for receiving ties based on “club”.
Regarding regression model for “discussion”, we can see that the result for alter(“meetings”) is not statistically significant, meaning that we cannot reject the null hypothesis. The result for same(“meetings”) is not statistically significant so there is no homophily based off of attending physicians’ conventions, meaning that we cannot reject the null hypothesis. For alter(“med_sch_yr), the result is also not statistically significant, so it seems that doctors do not receive discussion ties based off of how long they have been practicing. Finally, we do see statistical significance for same(”med_sch_yr”), meaning that there is homophily based off of how long doctors have been practicing medicine.
Our analysis took the largest component of the network which is the largest city of Peoria to analyse which node attribute influence patterns of relationships among physicians. We permuted the network by applying a regression model for two types of tie attributes, “friendship” and “discussion”. Permutation takes a series of random graphs to compare the tie patterns within our network to those of these random graphs in order to understand which, if any, node attributes are influence the patterns. Permutation keeps the size, density and structure of the network when computing random graphs for higher accuracy.
Our results were interesting and showed homophily among several attributes of this network. When it comes to friendship, doctors who have been in a community for the same amount of time are more likely to group together. They also relate more to peers of the same specialty. Now when it comes to patterns of discussion, we see that doctors are more likely to have discussion with those that started practicing the same year. These results are very intuitive based off of assumptions that we can make on how doctors likely relate to each other.