Online version script: https://rpubs.com/lincw/praktikum-cytoscape

1 Analyze protein interactoin data with Cytoscape

The aim of this part is to visualize and analyze a host-pathogen protein interaction map derived from a Y2H experiment using the software Cytoscape. The network is extended with additional interactions and protein properties from online resources. Furthermore expression fold change values derived from an RNA-Seq experiment are integrated to examine the expression dynamic after pathogen treatment.

1.1 Software in this course

Cytoscape

Cytoscape is an open-source software platform for visualizing complex networks and integrating these with any type of attribute data.

2 GUI Basics

2.1 Prepare Cytoscape

2.2 Required data

2.3 Cytoscape Basics

After starting Cytoscape five sample sessions are provided. To get familiar with Cytoscape’s user interface, select the “Yeast Perturbation” sample session.

Figure 1. Cytoscape user interface.

2.3.1 working environment

A selection of elements in the Cytoscape window are shown in figure 2. The number of the following list correspond to the numbers in the following figure.

  1. Network tab: All networks collections / networks of the currenct session are shown here.
  2. In the Style tab the visualization of the network components (nodes, edges, network) can be adjusted
  3. In the Selection tab, a set of nodes / edges based on the specified selection criteria can be selected.
  4. The network galFiltered.sif is part of the network collection galFiltered.sif.
  5. The galFiltered.sif network contains 331 nodes.
  6. The number of edges in the network: 362
  7. In the Table Panel, three different tables are available: node properties, edge properties and network properties
  8. The network panel
  9. The number of selected nodes and edges in the network panel
  10. The number of hidden nodes and edges
  11. Text field to search for nodes

Figure 2. Introduce of Cytoscape interface.

2.4 Import Network

Most times the networks are not available as Cytoscape files, but they are available as excel or comma separated files with at least two column. The format is usually that each row contains an interacting pair of nodes and optional interaction parameters (e.g. to which network does an interaction belong or in which screen / experiment an interaction has been identified). Nodes can represent different entities: proteins, genes, people, … This list of interactions can be imported into Cytoscape, where it is represented as network. To import a network follow these steps:

  1. Start Cytoscape
  2. To create a network from an excel file select File -> Import -> Network -> File
  3. Go to the folder “Internship” and select the file “Ath - Psy interactions from Mukhtar et al 2011.xlsx”

Figure 3. Import network from file.

  1. Check, if column ida is annotated as “Source Node” and column idb as “Target Node” by pressing on the small arrow in the column header right to the column name.
  2. Press “OK” to import the network.

Figure 4. Preview of import content.

  1. The imported interactions are now shown as network with the default style.

Figure 5. Visualization of imported network.

2.5 Import Protein Properties

The import of node and edge attributes allows to integrate data from different sources. This supports the analysis of the network by adding additional information to the network components. In this step the species and some other protein attributes are imported.

  1. To import additional protein properties select File -> Import -> Table -> File
  2. In the open file selection dialog, go to the folder “Internship” and select the file “Ath - Psy Protein Properties from Mukhtar et al 2011.xlsx”. Press Open
  3. Press OK to start import.
  4. Save the network

Figure 6. Import annotation from table.

2.6 Adjust Style

Styles allow to control how nodes and edges are represented. The properties of each style element can be defined as static as well as dynamic attribute, which depends from the value of a selected property of a network element. Here we create a style, which adjusts the representation of proteins / nodes depending on the value of the species attribute.

  1. Select tab “Style”
  2. Press the menu button “≡” to the right of the drop-down list
  3. Select “Create a new Style…”

Figure 7. Adjust the network style.

  1. As name for the new style enter “Effectors” in the dialog window and press “OK”

Now adjust the node color depending on the value of the species attribute of the proteins. Note that some proteins, which belong to Arabidopsis thaliana have no value for the “Species” attributes.

  1. On the tab “Node” the representation of the protein nodes can be adjusted. To adjust the node color the attribute “Fill color” has to be selected. To set the color depending on the species select the value “Species” in the dropdown element right to the “Column” field.
  2. As we want to set the color depending on the value of the species field, select “Discrete Mapping” in the dropdown element in the row “Mapping Type”.
  3. Now two additional rows are shown: “Arabidopsis thaliana” and “Pseudomonas syringae” For value “Arabidopsis thaliana” and the default color set green by pressing on the “…” button on the right side and on the red square, respectively.
  4. For “Pseudomonas syringae” set a red color.

Figure 8. New visualization style of network.

  1. Now select the value, which should be displayed as node name in the network. For this purpose select column “name” as value for the attribute “Label”. As mapping type “Passthrough Mapping” has to be selected, because the value of this column has to be directly shown in the network.
  2. More properties to adjust the style of the network can be found in “Properties” dropdown field. Select the “Label Position” attribute and set the default position of the label below the node.
  3. Save the network

Figure 9. Modify the node label position.

2.7 Analyze Network Component Properties

Network components (nodes and edges) are characterized by several network measures, which are derived from the position of the component in the network. Here we focus on the node properties degree, shortest path, betweenness, clustering coefficient and assortativity. These measures characterize single nodes and can be used to determine the importance of a node for a network, e.g. hubs are assumed to be important for organisms to be healthy. Furthermore, the distribution of the measures can be used to derive the overall network structure, which make an indication to the resistance of the network against perturbations by e.g. pathogens or mutations.

Figure 10. Network measures.

  • Degree: number of interacting proteins of A.
  • Shortest path: the shortest path between two nodes, with smaller number of steps than alternative paths.
  • Betweenness: describes the centrality of nodes in a network. Frequency how often a node is on the shortest path.
  • Clustering coefficient: a measure of the degree of interconnectivity in the neighborhood of a node.
  • Assortativity: average degree of the nearest neighbors of a node. Negative correlation of assortativity with degree => nodes with high degree interacts with nodes with a low degree; positive correlation => hubs interact with hubs.

Figure 11. 3 representative network models. Random network, scale-free network and hierachical network. (reference: Yamada, T. & Bork, P. Evolution of biomolecular networks: lessons from metabolic and protein interactions. Nature Reviews Molecular Cell Biology 10, 791–803 (2009).)


To determine node properties in Cytoscape the included tool NetworkAnalyzer can be used.

  1. Go Tools -> NetworkAnalyzer -> Network Analysis -> Analyze Network
  2. In the “NetworkAnalyzer – Network Interpretation” dialog window select “Treat the network as undirected”.
  3. Press OK
  4. The calculated values are added as node properties to the node table and edge table. The node values and additional statistics are also shown in the “Results Panel”.

Figure 12. Analyze network in Cytoscape.

Figure 13. The content of node table in Cytoscape.

Adjust now the style, so that the node’s degree attribute is used to set the node’s size.

  1. Select the “Size” attribute on the “Node” tab of the “Style” tab.
  2. As Column select “Degree”
  3. As mapping type select “Continues Mapping”
  4. Double click on the diagram right of “Current Mapping”: set the minimum node size to 15 and the maximum node size to 40. Therefor click on the black arrow to edit the field node size for minimum size, afterwards click on the right black arrow for maximum size.
  5. Press OK
  6. Save the network

Figure 14. Adjust the node size based on the degree parameter.

Figure 15. The output after adjustment.

2.8 Integrate Additional Interactions

To elucidate plant-pathogen protein interactions and gain insight into plant immunity it is not enough to identify direct plant-pathogen protein interactions. But it is necessary to integrate protein-protein interactions from the host plant to identify interaction partners of pathogen targets. These interaction partners can serve as “guard” proteins of pathogen effector targets. [Dodds and Rathjen, Plant immunity: towards an integrated view of plant-pathogen interactions, 2010, Nature Review Genetics]

Here we integrate two additional Arabidopsis Protein-Protein interaction set derived with the same Y2H interaction mapping pipeline like the effector-host interactions. The first set are an additional interactions of NB-LRR proteins (which act as guard proteins) which have been tested for binding against 8000 Arabidopsis proteins in Mukhtar et al, 2011. NB-LRR proteins are known to form complexes with other proteins to recognize effector proteins: “RIN4 forms exclusive complexes with the NB-LRR proteins RPM1 and RESISTANCE TO PSEUDOMONAS SYRINGAE 2 (RPS2). Degradation of RIN4 by the protease effector AvrRpt2 de-represses RPS2, whereas AvrB or AvrRPM1-mediated phosphorylation of RIN4 activates RPM1. Thus, modification of RIN4 by the effectors explains how an individual NB-LRR (in this case, RPM1) can recognize more than one effector.” [Dodds and Rathjen, Plant immunity: towards an integrated view of plant-pathogen interactions, 2010, Nature Review Genetics]

The second set is an interaction map, where almost all proteins, which have been tested against Pseudomonas effectors, have been tested against each other. This network is called Arabidopsis Interactome 1 and has been published in [Arabidopsis Interactome Mapping Consortium, Science, 2011].

Other sources of protein-protein interactions are public database, where interactions from various organisms are available. For Arabidopsis a lot of interactions can be found in IntAct and BioGRID.

2.8.1 Integrate Arabidopsis thaliana NB-LRR interactions from Mukhtar et al, 2011

  1. File -> Import -> Network -> File…
  2. Select File “Ath - Ath NBLRR interactions from Mukhtar et al 2011.xlsx”
  3. In the “Import Network From Table” dialog adjust the column properties
  4. Check that ida is annotated as “Source Node” and idb is annotated as “Target Node”
  5. Press OK

2.8.2 Integrate AI1Main interactions from Arabidopsis Interactome Mapping Consortium, 2011

  1. File -> Import -> Network -> File…
  2. Select File “Arabidopsisinteractome_SOM_TableS4_AI-1Main only.xlsx”
  3. In the “Import Network From Table” dialog adjust the column properties
  4. Set TAIR_LOCUS_IDA as “Source Node”
  5. TAIR_LOCUS_IDB as “Target Node”
  6. All other columns to “Not Imported”
  7. Press OK
  8. Two new networks are shown in the network tab

Figure 16. Import another network.

2.8.3 Integrate NB-LRR annotations

  1. Select File -> Import -> Table -> File…
  2. In the open file selection dialog, go to the folder “Internship” and select the file “Ath - Psy Protein Properties from Mukhtar et al 2011.xlsx”. Press Open
  3. Locus_id column must be selected as Key
  4. Press OK
  5. Save the network

Comprehensive information about Arabidopsis genes can be found in two online resources TAIR (The Arabidopsis Information Resource http://www.arabidopsis.org/) and Araport (Arabidopsis Information Portal https://www.araport.org/). TAIR is more up to date than Araport, but it needs a charged subscription from a certain number of queries. Araport is free of charge and integrates data from external databases.

  1. Identify the Arabidopsis Protein in the Ath – Psy network with the highest number of interactions in the Arabidopsis – Pseudomonas interactions.
  2. Check in Araport, if this protein is related to immunity

The gene information can be downloaded and integrated into the network. Import comprehensive gene annotation from Araport:

  1. Select File -> Import -> Table -> File…
  2. In the open file selection dialog, go to the folder “Internship” and select the file “araport11 gene annotation.xlsx”. Press Open
  3. Column “name” must be selected as key
  4. Press OK
  5. Save the network

Merge single networks to one big network

  1. Go to Tools -> Merge -> Networks…
  2. In the dialog window “Advanced Network Merge” move all three networks from the left side (“Available Networks”) to the right side (“Networks to Merge”)
  3. Press Merge
  4. After merging is finished -> save the network

Figure 17. Merge networks into a giant one.

Figure 18. The options of merge function. There are Union, Intersection and Difference 3 options, the union was used here.

2.8.4 Integration of RNA-Seq expression data

A different type of data, which can be integrated are expression data. RNA-seq data can be used to examine the impact of a pathogen on the expression of genes and investigate, if the expression change of a gene correlates with the position of the protein in the network. In our case it is interesting to check, which genes are differentially regulated upon pathogen infection.

RNA-Seq data can be found in the GEO (Gene Expression Omnibus) repository hosted by NCBI. A suitable dataset, which profiles the expression of genes in Arabidopsis thaliana at different time points under treatment with Pseudomonas syringae is GSE88798.

From this dataset log2 fold change values between treatment with Pto DC3000 carrying a vector with AvrRpm1 vs. mock treatment for all time points are in the file GSE88798_log2FoldChanges_Pto_AvrRpm1.csv. This data set has no to be imported as node attributes for the merged network.

  1. Select network “Merged Network”
  2. Select File -> Import -> Table -> File…
  3. In the open file selection dialog, go to the folder “Internship” and select the file GSE88798_log2FoldChanges_Pto_AvrRpm1.csv. Press Open
  4. Click on “Advanced Options…”
  5. As delimiter select only “;” (semicolon) -> Press OK
  6. For every column log2FoldChange_* change the data type to “Floating Point”
  7. Press OK

Figure 19. Import gene expression table.

The focus of this analysis is on the effector targets and their interaction partners. Therefor a subnetwork is extracted from the merged network.

  1. Select all proteins, which belong to the species “Pseudomonas syringae” in the table panel. Therefore sort the table by columns “Species” and select the proteins from Pseudomonas syringae.
  2. Right click on the selected table and select “Select nodes from selected rows”.
  3. Press the button “First Neighbors of selected nodes (undirected)” to select direct targets of Pseudomonas effectors.
  4. Press again the button “First Neighbors of selected nodes (undirected)” to select the interaction partners of the effector targets.
  5. Press the button “New Network from Selection (all edges).

Figure 20. Select nodes from node table.

Figure 21. Generate a subnetwork from selection.

Now we want to filter for interactions, where the Arabidopsis interaction partners of effector targets are upregulated 1 h after treatment. This can be done on the tab “Select”. On this tab the network can be filtered by the available columns.

  1. Select the new network “Merged Network(1)”
  2. Select tab “Select”.
  3. Filter for Pst Proteins by pressing “+” and select “Column Filter”.
  4. Select column “Node: Species” , “is” and enter the value “Pseudomonas syringae”.
  5. Add an additional column filter and select column “Node: log2FoldChange_1”, is and select the values between 2 and the maximum value of 5,002.

Figure 22. Filter nodes with filter function.

Figure 23. Filter with bacterial species.

Figure 24. Add the second filter parameter.

  1. Change the concatenation operator to Match any (OR).
  2. Press toolbar button “First Neighbor of Selected Nodes (Undirected)”
  3. Press toolbar button “New Network From Selection (all edges)
  4. The new network “Merged Network(2)” should contain 154 nodes and 309 interactions
  5. Press the toolbar button “Apply Preferred Layout” in the toolbar.

Prepare a new style

  1. Create new style: “FoldChange_1h”
  2. Node, Transparency: Default value 200
  3. Node, Fill Color: Default value = darkgreen, Column = log2FoldChange_1, Mapping Type = Continuous Mapping
  4. Node, Shape: Column = Species, Mapping Type = Discrete Mapping, Pseudomonas syringae = Round Rectangle
  5. Node, Label: Column = Gene_Name

3 Tasks

3.1 Task 1

Find NTL9 and identify all interacting Arabidopsis proteins, which are 2-fold upregulated. Check these proteins in Araport for their roles / functions. Are there they related to immunity, pathogens, …?

3.2 Task 2

Identify another subnetwork consisting of one another Pseudomonas effector, an effector target and an at least 2 fold upregulated interactor protein.

3.3 Task 3

Adjust the filter to select all proteins, which have an log2 fold change >= 2 any time point then change Column in Fill Color to other time points. How does expression change? Which genes are upregulated at which time point?