Online version script: https://rpubs.com/lincw/praktikum-cytoscape
The aim of this part is to visualize and analyze a host-pathogen protein interaction map derived from a Y2H experiment using the software Cytoscape. The network is extended with additional interactions and protein properties from online resources. Furthermore expression fold change values derived from an RNA-Seq experiment are integrated to examine the expression dynamic after pathogen treatment.
Cytoscape is an open-source software platform for visualizing complex networks and integrating these with any type of attribute data.
After starting Cytoscape five sample sessions are provided. To get familiar with Cytoscape’s user interface, select the “Yeast Perturbation” sample session.
Figure 1. Cytoscape user interface.
A selection of elements in the Cytoscape window are shown in figure 2. The number of the following list correspond to the numbers in the following figure.
Figure 2. Introduce of Cytoscape interface.
Most times the networks are not available as Cytoscape files, but they are available as excel or comma separated files with at least two column. The format is usually that each row contains an interacting pair of nodes and optional interaction parameters (e.g. to which network does an interaction belong or in which screen / experiment an interaction has been identified). Nodes can represent different entities: proteins, genes, people, … This list of interactions can be imported into Cytoscape, where it is represented as network. To import a network follow these steps:
Figure 3. Import network from file.
Figure 4. Preview of import content.
Figure 5. Visualization of imported network.
The import of node and edge attributes allows to integrate data from different sources. This supports the analysis of the network by adding additional information to the network components. In this step the species and some other protein attributes are imported.
Figure 6. Import annotation from table.
Styles allow to control how nodes and edges are represented. The properties of each style element can be defined as static as well as dynamic attribute, which depends from the value of a selected property of a network element. Here we create a style, which adjusts the representation of proteins / nodes depending on the value of the species attribute.
Figure 7. Adjust the network style.
Now adjust the node color depending on the value of the species attribute of the proteins. Note that some proteins, which belong to Arabidopsis thaliana have no value for the “Species” attributes.
Figure 8. New visualization style of network.
Figure 9. Modify the node label position.
Network components (nodes and edges) are characterized by several network measures, which are derived from the position of the component in the network. Here we focus on the node properties degree, shortest path, betweenness, clustering coefficient and assortativity. These measures characterize single nodes and can be used to determine the importance of a node for a network, e.g. hubs are assumed to be important for organisms to be healthy. Furthermore, the distribution of the measures can be used to derive the overall network structure, which make an indication to the resistance of the network against perturbations by e.g. pathogens or mutations.
Figure 10. Network measures.
Figure 11. 3 representative network models. Random network, scale-free network and hierachical network. (reference: Yamada, T. & Bork, P. Evolution of biomolecular networks: lessons from metabolic and protein interactions. Nature Reviews Molecular Cell Biology 10, 791–803 (2009).)
To determine node properties in Cytoscape the included tool NetworkAnalyzer can be used.
Figure 12. Analyze network in Cytoscape.
Figure 13. The content of node table in Cytoscape.
Adjust now the style, so that the node’s degree attribute is used to set the node’s size.
Figure 14. Adjust the node size based on the degree parameter.
Figure 15. The output after adjustment.
To elucidate plant-pathogen protein interactions and gain insight into plant immunity it is not enough to identify direct plant-pathogen protein interactions. But it is necessary to integrate protein-protein interactions from the host plant to identify interaction partners of pathogen targets. These interaction partners can serve as “guard” proteins of pathogen effector targets. [Dodds and Rathjen, Plant immunity: towards an integrated view of plant-pathogen interactions, 2010, Nature Review Genetics]
Here we integrate two additional Arabidopsis Protein-Protein interaction set derived with the same Y2H interaction mapping pipeline like the effector-host interactions. The first set are an additional interactions of NB-LRR proteins (which act as guard proteins) which have been tested for binding against 8000 Arabidopsis proteins in Mukhtar et al, 2011. NB-LRR proteins are known to form complexes with other proteins to recognize effector proteins: “RIN4 forms exclusive complexes with the NB-LRR proteins RPM1 and RESISTANCE TO PSEUDOMONAS SYRINGAE 2 (RPS2). Degradation of RIN4 by the protease effector AvrRpt2 de-represses RPS2, whereas AvrB or AvrRPM1-mediated phosphorylation of RIN4 activates RPM1. Thus, modification of RIN4 by the effectors explains how an individual NB-LRR (in this case, RPM1) can recognize more than one effector.” [Dodds and Rathjen, Plant immunity: towards an integrated view of plant-pathogen interactions, 2010, Nature Review Genetics]
The second set is an interaction map, where almost all proteins, which have been tested against Pseudomonas effectors, have been tested against each other. This network is called Arabidopsis Interactome 1 and has been published in [Arabidopsis Interactome Mapping Consortium, Science, 2011].
Other sources of protein-protein interactions are public database, where interactions from various organisms are available. For Arabidopsis a lot of interactions can be found in IntAct and BioGRID.
Figure 16. Import another network.
Comprehensive information about Arabidopsis genes can be found in two online resources TAIR (The Arabidopsis Information Resource http://www.arabidopsis.org/) and Araport (Arabidopsis Information Portal https://www.araport.org/). TAIR is more up to date than Araport, but it needs a charged subscription from a certain number of queries. Araport is free of charge and integrates data from external databases.
The gene information can be downloaded and integrated into the network. Import comprehensive gene annotation from Araport:
Merge single networks to one big network
Figure 17. Merge networks into a giant one.
Figure 18. The options of merge function. There are Union, Intersection and Difference 3 options, the union was used here.
A different type of data, which can be integrated are expression data. RNA-seq data can be used to examine the impact of a pathogen on the expression of genes and investigate, if the expression change of a gene correlates with the position of the protein in the network. In our case it is interesting to check, which genes are differentially regulated upon pathogen infection.
RNA-Seq data can be found in the GEO (Gene Expression Omnibus) repository hosted by NCBI. A suitable dataset, which profiles the expression of genes in Arabidopsis thaliana at different time points under treatment with Pseudomonas syringae is GSE88798.
From this dataset log2 fold change values between treatment with Pto DC3000 carrying a vector with AvrRpm1 vs. mock treatment for all time points are in the file GSE88798_log2FoldChanges_Pto_AvrRpm1.csv. This data set has no to be imported as node attributes for the merged network.
GSE88798_log2FoldChanges_Pto_AvrRpm1.csv. Press OpenFigure 19. Import gene expression table.
The focus of this analysis is on the effector targets and their interaction partners. Therefor a subnetwork is extracted from the merged network.
Figure 20. Select nodes from node table.
Figure 21. Generate a subnetwork from selection.
Now we want to filter for interactions, where the Arabidopsis interaction partners of effector targets are upregulated 1 h after treatment. This can be done on the tab “Select”. On this tab the network can be filtered by the available columns.
Figure 22. Filter nodes with filter function.
Figure 23. Filter with bacterial species.
Figure 24. Add the second filter parameter.
Prepare a new style
Find NTL9 and identify all interacting Arabidopsis proteins, which are 2-fold upregulated. Check these proteins in Araport for their roles / functions. Are there they related to immunity, pathogens, …?
Identify another subnetwork consisting of one another Pseudomonas effector, an effector target and an at least 2 fold upregulated interactor protein.
Adjust the filter to select all proteins, which have an log2 fold change >= 2 any time point then change Column in Fill Color to other time points. How does expression change? Which genes are upregulated at which time point?