Sources¶
BioGRID¶
Download and convert BioGRID to BEL.
Run this script with python -m bio2bel.sources.biogrid
The interaction information contained in can be categorized into protein interactions, genetic interactions, chemical associations, and post-translational modifications. BioGRID includes information from major model organisms and humans.
The file downloaded from BioGRID is a zip archive containing a single file formatted in PSI MITAB level 2.5 compatible Tab Delimited Text file format, containing all interaction and associated annotation data.
The interaction types in BioGRID were in the PSI-MI (Proteomics Standards Initiative - Molecular Interactions Controlled Vocabulary) format and were mapped to BEL relations. The following table shows examples of how interaction types in BioGRID were mapped to BEL or other ontologies.
PSI-MI (BioGIRD) |
Mapped BEL term |
Source |
Target |
---|---|---|---|
psi-mi:”MI:0794”(synthetic genetic interaction defined by inequality) |
|
||
psi-mi:”MI:0915”(physical association) |
|
||
psi-mi:”MI:0407”(direct interaction) |
|
Summary statistics of the BEL graph generated in the BioGRID module:
Key |
Value |
---|---|
Version |
v3.5.183 |
Nodes |
293030 |
Edges |
3127695 |
Citations |
9 |
Components |
1225 |
Density: |
3.64E-05 |
IntAct¶
Download and convert IntAct to BEL.
Run with python -m bio2bel.sources.intact
IntAct is a interaction database with information about interacting proteins, their relation, and the experiments, in which these interactions were found. Among the interactions that are documented in IntAct are protein modifications, associations, direct interactions, binding interactions and cleavage reactions. These interactions were grouped according to their biological interpretation and mapped to the corresponding BEL relation. The interactions in IntAct had a higher granularity than the interactions in BioGRID.
Due to the default BEL
namespace of protein modifications pybel.language.pmod_namespace
, the post-translational protein modification
can be identified very accurately. For example, the glycosylation of a protein can be described in BEL by
pybel.dsl.ProteinModification('Glyco')
. Although many protein modifications had corresponding terms in BEL,
there were some interaction types in IntAct that could not be mapped directly, like gtpase reaction
or aminoacylation reaction.
Therefore, other vocabularies like the Gene Ontology (GO) or the Molecular Process Ontology (MOP) were used to find corresponding interaction terms. These terms were then annotated with the name, namespace and identifier. IntAct uses the PSI-MI (Proteomics Standards Initiative - Molecular Interactions Controlled Vocabulary) format to identify interaction types The following tables shows examples of how the interactions from IntAct were mapped to BEL or other ontologies.
Source Type |
Target Type |
Interaction Type |
BEL Example |
---|---|---|---|
Protein |
Protein |
psi-mi:”MI:0193”(amidation reaction) |
p(‘uniprot’, ‘P62865’) increases p(‘uniprot’, ‘P10731’) |
Protein |
Protein |
psi-mi:”MI:1327”(sulfurtransfer reaction) |
p(‘uniprot’, ‘Q46925’) increases p(‘uniprot’, ‘P0AGF2’) |
Protein |
Protein |
psi-mi:”MI:0945”(oxidoreductase activity electron transfer reaction) |
p(‘uniprot’, ‘P0A3E0’) increases p(‘uniprot’, ‘P21890’) |
Protein |
Protein |
psi-mi:”MI:0217”(phosphorylation reaction) |
p(‘uniprot’, ‘P53999’) increases p(‘uniprot’, ‘P68400’) |
Protein |
Protein |
psi-mi:”MI:0567”(neddylation reaction) |
p(‘uniprot’, ‘Q86XK2’) increases p(‘uniprot’, ‘Q15843’) |
Protein |
Protein |
psi-mi:”MI:1148”(ampylation reaction) |
p(‘uniprot’, ‘P60953-2’) increases p(‘uniprot’, ‘Q9BVA6’) |
Protein |
Protein |
psi-mi:”MI:0883”(gtpase reaction) |
p(‘chebi’, ‘15996’) increases p(‘uniprot’, ‘Q9HCN4’) |
Protein |
Protein |
psi-mi:”MI:0557”(adp ribosylation reaction) |
p(‘uniprot’, ‘P09874’) increases p(‘uniprot’, ‘P13010’) |
Protein |
Protein |
psi-mi:”MI:0211”(lipid addition) |
p(‘chebi’, ‘15532’) increases p(‘uniprot’, ‘Q9BR61’) |
Protein |
Protein |
psi-mi:”MI:0192”(acetylation reaction) |
p(‘uniprot’, ‘O15350’) increases p(‘uniprot’, ‘Q09472’) |
Protein |
Protein |
psi-mi:”MI:0844”(phosphotransfer reaction) |
p(‘chebi’, ‘15422’) increases p(‘uniprot’, ‘O13297’) |
Protein |
Protein |
psi-mi:”MI:0220”(ubiquitination reaction) |
p(‘uniprot’, ‘P32121’) increases p(‘uniprot’, ‘Q00987’) |
Protein |
Protein |
psi-mi:”MI:0213”(methylation reaction) |
p(‘uniprot’, ‘O60016’) increases p(‘uniprot’, ‘P09988’) |
Protein |
Protein |
psi-mi:”MI:0214”(myristoylation reaction) |
p(‘chebi’, ‘15532’) increases p(‘uniprot’, ‘Q9BR61’) |
Protein |
Protein |
psi-mi:”MI:0216”(palmitoylation reaction) |
p(‘uniprot’, ‘P60880’) increases p(‘uniprot’, ‘Q8IUH5’) |
Protein |
Gene |
psi-mi:”MI:0701”(dna strand elongation) |
p(‘uniprot’, ‘Q9NYJ8’) increases g(‘uniprot’, ‘Q62073’) |
Protein |
Protein |
psi-mi:”MI:1250”(isomerase reaction) |
p(‘uniprot’, ‘Q13526’) increases p(‘uniprot’, ‘Q3UVX5’) |
Protein |
Protein |
psi-mi:”MI:0559”(glycosylation reaction) |
p(‘uniprot’, ‘P18177’) increases p(‘uniprot’, ‘P63000’) |
Protein |
Protein |
psi-mi:”MI:0566”(sumoylation reaction) |
p(‘uniprot’, ‘P56693’) increases p(‘uniprot’, ‘P63165’) |
Protein |
Protein |
psi-mi:”MI:0882”(atpase reaction) |
p(‘chebi’, ‘15422’) increases p(‘uniprot’, ‘Q9ZNT0’) |
Protein |
Protein |
psi-mi:”MI:1146”(phospholipase reaction) |
p(‘chebi’, ‘40265’) increases p(‘uniprot’, ‘P30041’) |
Protein |
Protein |
psi-mi:”MI:0556”(transglutamination reaction) |
p(‘uniprot’, ‘P40337’) increases p(‘uniprot’, ‘P21980’) |
Protein |
Protein |
psi-mi:”MI:1143”(aminoacylation reaction) |
p(‘uniprot’, ‘Q89VT6’) increases p(‘uniprot’, ‘Q89VT8’) |
Protein |
Protein |
psi-mi:”MI:0210”(hydroxylation reaction) |
p(‘uniprot’, ‘Q16665’) increases p(‘uniprot’, ‘Q96KS0’) |
Protein |
Protein |
psi-mi:”MI:1355”(lipid cleavage) |
p(‘chebi’, ‘64583’) decreases p(‘uniprot’, ‘F1N588’) |
Protein |
Protein |
psi-mi:”MI:0212”(lipoprotein cleavage reaction) |
p(‘uniprot’, ‘P10515’) decreases p(‘uniprot’, ‘Q9Y6E7’) |
Protein |
Protein |
psi-mi:”MI:2280”(deamidation reaction) |
p(‘uniprot’, ‘Q86YW7’) decreases p(‘uniprot’, ‘P21163’) |
Protein |
Protein |
psi-mi:”MI:0204”(deubiquitination reaction) |
p(‘uniprot’, ‘Q93009’) decreases p(‘uniprot’, ‘P04637’) |
Protein |
Protein |
psi-mi:”MI:0569”(deneddylation reaction) |
p(‘uniprot’, ‘Q96LD8’) decreases p(‘uniprot’, ‘P62913’) |
Protein |
Protein |
psi-mi:”MI:0985”(deamination reaction) |
p(‘uniprot’, ‘Q8VSD5’) decreases p(‘uniprot’, ‘P61088’) |
Protein |
Protein |
psi-mi:”MI:0871”(demethylation reaction) |
p(‘uniprot’, ‘P68432’) decreases p(‘uniprot’, ‘P41229’) |
Protein |
Protein |
psi-mi:”MI:0570”(protein cleavage) |
p(‘uniprot’, ‘P04275’) decreases p(‘uniprot’, ‘Q76LX8’) |
Protein |
Gene |
psi-mi:”MI:0572”(dna cleavage) |
p(‘uniprot’, ‘A4GXA9’) decreases g(‘uniprot’, ‘Q96NY9’) |
Protein |
Protein |
psi-mi:”MI:0197”(deacetylation reaction) |
p(‘uniprot’, ‘Q71U36’) decreases p(‘uniprot’, ‘Q9UBN7’) |
Protein |
Protein |
psi-mi:”MI:0199”(deformylation reaction) |
p(‘uniprot’, ‘Q62962’) decreases p(‘uniprot’, ‘Q9EP80’) |
Protein |
Protein |
psi-mi:”MI:1140”(decarboxylation reaction) |
p(‘chebi’, ‘16810’) decreases p(‘uniprot’, ‘P9WJA9’) |
Protein |
Rna |
psi-mi:”MI:0902”(rna cleavage) |
p(‘uniprot’, ‘Q99714’) decreases r(‘uniprot’, ‘O15091’) |
Protein |
Protein |
psi-mi:”MI:0194”(cleavage reaction) |
p(‘uniprot’, ‘O14727’) decreases p(‘uniprot’, ‘P42574’) |
Protein |
Protein |
psi-mi:”MI:0203”(dephosphorylation reaction) |
p(‘uniprot’, ‘Q78DX7’) decreases p(‘uniprot’, ‘P29351’) |
Protein |
Protein |
psi-mi:”MI:1127”(putative self interaction) |
p(‘uniprot’, ‘O64517’) association p(‘uniprot’, ‘O64517’) |
Protein |
Protein |
psi-mi:”MI:0915”(physical association) |
p(‘uniprot’, ‘P34708-1’) association p(‘uniprot’, ‘P34709’) |
Protein |
Protein |
psi-mi:”MI:0914”(association) |
p(‘uniprot’, ‘P50570’) association p(‘uniprot’, ‘Q99961’) |
Protein |
Protein |
psi-mi:”MI:1126”(self interaction) |
p(‘uniprot’, ‘P28481’) association p(‘uniprot’, ‘P28481’) |
Protein |
Protein |
psi-mi:”MI:0414”(enzymatic reaction) |
p(‘uniprot’, ‘P15646’) association p(‘uniprot’, ‘Q02555’) |
Protein |
Protein |
psi-mi:”MI:0403”(colocalization) |
p(‘uniprot’, ‘P00519’) association p(‘uniprot’, ‘Q92558’) |
Protein |
Protein |
psi-mi:”MI:0407”(direct interaction) |
p(‘uniprot’, ‘P49418’) regulates p(‘uniprot’, ‘O43426’) |
Protein |
Protein |
psi-mi:”MI:0195”(covalent binding) |
p(‘uniprot’, ‘P0CG48’) hasComponent p(‘uniprot’, ‘P63146’) |
Protein |
Protein |
psi-mi:”MI:0408”(disulfide bond) |
p(‘uniprot’, ‘P73728’) hasComponent p(‘uniprot’, ‘P73728’) |
For negative protein modifications in which a group is split from the protein like decarboxylation reaction, the positive term protein carboxylation is taken and a interaction describing the decrease of the target is taken.
In the case of gtpase reaction and
atpase reaction,
the notion of the source protein taking on the ability to catalyze a GTP or ATP hydrolysis had to be mentioned.
Therefore, pybel.dsl.activity()
was added as the subject_modifier of the source protein.
A very special case was that of the dna strand elongation.
Here, the target was a gene and to capture the notion of the DNA strand elogation process, the corresponding GO term
was added as a pybel.dsl.GeneModification
. In the case of DNA or RNA cleavage, the target was set as the entity
of pybel.dsl.Gene
or pybel.dsl.Rna
.
For the relation isomerase reaction there was no corresponding term in BEL denoting this process. Therefore, the molecular process isomerization from the MOP was used and annotated.
As IntAct and BioGRID are both interaction databases, the general code from biogrid.py could be taken as an initial approach. Due to the higher granularity of IntAct concerning the interaction types, many modifications and special cases as mentioned above had to be further investigated and were applied case-sensitive.
Moreover, a very interesting type of information in IntAct is the negative interaction data which means that a target would not be activated by the source. A future improvement would be to map this type of relations to negative BEL. In machine learning tasks like link prediction in graphs these negative edges could be used as negative samples to enhance the prediction quality of the model.
IntAct also gives internal accession numbers to some complexes, but there are no mappings from IntAct to other preferred resources like ComplexPortal yet. Therefore, these complexes are not taken into account in this module here. For further information on this matter please follow the ongoing dicussion on Twitter <https://twitter.com/cthoyt/status/1252345260740456453>_.
Next to IntAct and BioGRID, there are also other data resources that make use of the PSI-MI 2.5 format:
Biomolecular Interaction Network Database (BIND) [bind]
Summary statistics of the BEL graph generated in the IntAct module:
Key |
Value |
---|---|
Version |
v2020-03-31 |
Nodes |
100115 |
Edges |
1294252 |
Citations |
20568 |
Components |
3119 |
Density: |
1.29E-04 |
Protein Interactions Database (PID)¶
PID Importer.
-
bio2bel.sources.pid.
get_graph_from_cx
(network_uuid, cx)[source]¶ Get a PID network from NDEx.
- Return type
BELGraph
-
class
bio2bel.sources.pid.
Protein
(**kwargs)[source]¶ Protein from PID.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
entrez_id
¶ entrez id of the protein
-
hgnc_id
: ClassVar[Column]¶ HGNC id of the protein
-
hgnc_symbol
: ClassVar[Column]¶ HGN symbol of the protein
-
-
class
bio2bel.sources.pid.
Pathway
(**kwargs)[source]¶ Pathway from PID.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
identifier
: ClassVar[Column]¶ The local unique identifier for this pathway
-
name
: ClassVar[Column]¶ The preferred label for this pathway
-
-
class
bio2bel.sources.pid.
Manager
(*args, **kwargs)[source]¶ Manager for PID.
Doesn’t let this class get instantiated if the pathway_model.
-
namespace_model
¶ alias of
bio2bel.sources.pid.Pathway
-
pathway_model
¶ alias of
bio2bel.sources.pid.Pathway
-
protein_model
¶ alias of
bio2bel.sources.pid.Protein
-