Sources

BioGRID

Download and convert BioGRID to BEL.

Run this script with python -m bio2bel.sources.biogrid

The interaction information contained in can be categorized into protein interactions, genetic interactions, chemical associations, and post-translational modifications. BioGRID includes information from major model organisms and humans.

The file downloaded from BioGRID is a zip archive containing a single file formatted in PSI MITAB level 2.5 compatible Tab Delimited Text file format, containing all interaction and associated annotation data.

The interaction types in BioGRID were in the PSI-MI (Proteomics Standards Initiative - Molecular Interactions Controlled Vocabulary) format and were mapped to BEL relations. The following table shows examples of how interaction types in BioGRID were mapped to BEL or other ontologies.

PSI-MI (BioGIRD)

Mapped BEL term

Source

Target

psi-mi:”MI:0794”(synthetic genetic interaction defined by inequality)

pybel.BELGraph.add_association

pybel.dsl.Gene

pybel.dsl.Gene

psi-mi:”MI:0915”(physical association)

pybel.BELGraph.add_association

pybel.dsl.Protein

pybel.dsl.Protein

psi-mi:”MI:0407”(direct interaction)

pybel.BELGraph.add_binds

pybel.dsl.Protein

pybel.dsl.Protein

Summary statistics of the BEL graph generated in the BioGRID module:

Key

Value

Version

v3.5.183

Nodes

293030

Edges

3127695

Citations

9

Components

1225

Density:

3.64E-05

bio2bel.sources.biogrid.get_bel()[source]

Get a BEL graph for BioGRID.

Return type

BELGraph

IntAct

Download and convert IntAct to BEL.

Run with python -m bio2bel.sources.intact

IntAct is a interaction database with information about interacting proteins, their relation, and the experiments, in which these interactions were found. Among the interactions that are documented in IntAct are protein modifications, associations, direct interactions, binding interactions and cleavage reactions. These interactions were grouped according to their biological interpretation and mapped to the corresponding BEL relation. The interactions in IntAct had a higher granularity than the interactions in BioGRID.

Due to the default BEL namespace of protein modifications pybel.language.pmod_namespace, the post-translational protein modification can be identified very accurately. For example, the glycosylation of a protein can be described in BEL by pybel.dsl.ProteinModification('Glyco'). Although many protein modifications had corresponding terms in BEL, there were some interaction types in IntAct that could not be mapped directly, like gtpase reaction or aminoacylation reaction.

Therefore, other vocabularies like the Gene Ontology (GO) or the Molecular Process Ontology (MOP) were used to find corresponding interaction terms. These terms were then annotated with the name, namespace and identifier. IntAct uses the PSI-MI (Proteomics Standards Initiative - Molecular Interactions Controlled Vocabulary) format to identify interaction types The following tables shows examples of how the interactions from IntAct were mapped to BEL or other ontologies.

Source Type

Target Type

Interaction Type

BEL Example

Protein

Protein

psi-mi:”MI:0193”(amidation reaction)

p(‘uniprot’, ‘P62865’) increases p(‘uniprot’, ‘P10731’)

Protein

Protein

psi-mi:”MI:1327”(sulfurtransfer reaction)

p(‘uniprot’, ‘Q46925’) increases p(‘uniprot’, ‘P0AGF2’)

Protein

Protein

psi-mi:”MI:0945”(oxidoreductase activity electron transfer reaction)

p(‘uniprot’, ‘P0A3E0’) increases p(‘uniprot’, ‘P21890’)

Protein

Protein

psi-mi:”MI:0217”(phosphorylation reaction)

p(‘uniprot’, ‘P53999’) increases p(‘uniprot’, ‘P68400’)

Protein

Protein

psi-mi:”MI:0567”(neddylation reaction)

p(‘uniprot’, ‘Q86XK2’) increases p(‘uniprot’, ‘Q15843’)

Protein

Protein

psi-mi:”MI:1148”(ampylation reaction)

p(‘uniprot’, ‘P60953-2’) increases p(‘uniprot’, ‘Q9BVA6’)

Protein

Protein

psi-mi:”MI:0883”(gtpase reaction)

p(‘chebi’, ‘15996’) increases p(‘uniprot’, ‘Q9HCN4’)

Protein

Protein

psi-mi:”MI:0557”(adp ribosylation reaction)

p(‘uniprot’, ‘P09874’) increases p(‘uniprot’, ‘P13010’)

Protein

Protein

psi-mi:”MI:0211”(lipid addition)

p(‘chebi’, ‘15532’) increases p(‘uniprot’, ‘Q9BR61’)

Protein

Protein

psi-mi:”MI:0192”(acetylation reaction)

p(‘uniprot’, ‘O15350’) increases p(‘uniprot’, ‘Q09472’)

Protein

Protein

psi-mi:”MI:0844”(phosphotransfer reaction)

p(‘chebi’, ‘15422’) increases p(‘uniprot’, ‘O13297’)

Protein

Protein

psi-mi:”MI:0220”(ubiquitination reaction)

p(‘uniprot’, ‘P32121’) increases p(‘uniprot’, ‘Q00987’)

Protein

Protein

psi-mi:”MI:0213”(methylation reaction)

p(‘uniprot’, ‘O60016’) increases p(‘uniprot’, ‘P09988’)

Protein

Protein

psi-mi:”MI:0214”(myristoylation reaction)

p(‘chebi’, ‘15532’) increases p(‘uniprot’, ‘Q9BR61’)

Protein

Protein

psi-mi:”MI:0216”(palmitoylation reaction)

p(‘uniprot’, ‘P60880’) increases p(‘uniprot’, ‘Q8IUH5’)

Protein

Gene

psi-mi:”MI:0701”(dna strand elongation)

p(‘uniprot’, ‘Q9NYJ8’) increases g(‘uniprot’, ‘Q62073’)

Protein

Protein

psi-mi:”MI:1250”(isomerase reaction)

p(‘uniprot’, ‘Q13526’) increases p(‘uniprot’, ‘Q3UVX5’)

Protein

Protein

psi-mi:”MI:0559”(glycosylation reaction)

p(‘uniprot’, ‘P18177’) increases p(‘uniprot’, ‘P63000’)

Protein

Protein

psi-mi:”MI:0566”(sumoylation reaction)

p(‘uniprot’, ‘P56693’) increases p(‘uniprot’, ‘P63165’)

Protein

Protein

psi-mi:”MI:0882”(atpase reaction)

p(‘chebi’, ‘15422’) increases p(‘uniprot’, ‘Q9ZNT0’)

Protein

Protein

psi-mi:”MI:1146”(phospholipase reaction)

p(‘chebi’, ‘40265’) increases p(‘uniprot’, ‘P30041’)

Protein

Protein

psi-mi:”MI:0556”(transglutamination reaction)

p(‘uniprot’, ‘P40337’) increases p(‘uniprot’, ‘P21980’)

Protein

Protein

psi-mi:”MI:1143”(aminoacylation reaction)

p(‘uniprot’, ‘Q89VT6’) increases p(‘uniprot’, ‘Q89VT8’)

Protein

Protein

psi-mi:”MI:0210”(hydroxylation reaction)

p(‘uniprot’, ‘Q16665’) increases p(‘uniprot’, ‘Q96KS0’)

Protein

Protein

psi-mi:”MI:1355”(lipid cleavage)

p(‘chebi’, ‘64583’) decreases p(‘uniprot’, ‘F1N588’)

Protein

Protein

psi-mi:”MI:0212”(lipoprotein cleavage reaction)

p(‘uniprot’, ‘P10515’) decreases p(‘uniprot’, ‘Q9Y6E7’)

Protein

Protein

psi-mi:”MI:2280”(deamidation reaction)

p(‘uniprot’, ‘Q86YW7’) decreases p(‘uniprot’, ‘P21163’)

Protein

Protein

psi-mi:”MI:0204”(deubiquitination reaction)

p(‘uniprot’, ‘Q93009’) decreases p(‘uniprot’, ‘P04637’)

Protein

Protein

psi-mi:”MI:0569”(deneddylation reaction)

p(‘uniprot’, ‘Q96LD8’) decreases p(‘uniprot’, ‘P62913’)

Protein

Protein

psi-mi:”MI:0985”(deamination reaction)

p(‘uniprot’, ‘Q8VSD5’) decreases p(‘uniprot’, ‘P61088’)

Protein

Protein

psi-mi:”MI:0871”(demethylation reaction)

p(‘uniprot’, ‘P68432’) decreases p(‘uniprot’, ‘P41229’)

Protein

Protein

psi-mi:”MI:0570”(protein cleavage)

p(‘uniprot’, ‘P04275’) decreases p(‘uniprot’, ‘Q76LX8’)

Protein

Gene

psi-mi:”MI:0572”(dna cleavage)

p(‘uniprot’, ‘A4GXA9’) decreases g(‘uniprot’, ‘Q96NY9’)

Protein

Protein

psi-mi:”MI:0197”(deacetylation reaction)

p(‘uniprot’, ‘Q71U36’) decreases p(‘uniprot’, ‘Q9UBN7’)

Protein

Protein

psi-mi:”MI:0199”(deformylation reaction)

p(‘uniprot’, ‘Q62962’) decreases p(‘uniprot’, ‘Q9EP80’)

Protein

Protein

psi-mi:”MI:1140”(decarboxylation reaction)

p(‘chebi’, ‘16810’) decreases p(‘uniprot’, ‘P9WJA9’)

Protein

Rna

psi-mi:”MI:0902”(rna cleavage)

p(‘uniprot’, ‘Q99714’) decreases r(‘uniprot’, ‘O15091’)

Protein

Protein

psi-mi:”MI:0194”(cleavage reaction)

p(‘uniprot’, ‘O14727’) decreases p(‘uniprot’, ‘P42574’)

Protein

Protein

psi-mi:”MI:0203”(dephosphorylation reaction)

p(‘uniprot’, ‘Q78DX7’) decreases p(‘uniprot’, ‘P29351’)

Protein

Protein

psi-mi:”MI:1127”(putative self interaction)

p(‘uniprot’, ‘O64517’) association p(‘uniprot’, ‘O64517’)

Protein

Protein

psi-mi:”MI:0915”(physical association)

p(‘uniprot’, ‘P34708-1’) association p(‘uniprot’, ‘P34709’)

Protein

Protein

psi-mi:”MI:0914”(association)

p(‘uniprot’, ‘P50570’) association p(‘uniprot’, ‘Q99961’)

Protein

Protein

psi-mi:”MI:1126”(self interaction)

p(‘uniprot’, ‘P28481’) association p(‘uniprot’, ‘P28481’)

Protein

Protein

psi-mi:”MI:0414”(enzymatic reaction)

p(‘uniprot’, ‘P15646’) association p(‘uniprot’, ‘Q02555’)

Protein

Protein

psi-mi:”MI:0403”(colocalization)

p(‘uniprot’, ‘P00519’) association p(‘uniprot’, ‘Q92558’)

Protein

Protein

psi-mi:”MI:0407”(direct interaction)

p(‘uniprot’, ‘P49418’) regulates p(‘uniprot’, ‘O43426’)

Protein

Protein

psi-mi:”MI:0195”(covalent binding)

p(‘uniprot’, ‘P0CG48’) hasComponent p(‘uniprot’, ‘P63146’)

Protein

Protein

psi-mi:”MI:0408”(disulfide bond)

p(‘uniprot’, ‘P73728’) hasComponent p(‘uniprot’, ‘P73728’)

For negative protein modifications in which a group is split from the protein like decarboxylation reaction, the positive term protein carboxylation is taken and a interaction describing the decrease of the target is taken.

In the case of gtpase reaction and atpase reaction, the notion of the source protein taking on the ability to catalyze a GTP or ATP hydrolysis had to be mentioned. Therefore, pybel.dsl.activity() was added as the subject_modifier of the source protein. A very special case was that of the dna strand elongation.

Here, the target was a gene and to capture the notion of the DNA strand elogation process, the corresponding GO term was added as a pybel.dsl.GeneModification. In the case of DNA or RNA cleavage, the target was set as the entity of pybel.dsl.Gene or pybel.dsl.Rna.

For the relation isomerase reaction there was no corresponding term in BEL denoting this process. Therefore, the molecular process isomerization from the MOP was used and annotated.

As IntAct and BioGRID are both interaction databases, the general code from biogrid.py could be taken as an initial approach. Due to the higher granularity of IntAct concerning the interaction types, many modifications and special cases as mentioned above had to be further investigated and were applied case-sensitive.

Moreover, a very interesting type of information in IntAct is the negative interaction data which means that a target would not be activated by the source. A future improvement would be to map this type of relations to negative BEL. In machine learning tasks like link prediction in graphs these negative edges could be used as negative samples to enhance the prediction quality of the model.

IntAct also gives internal accession numbers to some complexes, but there are no mappings from IntAct to other preferred resources like ComplexPortal yet. Therefore, these complexes are not taken into account in this module here. For further information on this matter please follow the ongoing dicussion on Twitter <https://twitter.com/cthoyt/status/1252345260740456453>_.

Next to IntAct and BioGRID, there are also other data resources that make use of the PSI-MI 2.5 format:

Summary statistics of the BEL graph generated in the IntAct module:

Key

Value

Version

v2020-03-31

Nodes

100115

Edges

1294252

Citations

20568

Components

3119

Density:

1.29E-04

bind

https://academic.oup.com/database/article/doi/10.1093/database/baq037/461120

hprd

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1347503/

dip

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC102387/

bio2bel.sources.intact.get_bel()[source]

Get BEL graph.

Return type

BELGraph

Protein Interactions Database (PID)

PID Importer.

bio2bel.sources.pid.iterate_graphs()[source]

List network uuids.

Return type

Iterable[Tuple[str, BELGraph]]

bio2bel.sources.pid.get_graph_from_cx(network_uuid, cx)[source]

Get a PID network from NDEx.

Return type

BELGraph

class bio2bel.sources.pid.Protein(**kwargs)[source]

Protein from PID.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

entrez_id

entrez id of the protein

hgnc_id: ClassVar[sqlalchemy.sql.schema.Column]

HGNC id of the protein

hgnc_symbol: ClassVar[sqlalchemy.sql.schema.Column]

HGN symbol of the protein

to_pybel()[source]

Return a protein.

Return type

Protein

class bio2bel.sources.pid.Pathway(**kwargs)[source]

Pathway from PID.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

identifier: ClassVar[sqlalchemy.sql.schema.Column]

HGNC gene family id of the protein

name: ClassVar[sqlalchemy.sql.schema.Column]

HGNC gene family name of the protein

class bio2bel.sources.pid.Manager(*args, **kwargs)[source]

Manager for PID.

Doesn’t let this class get instantiated if the pathway_model.

namespace_model

alias of Pathway

pathway_model

alias of Pathway

protein_model

alias of Protein

populate(*args, **kwargs)[source]

Populate the PID database.

Return type

None

TFRegulons

Exporter for TFregulons.

bio2bel.sources.tfregulons.get_df()[source]

Get the TFregulons dataframe.

Return type

DataFrame

bio2bel.sources.tfregulons.get_hgnc_ids(graph)[source]

Get HGNC identifiers for nodes in the graph.

Return type

Set[str]

bio2bel.sources.tfregulons.get_bel()[source]

Get the entirety of TFregulons as BEL.

Return type

BELGraph

bio2bel.sources.tfregulons.enrich_graph(graph)[source]

Enrich a graph with transcription factors effecting the genes/rnas/proteins in the graph.

Return type

None