Bio2BEL HMDB¶
Bio2BEL HMDB is a package which allows the user to work with a local sqlite version of the Human Metabolome Database (HMDB).
Next to creating the local database there are also functions provided, which will enrich given Biological Expression Language (BEL) graphs with information about metabolites, proteins and diseases, that is present in HMDB.
HMDB BEL namespaces for these BEL graphs can be written.
Installation¶
Setup¶
1. Create a bio2bel_hmdb.Manager
object¶
>>> from bio2bel_hmdb import Manager
>>> manager = Manager()
2. Create the tables in the database¶
>>> manager.create_all()
3. Populate the database¶
This step will take sometime since the HMDB XML data needs to be downloaded, parsed, and fed into the database line by line.
>>> manager.populate()
Enrichment¶
Enrich BEL graphs¶
In the current build it is possible to enrich BEL graphs containing metabolites with associated disease or protein information and to enrich BEL graphs containing disease or protein information with associated metabolites. This can be done with the functions further explained in BEL Serialization
2. Enriching BEL graphs¶
Using an BEL graph with metabolites (represented using the HMDB namespace) it can be enriched with disease and protein information from HMDB.
2.1 Metabolites-Proteins¶
For a graph containing metabolites:
>>> enrich_metabolites_proteins(bel_graph, manager)
The result of this will be a BEL graph which now includes relations between the metabolites and proteins.
For a graph containing proteins (named using uniprot identifiers):
>>> enrich_proteins_metabolites(bel_graph, manager)
This will result in a BEL graph where the proteins are linked to associated metabolites.
2.2 Metabolites-Diseases¶
For a graph containing metabolites:
>>> enrich_metabolites_diseases(bel_graph, manager)
The result of this will be a BEL graph which now includes relations between the metabolites and diseases.
For a graph containing diseases (named using HMDB identifiers):
>>> enrich_diseases_metabolites(bel_graph, manager)
This will result in a BEL graph where the diseases are linked to associated metabolites.
-
bio2bel_hmdb.enrich.
enrich_diseases_metabolites
(graph: pybel.struct.graph.BELGraph, manager: Optional[bio2bel_hmdb.manager.Manager] = None)[source]¶ Enrich a given BEL graph, which includes HMDB diseases with HMDB metabolites, which are associated to the diseases.
-
bio2bel_hmdb.enrich.
enrich_metabolites_diseases
(graph: pybel.struct.graph.BELGraph, manager: Optional[bio2bel_hmdb.manager.Manager] = None)[source]¶ Enrich a given BEL graph, which includes metabolites with diseases, to which the metabolites are associated.
Manager¶
The Manager is a key component of HMDB. This class is used to create, populate and query the local HMDB version.
-
class
bio2bel_hmdb.manager.
Manager
(*args, **kwargs)[source]¶ Metabolite-proteins and metabolite-disease associations.
-
get_hmdb_accession
()[source]¶ Create a list of all HMDB metabolite identifiers present in the database.
Return type: list
-
get_hmdb_diseases
()[source]¶ Create a list of all disease names present in the database.
Return type: list
-
get_metabolite_by_accession
(hmdb_metabolite_accession: str) → Optional[bio2bel_hmdb.models.Metabolite][source]¶ Query the constructed HMDB database and extract a metabolite object.
Parameters: hmdb_metabolite_accession – HMDB metabolite identifier Example:
>>> import bio2bel_hmdb >>> manager = bio2bel_hmdb.Manager() >>> manager.get_metabolite_by_accession("HMDB00072")
-
get_reference_by_pubmed_id
(pubmed_id: str) → Optional[bio2bel_hmdb.models.Reference][source]¶ Get a reference by its PubMed identifier if it exists.
Parameters: pubmed_id – The PubMed identifier to search
-
populate
(source: Optional[str] = None, map_dis: bool = True, group_size: int = 500000)[source]¶ Populate the database with the HMDB data.
Parameters: - source – Path to an .xml file. If None the whole HMDB will be downloaded and used for population.
- map_dis – Should diseases be mapped?
-
query_disease_associated_metabolites
(disease_name: str) → List[bio2bel_hmdb.models.Metabolite][source]¶ Query function that returns a list of metabolite-disease interactions, which are associated to a disease.
Parameters: disease_name – HMDB disease name
-
query_metabolite_associated_diseases
(hmdb_metabolite_id: str) → List[bio2bel_hmdb.models.Disease][source]¶ Query the constructed HMDB database to get the metabolite associated disease relations for BEL enrichment
Parameters: hmdb_metabolite_id – HMDB metabolite identifier
-
query_metabolite_associated_proteins
(hmdb_metabolite_id: str) → Optional[List[bio2bel_hmdb.models.Protein]][source]¶ Query the constructed HMDB database to get the metabolite associated protein relations for BEL enrichment
Parameters: hmdb_metabolite_id – HMDB metabolite identifier
-
query_protein_associated_metabolites
(uniprot_id)[source]¶ Query function that returns a list of metabolite-disease interactions, which are associated to a disease.
Parameters: uniprot_id (str) – uniprot identifier of a protein for which the associated metabolite relations should be outputted Return type: list
-
Models¶
The data model for the local HMDB version consists of 22 different tables that represent the relations found in the original HMDB data.
-
class
bio2bel_hmdb.models.
Biofluid
(**kwargs)[source]¶ Table storing the different biofluids.
-
biofluid
¶ Name of the biofluid
-
-
class
bio2bel_hmdb.models.
Biofunction
(**kwargs)[source]¶ Table for storing the ‘biofunctions’ annotations
-
class
bio2bel_hmdb.models.
CellularLocation
(**kwargs)[source]¶ Table for storing the cellular location GO annotations
-
class
bio2bel_hmdb.models.
Disease
(**kwargs)[source]¶ Table storing the diseases and their ids.
-
dion
¶ Disease Ontology name for this disease. Found using string matching
-
hpo
¶ Human Phenotype Ontology name for this disease. Found using string matching
-
mesh_diseases
¶ MeSH Disease name for this disease. Found using string matching
-
name
¶ Name of the disease
-
omim_id
¶ OMIM identifier associated with the disease
-
-
class
bio2bel_hmdb.models.
Metabolite
(**kwargs)[source]¶ Table which stores the metabolites and all the information provided about them in HMDB.
-
accession
¶ Accession ID for the metabolite
-
average_molecular_weight
¶ Average molecular weight of the metabolite
-
bigg_id
¶ Bigg ID of the metabolite
-
biocyc_id
¶ BioCyc ID of the metabolite
-
cas_registry_number
¶ Cas registry number of the metabolite
-
chebi_id
¶ ChEBI identifier of the metabolite
-
chemical_formula
¶ Chemical formula of the metabolite
-
chemspider_id
¶ Chemspider ID of the metabolite
-
creation_date
¶ Date when the metabolite was included into HMDB
-
description
¶ Description including some information about the metabolite
-
drugbank_id
¶ DrugBank identifier of the metabolite
-
drugbank_metabolite_id
¶ Drugbank metabolite ID of the metabolite
-
foodb_id
¶ FooDB ID of the metabolite
-
het_id
¶ Het ID of the metabolite
-
inchi
¶ InChi of the metabolite
-
inchikey
¶ InCHI key of the metabolite
-
iupac_name
¶ IUPAC name of the metabolite
-
kegg_id
¶ KEGG ID of the metabolite
-
knapsack_id
¶ Knapsack ID of the metabolite
-
metagene
¶ Metagene ID of the metabolite
-
metlin_id
¶ Metlin ID of the metabolite
-
monisotopic_molecular_weight
¶ Monisotopic weight of the molecule
-
name
¶ Name of the metabolite
-
nugowiki
¶ NukoWiki ID of the metabolite
-
phenol_explorer_compound_id
¶ Phenol explorer compound ID of the metabolite
-
phenol_explorer_metabolite_id
¶ Phenol explorer metabolite ID of the metabolite
-
pubchem_compound_id
¶ PubChem compound ID of the metabolite
-
serialize_to_bel
() → pybel.dsl.node_classes.Abundance[source]¶ Function to serialize a metabolite object to a PyBEL node data dictionary.
-
smiles
¶ Smiles representation of the metabolite
-
state
¶ Aggregate state of the metabolite
-
synthesis_reference
¶ Synthesis reference citation of the metabolite
-
trivial
¶ Trivial name of the metabolite
-
update_date
¶ Date when the entry was last updated
-
version
¶ Current version listing that metabolite
-
wikipedia
¶ Wikipedia name of the metabolite
-
-
class
bio2bel_hmdb.models.
MetaboliteBiofluid
(**kwargs)[source]¶ Table representing the Metabolite and Biofluid relations.
-
class
bio2bel_hmdb.models.
MetaboliteBiofunction
(**kwargs)[source]¶ Table storing the many to many relations between metabolites and cellular location GO annotations
-
class
bio2bel_hmdb.models.
MetaboliteCellularLocation
(**kwargs)[source]¶ Table storing the many to many relations between metabolites and cellular location GO annotations
-
class
bio2bel_hmdb.models.
MetaboliteDiseaseReference
(**kwargs)[source]¶ Table storing the relations between disease and metabolite
-
class
bio2bel_hmdb.models.
MetabolitePathway
(**kwargs)[source]¶ Table storing the different relations between pathways and metabolites.
-
class
bio2bel_hmdb.models.
MetaboliteProtein
(**kwargs)[source]¶ Table representing the many to many relationship between metabolites and proteins.
-
class
bio2bel_hmdb.models.
MetaboliteReference
(**kwargs)[source]¶ Table representing the many to many relationship between metabolites and references.
-
class
bio2bel_hmdb.models.
MetaboliteSynonym
(**kwargs)[source]¶ Table storing the synonyms of metabolites.
-
synonym
¶ Synonym for the metabolite
-
-
class
bio2bel_hmdb.models.
MetaboliteTissue
(**kwargs)[source]¶ Table storing the different relations between tissues and metabolites
-
class
bio2bel_hmdb.models.
Pathway
(**kwargs)[source]¶ Table storing the different tissues.
-
kegg_map_id
¶ KEGG Map identifier of the pathway.
-
name
¶ Name of the pathway.
-
smpdb_id
¶ SMPDB identifier of the pathway.
-
-
class
bio2bel_hmdb.models.
PropertyKinds
(**kwargs)[source]¶ Table storing the ‘kind’ of chemical properties e.g. logP.
Not used for BEL enrichment
-
kind
¶ the ‘kind’ of chemical properties e.g. logP, melting point etc
-
-
class
bio2bel_hmdb.models.
PropertySource
(**kwargs)[source]¶ Table storing the sources of properties e.g. software like ‘ALOGPS’.
Not used for BEL enrichment
-
class
bio2bel_hmdb.models.
PropertyValues
(**kwargs)[source]¶ Table storing the values of chemical properties.
Not used for BEL enrichment
-
value
¶ value of a chemical property (e.g. logp) that will be linked to the properts and metabolites
-
-
class
bio2bel_hmdb.models.
Protein
(**kwargs)[source]¶ Table to store the protein information.
-
gene_name
¶ Gene name of the protein coding gene
-
protein_accession
¶ HMDB accession number for the protein
-
protein_type
¶ Protein type like ‘enzyme’ etc.
-
serialize_to_bel
() → pybel.dsl.node_classes.Protein[source]¶ Function to serialize a protein object to a PyBEL node data dictionary.
-
uniprot_id
¶ UniProt identifier of the protein
-
-
class
bio2bel_hmdb.models.
Reference
(**kwargs)[source]¶ Table storing literature references.
-
pubmed_id
¶ PubMed identifier of the article
-
reference_text
¶ Citation of the reference article
-
Creating BEL Namespaces¶
Current Status¶
What is still missing?¶
Not all of the information found in HMDB is yet integrated.
Bio2BEL HMDB does not yet include: - Taxonomy information - Spectra information - Experimental properties (datamodel is implemented but tables will not get populated) - Predicted properties (datamodel is implemented but tables will not get populated) - Normal concentration - Abnormal concentration
Bio2BEL HMDB still lacks functions to: - convert metabolite namespaces from and to HMDB identifiers - query functions (only querying with metabolite identifiers for diseases and proteins and vice versa is supported right now)
Roadmap¶
The next steps in the development of Bio2BEL HMDB are:
- add namespace mappings from metabolite HMDB identifiers to different databases/namespaces
- add query functions for several tables and entries
- change BEL enrichment functions to automatically work even when pathology nodes are not in HMDB disease namespace
- include missing HMDB tables and relations listed above
- maybe add parallelization to the database population to improve run time