How to Make a Bio2BEL Package

In this tutorial, we’re going to explain how to make your own Bio2BEL package using miRTarBase as an example. This package already exists and is an excellent example.

Naming the Package

The package should be named bio2bel_XXX with all lowercase letters for the name of the package, even if the source uses stylized capitalization. This means our example package wil be called bio2bel_mirtarbase.

Note that the repository can be named differently from the package. On the Bio2BEL organization on GitHub, we have chosen to use simply a lowercase name of the source to eliminate redundancy in the URL.

Organizing Constants

The package should have a top-level module named constants.py as an easily accesible location for constants. A variable called MODULE_NAME should be defined with the lowercase name of the source. Additionally, the functions bio2bel.get_data_dir() and bio2bel.get_connection() to locate the appropriate directory for data and configuration loading.

# /src/bio2bel_mirtarbase/constants.py

from bio2bel import get_data_dir, get_connection

MODULE_NAME = 'mirtarbase'
DATA_DIRECTORY_PATH = get_data_dir(MODULE_NAME)
DEFAULT_CONNECTION = get_connection(MODULE_NAME)

Making a Manager

There should be a concrete implementation of bio2bel.AbstractManager. For consistent style, we recommend implementing this in a top-level module called manager.py and naming the class Manager. Check the miRTarBase repository for an example of the package structure and an example of the implementation of a Manager.

class bio2bel.AbstractManager(*args, **kwargs)[source]

This is a base class for implementing your own Bio2BEL manager.

It already includes functions to handle configuration, construction of a connection to a database using SQLAlchemy, creation of the tables defined by your own sqlalchemy.ext.declarative.declarative_base(), and has hooks to override that populate and make simple queries to the database. Since AbstractManager inherits from abc.ABC and is therefore an abstract class, there are a few class variables, functions, and properties that need to be overridden.

Overriding the Module Name

First, the class-level variable module_name must be set to a string corresponding to the name of the data source.

from bio2bel import AbstractManager

class Manager(AbstractManager):
    module_name = 'mirtarbase'  # note: use lower case module names

In general, this should also correspond to the same value as MODULE_NAME set in constants.py and can also be set with an assignment to this value

from bio2bel import AbstractManager
from .constants import MODULE_NAME

class Manager(AbstractManager):
    module_name = MODULE_NAME

Setting the Declarative Base

Building on the previous example, the (private) abstract property bio2bel.AbstractManager._base must be overridden to return the value from your sqlalchemy.ext.declarative.declarative_base(). We chose to make this an instance-level property instead of a class-level variable so each manager could have its own information about connections to the database.

As a minimal example:

from sqlalchemy.ext.declarative import DeclarativeMeta, declarative_base

from bio2bel import AbstractManager

Base: DeclarativeMeta = declarative_base()

class Manager(AbstractManager):
    module_name = 'mirtarbase'  # note: use lower case module names

    @property
    def _base(self) -> DeclarativeMeta:
        return Base

In general, the models should be defined in a module called models.py so the Base can also be imported.

from sqlalchemy.ext.declarative import DeclarativeMeta

from bio2bel import AbstractManager

from .constants import MODULE_NAME
from .models import Base

class Manager(AbstractManager):
    module_name = MODULE_NAME

    @property
    def _base(self) -> DeclarativeMeta:
        return Base

Populating the Database

Deciding how to populate the database using your SQLAlchemy models is incredibly creative and can’t be given a good example without checking real code. See the previously mentioned implementation of a Manager.

from sqlalchemy.ext.declarative import DeclarativeMeta

from bio2bel import AbstractManager

from .constants import MODULE_NAME
from .models import Base

class Manager(AbstractManager):
    module_name = MODULE_NAME

    @property
    def _base(self) -> DeclarativeMeta:
        return Base

    def populate(self) -> None:
        ...

Checking the Database is Populated

A method for checking if the database has been populated already must be implemented as well. The easiest way to implement this is to check that there’s a non-zero count of whatever the most important model in the database is.

from sqlalchemy.ext.declarative import DeclarativeMeta

from bio2bel import AbstractManager

from .constants import MODULE_NAME
from .models import Base

class Manager(AbstractManager):
    module_name = MODULE_NAME

    @property
    def _base(self) -> DeclarativeMeta:
        return Base

    def populate(self) -> None:
        ...

    def is_populated(self) -> bool:
        return 0 < self.session.query(MyImportantModel).count()

There are several mixins that can be optionally inherited:

  1. bio2bel.manager.flask_manager.FlaskMixin: the Flask Mixin creates a Flask-Admin web application.

  2. bio2bel.manager.namespace_manager.BELNamespaceManagerMixin: the BEL Namespace Manager Mixin exports a BEL namespace and interact with PyBEL.

  3. bio2bel.manager.bel_manager.BELManagerMixin: the BEL Manager Mixin exports a BEL script and interact with PyBEL.

Build an abstract manager from either a connection or an engine/session.

The remaining keyword arguments are passed to build_engine_session().

Parameters
  • connection (Optional[str]) –

  • engine

  • session

abstract is_populated()[source]

Check if the database is already populated.

Return type

bool

abstract populate(*args, **kwargs)[source]

Populate the database.

Return type

None

abstract summarize()[source]

Summarize the database.

Return type

Mapping[str, int]

create_all(check_first=True)[source]

Create the empty database (tables).

Parameters

check_first (bool) – Defaults to True, don’t issue CREATEs for tables already present in the target database. Defers to sqlalchemy.sql.schema.MetaData.create_all()

drop_all(check_first=True)[source]

Drop all tables from the database.

Parameters

check_first (bool) – Defaults to True, only issue DROPs for tables confirmed to be present in the target database. Defers to sqlalchemy.sql.schema.MetaData.drop_all()

classmethod get_cli()[source]

Get the click main function to use as a command line interface.

Return type

Group

property connection

Return this manager’s connection string.

Return type

str

Mixins

Flask

class bio2bel.manager.flask_manager.FlaskMixin(*args, **kwargs)[source]

A mixin for building a Flask-Admin interface.

This class can be used as a mixin, meaning that a class inheriting from AbstractManager can also multiple-inherit from this class. It contains functions to build a flask application for easy viewing of the contents of the database.

First, you’ll have to make sure that the flask and flask-admin are installed. They can be installed with Bio2BEL using the package extra called “web” like:

$ pip install bio2bel[web]

Or, installed directly with pip:

$ pip install flask flask-admin

Besides this, all that’s necessary to use this mixin is to define the class variable flask_admin_models as a list of SQLAlchemy models you’d like to see.

>>> from sqlalchemy.ext.declarative import DeclarativeMeta
>>>
>>> from bio2bel import AbstractManager
>>> from bio2bel.manager.flask_manager.FlaskMixin
>>>
>>> from .constants import MODULE_NAME
>>> from .models import Base, Evidence, Interaction, Mirna, Species, Target
>>>
>>> class Manager(AbstractManager):
...    module_name = MODULE_NAME
...    flask_admin_models = [Evidence, Interaction, Mirna, Species, Target]
...
...    @property
...    def _base(self) -> DeclarativeMeta:
...        return Base
...
...    def populate(self) -> None:
...        ...

Build an abstract manager from either a connection or an engine/session.

The remaining keyword arguments are passed to build_engine_session().

Parameters
  • connection (Optional[str]) –

  • engine

  • session

flask_admin_models = Ellipsis

Represents a list of SQLAlchemy classes to make a Flask-Admin interface.

get_flask_admin_app(url=None, secret_key=None)[source]

Create a Flask application.

Parameters

url (Optional[str]) – Optional mount point of the admin application. Defaults to '/'.

Return type

flask.Flask

classmethod get_cli()[source]

Add a click main function to use as a command line interface.

Return type

Group

BEL Namespace

class bio2bel.manager.namespace_manager.BELNamespaceManagerMixin(*args, **kwargs)[source]

A mixin for generating a BEL namespace file and uploading it to the PyBEL database.

First, you’ll have to make sure that pybel is installed. This can be done with pip like:

$ pip install pybel

To use this mixin, you need to properly implement the AbstractManager, and add additional class variables and functions.

namespace_model: The SQLAlchemy class that represents the entity to serialize into the namespace

>>> from bio2bel import AbstractManager
>>> from bio2bel.namespace_manager import NamespaceManagerMixin
>>> from .models import HumanGene
>>>
>>> class MyManager(AbstractManager, NamespaceManagerMixin):
...     module_name = 'hgnc'
...     ...
...     namespace_model = HumanGene

Several fields from Identifiers.org should be populated, including:

  1. identifiers_recommended

  2. identifiers_pattern

  3. identifiers_miriam

  4. identifiers_namespace

  5. identifiers_url

>>> from bio2bel import AbstractManager
>>> from bio2bel.namespace_manager import NamespaceManagerMixin
>>> from .models import HumanGene
>>>
>>> class MyManager(AbstractManager, NamespaceManagerMixin):
...     module_name = 'hgnc'
...     ...
...     namespace_model = HumanGene
...     identifiers_recommended = 'HGNC'
...     identifiers_pattern = '...'
...     identifiers_miriam = 'MIR:00000080'
...     identifiers_namespace = 'hgnc'
...     identifiers_url = 'http://identifiers.org/hgnc/'

Two methods need to be implemented. First, the static method _get_identifier should take in the namespace model and give back the database identifier. for us, this is easy, since the HumanGene class has an attribute called hgnc_id.

Perhaps in the future, we will enfoce the convention that the namespace model should have a field called <module name>_id, but having this method gives lots of flexibility.

This is also a good place to add more specific type annotations (not yet tested with MyPy).

>>> from bio2bel import AbstractManager
>>> from bio2bel.namespace_manager import NamespaceManagerMixin
>>> from .models import HumanGene
>>>
>>> class MyManager(AbstractManager, NamespaceManagerMixin):
...     module_name = 'hgnc'
...     ...
...     namespace_model = HumanGene
...     identifiers_recommended = 'HGNC'
...     identifiers_pattern = '...'
...     identifiers_miriam = 'MIR:00000080'
...     identifiers_namespace = 'hgnc'
...     identifiers_url = 'http://identifiers.org/hgnc/'
...
...     @staticmethod
...     def _get_identifier(model: HumanGene) -> str:
...         return model.hgnc_id

Last, we must implement the method _create_namespace_entry_from_model, which encodes the logic of building a pybel.manager.models.NamespaceEntry from the Bio2BEL repository’s namespace model.

For a repository like ChEBI, this is very simple, but for HGNC there is reason to add additional logic to get the proper encodings.

>>> from bio2bel import AbstractManager
>>> from bio2bel.namespace_manager import NamespaceManagerMixin
>>> from pybel.manager.models import Namespace, NamespaceEntry
>>> from .models import HumanGene
>>>
>>> class MyManager(AbstractManager, NamespaceManagerMixin):
...     module_name = 'hgnc'
...     ...
...     namespace_model = HumanGene
...     identifiers_recommended = 'HGNC'
...     identifiers_pattern = '...'
...     identifiers_miriam = 'MIR:00000080'
...     identifiers_namespace = 'hgnc'
...     identifiers_url = 'http://identifiers.org/hgnc/'
...
...     @staticmethod
...     def _get_identifier(model: HumanGene) -> str:
...         return model.hgnc_id
...
...     def _create_namespace_entry_from_model(self, model: HumanGene, namespace: Namespace) -> NamespaceEntry:
...         return NamespaceEntry(
...             encoding=encodings.get(model.locus_type, 'GRP'),
...             identifier=model.hgnc_id,
...             name=model.hgnc_symbol,
...             namespace=namespace,
...         )
has_names = True

Can be set to False for namespaces that don’t have labels

add_namespace_to_graph(graph)[source]

Add this manager’s namespace to the graph.

Return type

Namespace

upload_bel_namespace(update=False)[source]

Upload the namespace to the PyBEL database.

Parameters

update (bool) – Should the namespace be updated first?

Return type

Namespace

drop_bel_namespace()[source]

Remove the default namespace if it exists.

Return type

Optional[Namespace]

write_bel_namespace(file, use_names=False)[source]

Write as a BEL namespace file.

Return type

None

write_bel_annotation(file)[source]

Write as a BEL annotation file.

Return type

None

write_bel_namespace_mappings(file, **kwargs)[source]

Write a BEL namespace mapping file.

Return type

None

write_directory(directory)[source]

Write a BEL namespace for identifiers, names, name hash, and mappings to the given directory.

Return type

bool

get_namespace_hash(hash_fn=None)[source]

Get the namespace hash.

Defaults to MD5.

Return type

str

classmethod get_cli()[source]

Get a click main function with added BEL namespace commands.

Return type

Group

BEL Network

class bio2bel.manager.bel_manager.BELManagerMixin[source]

A mixin for generating a pybel.BELGraph representing BEL.

First, you’ll have to make sure that pybel is installed. This can be done with pip like:

$ pip install pybel

To use this mixin, you need to properly implement the bio2bel.AbstractManager, and additionally define a function named to_bel that returns a BEL graph.

>>> from bio2bel import AbstractManager
>>> from bio2bel.manager.bel_manager import BELManagerMixin
>>> from pybel import BELGraph
>>>
>>> class MyManager(AbstractManager, BELManagerMixin):
...     def to_bel(self) -> BELGraph:
...         pass
count_relations()[source]

Count the number of BEL relations generated.

Return type

int

abstract to_bel(*args, **kwargs)[source]

Convert the database to BEL.

Example implementation outline:

from bio2bel import AbstractManager
from bio2bel.manager.bel_manager import BELManagerMixin
from pybel import BELGraph
from .models import Interaction

class MyManager(AbstractManager, BELManagerMixin):
    module_name = 'mirtarbase'
    def to_bel(self):
        rv = BELGraph(
            name='miRTarBase',
            version='1.0.0',
        )

        for interaction in self.session.query(Interaction):
            mirna = mirna_dsl('mirtarbase', interaction.mirna.mirtarbase_id)
            rna = rna_dsl('hgnc', interaction.target.hgnc_id)

            rv.add_qualified_edge(
                mirna,
                rna,
                DECREASES,
                ...
            )

        return rv
Return type

BELGraph

to_indra_statements(*args, **kwargs)[source]

Dump as a list of INDRA statements.

Return type

List[indra.Statement]

classmethod get_cli()[source]

Get a click main function with added BEL commands.

Return type

Group

Organizing the Manager

This class should be importable from the top-level. In our example, this means that you can either import the manager class with from bio2bel_mirtarbase import Manager or import bio2bel_mirtarbase.Manager.

This can be accomplished by importing the Manager in the top-level __init__.py.

# /src/bio2bel_mirtarbase/__init__.py

from .manager import Manager

__all__ = ['Manager]

__title__ = 'bio2bel_mirtarbase
...

A full example of the __init__.py for mirTarBase can be found here.

Making a Command Line Interface

The package should include a top-level module called cli.py. Normally, click can be used to build nice Command Line Interfaces like:

import click

@click.group()
def main():
    pass

@main.command()
def command_1()
    pass

However, if you’ve properly implemented an AbstractManager, then you can use AbstractManager.get_cli() to generate the main function and automatically implement several commands.

# /src/bio2bel_mirtarbase/cli.py

from .manager import Manager

main = Manager.get_cli()

if __name__ == '__main__':
    main()

This command line application will automatically have commands for populate, drop, and web. It can be extended like main from the first example as well.

Additionally, if the optional function to_bel is implemented in the manager, then several other commands (e.g., to_bel_file, upload_bel, etc.) become available as well.

Setting up __main__.py

Finally, the top-level __main__.py should import main and should have 3 lines, reading exactly as follows:

# /src/bio2bel_mirtarbase/__main__.py

from .cli import main

if __name__ == '__main__':
    main()

Entry Points in setup.py

Bio2BEL uses the entry points loader to find packages in combination with setuptools’s entry_points argument.

# /setup.py

import setuptools

setuptools.setup(
    ...
    entry_points={
        'bio2bel': [
            'mirtarbase = bio2bel_mirtarbase',
        ],
    }
    ...
)

This directly enables the Bio2BEL CLI to operate using the package’s cli so it’s possible to call things like bio2bel mirtarbase populate or bio2bel mirtarbase drop.

Additionally, a command-line interaface should be registered as well called bio2bel__mirtarbase that directly points to the main function in cli.py.

# /setup.py

import setuptools

setuptools.setup(
    ...
    entry_points={
        'bio2bel': [
            'mirtarbase = bio2bel_mirtarbase',
        ],
        'console_scripts': [
          'bio2bel_mirtarbase = bio2bel_mirtarbase.cli:main',
      ]
    }
    ...
)

Check the miRTarBase repostiroy for a full example of a setup.py.

Testing

Though it’s not a requirement, writing tests is a plus. There are several testing classes available in bio2bel.testing to enable writing tests quickly.

# /tests/constants.py

from bio2bel.testing import make_temporary_cache_class_mixin
from bio2bel_mirtarbase import Manager

TemporaryCacheClassMixin = make_temporary_cache_class_mixin(Manager)

Additionally, this class can also be generated as a subclass directly and used to override the class-level populate function

class PopulatedTemporaryCacheClassMixin(TemporaryCacheClassMixin):
    @classmethod
    def populate(cls)
        cls.manager.populate(url='... test data path ...')

Keep in mind that your populate function will probably have different argument names, especially if there are multiple files necessary to populate. Using test data instead of full source data is preferred for faster testing!