How to Make a Bio2BEL Package¶
In this tutorial, we’re going to explain how to make your own Bio2BEL package using miRTarBase as an example. This package already exists and is an excellent example.
Naming the Package¶
The package should be named bio2bel_XXX
with all lowercase letters for the name of the package, even if the source
uses stylized capitalization. This means our example package wil be called bio2bel_mirtarbase
.
Note that the repository can be named differently from the package. On the Bio2BEL organization on GitHub, we have chosen to use simply a lowercase name of the source to eliminate redundancy in the URL.
Organizing Constants¶
The package should have a top-level module named constants.py
as an easily accesible location for constants. A
variable called MODULE_NAME
should be defined with the lowercase name of the source. Additionally,
the functions bio2bel.get_data_dir()
and bio2bel.get_connection()
to locate the appropriate directory
for data and configuration loading.
# /src/bio2bel_mirtarbase/constants.py
from bio2bel import get_data_dir, get_connection
MODULE_NAME = 'mirtarbase'
DATA_DIRECTORY_PATH = get_data_dir(MODULE_NAME)
DEFAULT_CONNECTION = get_connection(MODULE_NAME)
Making a Manager¶
There should be a concrete implementation of bio2bel.AbstractManager
. For consistent style, we recommend
implementing this in a top-level module called manager.py
and naming the class Manager
. Check the miRTarBase
repository for an example of the
package structure and an example of the
implementation of a Manager.
-
class
bio2bel.
AbstractManager
(*args, **kwargs)[source]¶ This is a base class for implementing your own Bio2BEL manager.
It already includes functions to handle configuration, construction of a connection to a database using SQLAlchemy, creation of the tables defined by your own
sqlalchemy.ext.declarative.declarative_base()
, and has hooks to override that populate and make simple queries to the database. SinceAbstractManager
inherits fromabc.ABC
and is therefore an abstract class, there are a few class variables, functions, and properties that need to be overridden.Overriding the Module Name
First, the class-level variable
module_name
must be set to a string corresponding to the name of the data source.from bio2bel import AbstractManager class Manager(AbstractManager): module_name = 'mirtarbase' # note: use lower case module names
In general, this should also correspond to the same value as
MODULE_NAME
set inconstants.py
and can also be set with an assignment to this valuefrom bio2bel import AbstractManager from .constants import MODULE_NAME class Manager(AbstractManager): module_name = MODULE_NAME
Setting the Declarative Base
Building on the previous example, the (private) abstract property
bio2bel.AbstractManager._base
must be overridden to return the value from yoursqlalchemy.ext.declarative.declarative_base()
. We chose to make this an instance-level property instead of a class-level variable so each manager could have its own information about connections to the database.As a minimal example:
from sqlalchemy.ext.declarative import DeclarativeMeta, declarative_base from bio2bel import AbstractManager Base: DeclarativeMeta = declarative_base() class Manager(AbstractManager): module_name = 'mirtarbase' # note: use lower case module names @property def _base(self) -> DeclarativeMeta: return Base
In general, the models should be defined in a module called
models.py
so theBase
can also be imported.from sqlalchemy.ext.declarative import DeclarativeMeta from bio2bel import AbstractManager from .constants import MODULE_NAME from .models import Base class Manager(AbstractManager): module_name = MODULE_NAME @property def _base(self) -> DeclarativeMeta: return Base
Populating the Database
Deciding how to populate the database using your SQLAlchemy models is incredibly creative and can’t be given a good example without checking real code. See the previously mentioned implementation of a Manager.
from sqlalchemy.ext.declarative import DeclarativeMeta from bio2bel import AbstractManager from .constants import MODULE_NAME from .models import Base class Manager(AbstractManager): module_name = MODULE_NAME @property def _base(self) -> DeclarativeMeta: return Base def populate(self) -> None: ...
Checking the Database is Populated
A method for checking if the database has been populated already must be implemented as well. The easiest way to implement this is to check that there’s a non-zero count of whatever the most important model in the database is.
from sqlalchemy.ext.declarative import DeclarativeMeta from bio2bel import AbstractManager from .constants import MODULE_NAME from .models import Base class Manager(AbstractManager): module_name = MODULE_NAME @property def _base(self) -> DeclarativeMeta: return Base def populate(self) -> None: ... def is_populated(self) -> bool: return 0 < self.session.query(MyImportantModel).count()
There are several mixins that can be optionally inherited:
bio2bel.manager.flask_manager.FlaskMixin
: the Flask Mixin creates a Flask-Admin web application.bio2bel.manager.namespace_manager.BELNamespaceManagerMixin
: the BEL Namespace Manager Mixin exports a BEL namespace and interact with PyBEL.bio2bel.manager.bel_manager.BELManagerMixin
: the BEL Manager Mixin exports a BEL script and interact with PyBEL.
Build an abstract manager from either a connection or an engine/session.
The remaining keyword arguments are passed to
build_engine_session()
.- Parameters
connection –
engine –
session –
-
create_all
(check_first=True)[source]¶ Create the empty database (tables).
- Parameters
check_first (bool) – Defaults to True, don’t issue CREATEs for tables already present in the target database. Defers to
sqlalchemy.sql.schema.MetaData.create_all()
-
drop_all
(check_first=True)[source]¶ Drop all tables from the database.
- Parameters
check_first (bool) – Defaults to True, only issue DROPs for tables confirmed to be present in the target database. Defers to
sqlalchemy.sql.schema.MetaData.drop_all()
Mixins¶
Flask¶
-
class
bio2bel.manager.flask_manager.
FlaskMixin
(*args, **kwargs)[source]¶ A mixin for building a Flask-Admin interface.
This class can be used as a mixin, meaning that a class inheriting from AbstractManager can also multiple-inherit from this class. It contains functions to build a
flask
application for easy viewing of the contents of the database.First, you’ll have to make sure that the
flask
andflask-admin
are installed. They can be installed with Bio2BEL using the package extra called “web” like:$ pip install bio2bel[web]
Or, installed directly with pip:
$ pip install flask flask-admin
Besides this, all that’s necessary to use this mixin is to define the class variable
flask_admin_models
as a list of SQLAlchemy models you’d like to see.>>> from sqlalchemy.ext.declarative import DeclarativeMeta >>> >>> from bio2bel import AbstractManager >>> from bio2bel.manager.flask_manager.FlaskMixin >>> >>> from .constants import MODULE_NAME >>> from .models import Base, Evidence, Interaction, Mirna, Species, Target >>> >>> class Manager(AbstractManager): ... module_name = MODULE_NAME ... flask_admin_models = [Evidence, Interaction, Mirna, Species, Target] ... ... @property ... def _base(self) -> DeclarativeMeta: ... return Base ... ... def populate(self) -> None: ... ...
Build an abstract manager from either a connection or an engine/session.
The remaining keyword arguments are passed to
build_engine_session()
.- Parameters
connection –
engine –
session –
-
flask_admin_models
: Union[sqlalchemy.ext.declarative.api.DeclarativeMeta, List[sqlalchemy.ext.declarative.api.DeclarativeMeta]]¶ Represents a list of SQLAlchemy classes to make a Flask-Admin interface.
BEL Namespace¶
-
class
bio2bel.manager.namespace_manager.
BELNamespaceManagerMixin
(*args, **kwargs)[source]¶ A mixin for generating a BEL namespace file and uploading it to the PyBEL database.
First, you’ll have to make sure that
pybel
is installed. This can be done with pip like:$ pip install pybel
To use this mixin, you need to properly implement the AbstractManager, and add additional class variables and functions.
namespace_model
: The SQLAlchemy class that represents the entity to serialize into the namespace>>> from bio2bel import AbstractManager >>> from bio2bel.namespace_manager import NamespaceManagerMixin >>> from .models import HumanGene >>> >>> class MyManager(AbstractManager, NamespaceManagerMixin): ... module_name = 'hgnc' ... ... ... namespace_model = HumanGene
Several fields from Identifiers.org should be populated, including:
identifiers_recommended
identifiers_pattern
identifiers_miriam
identifiers_namespace
identifiers_url
>>> from bio2bel import AbstractManager >>> from bio2bel.namespace_manager import NamespaceManagerMixin >>> from .models import HumanGene >>> >>> class MyManager(AbstractManager, NamespaceManagerMixin): ... module_name = 'hgnc' ... ... ... namespace_model = HumanGene ... identifiers_recommended = 'HGNC' ... identifiers_pattern = '...' ... identifiers_miriam = 'MIR:00000080' ... identifiers_namespace = 'hgnc' ... identifiers_url = 'http://identifiers.org/hgnc/'
Two methods need to be implemented. First, the static method
_get_identifier
should take in the namespace model and give back the database identifier. for us, this is easy, since the HumanGene class has an attribute calledhgnc_id
.Perhaps in the future, we will enfoce the convention that the namespace model should have a field called <module name>_id, but having this method gives lots of flexibility.
This is also a good place to add more specific type annotations (not yet tested with MyPy).
>>> from bio2bel import AbstractManager >>> from bio2bel.namespace_manager import NamespaceManagerMixin >>> from .models import HumanGene >>> >>> class MyManager(AbstractManager, NamespaceManagerMixin): ... module_name = 'hgnc' ... ... ... namespace_model = HumanGene ... identifiers_recommended = 'HGNC' ... identifiers_pattern = '...' ... identifiers_miriam = 'MIR:00000080' ... identifiers_namespace = 'hgnc' ... identifiers_url = 'http://identifiers.org/hgnc/' ... ... @staticmethod ... def _get_identifier(model: HumanGene) -> str: ... return model.hgnc_id
Last, we must implement the method
_create_namespace_entry_from_model
, which encodes the logic of building apybel.manager.models.NamespaceEntry
from the Bio2BEL repository’s namespace model.For a repository like ChEBI, this is very simple, but for HGNC there is reason to add additional logic to get the proper encodings.
>>> from bio2bel import AbstractManager >>> from bio2bel.namespace_manager import NamespaceManagerMixin >>> from pybel.manager.models import Namespace, NamespaceEntry >>> from .models import HumanGene >>> >>> class MyManager(AbstractManager, NamespaceManagerMixin): ... module_name = 'hgnc' ... ... ... namespace_model = HumanGene ... identifiers_recommended = 'HGNC' ... identifiers_pattern = '...' ... identifiers_miriam = 'MIR:00000080' ... identifiers_namespace = 'hgnc' ... identifiers_url = 'http://identifiers.org/hgnc/' ... ... @staticmethod ... def _get_identifier(model: HumanGene) -> str: ... return model.hgnc_id ... ... def _create_namespace_entry_from_model(self, model: HumanGene, namespace: Namespace) -> NamespaceEntry: ... return NamespaceEntry( ... encoding=encodings.get(model.locus_type, 'GRP'), ... identifier=model.hgnc_id, ... name=model.hgnc_symbol, ... namespace=namespace, ... )
-
write_bel_namespace_mappings
(file, **kwargs)[source]¶ Write a BEL namespace mapping file.
- Return type
BEL Network¶
-
class
bio2bel.manager.bel_manager.
BELManagerMixin
[source]¶ A mixin for generating a
pybel.BELGraph
representing BEL.First, you’ll have to make sure that
pybel
is installed. This can be done with pip like:$ pip install pybel
To use this mixin, you need to properly implement the
bio2bel.AbstractManager
, and additionally define a function namedto_bel
that returns a BEL graph.>>> from bio2bel import AbstractManager >>> from bio2bel.manager.bel_manager import BELManagerMixin >>> from pybel import BELGraph >>> >>> class MyManager(AbstractManager, BELManagerMixin): ... def to_bel(self) -> BELGraph: ... pass
-
abstract
to_bel
(*args, **kwargs)[source]¶ Convert the database to BEL.
Example implementation outline:
from bio2bel import AbstractManager from bio2bel.manager.bel_manager import BELManagerMixin from pybel import BELGraph from .models import Interaction class MyManager(AbstractManager, BELManagerMixin): module_name = 'mirtarbase' def to_bel(self): rv = BELGraph( name='miRTarBase', version='1.0.0', ) for interaction in self.session.query(Interaction): mirna = mirna_dsl('mirtarbase', interaction.mirna.mirtarbase_id) rna = rna_dsl('hgnc', interaction.target.hgnc_id) rv.add_qualified_edge( mirna, rna, DECREASES, ... ) return rv
- Return type
BELGraph
-
abstract
Organizing the Manager¶
This class should be importable from the top-level. In our example, this means that you can either import the manager
class with from bio2bel_mirtarbase import Manager
or import bio2bel_mirtarbase.Manager
.
This can be accomplished by importing the Manager
in the top-level __init__.py
.
# /src/bio2bel_mirtarbase/__init__.py
from .manager import Manager
__all__ = ['Manager]
__title__ = 'bio2bel_mirtarbase
...
A full example of the __init__.py
for mirTarBase can be found here.
Making a Command Line Interface¶
The package should include a top-level module called cli.py
. Normally, click
can be used to build nice
Command Line Interfaces like:
import click
@click.group()
def main():
pass
@main.command()
def command_1()
pass
However, if you’ve properly implemented an AbstractManager, then you can use AbstractManager.get_cli()
to
generate the main function and automatically implement several commands.
# /src/bio2bel_mirtarbase/cli.py
from .manager import Manager
main = Manager.get_cli()
if __name__ == '__main__':
main()
This command line application will automatically have commands for populate
, drop
, and web
. It can be
extended like main
from the first example as well.
Additionally, if the optional function to_bel
is implemented in the manager, then several other commands
(e.g., to_bel_file
, upload_bel
, etc.) become available as well.
Setting up __main__.py
¶
Finally, the top-level __main__.py
should import main and should have 3 lines, reading exactly as follows:
# /src/bio2bel_mirtarbase/__main__.py
from .cli import main
if __name__ == '__main__':
main()
Entry Points in setup.py
¶
Bio2BEL uses the entry points loader to find packages in combination with setuptools’s entry_points
argument.
# /setup.py
import setuptools
setuptools.setup(
...
entry_points={
'bio2bel': [
'mirtarbase = bio2bel_mirtarbase',
],
}
...
)
This directly enables the Bio2BEL CLI to operate using the package’s cli so it’s possible to call things like
bio2bel mirtarbase populate
or bio2bel mirtarbase drop
.
Additionally, a command-line interaface should be registered as well called bio2bel__mirtarbase
that directly
points to the main
function in cli.py
.
# /setup.py
import setuptools
setuptools.setup(
...
entry_points={
'bio2bel': [
'mirtarbase = bio2bel_mirtarbase',
],
'console_scripts': [
'bio2bel_mirtarbase = bio2bel_mirtarbase.cli:main',
]
}
...
)
Check the miRTarBase repostiroy for a full example of a setup.py.
Testing¶
Though it’s not a requirement, writing tests is a plus. There are several testing classes available in
bio2bel.testing
to enable writing tests quickly.
# /tests/constants.py
from bio2bel.testing import make_temporary_cache_class_mixin
from bio2bel_mirtarbase import Manager
TemporaryCacheClassMixin = make_temporary_cache_class_mixin(Manager)
Additionally, this class can also be generated as a subclass directly and used to override the class-level populate
function
class PopulatedTemporaryCacheClassMixin(TemporaryCacheClassMixin):
@classmethod
def populate(cls)
cls.manager.populate(url='... test data path ...')
Keep in mind that your populate function will probably have different argument names, especially if there are multiple files necessary to populate. Using test data instead of full source data is preferred for faster testing!