Using Operon¶
The primary user interaction with Operon is through the command line interface, exposed as the executable operon
.
Subcommands are provided to install, manage, and run pipelines compatible with the Operon framework.
Initialization¶
Operon keeps track of pipelines and configurations, among other metadata, in a hidden directory. Before Operon can be used, this directory structure needs to be initialized:
$ operon init [init-path]
Where init-path
is an optional path pointing to the location where the hidden .operon
folder should be
initialized. If this path isn’t given it will default to the user’s home directory.
By default, whenever Operon needs to access the .operon
folder it will look in the user’s home directory. If the
.operon
folder has been initialized elsewhere, there must be a shell environment variable OPERON_HOME
which
points to the directory containing the .operon
folder.
Note
If a path other than the user’s home directory is given to the init
subprogram, it will attempt to add the
OPERON_HOME
environment variable to the user’s shell session in ~/.bashrc
, ~/.bash_profile
, or
~/.profile
.
After a successful initialization, a new shell session should be started for tab-completion and the OPERON_HOME
environment variable to take effect.
Pipeline Installation¶
To install an Operon compatible pipeline into the Operon system:
$ operon install /path/to/pipeline.py [-y]
The pipeline file will be copied into the Operon system and optionally Python package dependencies, as specified by
the pipeline just installed, can be installed into the current Python environment using pip
.
Caution
If install
attempts to install Python package dependencies, it will attempt to do so using the --upgrade
flag to pip
. If in the current Python environment those packages already exist, they will be either upgraded
or downgraded, which may cause other software to stop functioning properly.
Pipeline Configuration¶
To configure an Operon pipeline with platform-static values and optionally use Miniconda to install software executables that the pipeline uses:
$ operon configure <pipeline-name> [-h] [--location LOCATION] [--blank]
If this is the first time the pipeline has been configured and Miniconda is found in PATH
, then the configure
subprogram will attempt to create a new conda environment, install software instances that the pipeline uses, then
inject those software paths into the next configuration step. If a conda environment for this pipeline has been
created before, configure
can attempt to inject those software paths instead.
For the configuration step, Operon will ask the user to provide values for the pipeline which will not change from
run to run such as software paths, paths to reference files, etc. The question is followed by a value in brackets
([]
), which is the used value if no input is provided. If a conda environment is used, this value in brackets will
be the injected software path.
By default, the configuration file is written into the .operon
folder where it will automatically be called up
when the user runs operon run
. If --location
is given as a path, the configuration file will be written
out there instead.
Seeing Pipeline States¶
To see all pipelines in the Operon system and whether each has a corresponding configuration file:
$ operon list
To see detailed information about a particular pipeline, such as current configuration, command line options, any required dependencies, etc:
$ operon show <pipeline-name>
Run a Pipeline¶
To run an installed pipeline:
$ operon run <pipeline-name> [--pipeline-config CONFIG] [--parsl-config CONFIG] \
[--logs-dir DIR] [pipeline-options]
The set of accepted pipeline-options
is defined by the pipeline itself and are meant to be values that change from
run to run, such as input files, metadata, etc. Three options will always exist:
pipeline-config
can point to a pipeline config to use for this run onlyparsl-config
can point to a file containing JSON that represents a Parsl config to use for this run onlylogs-dir
can point to a location where log files from this run should be deposited; if it doesn’t exist, it will be created; defaults to the currect directory
When an Operon pipeline is run, under the hood it creates a Parsl workflow which can be run in many different ways
depending on the accompanying Parl configuration. This means that while the definition for a pipeline run with the
run
subprogram is consistent, that actual execution model may vary if the Parsl configuration varies.
Parsl Configuration¶
Parsl is the package the powers Operon and and is responsible for Operon’s powerful and flexible parallel execution. Operon itself is only a front-end abstraction of a Parsl workflow; the actual execution model is fully Parsl-specific and as such it’s advised to check out the Parsl documentation to get a sense for how to design a Parls configuration for a specific need-case.
The run
subprogram attempts to pull a Parsl configuration from the user in the following order:
- From the command line argument
--parsl-config
- From the pipeline configuration key
parsl_config
- From a platform default JSON file located at
$OPERON_HOME/.operon/parsl_config.json
- A default parsl configuration provided by the pipeline
- A package default parsl configuration of 8 workers using Python threads
For more detailed information, refer to AnotherPage
Command Line Help¶
All subcommands can be followed by a -h
, --help
, or help
to get a more detailed explanation for how it
should be used.